digitalmars.D.learn - Finding chars in strings
- Per =?UTF-8?B?Tm9yZGzDtnc=?= (3/3) Sep 05 2017 If a character literal has type char, always below 128, can we
- Per =?UTF-8?B?Tm9yZGzDtnc=?= (3/6) Sep 05 2017 Follow up question: If a character literal has type char, can we
- ag0aep6g (6/8) Sep 05 2017 Strictly speaking, this is a character literal of type char: '\xC3'.
- Jonathan M Davis via Digitalmars-d-learn (14/22) Sep 05 2017 Aside from escape sequences, a literal should not result in a non-ASCII
- ag0aep6g (3/6) Sep 05 2017 Yes. You can search for ASCII characters (< 128) without decoding. The
- Jonathan M Davis via Digitalmars-d-learn (9/15) Sep 05 2017 Unfortunately, you'll have to use something like std.utf.byCodeUnit or
If a character literal has type char, always below 128, can we always search for it's first byte offset in a string without decoding the string to a range of dchars?
Sep 05 2017
On Tuesday, 5 September 2017 at 15:43:02 UTC, Per Nordlöw wrote:If a character literal has type char, always below 128, can we always search for it's first byte offset in a string without decoding the string to a range of dchars?Follow up question: If a character literal has type char, can we always assume it's an ASCII character?
Sep 05 2017
On 09/05/2017 05:54 PM, Per Nordlöw wrote:Follow up question: If a character literal has type char, can we always assume it's an ASCII character?Strictly speaking, this is a character literal of type char: '\xC3'. It's clearly above 0x7F, and not an ASCII character. So, no. But if it's an actual character, not an escape sequence, then yes (I think). A wrong encoding setting in your text editor could mess with that, though.
Sep 05 2017
On Tuesday, September 05, 2017 18:04:16 ag0aep6g via Digitalmars-d-learn wrote:On 09/05/2017 05:54 PM, Per Nordlöw wrote:Aside from escape sequences, a literal should not result in a non-ASCII value for a char, but in general, it's a bad idea to assume that a char is an ASCII character unless you've verified that already or somehow know based on where the input came from that the char or chars that you're dealing with are all ASCII. And you have to remember that VRP is in play as well, so if it gets involved, you could end up with a char that's not an ASCII character. And IIRC, character literals are almost always treated as dchar unless a cast or VRP gets involved. So, I wouldn't be in a hurry to assume that using character literals would guarantee that you're dealing with only ASCII. Ultimately, std.ascii.isASCII is your friend if there's any risk of something not being ASCII when you need it to be ASCII. - Jonathan M DavisFollow up question: If a character literal has type char, can we always assume it's an ASCII character?Strictly speaking, this is a character literal of type char: '\xC3'. It's clearly above 0x7F, and not an ASCII character. So, no. But if it's an actual character, not an escape sequence, then yes (I think). A wrong encoding setting in your text editor could mess with that, though.
Sep 05 2017
On 09/05/2017 05:43 PM, Per Nordlöw wrote:If a character literal has type char, always below 128, can we always search for it's first byte offset in a string without decoding the string to a range of dchars?Yes. You can search for ASCII characters (< 128) without decoding. The values in multibyte sequences are always above 127.
Sep 05 2017
On Tuesday, September 05, 2017 17:55:20 ag0aep6g via Digitalmars-d-learn wrote:On 09/05/2017 05:43 PM, Per Nordlöw wrote:Unfortunately, you'll have to use something like std.utf.byCodeUnit or std.string.representation to do it; otherwise, you get hit with the autodecoding. But yeah, UTF-8 is designed to be compatible with ASCII, so all ASCII characters are valid UTF-8 code units and don't require decoding. The decoding is just required if you're dealing with non-ASCII characters, which is another reason why the autodecoding is annoying. - Jonathan M DavisIf a character literal has type char, always below 128, can we always search for it's first byte offset in a string without decoding the string to a range of dchars?Yes. You can search for ASCII characters (< 128) without decoding. The values in multibyte sequences are always above 127.
Sep 05 2017