www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Using a char value >= 128

reply Per =?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:
In which circumstances can a `char` be initialized a non-7-bit 
value (>= 128)? Is it possible only in non- safe code?

And, if so, what will be the result of casting such a value to 
`dchar`? Will that result in an exception or will it interpret 
the `char` using a 8-bit character encoding?

I'm asking because I'm pondering about how to specialize the 
non-7-bit `needle`-case of the following array-overload of 
`startsWith` when `T` is `char`:

bool startsWith(T)(scope const(T)[] haystack,
                    scope const T needle)
{
     static if (is(T : char)) { assert(needle < 128); } // TODO 
convert needle to `char[]` and call itself
     if (haystack.length >= 1)
     {
         return haystack[0] == needle;
     }
     return false;
}
Oct 27 2019
next sibling parent Adam D. Ruppe <destructionator gmail.com> writes:
On Sunday, 27 October 2019 at 12:44:05 UTC, Per Nordlöw wrote:
 In which circumstances can a `char` be initialized a non-7-bit 
 value (>= 128)? Is it possible only in non- safe code?
All circumstances, `char`'s default initializer is 255. char a; // is 255
 And, if so, what will be the result of casting such a value to 
 `dchar`? Will that result in an exception or will it interpret 
 the `char` using a 8-bit character encoding?
It will treat the numeric value as a Unicode code point then.
 I'm asking because I'm pondering about how to specialize the 
 non-7-bit `needle`-case of the following array-overload of 
 `startsWith` when `T` is `char`:
I'd say that is just plain invalid and it should throw; I'm of the opinion the assert there is correct. But you could also do cast into dchar, then call std.utf.encode http://dpldocs.info/experimental-docs/std.utf.encode.1.html to get it back to utf-8 and compare the values then. It'd spit out a two byte pair that is probably the closest thing to what the user intended. But I'm just not convinced the library should be guessing what the user intended to begin with.
Oct 27 2019
prev sibling next sibling parent reply Ernesto Castellotti <erny.castell gmail.com> writes:
On Sunday, 27 October 2019 at 12:44:05 UTC, Per Nordlöw wrote:
 In which circumstances can a `char` be initialized a non-7-bit 
 value (>= 128)? Is it possible only in non- safe code?

 And, if so, what will be the result of casting such a value to 
 `dchar`? Will that result in an exception or will it interpret 
 the `char` using a 8-bit character encoding?

 I'm asking because I'm pondering about how to specialize the 
 non-7-bit `needle`-case of the following array-overload of 
 `startsWith` when `T` is `char`:

 bool startsWith(T)(scope const(T)[] haystack,
                    scope const T needle)
 {
     static if (is(T : char)) { assert(needle < 128); } // TODO 
 convert needle to `char[]` and call itself
     if (haystack.length >= 1)
     {
         return haystack[0] == needle;
     }
     return false;
 }
char in D is always unsigned, it is not implementation-specific. Therefore it can take values ​​up to (2^8)−1, If you want a signed 8 byte type you can use ubyte, which obviously can take up from -(2^7) to (2^7)-1
Oct 27 2019
parent Ernesto Castellotti <erny.castell gmail.com> writes:
On Sunday, 27 October 2019 at 14:36:54 UTC, Ernesto Castellotti 
wrote:
 On Sunday, 27 October 2019 at 12:44:05 UTC, Per Nordlöw wrote:
 [...]
char in D is always unsigned, it is not implementation-specific. Therefore it can take values ​​up to (2^8)−1, If you want a signed 8 byte type you can use ubyte, which obviously can take up from -(2^7) to (2^7)-1
 signed 8 byte
correction: they are obviously 8 bits, not 8 byte
Oct 27 2019
prev sibling parent Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:
On Sunday, October 27, 2019 6:44:05 AM MDT Per Nordlöw via Digitalmars-d-
learn wrote:
 In which circumstances can a `char` be initialized a non-7-bit
 value (>= 128)? Is it possible only in non- safe code?

 And, if so, what will be the result of casting such a value to
 `dchar`? Will that result in an exception or will it interpret
 the `char` using a 8-bit character encoding?

 I'm asking because I'm pondering about how to specialize the
 non-7-bit `needle`-case of the following array-overload of
 `startsWith` when `T` is `char`:

 bool startsWith(T)(scope const(T)[] haystack,
                     scope const T needle)
 {
      static if (is(T : char)) { assert(needle < 128); } // TODO
 convert needle to `char[]` and call itself
      if (haystack.length >= 1)
      {
          return haystack[0] == needle;
      }
      return false;
 }
char is a value above 127 all the time, because specific values above 127 are used as the first byte in a multibyte code point in UTF-8. Also, as Adam points out, the default value for char is 255 (in order to specifically give it an invalid value). That being said, it doesn't make sense to use startsWith with a single char which isn't ASCII, because no such char would be valid UTF-8 on its own. - Jonathan M Davis
Oct 27 2019