www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Suggestion: char.init, wchar.init and dchar.init

reply Arcane Jill <Arcane_member pathlink.com> writes:
Hi,

The default value of NaN for floating point numbers is an excellent idea. I
suggest that we do the same thing for chars, wchars and dchars.

The init value for char should (IMO) be 0xFF. Rationale - char by definition
contains a UTF-8 fragment. The byte 0xFF will never occur in a valid UTF-8
sequence. It is a clear indication of an unassigned value.

The init value for wchar and dchar should be 0xFFFF (that is, 0x0000FFFF for
dchar). Rationale - wchar and dchar by definiton contain UTF-16 and UTF-32
(equivalent to plain Unicode within their defined ranges). The codepoint U+FFFF
is not a legitimate Unicode character, and, furthermore, it is guaranteed by the
Unicode Consortium that 0xFFFF will NEVER be a legitimate Unicode character.
This codepoint will remain forever unassigned, precisely so that it may be used
for purposes such as this.

Be it noted that that the codepoint 0 is a bad choice for a default value. It
might have made sense in C, where '\0' has special meaning as a string
terminator, but in D '\0' is just another character. Unicode defines '\0' as a
control character whose interpretation is implementation dependent. Better, I
feel, to use a value with universal meaning.

Jill
Jun 07 2004
next sibling parent Ilya Minkov <minkov cs.tum.edu> writes:
Gets my vote!

-eye
Jun 07 2004
prev sibling next sibling parent "Walter" <newshound digitalmars.com> writes:
That's a good idea.

"Arcane Jill" <Arcane_member pathlink.com> wrote in message
news:ca17qq$224t$1 digitaldaemon.com...
 Hi,

 The default value of NaN for floating point numbers is an excellent idea.
I
 suggest that we do the same thing for chars, wchars and dchars.

 The init value for char should (IMO) be 0xFF. Rationale - char by
definition
 contains a UTF-8 fragment. The byte 0xFF will never occur in a valid UTF-8
 sequence. It is a clear indication of an unassigned value.

 The init value for wchar and dchar should be 0xFFFF (that is, 0x0000FFFF
for
 dchar). Rationale - wchar and dchar by definiton contain UTF-16 and UTF-32
 (equivalent to plain Unicode within their defined ranges). The codepoint
U+FFFF
 is not a legitimate Unicode character, and, furthermore, it is guaranteed
by the
 Unicode Consortium that 0xFFFF will NEVER be a legitimate Unicode
character.
 This codepoint will remain forever unassigned, precisely so that it may be
used
 for purposes such as this.

 Be it noted that that the codepoint 0 is a bad choice for a default value.
It
 might have made sense in C, where '\0' has special meaning as a string
 terminator, but in D '\0' is just another character. Unicode defines '\0'
as a
 control character whose interpretation is implementation dependent.
Better, I
 feel, to use a value with universal meaning.

 Jill
Jun 07 2004
prev sibling parent reply Hauke Duden <H.NS.Duden gmx.net> writes:
Arcane Jill wrote:
 Hi,
 
 The default value of NaN for floating point numbers is an excellent idea. I
 suggest that we do the same thing for chars, wchars and dchars.
 
 The init value for char should (IMO) be 0xFF. Rationale - char by definition
 contains a UTF-8 fragment. The byte 0xFF will never occur in a valid UTF-8
 sequence. It is a clear indication of an unassigned value.
 
 The init value for wchar and dchar should be 0xFFFF (that is, 0x0000FFFF for
 dchar). Rationale - wchar and dchar by definiton contain UTF-16 and UTF-32
 (equivalent to plain Unicode within their defined ranges). The codepoint U+FFFF
 is not a legitimate Unicode character, and, furthermore, it is guaranteed by
the
 Unicode Consortium that 0xFFFF will NEVER be a legitimate Unicode character.
 This codepoint will remain forever unassigned, precisely so that it may be used
 for purposes such as this.
 
 Be it noted that that the codepoint 0 is a bad choice for a default value. It
 might have made sense in C, where '\0' has special meaning as a string
 terminator, but in D '\0' is just another character. Unicode defines '\0' as a
 control character whose interpretation is implementation dependent. Better, I
 feel, to use a value with universal meaning.
I like the 0 initialization. It is consistent and easy to understand and remember. And it has an important function. If anyone ever passes an uninitialized D memory block to functions that expect a 0-terminated string then nothing bad will happen. But then again, I also don't like that floats are initialized to NaN. If it HAS to be done then there should definitely be an easy-to-remember property for the char types to test for this. Otherwise many programmers will have a hard time remembering which value means "not a char". Hauke
Jun 07 2004
next sibling parent "Ben Hinkle" <bhinkle mathworks.com> writes:
 If it HAS to be done then there should definitely be an easy-to-remember
 property for the char types to test for this. Otherwise many programmers
 will have a hard time remembering which value means "not a char".
.init?
Jun 07 2004
prev sibling parent Arcane Jill <Arcane_member pathlink.com> writes:
In article <ca2754$h5k$1 digitaldaemon.com>, Hauke Duden says...

If it HAS to be done then there should definitely be an easy-to-remember 
property for the char types to test for this. Otherwise many programmers 
will have a hard time remembering which value means "not a char".
You're not supposed to /test/ for uninitialized variables - you're simply supposed to initialize them! And that error, of course is exactly what we're trying to catch. Anyway, you could always test for "if (c == char.init)" no matter what char.init was. By the way, I got to look at your Unichar code today. Excellent stuff. It's on my machine now. Also, you were right about doxygen, judging by the quality of your documentation - it really does rock. Jill
Jun 07 2004