digitalmars.D - Suggestion: char.init, wchar.init and dchar.init

Arcane Jill (19/19) Jun 07 2004 Hi,

Ilya Minkov (2/2) Jun 07 2004 Gets my vote!
Walter (13/32) Jun 07 2004 That's a good idea.
Hauke Duden (11/33) Jun 07 2004 I like the 0 initialization. It is consistent and easy to understand and...

Ben Hinkle (1/4) Jun 07 2004 .init?
Arcane Jill (10/13) Jun 07 2004 You're not supposed to /test/ for uninitialized variables - you're simpl...

Arcane Jill <Arcane_member pathlink.com> writes:

Hi,

The default value of NaN for floating point numbers is an excellent idea. I
suggest that we do the same thing for chars, wchars and dchars.

The init value for char should (IMO) be 0xFF. Rationale - char by definition
contains a UTF-8 fragment. The byte 0xFF will never occur in a valid UTF-8
sequence. It is a clear indication of an unassigned value.

The init value for wchar and dchar should be 0xFFFF (that is, 0x0000FFFF for
dchar). Rationale - wchar and dchar by definiton contain UTF-16 and UTF-32
(equivalent to plain Unicode within their defined ranges). The codepoint U+FFFF
is not a legitimate Unicode character, and, furthermore, it is guaranteed by the
Unicode Consortium that 0xFFFF will NEVER be a legitimate Unicode character.
This codepoint will remain forever unassigned, precisely so that it may be used
for purposes such as this.

Be it noted that that the codepoint 0 is a bad choice for a default value. It
might have made sense in C, where '\0' has special meaning as a string
terminator, but in D '\0' is just another character. Unicode defines '\0' as a
control character whose interpretation is implementation dependent. Better, I
feel, to use a value with universal meaning.

Jill

Jun 07 2004

Ilya Minkov <minkov cs.tum.edu> writes:

Gets my vote!

-eye

Jun 07 2004

"Walter" <newshound digitalmars.com> writes:

That's a good idea.

"Arcane Jill" <Arcane_member pathlink.com> wrote in message
news:ca17qq$224t$1 digitaldaemon.com...
 Hi,

 The default value of NaN for floating point numbers is an excellent idea.

I
 suggest that we do the same thing for chars, wchars and dchars.

 The init value for char should (IMO) be 0xFF. Rationale - char by

definition
 contains a UTF-8 fragment. The byte 0xFF will never occur in a valid UTF-8
 sequence. It is a clear indication of an unassigned value.

 The init value for wchar and dchar should be 0xFFFF (that is, 0x0000FFFF

for
 dchar). Rationale - wchar and dchar by definiton contain UTF-16 and UTF-32
 (equivalent to plain Unicode within their defined ranges). The codepoint

U+FFFF
 is not a legitimate Unicode character, and, furthermore, it is guaranteed

by the
 Unicode Consortium that 0xFFFF will NEVER be a legitimate Unicode

character.
 This codepoint will remain forever unassigned, precisely so that it may be

used
 for purposes such as this.

 Be it noted that that the codepoint 0 is a bad choice for a default value.

It
 might have made sense in C, where '\0' has special meaning as a string
 terminator, but in D '\0' is just another character. Unicode defines '\0'

as a
 control character whose interpretation is implementation dependent.

Better, I
 feel, to use a value with universal meaning.

 Jill

Jun 07 2004

Hauke Duden <H.NS.Duden gmx.net> writes:

Arcane Jill wrote:
 Hi,
 
 The default value of NaN for floating point numbers is an excellent idea. I
 suggest that we do the same thing for chars, wchars and dchars.
 
 The init value for char should (IMO) be 0xFF. Rationale - char by definition
 contains a UTF-8 fragment. The byte 0xFF will never occur in a valid UTF-8
 sequence. It is a clear indication of an unassigned value.
 
 The init value for wchar and dchar should be 0xFFFF (that is, 0x0000FFFF for
 dchar). Rationale - wchar and dchar by definiton contain UTF-16 and UTF-32
 (equivalent to plain Unicode within their defined ranges). The codepoint U+FFFF
 is not a legitimate Unicode character, and, furthermore, it is guaranteed by
the
 Unicode Consortium that 0xFFFF will NEVER be a legitimate Unicode character.
 This codepoint will remain forever unassigned, precisely so that it may be used
 for purposes such as this.
 
 Be it noted that that the codepoint 0 is a bad choice for a default value. It
 might have made sense in C, where '\0' has special meaning as a string
 terminator, but in D '\0' is just another character. Unicode defines '\0' as a
 control character whose interpretation is implementation dependent. Better, I
 feel, to use a value with universal meaning.

I like the 0 initialization. It is consistent and easy to understand and 
remember.

And it has an important function. If anyone ever passes an uninitialized 
D memory block to functions that expect a 0-terminated string then 
nothing bad will happen.

But then again, I also don't like that floats are initialized to NaN.

If it HAS to be done then there should definitely be an easy-to-remember 
property for the char types to test for this. Otherwise many programmers 
will have a hard time remembering which value means "not a char".

Hauke

Jun 07 2004

"Ben Hinkle" <bhinkle mathworks.com> writes:

 If it HAS to be done then there should definitely be an easy-to-remember
 property for the char types to test for this. Otherwise many programmers
 will have a hard time remembering which value means "not a char".

.init?

Jun 07 2004

Arcane Jill <Arcane_member pathlink.com> writes:

In article <ca2754$h5k$1 digitaldaemon.com>, Hauke Duden says...

If it HAS to be done then there should definitely be an easy-to-remember 
property for the char types to test for this. Otherwise many programmers 
will have a hard time remembering which value means "not a char".

You're not supposed to /test/ for uninitialized variables - you're simply
supposed to initialize them! And that error, of course is exactly what we're
trying to catch.

Anyway, you could always test for "if (c == char.init)" no matter what char.init
was.

By the way, I got to look at your Unichar code today. Excellent stuff. It's on
my machine now. Also, you were right about doxygen, judging by the quality of
your documentation - it really does rock.

Jill

Jun 07 2004

D Programming

C/C++ Programming

Other

digitalmars.D - Suggestion: char.init, wchar.init and dchar.init