www.digitalmars.com         C & C++   DMDScript  

D - Documentation Error

reply Unicode User <Unicode_member pathlink.com> writes:
Hi,

Not sure if this is the right place to report this. I am very, VERY impressed
with D - especially with the UTF support. Spending some time learning D now.

But there's an error in the documentation of the Basic Data Types. It says:
"char = unsigned 8 bit ASCII".

I would point out that ASCII, *BY DEFINITION* is only seven bits wide, and
therefore that the phrase "8 bit ASCII" is meaningless, and open to all sorts of
possible misinterpretation. Three corrections are possible, and I don't know
which one is right:
1. char = unsigned 7 bit ASCII.
2. char = unsigned 8 bit UTF-8
3. char = unsigned 8-bit ISO 8059-1 (aka LATIN-1)

Please note that while choice 3 is a subset of Unicode, it is incompatible with
choice 2. Each of the three choices differ in how codepoints 0x80 to 0xFF are
interpretted. Specifically:
1. ASCII - codepoints 0x80 to 0xFF are undefined
2. UTF-8 - codepoints 0x80 to 0xFF are reserved for proper UTF-8 encoding
3. ISO 8059-1 - codepoints 0x80 to 0xFF represent Unicode chars U+0080 to
U+00FF.

This seems a simple thing to fix. If this is not the right place to report this,
please can someone point me to the right place. Thanks.
Mar 11 2004
parent "Walter" <walter digitalmars.com> writes:
You must be looking at an old version. The current doc defines char as
unsigned 8 bit UTF-8. -Walter

"Unicode User" <Unicode_member pathlink.com> wrote in message
news:c2pgq5$1tnc$1 digitaldaemon.com...
 Hi,

 Not sure if this is the right place to report this. I am very, VERY
impressed
 with D - especially with the UTF support. Spending some time learning D
now.
 But there's an error in the documentation of the Basic Data Types. It
says:
 "char = unsigned 8 bit ASCII".

 I would point out that ASCII, *BY DEFINITION* is only seven bits wide, and
 therefore that the phrase "8 bit ASCII" is meaningless, and open to all
sorts of
 possible misinterpretation. Three corrections are possible, and I don't
know
 which one is right:
 1. char = unsigned 7 bit ASCII.
 2. char = unsigned 8 bit UTF-8
 3. char = unsigned 8-bit ISO 8059-1 (aka LATIN-1)

 Please note that while choice 3 is a subset of Unicode, it is incompatible
with
 choice 2. Each of the three choices differ in how codepoints 0x80 to 0xFF
are
 interpretted. Specifically:
 1. ASCII - codepoints 0x80 to 0xFF are undefined
 2. UTF-8 - codepoints 0x80 to 0xFF are reserved for proper UTF-8 encoding
 3. ISO 8059-1 - codepoints 0x80 to 0xFF represent Unicode chars U+0080 to
 U+00FF.

 This seems a simple thing to fix. If this is not the right place to report
this,
 please can someone point me to the right place. Thanks.
Mar 11 2004