digitalmars.D - Why is char initialized to 0xFF ?

James Blachly (18/18) Jun 08 2019 Disclaimer: I am not a unicode expert.

Adam D. Ruppe (11/13) Jun 08 2019 And that is exactly why it is the default: the idea here is to

Andrej Mitrovic (14/20) Jun 08 2019 To me they are not really obvious or useful, especially when I
KnightMare (18/24) Jun 09 2019 double d;

KnightMare (3/5) Jun 09 2019 not exactly in this line, but when we try to read from it first
Mike Parker (8/13) Jun 09 2019 You can set the default initializer in this case:

KnightMare (14/19) Jun 09 2019 I agree that memory must be initialized unless otherwise stated.

Patrick Schluter (18/43) Jun 09 2019 Which is technically not possible in D because D always

KnightMare (8/8) Jun 09 2019 On Sunday, 9 June 2019 at 08:36:30 UTC, Patrick Schluter wrote:

Jonathan M Davis (25/33) Jun 09 2019 Given how init works in D and how it's used all over the place, it reall...

KnightMare (14/24) Jun 09 2019 this mean that u should initialize added elements to some

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (16/20) Jun 09 2019 I don't think it is undefined though… If something has an

James Blachly (6/29) Jun 09 2019 Yes, and further I would suggest that non-zero-bit initializations
Patrick Schluter (3/9) Jun 09 2019 It is undefined behaviour by the definition of the standard.

lithium iodate (10/21) Jun 09 2019 To be fair, the C standard (C11) is a bit self-contradicting

James Blachly <james.blachly gmail.com> writes:

Disclaimer: I am not a unicode expert.

Background: I have added UTF8 character type support to lldb in 
conjunction with adding support for D string/wstring/dstring.

Dlang char is analogous to C++20 char8_t[1] AFAICT.

The default initialization value in C++20 is u8'\0', whereas in D 
char.init is '\xFF'[2]. Likewise, wchar .init is 0xFFFF and dchar is 
0x0000FFFF.

char is a UTF8 character, but 0xFF is specifically forbidden[3] by the 
UTF8 specification.

What is the reasoning behind this? Is it related to zero-termination of 
C strings? Should it be considered for change?

It is surprising that these do not init to the null value, which is 
valid UTF.

Kind regards
James


[1] http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0482r6.html
[2] https://dlang.org/spec/type.html
[3] https://en.wikipedia.org/wiki/UTF-8#Invalid_byte_sequences

Jun 08 2019

Adam D. Ruppe <destructionator gmail.com> writes:

On Saturday, 8 June 2019 at 17:55:07 UTC, James Blachly wrote:
 char is a UTF8 character, but 0xFF is specifically forbidden[3] 
 by the UTF8 specification.

And that is exactly why it is the default: the idea here is to 
make uninitialized variables obvious, because they will be a 
predictable, but invalid value when they appear.

Same reason why floats are nan and classes are null btw. `int` is 
the exception as being default initialized as something that 
happens to be really useful.

(and arrays kinda are special too. technically they are null, but 
the runtime will automatically allocate null arrays when needed, 
so it works transparently anyway... and ends up being super 
useful)

Jun 08 2019

Andrej Mitrovic <andrej.mitrovich gmail.com> writes:

On Saturday, 8 June 2019 at 18:04:46 UTC, Adam D. Ruppe wrote:
 On Saturday, 8 June 2019 at 17:55:07 UTC, James Blachly wrote:
 char is a UTF8 character, but 0xFF is specifically 
 forbidden[3] by the UTF8 specification.

 And that is exactly why it is the default: the idea here is to 
 make uninitialized variables obvious, because they will be a 
 predictable, but invalid value when they appear.

To me they are not really obvious or useful, especially when I 
interface with C/C++. I pass some default-initialized char or 
float to a C/C++ library (by mistake), and I get some weird 
output written in some distant data field. The end result is 
either broken data somewhere down the line, or garbled output in 
the UI.

I much prefer default values which are correct for 99% of the 
intended use-cases. I make full use of the fact integers 
default-initialize to zero, I think it's a great "feature". If 
there was a NaN for integers, I'd probably hate it..

I would prefer it if the compiler (or a tool!) had a switch 
--check-use-before-initialize or something of the sort, with 
code-flow analysis and all that good stuff.

Jun 08 2019

KnightMare <black80 bk.ru> writes:

On Saturday, 8 June 2019 at 18:04:46 UTC, Adam D. Ruppe wrote:
 On Saturday, 8 June 2019 at 17:55:07 UTC, James Blachly wrote:
 char is a UTF8 character, but 0xFF is specifically 
 forbidden[3] by the UTF8 specification.

 And that is exactly why it is the default: the idea here is to 
 make uninitialized variables obvious, because they will be a 
 predictable, but invalid value when they appear.

double d;
most compilers fire error "using unitialized variable".
another side "I(D compiler) will tell u nothing for that, but 
u'll get a shit! haha"

ok. lets see structs now
struct S { double d; }
S s;
in most compilers s will contains zeros. in C/C++ - garbage.
men comes to D not as first language, they has troubles with 
garbage in structs already, and they still forget initialize it 
right (I do), so rule "all initialization is zeros" is the best 
and right thing that can be.
if u dont initialize use "= void" - is good too.
but initialize ints as 0, ptrs as null, chars as #FF, doubles as 
NaN - is was invented under mushrooms

men comes to D and see char=#ff,double=NaN
https://www.youtube.com/watch?v=Qsa41csyNU8

Jun 09 2019

KnightMare <black80 bk.ru> writes:

On Sunday, 9 June 2019 at 07:48:46 UTC, KnightMare wrote:
 double d;
 most compilers fire error "using unitialized variable".

not exactly in this line, but when we try to read from it first 
like "d += ..."

Jun 09 2019

Mike Parker <aldacron gmail.com> writes:

On Sunday, 9 June 2019 at 07:48:46 UTC, KnightMare wrote:

 ok. lets see structs now
 struct S { double d; }
 S s;

You can set the default initializer in this case:

struct S { double d = 0.0; }


 but initialize ints as 0, ptrs as null, chars as #FF, doubles 
 as NaN - is was invented under mushrooms

Not at all. It's quite practical for debugging. Uninitialized 
variables are a pain in C and C++. Default initializing to 
invalid values makes them stand out in the debugger. The drawback 
is that the integrals (and bool) have no invalid value, so we're 
stuck with 0 (and false).

Jun 09 2019

KnightMare <black80 bk.ru> writes:

On Sunday, 9 June 2019 at 08:26:45 UTC, Mike Parker wrote:

 Not at all. It's quite practical for debugging. Uninitialized 
 variables are a pain in C and C++. Default initializing to 
 invalid values makes them stand out in the debugger. The 
 drawback is that the integrals (and bool) have no invalid 
 value, so we're stuck with 0 (and false).

I agree that memory must be initialized unless otherwise stated.
I disagree that garbage(uninit value) should be FF and NaN.
again "all zeroes" is best and right thing.
people are the main resource, they have expectations, the expect 
zeroes, u can poll they "what values shuold be used for 
unitialized vars?" and if u think about it u will answer.. 
what?.. any men on the street. no, in IT-park.

imo coz nobody used FF and Nan in D-code now (so, the default is 
FF, so I just do "ch += 1" and I've got 00! I am cool hacker!), 
we can change it to most expecting values (I think it zero). In 
any case we can do poll between D-users for beggining.
or lets setup tagline for D "We have our own way, dont boomboom 
our brain!". joke. maybe a little bit trolled.

Jun 09 2019

Patrick Schluter <Patrick.Schluter bbox.fr> writes:

On Sunday, 9 June 2019 at 07:48:46 UTC, KnightMare wrote:
 On Saturday, 8 June 2019 at 18:04:46 UTC, Adam D. Ruppe wrote:
 On Saturday, 8 June 2019 at 17:55:07 UTC, James Blachly wrote:
 char is a UTF8 character, but 0xFF is specifically 
 forbidden[3] by the UTF8 specification.

 And that is exactly why it is the default: the idea here is to 
 make uninitialized variables obvious, because they will be a 
 predictable, but invalid value when they appear.

 double d;
 most compilers fire error "using unitialized variable".

Which is technically not possible in D because D always 
initializes variables. In C and C++ if you'd declare
double d=0.0; you wouldn't get the "using unitialized variable" 
warning either. Independantly if 0 is the right or the wrong init 
value.

 another side "I(D compiler) will tell u nothing for that, but 
 u'll get a shit! haha"

 ok. lets see structs now
 struct S { double d; }
 S s;
 in most compilers s will contains zeros. in C/C++ - garbage.
 men comes to D not as first language, they has troubles with 
 garbage in structs already, and they still forget initialize it 
 right (I do), so rule "all initialization is zeros" is the best 
 and right thing that can be.

No, by putting NaN in d you hav e a deterministic error. In C and 
C++ you will have undefined behaviour that will vary with 
compiler, version, options, OS version, architecture, position of 
the moon, etc. and sometimes undetectable bugs.

 if u dont initialize use "= void" - is good too.
 but initialize ints as 0, ptrs as null, chars as #FF, doubles 
 as NaN - is was invented under mushrooms

No. If there were an equivalent of NaN for ints it would also be 
used ( Personnaly I really would prefer int.init == int.int_min 
and uint.init == uint.uint_max).

Default initialisation of variable is here to have deterministic 
behaviour between versions and runs, i.e. get rid of nasal 
demons, not to mind read the appropriate initial value of a 
variable, that is something the programmer still has the 
responsibility for.


 men comes to D and see char=#ff,double=NaN
 https://www.youtube.com/watch?v=Qsa41csyNU8

Jun 09 2019

KnightMare <black80 bk.ru> writes:

On Sunday, 9 June 2019 at 08:36:30 UTC, Patrick Schluter wrote:

I read the bible too. I know reasons why leaders decided use NaN 
and FF.
but
what is the best solution:
do some math and get garbage in C++ or NaN in D?
or compiler will tell "using unitialized variable" before any 
math?

Jun 09 2019

Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:

On Sunday, June 9, 2019 3:27:39 AM MDT KnightMare via Digitalmars-d wrote:
 On Sunday, 9 June 2019 at 08:36:30 UTC, Patrick Schluter wrote:

 I read the bible too. I know reasons why leaders decided use NaN
 and FF.
 but
 what is the best solution:
 do some math and get garbage in C++ or NaN in D?
 or compiler will tell "using unitialized variable" before any
 math?

Given how init works in D and how it's used all over the place, it really
isn't feasible to have the compiler tell you to initialize the variable. A
prime example would be with dynamic arrays. Mutating the length of a dynamic
array has to use the init value. e.g.

arr.length += 15;

wouldn't work if init weren't used. Another example would be out parameters.
They get assigned the init value for the type when the function is called.

A lot of aspects of D are built around the fact that every type has an init
value and the fact that values of that type are always initialized to that
init value if they're not given an explicit value. At some point during the
language's development, the ability to  disable the default intialization of
a type was added, but even those types still have an init value. And while
it works,  disabling default initialization causes all kinds of subtle
problems precisely because D was built around the idea that every type could
be default-initialized.

Sure, there are some downsides to D's approach (such as getting unexpected
NaNs or not having default constructors for structs), but it also solves a
whole class of problems that other languages like C and C++ have with
garbage values. Even Java has problems with garbage values in spite of it
requiring that you initialize variables before using them (e.g. it's quite
possible to use a static variable in Java before it's initialized because of
circular reference issues). D, on the other hand, never has garbage values
unless you use  system features like = void to force it.

- Jonathan M Davis

Jun 09 2019

KnightMare <black80 bk.ru> writes:

On Sunday, 9 June 2019 at 13:45:07 UTC, Jonathan M Davis wrote:
 Mutating the length of a dynamic array has to use the init 
 value. e.g.
 arr.length += 15;
 wouldn't work if init weren't used.

this mean that u should initialize added elements to some 
value(usual 0.0) again, coz NaN is not useful at all, no any case 
where u can use it. tada! double work!

 A lot of aspects of D are built around the fact that every type 
 has an init value and the fact that values of that type are 
 always initialized to that init value if they're not given an 
 explicit value.

I agree that data will should be initialized.. but with "all 
zeroes" not NaN or \xFF.

  disable..

I found only this https://dlang.org/spec/attribute.html#disable
I feel that I miss something. what is your point?

 not having default constructors for structs

I am accepting it, its only in my responsibility assign some 
values to fields. but "all fields are zeroes" is good choice for 
me.
NaN - means that I should to think small details when I switch 

do nothing.

Jun 09 2019

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:

On Sunday, 9 June 2019 at 08:36:30 UTC, Patrick Schluter wrote:
 No, by putting NaN in d you hav e a deterministic error. In C 
 and C++ you will have undefined behaviour that will vary with 
 compiler, version, options, OS version, architecture, position 
 of the moon, etc. and sometimes undetectable bugs.

I don't think it is undefined though… If something has an 
arbitrary value, you could still compute with it, if the 
algorithm takes that into account. Assuming that all bit-patterns 
provides a defined value (which is the case for IEEE floating 
point bit-patterns).

Anyway, the obvious advantage with having structs default 
initialized to all-bits-zero is that you can have an allocator 
that clears bits in the background (bypassing caches so they are 
not polluted).

Then you have no penalty when allocating an array of one million 
struct values. Which is very useful. Just allocate memory-chunks 
that are already set to zero-bits.

You usually want an array of floating point values to be 
pre-initialized to zeros. You almost never want an array of 
floating point values being initialized to NaN.

Jun 09 2019

James Blachly <james.blachly gmail.com> writes:

On 6/9/19 8:19 AM, Ola Fosheim Grøstad wrote:
 On Sunday, 9 June 2019 at 08:36:30 UTC, Patrick Schluter wrote:
 No, by putting NaN in d you hav e a deterministic error. In C and C++ 
 you will have undefined behaviour that will vary with compiler, 
 version, options, OS version, architecture, position of the moon, etc. 
 and sometimes undetectable bugs.

 
 I don't think it is undefined though… If something has an arbitrary 
 value, you could still compute with it, if the algorithm takes that into 
 account. Assuming that all bit-patterns provides a defined value (which 
 is the case for IEEE floating point bit-patterns).
 
 Anyway, the obvious advantage with having structs default initialized to 
 all-bits-zero is that you can have an allocator that clears bits in the 
 background (bypassing caches so they are not polluted).
 
 Then you have no penalty when allocating an array of one million struct 
 values. Which is very useful. Just allocate memory-chunks that are 
 already set to zero-bits.
 
 You usually want an array of floating point values to be pre-initialized 
 to zeros. You almost never want an array of floating point values being 
 initialized to NaN.
 


Yes, and further I would suggest that non-zero-bit initializations 
violate the principle of least surprise.

As someone posted upthread, it would be interesting to take a poll of 
new users (or non users, or perhaps the D-curious) and ask what their 
best guess is for each default value.

Jun 09 2019

Patrick Schluter <Patrick.Schluter bbox.fr> writes:

On Sunday, 9 June 2019 at 12:19:43 UTC, Ola Fosheim Grøstad wrote:
 On Sunday, 9 June 2019 at 08:36:30 UTC, Patrick Schluter wrote:
 No, by putting NaN in d you hav e a deterministic error. In C 
 and C++ you will have undefined behaviour that will vary with 
 compiler, version, options, OS version, architecture, position 
 of the moon, etc. and sometimes undetectable bugs.

 I don't think it is undefined though…

It is undefined behaviour by the definition of the standard. 
undefined behaviour includes behaviour that can be explained.

Jun 09 2019

lithium iodate <whatdoiknow doesntexist.net> writes:

On Sunday, 9 June 2019 at 18:27:07 UTC, Patrick Schluter wrote:
 On Sunday, 9 June 2019 at 12:19:43 UTC, Ola Fosheim Grøstad 
 wrote:
 On Sunday, 9 June 2019 at 08:36:30 UTC, Patrick Schluter wrote:
 No, by putting NaN in d you hav e a deterministic error. In C 
 and C++ you will have undefined behaviour that will vary with 
 compiler, version, options, OS version, architecture, 
 position of the moon, etc. and sometimes undetectable bugs.

 I don't think it is undefined though…

 It is undefined behaviour by the definition of the standard. 
 undefined behaviour includes behaviour that can be explained.

To be fair, the C standard (C11) is a bit self-contradicting 
there. Variables of automatic storage duration that are not 
explicitly initialized contain an unspecified value (i. e. any 
valid value) or a trap representation. Types such as integers 
usually don't have any trap representations so reading them 
should be defined on most platforms (unless C permits some sort 
of compile-time-only trap representation? not sure.). Then 
there's informative(!) annex J which explicitly lists it as 
undefined behavior.

Jun 09 2019

D Programming

C/C++ Programming

Other

digitalmars.D - Why is char initialized to 0xFF ?