digitalmars.D - Why is char initialized to 0xFF ?
- James Blachly (18/18) Jun 08 2019 Disclaimer: I am not a unicode expert.
- Adam D. Ruppe (11/13) Jun 08 2019 And that is exactly why it is the default: the idea here is to
- Andrej Mitrovic (14/20) Jun 08 2019 To me they are not really obvious or useful, especially when I
- KnightMare (18/24) Jun 09 2019 double d;
- KnightMare (3/5) Jun 09 2019 not exactly in this line, but when we try to read from it first
- Mike Parker (8/13) Jun 09 2019 You can set the default initializer in this case:
- KnightMare (14/19) Jun 09 2019 I agree that memory must be initialized unless otherwise stated.
- Patrick Schluter (18/43) Jun 09 2019 Which is technically not possible in D because D always
- KnightMare (8/8) Jun 09 2019 On Sunday, 9 June 2019 at 08:36:30 UTC, Patrick Schluter wrote:
- Jonathan M Davis (25/33) Jun 09 2019 Given how init works in D and how it's used all over the place, it reall...
- KnightMare (14/24) Jun 09 2019 this mean that u should initialize added elements to some
- Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (16/20) Jun 09 2019 I don't think it is undefined though… If something has an
- James Blachly (6/29) Jun 09 2019 Yes, and further I would suggest that non-zero-bit initializations
- Patrick Schluter (3/9) Jun 09 2019 It is undefined behaviour by the definition of the standard.
- lithium iodate (10/21) Jun 09 2019 To be fair, the C standard (C11) is a bit self-contradicting
Disclaimer: I am not a unicode expert. Background: I have added UTF8 character type support to lldb in conjunction with adding support for D string/wstring/dstring. Dlang char is analogous to C++20 char8_t[1] AFAICT. The default initialization value in C++20 is u8'\0', whereas in D char.init is '\xFF'[2]. Likewise, wchar .init is 0xFFFF and dchar is 0x0000FFFF. char is a UTF8 character, but 0xFF is specifically forbidden[3] by the UTF8 specification. What is the reasoning behind this? Is it related to zero-termination of C strings? Should it be considered for change? It is surprising that these do not init to the null value, which is valid UTF. Kind regards James [1] http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0482r6.html [2] https://dlang.org/spec/type.html [3] https://en.wikipedia.org/wiki/UTF-8#Invalid_byte_sequences
Jun 08 2019
On Saturday, 8 June 2019 at 17:55:07 UTC, James Blachly wrote:char is a UTF8 character, but 0xFF is specifically forbidden[3] by the UTF8 specification.And that is exactly why it is the default: the idea here is to make uninitialized variables obvious, because they will be a predictable, but invalid value when they appear. Same reason why floats are nan and classes are null btw. `int` is the exception as being default initialized as something that happens to be really useful. (and arrays kinda are special too. technically they are null, but the runtime will automatically allocate null arrays when needed, so it works transparently anyway... and ends up being super useful)
Jun 08 2019
On Saturday, 8 June 2019 at 18:04:46 UTC, Adam D. Ruppe wrote:On Saturday, 8 June 2019 at 17:55:07 UTC, James Blachly wrote:To me they are not really obvious or useful, especially when I interface with C/C++. I pass some default-initialized char or float to a C/C++ library (by mistake), and I get some weird output written in some distant data field. The end result is either broken data somewhere down the line, or garbled output in the UI. I much prefer default values which are correct for 99% of the intended use-cases. I make full use of the fact integers default-initialize to zero, I think it's a great "feature". If there was a NaN for integers, I'd probably hate it.. I would prefer it if the compiler (or a tool!) had a switch --check-use-before-initialize or something of the sort, with code-flow analysis and all that good stuff.char is a UTF8 character, but 0xFF is specifically forbidden[3] by the UTF8 specification.And that is exactly why it is the default: the idea here is to make uninitialized variables obvious, because they will be a predictable, but invalid value when they appear.
Jun 08 2019
On Saturday, 8 June 2019 at 18:04:46 UTC, Adam D. Ruppe wrote:On Saturday, 8 June 2019 at 17:55:07 UTC, James Blachly wrote:double d; most compilers fire error "using unitialized variable". another side "I(D compiler) will tell u nothing for that, but u'll get a shit! haha" ok. lets see structs now struct S { double d; } S s; in most compilers s will contains zeros. in C/C++ - garbage. men comes to D not as first language, they has troubles with garbage in structs already, and they still forget initialize it right (I do), so rule "all initialization is zeros" is the best and right thing that can be. if u dont initialize use "= void" - is good too. but initialize ints as 0, ptrs as null, chars as #FF, doubles as NaN - is was invented under mushrooms men comes to D and see char=#ff,double=NaN https://www.youtube.com/watch?v=Qsa41csyNU8char is a UTF8 character, but 0xFF is specifically forbidden[3] by the UTF8 specification.And that is exactly why it is the default: the idea here is to make uninitialized variables obvious, because they will be a predictable, but invalid value when they appear.
Jun 09 2019
On Sunday, 9 June 2019 at 07:48:46 UTC, KnightMare wrote:double d; most compilers fire error "using unitialized variable".not exactly in this line, but when we try to read from it first like "d += ..."
Jun 09 2019
On Sunday, 9 June 2019 at 07:48:46 UTC, KnightMare wrote:ok. lets see structs now struct S { double d; } S s;You can set the default initializer in this case: struct S { double d = 0.0; }but initialize ints as 0, ptrs as null, chars as #FF, doubles as NaN - is was invented under mushroomsNot at all. It's quite practical for debugging. Uninitialized variables are a pain in C and C++. Default initializing to invalid values makes them stand out in the debugger. The drawback is that the integrals (and bool) have no invalid value, so we're stuck with 0 (and false).
Jun 09 2019
On Sunday, 9 June 2019 at 08:26:45 UTC, Mike Parker wrote:Not at all. It's quite practical for debugging. Uninitialized variables are a pain in C and C++. Default initializing to invalid values makes them stand out in the debugger. The drawback is that the integrals (and bool) have no invalid value, so we're stuck with 0 (and false).I agree that memory must be initialized unless otherwise stated. I disagree that garbage(uninit value) should be FF and NaN. again "all zeroes" is best and right thing. people are the main resource, they have expectations, the expect zeroes, u can poll they "what values shuold be used for unitialized vars?" and if u think about it u will answer.. what?.. any men on the street. no, in IT-park. imo coz nobody used FF and Nan in D-code now (so, the default is FF, so I just do "ch += 1" and I've got 00! I am cool hacker!), we can change it to most expecting values (I think it zero). In any case we can do poll between D-users for beggining. or lets setup tagline for D "We have our own way, dont boomboom our brain!". joke. maybe a little bit trolled.
Jun 09 2019
On Sunday, 9 June 2019 at 07:48:46 UTC, KnightMare wrote:On Saturday, 8 June 2019 at 18:04:46 UTC, Adam D. Ruppe wrote:Which is technically not possible in D because D always initializes variables. In C and C++ if you'd declare double d=0.0; you wouldn't get the "using unitialized variable" warning either. Independantly if 0 is the right or the wrong init value.On Saturday, 8 June 2019 at 17:55:07 UTC, James Blachly wrote:double d; most compilers fire error "using unitialized variable".char is a UTF8 character, but 0xFF is specifically forbidden[3] by the UTF8 specification.And that is exactly why it is the default: the idea here is to make uninitialized variables obvious, because they will be a predictable, but invalid value when they appear.another side "I(D compiler) will tell u nothing for that, but u'll get a shit! haha" ok. lets see structs now struct S { double d; } S s; in most compilers s will contains zeros. in C/C++ - garbage. men comes to D not as first language, they has troubles with garbage in structs already, and they still forget initialize it right (I do), so rule "all initialization is zeros" is the best and right thing that can be.No, by putting NaN in d you hav e a deterministic error. In C and C++ you will have undefined behaviour that will vary with compiler, version, options, OS version, architecture, position of the moon, etc. and sometimes undetectable bugs.if u dont initialize use "= void" - is good too. but initialize ints as 0, ptrs as null, chars as #FF, doubles as NaN - is was invented under mushroomsNo. If there were an equivalent of NaN for ints it would also be used ( Personnaly I really would prefer int.init == int.int_min and uint.init == uint.uint_max). Default initialisation of variable is here to have deterministic behaviour between versions and runs, i.e. get rid of nasal demons, not to mind read the appropriate initial value of a variable, that is something the programmer still has the responsibility for.men comes to D and see char=#ff,double=NaN https://www.youtube.com/watch?v=Qsa41csyNU8
Jun 09 2019
On Sunday, 9 June 2019 at 08:36:30 UTC, Patrick Schluter wrote: I read the bible too. I know reasons why leaders decided use NaN and FF. but what is the best solution: do some math and get garbage in C++ or NaN in D? or compiler will tell "using unitialized variable" before any math?
Jun 09 2019
On Sunday, June 9, 2019 3:27:39 AM MDT KnightMare via Digitalmars-d wrote:On Sunday, 9 June 2019 at 08:36:30 UTC, Patrick Schluter wrote: I read the bible too. I know reasons why leaders decided use NaN and FF. but what is the best solution: do some math and get garbage in C++ or NaN in D? or compiler will tell "using unitialized variable" before any math?Given how init works in D and how it's used all over the place, it really isn't feasible to have the compiler tell you to initialize the variable. A prime example would be with dynamic arrays. Mutating the length of a dynamic array has to use the init value. e.g. arr.length += 15; wouldn't work if init weren't used. Another example would be out parameters. They get assigned the init value for the type when the function is called. A lot of aspects of D are built around the fact that every type has an init value and the fact that values of that type are always initialized to that init value if they're not given an explicit value. At some point during the language's development, the ability to disable the default intialization of a type was added, but even those types still have an init value. And while it works, disabling default initialization causes all kinds of subtle problems precisely because D was built around the idea that every type could be default-initialized. Sure, there are some downsides to D's approach (such as getting unexpected NaNs or not having default constructors for structs), but it also solves a whole class of problems that other languages like C and C++ have with garbage values. Even Java has problems with garbage values in spite of it requiring that you initialize variables before using them (e.g. it's quite possible to use a static variable in Java before it's initialized because of circular reference issues). D, on the other hand, never has garbage values unless you use system features like = void to force it. - Jonathan M Davis
Jun 09 2019
On Sunday, 9 June 2019 at 13:45:07 UTC, Jonathan M Davis wrote:Mutating the length of a dynamic array has to use the init value. e.g. arr.length += 15; wouldn't work if init weren't used.this mean that u should initialize added elements to some value(usual 0.0) again, coz NaN is not useful at all, no any case where u can use it. tada! double work!A lot of aspects of D are built around the fact that every type has an init value and the fact that values of that type are always initialized to that init value if they're not given an explicit value.I agree that data will should be initialized.. but with "all zeroes" not NaN or \xFF.disable..I found only this https://dlang.org/spec/attribute.html#disable I feel that I miss something. what is your point?not having default constructors for structsI am accepting it, its only in my responsibility assign some values to fields. but "all fields are zeroes" is good choice for me. NaN - means that I should to think small details when I switch do nothing.
Jun 09 2019
On Sunday, 9 June 2019 at 08:36:30 UTC, Patrick Schluter wrote:No, by putting NaN in d you hav e a deterministic error. In C and C++ you will have undefined behaviour that will vary with compiler, version, options, OS version, architecture, position of the moon, etc. and sometimes undetectable bugs.I don't think it is undefined though… If something has an arbitrary value, you could still compute with it, if the algorithm takes that into account. Assuming that all bit-patterns provides a defined value (which is the case for IEEE floating point bit-patterns). Anyway, the obvious advantage with having structs default initialized to all-bits-zero is that you can have an allocator that clears bits in the background (bypassing caches so they are not polluted). Then you have no penalty when allocating an array of one million struct values. Which is very useful. Just allocate memory-chunks that are already set to zero-bits. You usually want an array of floating point values to be pre-initialized to zeros. You almost never want an array of floating point values being initialized to NaN.
Jun 09 2019
On 6/9/19 8:19 AM, Ola Fosheim Grøstad wrote:On Sunday, 9 June 2019 at 08:36:30 UTC, Patrick Schluter wrote:Yes, and further I would suggest that non-zero-bit initializations violate the principle of least surprise. As someone posted upthread, it would be interesting to take a poll of new users (or non users, or perhaps the D-curious) and ask what their best guess is for each default value.No, by putting NaN in d you hav e a deterministic error. In C and C++ you will have undefined behaviour that will vary with compiler, version, options, OS version, architecture, position of the moon, etc. and sometimes undetectable bugs.I don't think it is undefined though… If something has an arbitrary value, you could still compute with it, if the algorithm takes that into account. Assuming that all bit-patterns provides a defined value (which is the case for IEEE floating point bit-patterns). Anyway, the obvious advantage with having structs default initialized to all-bits-zero is that you can have an allocator that clears bits in the background (bypassing caches so they are not polluted). Then you have no penalty when allocating an array of one million struct values. Which is very useful. Just allocate memory-chunks that are already set to zero-bits. You usually want an array of floating point values to be pre-initialized to zeros. You almost never want an array of floating point values being initialized to NaN.
Jun 09 2019
On Sunday, 9 June 2019 at 12:19:43 UTC, Ola Fosheim Grøstad wrote:On Sunday, 9 June 2019 at 08:36:30 UTC, Patrick Schluter wrote:It is undefined behaviour by the definition of the standard. undefined behaviour includes behaviour that can be explained.No, by putting NaN in d you hav e a deterministic error. In C and C++ you will have undefined behaviour that will vary with compiler, version, options, OS version, architecture, position of the moon, etc. and sometimes undetectable bugs.I don't think it is undefined though…
Jun 09 2019
On Sunday, 9 June 2019 at 18:27:07 UTC, Patrick Schluter wrote:On Sunday, 9 June 2019 at 12:19:43 UTC, Ola Fosheim Grøstad wrote:To be fair, the C standard (C11) is a bit self-contradicting there. Variables of automatic storage duration that are not explicitly initialized contain an unspecified value (i. e. any valid value) or a trap representation. Types such as integers usually don't have any trap representations so reading them should be defined on most platforms (unless C permits some sort of compile-time-only trap representation? not sure.). Then there's informative(!) annex J which explicitly lists it as undefined behavior.On Sunday, 9 June 2019 at 08:36:30 UTC, Patrick Schluter wrote:It is undefined behaviour by the definition of the standard. undefined behaviour includes behaviour that can be explained.No, by putting NaN in d you hav e a deterministic error. In C and C++ you will have undefined behaviour that will vary with compiler, version, options, OS version, architecture, position of the moon, etc. and sometimes undetectable bugs.I don't think it is undefined though…
Jun 09 2019