www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Null pointer dereferencing in D

reply "bearophile" <bearophileHUGS lycos.com> writes:
This was surely discussed in past, but I don't remember the 
answer (so perhaps this is more fit in D.learn).

Dereferencing the null pointer in C is undefined behaviour, so in 
most cases the program segfaults, but sometimes the compiler 
assumes a dereferenced pointer can't be null, so it optimizes 
away tests and other parts, leading to bugs and problems, 
including exploits:

http://blog.llvm.org/2011/05/what-every-c-programmer-should-know_14.html

What's the solution for very optimizing D compilers like GDC, now 
and in future? Surely the C not-solution is not acceptable in a 
safe language as D.

Perhaps D can define dereferencing the null pointer in C as 
segfaulting in all cases. Is this acceptable for people that 
could desire to write a kernel in D?

An alternative possibility is to go the Java way, and add a 
compiler switch that adds an assert before every pointer 
dereference after the compiler has optimized the code (and remove 
some of such asserts where the compiler is certain they can't be 
null).

What's the solution used by Rust (beside not having to deal with 
nulls in many cases)?

Bye,
bearophile
Jun 13 2014
next sibling parent reply "deadalnix" <deadalnix gmail.com> writes:
This indeed has been already discussed. Long story short :

Making dereferencing null a defined behavior is extremely
expensive from the optimizer perspective. That mean that every
single load can have side effect, so the optimizer can't optimize
them away unless it can prove that the pointer is not null. It
also cannot reorder load anymore unless it can prove they are not
null.

This mean 2 things:
   - D programs will be much slower
   - We won't be able to reuse existing optimizers out of the box.

That sound like a terrible idea to me. And ultimately, it won't
solve the problem of null pointers that can all over the place
(this is major issue in some codebases).

The approach consisting in having non nullable pointers/reference
by default is the one that is gaining traction and for good
reasons.
Jun 13 2014
parent reply Jonathan M Davis via Digitalmars-d <digitalmars-d puremagic.com> writes:
On Fri, 13 Jun 2014 21:23:00 +0000
deadalnix via Digitalmars-d <digitalmars-d puremagic.com> wrote:

 The approach consisting in having non nullable pointers/reference
 by default is the one that is gaining traction and for good
 reasons.
That interacts _really_ badly with D's approach of requiring init values for all types. We have enough problems with disable this() as it is. - Jonathan M Davis
Jun 13 2014
parent reply Timon Gehr <timon.gehr gmx.ch> writes:
On 06/13/2014 11:45 PM, Jonathan M Davis via Digitalmars-d wrote:
 On Fri, 13 Jun 2014 21:23:00 +0000
 deadalnix via Digitalmars-d <digitalmars-d puremagic.com> wrote:

 The approach consisting in having non nullable pointers/reference
 by default is the one that is gaining traction and for good
 reasons.
That interacts _really_ badly with D's approach of requiring init values for all types. We have enough problems with disable this() as it is. - Jonathan M Davis
disable this() and nested structs etc. Trying to require init values for everything isn't an extraordinarily good idea. It roughly extends 'nullable by default' to all _structs_ with non-trivial invariants.
Jun 13 2014
parent reply Jonathan M Davis via Digitalmars-d <digitalmars-d puremagic.com> writes:
On Sat, 14 Jun 2014 00:34:51 +0200
Timon Gehr via Digitalmars-d <digitalmars-d puremagic.com> wrote:

 On 06/13/2014 11:45 PM, Jonathan M Davis via Digitalmars-d wrote:
 On Fri, 13 Jun 2014 21:23:00 +0000
 deadalnix via Digitalmars-d <digitalmars-d puremagic.com> wrote:

 The approach consisting in having non nullable pointers/reference
 by default is the one that is gaining traction and for good
 reasons.
That interacts _really_ badly with D's approach of requiring init values for all types. We have enough problems with disable this() as it is. - Jonathan M Davis
disable this() and nested structs etc. Trying to require init values for everything isn't an extraordinarily good idea. It roughly extends 'nullable by default' to all _structs_ with non-trivial invariants.
True, some types become problematic when you have to have an init value (like a NonNullable struct to make nullable pointers non-nullable), but generic code is way more of a pain to write when you can't rely on an init value existing, and there are a number of places that the language requires an init value (like arrays), making types which don't have init values problematic to use. Overall, I think that adding disable this() to the language was a mistake. - Jonathan M Davis
Jun 13 2014
next sibling parent "deadalnix" <deadalnix gmail.com> writes:
On Saturday, 14 June 2014 at 00:40:06 UTC, Jonathan M Davis via
Digitalmars-d wrote:
 True, some types become problematic when you have to have an 
 init value (like
 a NonNullable struct to make nullable pointers non-nullable), 
 but generic code
 is way more of a pain to write when you can't rely on an init 
 value existing,
 and there are a number of places that the language requires an 
 init value
 (like arrays), making types which don't have init values 
 problematic to use.
 Overall, I think that adding  disable this() to the language 
 was a mistake.

 - Jonathan M Davis
Yes, because who care about the code being correct, or even what it does being defined ? As long as it is easy to write.
Jun 13 2014
prev sibling next sibling parent reply "Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> writes:
On Saturday, 14 June 2014 at 00:40:06 UTC, Jonathan M Davis via 
Digitalmars-d wrote:
 On Sat, 14 Jun 2014 00:34:51 +0200
 Timon Gehr via Digitalmars-d <digitalmars-d puremagic.com> 
 wrote:

 On 06/13/2014 11:45 PM, Jonathan M Davis via Digitalmars-d 
 wrote:
 On Fri, 13 Jun 2014 21:23:00 +0000
 deadalnix via Digitalmars-d <digitalmars-d puremagic.com> 
 wrote:

 The approach consisting in having non nullable 
 pointers/reference
 by default is the one that is gaining traction and for good
 reasons.
That interacts _really_ badly with D's approach of requiring init values for all types. We have enough problems with disable this() as it is. - Jonathan M Davis
disable this() and nested structs etc. Trying to require init values for everything isn't an extraordinarily good idea. It roughly extends 'nullable by default' to all _structs_ with non-trivial invariants.
True, some types become problematic when you have to have an init value (like a NonNullable struct to make nullable pointers non-nullable), but generic code is way more of a pain to write when you can't rely on an init value existing, and there are a number of places that the language requires an init value (like arrays), making types which don't have init values problematic to use. Overall, I think that adding disable this() to the language was a mistake.
Huh? Types with ` disable this()` still have an `init` value. All it does is disallow instantiating the type without specifying an initializer (e.g. a struct literal, a value returned from a factory function, or `static opCall()`).
Jun 14 2014
parent reply "Dicebot" <public dicebot.lv> writes:
On Saturday, 14 June 2014 at 10:15:49 UTC, Marc Schütz wrote:
 Huh? Types with ` disable this()` still have an `init` value. 
 All it does is disallow instantiating the type without 
 specifying an initializer (e.g. a struct literal, a value 
 returned from a factory function, or `static opCall()`).
Which is effectively a type system hole with disable this : struct A { disable this(); } auto a = A.init;
Jun 14 2014
next sibling parent reply "Maxim Fomin" <maxim maxim-fomin.ru> writes:
On Saturday, 14 June 2014 at 12:33:28 UTC, Dicebot wrote:
 On Saturday, 14 June 2014 at 10:15:49 UTC, Marc Schütz wrote:
 Huh? Types with ` disable this()` still have an `init` value. 
 All it does is disallow instantiating the type without 
 specifying an initializer (e.g. a struct literal, a value 
 returned from a factory function, or `static opCall()`).
Which is effectively a type system hole with disable this : struct A { disable this(); } auto a = A.init;
Why this is a type hole if initializer is explicitly provided? The idea of disabled this() is to prevent default initialization, not to reject potentially buggy one.
Jun 14 2014
parent reply "Dicebot" <public dicebot.lv> writes:
On Saturday, 14 June 2014 at 13:38:40 UTC, Maxim Fomin wrote:
 Which is effectively a type system hole with  disable this :

 struct A {  disable this(); }
 auto a = A.init;
Why this is a type hole if initializer is explicitly provided? The idea of disabled this() is to prevent default initialization, not to reject potentially buggy one.
Well consider imaginary NotNullable struct that uses " disable this()" to guarantee that instance of that struct always has meaningful state. By using (NotNullable!T).init you can get value of that type which is in fact null and pass it as an argument to function that expects NotNullable to always be non null. With no casts involved you have just circumvented guarantees static type system was suppose to give.
Jun 14 2014
next sibling parent reply "John Colvin" <john.loughran.colvin gmail.com> writes:
On Saturday, 14 June 2014 at 14:51:10 UTC, Dicebot wrote:
 On Saturday, 14 June 2014 at 13:38:40 UTC, Maxim Fomin wrote:
 Which is effectively a type system hole with  disable this :

 struct A {  disable this(); }
 auto a = A.init;
Why this is a type hole if initializer is explicitly provided? The idea of disabled this() is to prevent default initialization, not to reject potentially buggy one.
Well consider imaginary NotNullable struct that uses " disable this()" to guarantee that instance of that struct always has meaningful state. By using (NotNullable!T).init you can get value of that type which is in fact null and pass it as an argument to function that expects NotNullable to always be non null. With no casts involved you have just circumvented guarantees static type system was suppose to give.
Hole in the type system: yes Necessarily a bad thing: no Some data-types require runtime initialisation to be valid. By using .init you are explicitly circumventing any runtime initialisation. It's an explicit hole, just like cast. It appears that it possible (in 2.065 at least) for a struct to provide it's own init, which can of course be template init() { static assert(false); } Is the ability to manually specify .init a bug or a feature? I feel like it's a bug. I would like to see a logicalInit added to druntime/phobos and adopted where appropriate instead of .init auto logicalInit(T)() { static if(hasMember!(T, "logicalInit")) { return T.logicalInit; } else { return T.init; } } and then types can define their own logicalInit, whether it's just a static assert or something else. Perhaps it should be limited to compile-time values only to avoid it being a default constructor workaround.
Jun 14 2014
parent "Maxim Fomin" <maxim maxim-fomin.ru> writes:
On Saturday, 14 June 2014 at 15:41:10 UTC, John Colvin wrote:
 On Saturday, 14 June 2014 at 14:51:10 UTC, Dicebot wrote:
 On Saturday, 14 June 2014 at 13:38:40 UTC, Maxim Fomin wrote:
 Which is effectively a type system hole with  disable this :

 struct A {  disable this(); }
 auto a = A.init;
Why this is a type hole if initializer is explicitly provided? The idea of disabled this() is to prevent default initialization, not to reject potentially buggy one.
Well consider imaginary NotNullable struct that uses " disable this()" to guarantee that instance of that struct always has meaningful state. By using (NotNullable!T).init you can get value of that type which is in fact null and pass it as an argument to function that expects NotNullable to always be non null. With no casts involved you have just circumvented guarantees static type system was suppose to give.
Hole in the type system: yes Necessarily a bad thing: no
It depends on what do you mean by type safety. If commonly accepted definition is chosen (for example, what type safety article means), than there is no type safety problem here.
 Some data-types require runtime initialisation to be valid. By 
 using .init you are explicitly circumventing any runtime 
 initialisation. It's an explicit hole, just like cast.
Yes, this is known. But note, that it is user, not the language, who circumvents initialization. By the way, what many people expect from the feature - to have reference which points to preallocated, valid object is hardly achievable in system level language where due to free access to memory and memory bugs reference may hold any value. For example, class object may turn into integer with value 12345. class A {} int I = 12345; A a = cast(A) I; With the same reasoning I can say that language is faulting here because it allowed for me to circumvent my own wishful(!) assumption that references should always be allocated and point to valid memory. The fact that pointer or reference can be null, is a tiny, tiny problem. You can test for null, but you cannot check whether pointer contains valid address. For example, try to figure out whether it is safe to write to address 0xFEFFABCD in some particular context.
 It appears that it possible (in 2.065 at least) for a struct to 
 provide it's own init, which can of course be

 template init() { static assert(false); }

 Is the ability to manually specify .init a bug or a feature? I 
 feel like it's a bug.
As far as I remember the ability to define init property was since 2.058. The semantic would that if you access init explicitly, you would get overridden property, but in other cases (for example allocation) the semantic is of language init. Probably it was filed as a bug, I don't remember.
Jun 14 2014
prev sibling parent reply "Maxim Fomin" <maxim maxim-fomin.ru> writes:
On Saturday, 14 June 2014 at 14:51:10 UTC, Dicebot wrote:
 On Saturday, 14 June 2014 at 13:38:40 UTC, Maxim Fomin wrote:
 Which is effectively a type system hole with  disable this :

 struct A {  disable this(); }
 auto a = A.init;
Why this is a type hole if initializer is explicitly provided? The idea of disabled this() is to prevent default initialization, not to reject potentially buggy one.
Well consider imaginary NotNullable struct that uses " disable this()" to guarantee that instance of that struct always has meaningful state. By using (NotNullable!T).init you can get value of that type which is in fact null and pass it as an argument to function that expects NotNullable to always be non null. With no casts involved you have just circumvented guarantees static type system was suppose to give.
You can complain if language itself produces code which accesses init property. This would be violation of disable premise. However, it is clearly you who intentionally asked init property, so there is no language fault here. For example, some time ago implicit disable struct creation issues (in context of array copy, out parameter, etc. ) were filed and fixed. If you face situation when compiler generates default disabled struct without user permission, this would be a bug. The case which you described is a not a type safety problem. There is no reinterpretation of object of one type as another type (or reading/writing memory which should not have been read/written). It is perfectly safe to assign object of type A to the same type. What is presented is some sort of design pattern violated by the user, not the language (and the goal of the pattern cannot be fully achieved in D language).
Jun 14 2014
parent reply "David Nadlinger" <code klickverbot.at> writes:
On Saturday, 14 June 2014 at 16:45:19 UTC, Maxim Fomin wrote:
 The case which you described is a not a type safety problem.
If a struct type has a non-trivial invariant(), .init allows an object to exist that violates it without an Error being thrown. Arguing that this is not part of the type system would be splitting hairs. David
Jun 14 2014
next sibling parent "Maxim Fomin" <maxim maxim-fomin.ru> writes:
On Saturday, 14 June 2014 at 17:05:21 UTC, David Nadlinger wrote:
 On Saturday, 14 June 2014 at 16:45:19 UTC, Maxim Fomin wrote:
 The case which you described is a not a type safety problem.
If a struct type has a non-trivial invariant(), .init allows an object to exist that violates it without an Error being thrown. Arguing that this is not part of the type system would be splitting hairs. David
Again, it may depend on your definition of type safety. In my view, it is not related. It is a problem of unwarranted assumption about data correctness in a system level language. By the way, AFAIK the issue has been already filed in bugzilla (closed as wontfix) and discussed in newsgroups. After the discussion the spec was updated to explicitly mention that init property may be problematic http://dlang.org/property.html (please notice, that invariant example is in the spec). Another issue which popped up is that in order to fix disable this() init problem, one need to break assumption about init availability in compile time, which breaks CTFE. In other words, it is impossible to fix the issue without creating a multitude of new problems.
Jun 14 2014
prev sibling next sibling parent "Maxim Fomin" <maxim maxim-fomin.ru> writes:
On Saturday, 14 June 2014 at 17:05:21 UTC, David Nadlinger wrote:
 On Saturday, 14 June 2014 at 16:45:19 UTC, Maxim Fomin wrote:
 The case which you described is a not a type safety problem.
If a struct type has a non-trivial invariant(), .init allows an object to exist that violates it without an Error being thrown. Arguing that this is not part of the type system would be splitting hairs. David
Déjà vu http://forum.dlang.org/thread/mohceehplxdhsdllxkzt forum.dlang.org#post-mailman.550.1349377293.5162.digitalmars-d:40puremagic.com https://issues.dlang.org/show_bug.cgi?id=7021 If I not mistaken it was Kenji who updated the init spec.
Jun 14 2014
prev sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 6/14/14, 10:05 AM, David Nadlinger wrote:
 On Saturday, 14 June 2014 at 16:45:19 UTC, Maxim Fomin wrote:
 The case which you described is a not a type safety problem.
If a struct type has a non-trivial invariant(), .init allows an object to exist that violates it without an Error being thrown.
Yah, that does make it a loophole. -- Andrei
Jun 14 2014
prev sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 6/14/14, 5:33 AM, Dicebot wrote:
 On Saturday, 14 June 2014 at 10:15:49 UTC, Marc Schütz wrote:
 Huh? Types with ` disable this()` still have an `init` value. All it
 does is disallow instantiating the type without specifying an
 initializer (e.g. a struct literal, a value returned from a factory
 function, or `static opCall()`).
Which is effectively a type system hole with disable this : struct A { disable this(); } auto a = A.init;
I disagree it's a loophole. A.init is a preexisting object. That said, I do understand that sometimes one wants to inhibit A.init, which can be achieved by e.g. making it a private member. -- Andrei
Jun 14 2014
next sibling parent "John Colvin" <john.loughran.colvin gmail.com> writes:
On Saturday, 14 June 2014 at 15:22:07 UTC, Andrei Alexandrescu
wrote:
 On 6/14/14, 5:33 AM, Dicebot wrote:
 On Saturday, 14 June 2014 at 10:15:49 UTC, Marc Schütz wrote:
 Huh? Types with ` disable this()` still have an `init` value. 
 All it
 does is disallow instantiating the type without specifying an
 initializer (e.g. a struct literal, a value returned from a 
 factory
 function, or `static opCall()`).
Which is effectively a type system hole with disable this : struct A { disable this(); } auto a = A.init;
I disagree it's a loophole. A.init is a preexisting object. That said, I do understand that sometimes one wants to inhibit A.init, which can be achieved by e.g. making it a private member. -- Andrei
template init() { static assert(false); }
Jun 14 2014
prev sibling parent reply "monarch_dodra" <monarchdodra gmail.com> writes:
On Saturday, 14 June 2014 at 15:22:07 UTC, Andrei Alexandrescu 
wrote:
 On 6/14/14, 5:33 AM, Dicebot wrote:
 On Saturday, 14 June 2014 at 10:15:49 UTC, Marc Schütz wrote:
 Huh? Types with ` disable this()` still have an `init` value. 
 All it
 does is disallow instantiating the type without specifying an
 initializer (e.g. a struct literal, a value returned from a 
 factory
 function, or `static opCall()`).
Which is effectively a type system hole with disable this : struct A { disable this(); } auto a = A.init;
I disagree it's a loophole. A.init is a preexisting object. That said, I do understand that sometimes one wants to inhibit A.init, which can be achieved by e.g. making it a private member. -- Andrei
By that rational, so is "void". If you want default initial state, you just use "A()". Using "A.init" is just the same as using "void": It's "fuck you compiler I'm know what I'm doing." I strongly oppose any notion of being able to customize ".init" value or access: It should be a built in and well defined concept. If we do allow things like that, than *any* low level function, such as emplace, move, swap, destroy or even __ctor will exibit unspecified behavior. ".init" should simply mean "the default bit state of the object". Let's not make it into anything more complicated than that.
Jun 14 2014
next sibling parent reply "Dicebot" <public dicebot.lv> writes:
On Saturday, 14 June 2014 at 16:41:46 UTC, monarch_dodra wrote:
 ".init" should simply mean "the default bit state of the 
 object". Let's not make it into anything more complicated than 
 that.
I guess this is the root of disagreement. There is no place in documentation that says that "T.init" is in any way unsafe or dangerous, I see it as a perfectly casual feature often used in any kind of generic code. Putting it in the same league as cast which is explicitly designed to punch holes in the type system? No way.
Jun 14 2014
parent "monarch_dodra" <monarchdodra gmail.com> writes:
On Saturday, 14 June 2014 at 17:15:16 UTC, Dicebot wrote:
 On Saturday, 14 June 2014 at 16:41:46 UTC, monarch_dodra wrote:
 ".init" should simply mean "the default bit state of the 
 object". Let's not make it into anything more complicated than 
 that.
I guess this is the root of disagreement. There is no place in documentation that says that "T.init" is in any way unsafe or dangerous, I see it as a perfectly casual feature often used in any kind of generic code. Putting it in the same league as cast which is explicitly designed to punch holes in the type system? No way.
I'm not saying it's to punch holes in the type system, but (if memory serves right), TDPL simply defines it as the default bit pattern that gets written before constructors are called (ergo default value when no constructors are called). It's not unsafe, but it *is* a way to request an object explicitly initialized to that "initial" state, which may not actually be ready for use until a corresponding constructor is called upon it. Which is what " disable this" does.
Jun 14 2014
prev sibling parent "deadalnix" <deadalnix gmail.com> writes:
On Saturday, 14 June 2014 at 16:41:46 UTC, monarch_dodra wrote:
 By that rational, so is "void". If you want default initial 
 state, you just use "A()".

 Using "A.init" is just the same as using "void": It's "fuck you 
 compiler I'm know what I'm doing."
Someone is trying to mess up with Andrei slice for DConf next year.
Jun 14 2014
prev sibling parent Timon Gehr <timon.gehr gmx.ch> writes:
On 06/14/2014 02:39 AM, Jonathan M Davis via Digitalmars-d wrote:
 On Sat, 14 Jun 2014 00:34:51 +0200
 Timon Gehr via Digitalmars-d <digitalmars-d puremagic.com> wrote:

 On 06/13/2014 11:45 PM, Jonathan M Davis via Digitalmars-d wrote:
 On Fri, 13 Jun 2014 21:23:00 +0000
 deadalnix via Digitalmars-d <digitalmars-d puremagic.com> wrote:

 The approach consisting in having non nullable pointers/reference
 by default is the one that is gaining traction and for good
 reasons.
That interacts _really_ badly with D's approach of requiring init values for all types. We have enough problems with disable this() as it is. - Jonathan M Davis
disable this() and nested structs etc. Trying to require init values for everything isn't an extraordinarily good idea. It roughly extends 'nullable by default' to all _structs_ with non-trivial invariants.
True, some types become problematic when you have to have an init value (like a NonNullable struct to make nullable pointers non-nullable), but generic code is way more of a pain to write when you can't rely on an init value existing,
Examples?
 and there are a number of places that the language requires an init value
 (like arrays),
Just use std.array.array.
 making types which don't have init values problematic to use.
The solution is to have explicit nullable types, not to force a default value on every type.
 Overall, I think that adding  disable this() to the language was a mistake.
...
Jun 14 2014
prev sibling parent "Wanderer" <no-reply no-reply.org> writes:
On Friday, 13 June 2014 at 21:13:20 UTC, bearophile wrote:
 This was surely discussed in past, but I don't remember the 
 answer (so perhaps this is more fit in D.learn).

 Dereferencing the null pointer in C is undefined behaviour, so 
 in most cases the program segfaults, but sometimes the compiler 
 assumes a dereferenced pointer can't be null, so it optimizes 
 away tests and other parts, leading to bugs and problems, 
 including exploits:

 http://blog.llvm.org/2011/05/what-every-c-programmer-should-know_14.html

 What's the solution for very optimizing D compilers like GDC, 
 now and in future? Surely the C not-solution is not acceptable 
 in a safe language as D.

 Perhaps D can define dereferencing the null pointer in C as 
 segfaulting in all cases. Is this acceptable for people that 
 could desire to write a kernel in D?

 An alternative possibility is to go the Java way, and add a 
 compiler switch that adds an assert before every pointer 
 dereference after the compiler has optimized the code (and 
 remove some of such asserts where the compiler is certain they 
 can't be null).

 What's the solution used by Rust (beside not having to deal 
 with nulls in many cases)?

 Bye,
 bearophile
Since you mentioned Java, NPE (NullPointerException) that throws every time null gets defererenced, is a real plaque of the language, hard to spot, hard to debug etc. So even without segfaults or exploits, nulls are PITA there and many programmers just refrain from using null values at all, explicitly initializing all references in 100% cases and marking all usages as NotNull. Maybe, if D decides to encourage programmers to use similar technique, it would be possible to optimize away most of such runtime checks and keep the language both fast *and* safe.
Jun 14 2014