www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Lints, Condate and bugs

reply bearophile <bearophileHUGS lycos.com> writes:
From my experience I've seen that lint tools are very useful to quickly find
many bugs, on the other hand most of my friends don't use them. This is a
problem I've felt for some time. Recently I have found an interesting paper
that expresses my feelings better:

"Condate: A Proto-language at the Confluence Between Checking and Compiling" by
Nic Volanschi
http://mygcc.free.fr/condate-ppdp06.pdf


A quote from the first part of the paper:


Existing tools use various approaches, but share a common, apparently minor
design choice: they are specialized tools, doing only program checking. There
are several important drawbacks of this design:

- most of the tools are completely decoupled from existing development
environments

- they duplicate a considerable amount of work on program parsing and program
analysis; this is true even for tools that achieve a superficial level of
integration by being called automatically from existing IDEs or makefiles

- they afford to perform costly analyses, which make them unsuitable for daily
use throughout development; at best, existing tools aim only at scalable
analyses

- last but not least, many programmers completely ignore their existence.

As a consequence of these and maybe other reasons, for instance related to
usability or to limited distribution policies (some proprietary tools being
kept as a competitive advantage), checking tools are not used on a large scale
nowadays.

To solve the above design-related issues, we propose to integrate some amount
of user-defined checking within the core of every development process -- the
compiler.
<


All this is related to the "mygcc" that's the kind of plug-in for GCC I have
talked about recently. It uses a half a page of rules to find lot of bugs in C
code. I don't know if similar rules may be written to catch bugs in D code too,
because the bugs are often of different kind (and maybe less simple).

If you want more info about Condate:
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.69.2171

Bye,
bearophile
Oct 26 2010
next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
bearophile wrote:
 From my experience I've seen that lint tools are very useful to quickly find
 many bugs, on the other hand most of my friends don't use them. This is a
 problem I've felt for some time. Recently I have found an interesting paper
 that expresses my feelings better:
 
 "Condate: A Proto-language at the Confluence Between Checking and Compiling"
 by Nic Volanschi http://mygcc.free.fr/condate-ppdp06.pdf
I've been interested in rule based static checkers for years. I'd hoped to be able to obsolete them by designing the need for such checks out of the language. There's no need to check for an error that cannot be expressed. Looking at what the rule based analyzers do falls into predictable categories: 1. Memory allocation errors - failure to free, dangling pointers, redundant frees 2. Use of uninitialized data 3. Related to (1), failure to clean up properly after allocating some resource 4. Memory corruption, such as buffer overflows 5. Failure to do transaction cleanup properly in the event part of the transaction failed 6. Failure to deal with error returns 7. Null pointer dereferencing 8. Signed/unsigned mismatching Keep in mind that such tools can also produce large quantities of false positives, requiring ugly workarounds or causing the programmer to miss the real bugs. Keep in mind also that these tools are often way oversold - they catch a few kinds of bugs, but not logic errors. Over time, I've found my own coding bugs that such tools might catch get less and less rare. The bugs in my code are logic errors that no tool could catch. Here's how D deals with them: 1. Garbage collection. 2. Data is guaranteed to be initialized. 3. RAII does this. 4. Array bounds checking, and safe mode in general, solves this. 5. D's scope guard does a very good job with this 6. Exceptions solve this 7. Certainly the idea of non-null types has a lot of adherents, D doesn't have that. 8. The only successful solution to this I've seen is Java's simply not having unsigned types. Analysis tools just produce false positives. I don't think there's much value left for add-on static analysis tools.
Oct 26 2010
next sibling parent reply Ellery Newcomer <ellery-newcomer utulsa.edu> writes:
On 10/26/2010 04:21 PM, Walter Bright wrote:
 7. Certainly the idea of non-null types has a lot of adherents, D
 doesn't have that.

 8. The only successful solution to this I've seen is Java's simply not
 having unsigned types. Analysis tools just produce false positives.
or python's variation: not having fixed integer types at all. Question: would it be feasible to make D use some sort of bignum type as the default index type for arrays, etc, but make it possible for the programmer to use uint or ulong or whatever for e.g. places where performance is an issue?
 I don't think there's much value left for add-on static analysis tools.
I went to the trouble of modifying dmd to warn on unsigned/signed comparison. It found me some bugs which probably would not have been noticed otherwise. Did it produce false positives? Yes. Did that make me wish I hadn't done it? Hell no. As long as there are things which dmd doesn't warn about, there will be value in add-on static analysis tools. The key idea is leave the warnings off unless the programmer explicitly asks for it.
Oct 26 2010
next sibling parent reply bearophile <bearophileHUGS lycos.com> writes:
Ellery Newcomer:

 I went to the trouble of modifying dmd to warn on unsigned/signed 
 comparison. It found me some bugs which probably would not have been 
 noticed otherwise. Did it produce false positives? Yes. Did that make me 
 wish I hadn't done it? Hell no.
Please, put this patch in a Bugzilla entry :-) (Later more comments on this thread) Bye, bearophile
Oct 26 2010
parent Ellery Newcomer <ellery-newcomer utulsa.edu> writes:
Short story: long time ago, lost code, think it's already in bugzilla (259?)

On 10/26/2010 08:25 PM, bearophile wrote:
 Ellery Newcomer:

 I went to the trouble of modifying dmd to warn on unsigned/signed
 comparison. It found me some bugs which probably would not have been
 noticed otherwise. Did it produce false positives? Yes. Did that make me
 wish I hadn't done it? Hell no.
Please, put this patch in a Bugzilla entry :-) (Later more comments on this thread) Bye, bearophile
Oct 26 2010
prev sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
Ellery Newcomer wrote:
 Question: would it be feasible to make D use some sort of bignum type as 
 the default index type for arrays, etc, but make it possible for the 
 programmer to use uint or ulong or whatever for e.g. places where 
 performance is an issue?
I suspect that's a big jump in complexity for very little gain. I also suspect that it would result in a drastic performance problem (remember, Python is 100x slower than native code), and will disable the advantages D has with type inference (as the user will be back to explicitly naming the type).
 I don't think there's much value left for add-on static analysis tools.
I went to the trouble of modifying dmd to warn on unsigned/signed comparison. It found me some bugs which probably would not have been noticed otherwise. Did it produce false positives? Yes. Did that make me wish I hadn't done it? Hell no.
You might want to consider changing your coding style to eschew the use of unsigned types.
 The key idea is leave the 
 warnings off unless the programmer explicitly asks for it.
That's a good sentiment, but it doesn't work that way in practice. Warnings always become de-facto requirements. They aren't the solution to a disagreement about how the language should work.
Oct 26 2010
next sibling parent Ellery Newcomer <ellery-newcomer utulsa.edu> writes:
On 10/26/2010 09:26 PM, Walter Bright wrote:
 That's a good sentiment, but it doesn't work that way in practice.
 Warnings always become de-facto requirements. They aren't the solution
 to a disagreement about how the language should work.
My point was that your claim to be trying to put static analysers out of business while refusing to include simple functionality I want is silly.
Oct 26 2010
prev sibling next sibling parent reply Don <nospam nospam.com> writes:
Walter Bright wrote:
 Ellery Newcomer wrote:
 I don't think there's much value left for add-on static analysis tools.
I went to the trouble of modifying dmd to warn on unsigned/signed comparison. It found me some bugs which probably would not have been noticed otherwise. Did it produce false positives? Yes. Did that make me wish I hadn't done it? Hell no.
You might want to consider changing your coding style to eschew the use of unsigned types.
I would strongly support that. But it doesn't really work. The problem is size_t. The fact that it's unsigned is a root of all kinds of evil. It means .length is unsigned!!! Personally I think that any creation or access to an object which is larger in size than half the memory space, should be impossible without a special function call. Providing syntax sugar for this incredibly rare scenario introduces a plethora of bugs.
Oct 27 2010
next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
Don wrote:
 I would strongly support that. But it doesn't really work.
 The problem is size_t. The fact that it's unsigned is a root of all 
 kinds of evil. It means .length is unsigned!!!
 Personally I think that any creation or access to an object which is 
 larger in size than half the memory space, should be impossible without 
 a special function call. Providing syntax sugar for this incredibly rare 
 scenario introduces a plethora of bugs.
size_t is unsigned to line up with C's. ptrdiff_t is the signed version. You can do things like: ptrdiff_t d = a.length; Making size_t signed would have consequences that are largely unknown.
Oct 27 2010
next sibling parent Don <nospam nospam.com> writes:
Walter Bright wrote:
 Don wrote:
 I would strongly support that. But it doesn't really work.
 The problem is size_t. The fact that it's unsigned is a root of all 
 kinds of evil. It means .length is unsigned!!!
 Personally I think that any creation or access to an object which is 
 larger in size than half the memory space, should be impossible 
 without a special function call. Providing syntax sugar for this 
 incredibly rare scenario introduces a plethora of bugs.
size_t is unsigned to line up with C's. ptrdiff_t is the signed version. You can do things like: ptrdiff_t d = a.length; Making size_t signed would have consequences that are largely unknown.
I know. But it's the root of the problem.
Oct 27 2010
prev sibling next sibling parent reply KennyTM~ <kennytm gmail.com> writes:
On Oct 27, 10 17:53, Walter Bright wrote:
 Don wrote:
 I would strongly support that. But it doesn't really work.
 The problem is size_t. The fact that it's unsigned is a root of all
 kinds of evil. It means .length is unsigned!!!
 Personally I think that any creation or access to an object which is
 larger in size than half the memory space, should be impossible
 without a special function call. Providing syntax sugar for this
 incredibly rare scenario introduces a plethora of bugs.
size_t is unsigned to line up with C's. ptrdiff_t is the signed version. You can do things like: ptrdiff_t d = a.length; Making size_t signed would have consequences that are largely unknown.
Why .length cannot return a signed integer (not size_t) by default?
Oct 27 2010
parent reply Eric Poggel <dnewsgroup2 yage3d.net> writes:
 Making size_t signed would have consequences that are largely unknown.
Why .length cannot return a signed integer (not size_t) by default?
What if you're working in 32-bit mode and want to open a 3GB memory-mapped file? Then you will also need an unsignedLength property.
Oct 27 2010
parent reply KennyTM~ <kennytm gmail.com> writes:
On Oct 28, 10 03:36, Eric Poggel wrote:
 Making size_t signed would have consequences that are largely unknown.
Why .length cannot return a signed integer (not size_t) by default?
What if you're working in 32-bit mode and want to open a 3GB memory-mapped file? Then you will also need an unsignedLength property.
I was talking about the .length property of arrays. std.mmfile.MmFile.length() is already returning a 64-bit number (ulong). Making it "63-bit" (long) won't cause it fail with a 3GB memory-mapped file.
Oct 27 2010
parent Eric Poggel <dnewsgroup2 yage3d.net> writes:
On 10/27/2010 4:11 PM, KennyTM~ wrote:
 On Oct 28, 10 03:36, Eric Poggel wrote:
 Making size_t signed would have consequences that are largely unknown.
Why .length cannot return a signed integer (not size_t) by default?
What if you're working in 32-bit mode and want to open a 3GB memory-mapped file? Then you will also need an unsignedLength property.
I was talking about the .length property of arrays. std.mmfile.MmFile.length() is already returning a 64-bit number (ulong). Making it "63-bit" (long) won't cause it fail with a 3GB memory-mapped file.
It had been a while since I used them. For some reason I was thinking they were exposed through a standard array, but that doesn't make sense now that I think about it.
Oct 29 2010
prev sibling parent reply Kagamin <spam here.lot> writes:
Walter Bright Wrote:

 Don wrote:
 I would strongly support that. But it doesn't really work.
 The problem is size_t. The fact that it's unsigned is a root of all 
 kinds of evil. It means .length is unsigned!!!
 Personally I think that any creation or access to an object which is 
 larger in size than half the memory space, should be impossible without 
 a special function call. Providing syntax sugar for this incredibly rare 
 scenario introduces a plethora of bugs.
size_t is unsigned to line up with C's. ptrdiff_t is the signed version. You can do things like: ptrdiff_t d = a.length; Making size_t signed would have consequences that are largely unknown.
use .LongLength property for really large arrays. Who knows, how large arrays can get even on 32bit target?
Nov 03 2010
parent reply Daniel Gibson <metalcaedes gmail.com> writes:
Kagamin schrieb:
 Who knows, how large arrays can get even on 32bit target?
Probably not longer than 2^32 elements (minus a few for the rest of the program) because more couldn't be addressed.
Nov 03 2010
parent retard <re tard.com.invalid> writes:
Wed, 03 Nov 2010 12:48:59 +0100, Daniel Gibson wrote:

 Kagamin schrieb:
 Who knows, how large arrays can get even on 32bit target?
Probably not longer than 2^32 elements (minus a few for the rest of the program) because more couldn't be addressed.
Having the whole 32-bit address space available is a bit unrealistic on 32-bit targets if they have a modern operating system. The user processes may only have 3 GB available if a 3/1 division is used even if on x86 the 36-bit PAE is in use. Even then the whole 3 GB isn't available for a single array because all kinds of other data must reside in the same address space. This mean that the maximum realistic length of an array of bytes is closer to 2^31 and with an array of shorts it's closer to 2^30. Array of ints (most typical data size?) could only grow to a bit over 2^29. This is just from a application programmer's point of view. Sure, if you're building your own operating system, having unsigned length fields might have some merit. In practice, if I need to deal with arrays with billions of elements, I'd use a 64-bit operating system in any case. Then having a signed length isn't a bad tradeoff because the physical architecture doesn't even have buses that wide. From the wikipedia: "The original implementation of the AMD64 architecture implemented 40-bit physical addresses and so could address up to 1 TB of RAM. Current implementations of the AMD64 architecture extend this to 48-bit physical addresses and therefore can address up to 256 TB of RAM. The architecture permits extending this to 52 bits in the future (limited by the page table entry format); this would allow addressing of up to 4 PB of RAM. For comparison, x86 processors are limited to 64 GB of RAM in Physical Address Extension (PAE) mode, or 4 GB of RAM without PAE mode." http://en.wikipedia.org/wiki/X86-64
Nov 03 2010
prev sibling parent reply Roman Ivanov <isroman.del ete.km.ru> writes:
On 10/27/2010 5:36 AM, Don wrote:
 Walter Bright wrote:
 Ellery Newcomer wrote:
 I don't think there's much value left for add-on static analysis tools.
I went to the trouble of modifying dmd to warn on unsigned/signed comparison. It found me some bugs which probably would not have been noticed otherwise. Did it produce false positives? Yes. Did that make me wish I hadn't done it? Hell no.
You might want to consider changing your coding style to eschew the use of unsigned types.
I would strongly support that. But it doesn't really work. The problem is size_t. The fact that it's unsigned is a root of all kinds of evil. It means .length is unsigned!!!
This probably has been discussed to death before, but what are the big issues with checking for overflows and prohibiting (or giving warnings) on implicit unsigned-to-signed conversion?
Oct 27 2010
parent reply bearophile <bearophileHUGS lycos.com> writes:
Roman Ivanov:

 This probably has been discussed to death before,
Don't worry, it was not discussed enough. Discussions that go nowhere aren't enough.
 but what are the big issues with checking for overflows
work. A syntax is needed to disable them locally, but this is not hard to invent. Plus probably two compilation switches to disable/enable them locally. People that don't want them just disable them. (My theory is that the slowdown caused by them is small enough that lot of people will want to keep them enabled in release mode too). Fixing Phobos to make it overflow-checking compliant will require some work, so it's better to introduce them sooner than later. Some other technical difficulties may remain, but in the end it's mostly a matter of finding someone willing to implement this feature and it's a matter of what Walter will accept.
 and prohibiting (or giving warnings) on implicit unsigned-to-signed conversion?
This warning is useful, I use it on default when I write C code. Walter doesn't like warning in general (because he says warnings become de facto errors), and because he says this warning creates too many false positives. Regarding warnings, I am waiting for some DMD warnings to become errors, like: http://d.puremagic.com/issues/show_bug.cgi?id=4216 http://d.puremagic.com/issues/show_bug.cgi?id=3836 Plus there are other warnings I'd like to have, like the "unused variable" one. Bye, bearophile
Oct 28 2010
next sibling parent Ellery Newcomer <ellery-newcomer utulsa.edu> writes:
Here's an idea: if warnings as a requirement actually become a 
phenomenon for a compiler which warns on ambiguous issues:

insert a warning which gives a random warning on randomly selected code

On 10/28/2010 06:15 AM, bearophile wrote:
 This warning is useful, I use it on default when I write C code. Walter
doesn't like warning in general (because he says warnings become de facto
errors), and because he says this warning creates too many false positives.
Oct 28 2010
prev sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
bearophile wrote:
 but what are the big issues with checking for overflows
There are no big issues for checking for overflows.
Consider that every add instruction: ADD EAX,3 becomes 2 instructions: ADD EAX,3 JC overflow and every: LEA EAX,7[EBX*8][ECX] becomes: MOV EAX,EBX IMUL EAX,3 JC overflow ADD EAX,7 JC overflow ADD EAX,ECX JC overflow This is not a small penalty. Adds, multiplies, and subtracts are the bread and butter of what the executable code is.
Oct 28 2010
parent reply dsimcha <dsimcha yahoo.com> writes:
== Quote from Walter Bright (newshound2 digitalmars.com)'s article
 bearophile wrote:
 but what are the big issues with checking for overflows
There are no big issues for checking for overflows.
Consider that every add instruction: ADD EAX,3 becomes 2 instructions: ADD EAX,3 JC overflow and every: LEA EAX,7[EBX*8][ECX] becomes: MOV EAX,EBX IMUL EAX,3 JC overflow ADD EAX,7 JC overflow ADD EAX,ECX JC overflow This is not a small penalty. Adds, multiplies, and subtracts are the bread and butter of what the executable code is.
I don't consider it a high priority because I've found that integer overflow is such an uncommon bug in practice, but I would like to have overflow and sign checking in D eventually. As long as it can be disabled by a compiler switch for a whole program, or an annotation for a single performance-critical function, you can still have your safety the 90% of the time when the hit doesn't matter and only live dangerously when you gain something in the tradeoff.
Oct 28 2010
parent reply Walter Bright <newshound2 digitalmars.com> writes:
dsimcha wrote:
 I don't consider it a high priority because I've found that integer overflow is
 such an uncommon bug in practice, but I would like to have overflow and sign
 checking in D eventually.  As long as it can be disabled by a compiler switch
for
 a whole program, or an annotation for a single performance-critical function,
you
 can still have your safety the 90% of the time when the hit doesn't matter and
 only live dangerously when you gain something in the tradeoff.
I agree that overflow is a pretty rare issue and way down the list. With 64 bit targets, it's probably several orders of magnitude even less of an issue. Of far more interest are improving abstraction abilities to prevent logic errors. It's also possible in D to build a "SafeInt" library type that will check for overflow. These classes exist for C++, but nobody seems to have much interest in them. Another way to deal with this is to use Don's most excellent std.bigint arbitrary precision integer data type. It can't overflow.
Oct 28 2010
parent reply bearophile <bearophileHUGS lycos.com> writes:
Walter:

 This is not a small penalty. Adds, multiplies, and subtracts are the bread and
 butter of what the executable code is.
I have used overflow tests in production Delphi code. I am aware of the performance difference they cause.
 becomes 2 instructions:
      ADD EAX,3
      JC overflow
Modern CPUs have speculative execution. That JC has a very low probability, and the CPU executes several instructions under it. That speculative execution branch is almost never discarded, because the overflows are very rare, the result is that from that code you see a significant slowdown only on simpler CPUs like Atoms. On normal good CPUs the performance loss is not too much large (probably less than 50% for the code most filled up with integer operations I've found).
 I agree that overflow is a pretty rare issue and way down the list. With 64
bit 
 targets, it's probably several orders of magnitude even less of an issue.
Every time I am comparing a signed with an unsigned I have an overflow risk in D. And overflow tests are a subset of more general range tests (see the recent thread about bound/range integers).
 Of far more interest are improving abstraction abilities to prevent logic
errors.
I am working on this too :-) I have proposed some little extensions of the D type system, they are simple annotations. But they are all additive changes, so they may wait for D3 too.
 It's also possible in D to build a "SafeInt" library type that will check for 
 overflow. These classes exist for C++, but nobody seems to have much interest
in 
 them.
People use integer overflows in Delphi code, I have seen them used even in code written by other people too. But those tests are uncommon in the C++ code I've seen. Maybe the cause it's because using a SafeInt is a pain and it's not handy. How many people are using array bound tests in C? Not many, despite surely some people desire to do that. How many people use array bound tests in D? Well, probably everyone. Because it's built-in and you just need a switch to disable them. In C++ vector you even need a different way to use bound tests: http://www.cplusplus.com/reference/stl/vector/at/ I have seen C++ code that uses at(), but it's probably much less common than transparent array bound tests as done in D, that don't need a different syntax.
 Another way to deal with this is to use Don's most excellent std.bigint 
 arbitrary precision integer data type. It can't overflow.
Then I'd like a compiler switch that works very well to automatically change all integral numbers in a program into bigints (and works well with all the int, short, ubyte and ulong etc type annotations too, of course). Have you tried to use the current bigints as a replacement for all ints in a program? They don't cast automatically to size_t (and there are few other troubles, time ago I have started a thread about this), so every time you use them as array indexes you need casts or more. And you can't even print them with a writeln. You care for the performance loss coming from replacing an "ADD EAX,3" with an "ADD EAX,3 JC overflow" but here you suggest me to replace integers with heap-allocated bigints. Bye, bearophile
Oct 28 2010
parent reply Walter Bright <newshound2 digitalmars.com> writes:
bearophile wrote:
 becomes 2 instructions: ADD EAX,3 JC overflow
Modern CPUs have speculative execution. That JC has a very low probability, and the CPU executes several instructions under it.
It still causes the same slowdown. If the CPU can speculatively execute 5 instructions ahead, then you're reducing it to 4. Substantially increasing the code size has additional costs with cache misses, etc.
 Every time I am comparing a signed with an unsigned I have an overflow risk
 in D.
Not every time, no. In fact, it's rare. I believe you are *way* overstating the case. If you were right I'd be reading all the time about integer overflow bugs, not buffer overflow bugs.
 And overflow tests are a subset of more general range tests (see the
 recent thread about bound/range integers).
Languages which have these have failed to gain traction. That might be for other reasons, but it's not ausipcious.
 It's also possible in D to build a "SafeInt" library type that will check
 for overflow. These classes exist for C++, but nobody seems to have much
 interest in them.
People use integer overflows in Delphi code,
Delphi has also failed in the marketplace. Again, surely other reasons factor into its failure, but if you're going to cite other languages it is more compelling to cite successful ones.
 I have seen them used even in
 code written by other people too. But those tests are uncommon in the C++
 code I've seen. Maybe the cause it's because using a SafeInt is a pain and
 it's not handy.
Or perhaps because people really aren't having a problem with integer overflows.
 Then I'd like a compiler switch that works very well to automatically change
 all integral numbers in a program into bigints (and works well with all the
 int, short, ubyte and ulong etc type annotations too, of course).
Such a switch is completely impractical, because such a language would then have two quite incompatible variants.
 Have you tried to use the current bigints as a replacement for all ints in a
 program? They don't cast automatically to size_t (and there are few other
 troubles, time ago I have started a thread about this), so every time you use
 them as array indexes you need casts or more. And you can't even print them
 with a writeln. You care for the performance loss coming from replacing an
 "ADD EAX,3" with an "ADD EAX,3 JC overflow" but here you suggest me to
 replace integers with heap-allocated bigints.
Bigint can probably be improved. My experience with Don is he is very interested in and committed to improving his designs. BTW, bigints aren't heap allocated if their range fits in a ulong. They are structs, i.e. value types (that got bashed in another thread as unnecessary, but here's an example where they are valuable). Also, *you* care about performance, as you've repeatedly posted benchmarks complaining about D's performance, including the performance on integer arithmetic. I don't see that you'd be happy with a market slowdown your proposal will produce. I'm willing to go out on a limb here. I welcome you to take a look at the dmd source and Phobos source code. Find any places actually vulnerable to a signed/unsigned error or overflow error (not theoretically vulnerable). For example, an overflow that would not happen unless the program had run out of memory long before is not an actual bug. The index into the vtable[] is not going to overflow. The line number counter is not going to overflow. The number of parameters is not going to overflow. There are also some places with overflow checks, like in turning numeric literals into binary.
Oct 28 2010
parent Kagamin <spam here.lot> writes:
Walter Bright Wrote:

 Every time I am comparing a signed with an unsigned I have an overflow risk
 in D.
Not every time, no. In fact, it's rare. I believe you are *way* overstating the case. If you were right I'd be reading all the time about integer overflow bugs, not buffer overflow bugs.
asserts you can put overflow checks. And you can prove yourself, why language integrated asserts are helpful.
 Find any places actually vulnerable to a 
 signed/unsigned error or overflow error (not theoretically vulnerable). For 
 example, an overflow that would not happen unless the program had run out of 
 memory long before is not an actual bug. The index into the vtable[] is not 
 going to overflow. The line number counter is not going to overflow. The
number 
 of parameters is not going to overflow. There are also some places with
overflow 
 checks, like in turning numeric literals into binary.
Are they signed? If they're not going to overflow, they don't need to be unsigned.
Nov 03 2010
prev sibling parent reply KennyTM~ <kennytm gmail.com> writes:
On Oct 27, 10 10:26, Walter Bright wrote:
 Ellery Newcomer wrote:
 Question: would it be feasible to make D use some sort of bignum type
 as the default index type for arrays, etc, but make it possible for
 the programmer to use uint or ulong or whatever for e.g. places where
 performance is an issue?
I suspect that's a big jump in complexity for very little gain. I also suspect that it would result in a drastic performance problem (remember, Python is 100x slower than native code), and will disable the advantages D has with type inference (as the user will be back to explicitly naming the type).
Python is slow because it's interpreted with almost no optimization, not because of bignum. You should try Haskell, which has the built-in Integer type (bignum) and Int type (fixed-size integer). The performance drop is at most 20% when the size is compatible with Int.
 I don't think there's much value left for add-on static analysis tools.
I went to the trouble of modifying dmd to warn on unsigned/signed comparison. It found me some bugs which probably would not have been noticed otherwise. Did it produce false positives? Yes. Did that make me wish I hadn't done it? Hell no.
You might want to consider changing your coding style to eschew the use of unsigned types.
 The key idea is leave the warnings off unless the programmer
 explicitly asks for it.
That's a good sentiment, but it doesn't work that way in practice. Warnings always become de-facto requirements. They aren't the solution to a disagreement about how the language should work.
Oct 27 2010
parent Walter Bright <newshound2 digitalmars.com> writes:
KennyTM~ wrote:
 You should try Haskell, which has the built-in Integer type (bignum) and 
 Int type (fixed-size integer). The performance drop is at most 20% when 
 the size is compatible with Int.
20% is an unacceptably huge penalty.
Oct 27 2010
prev sibling parent reply Don <nospam nospam.com> writes:
Walter Bright wrote:
 Looking at what the rule based analyzers do falls into predictable 
 categories:
 
 1. Memory allocation errors - failure to free, dangling pointers, 
 redundant frees
 
 2. Use of uninitialized data
 
 3. Related to (1), failure to clean up properly after allocating some 
 resource
 
 4. Memory corruption, such as buffer overflows
 
 5. Failure to do transaction cleanup properly in the event part of the 
 transaction failed
 
 6. Failure to deal with error returns
 
 7. Null pointer dereferencing
 
 8. Signed/unsigned mismatching
 
 Keep in mind that such tools can also produce large quantities of false 
 positives, requiring ugly workarounds or causing the programmer to miss 
 the real bugs. Keep in mind also that these tools are often way oversold 
 - they catch a few kinds of bugs, but not logic errors. Over time, I've 
 found my own coding bugs that such tools might catch get less and less 
 rare. The bugs in my code are logic errors that no tool could catch.
With the bugs I've fixed in the DMD source, I've seen very many cases of 7, several cases of 2 and 6, and only one case of 8. Many bugs are also caused by dangerous casts (where a pointer is cast from one type to another). But almost everything else been caused by a logic error. I am certain that there are still many null pointer bugs in DMD.
Oct 28 2010
next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
Don wrote:
 With the bugs I've fixed in the DMD source, I've seen very many cases of 
 7, several cases of 2 and 6, and only one case of 8.
 Many bugs are also caused by dangerous casts (where a pointer is cast 
 from one type to another).
 But almost everything else been caused by a logic error.
 
 I am certain that there are still many null pointer bugs in DMD.
None of the null pointer bugs dmd has had would have been prevented by using non-nullable types. I.e. they were not "I forgot to initialize this pointer", but were instead the result of logic errors, like running off the end of a list. NULL in dmd represents "this datum has not been computed yet" or "this datum has an invalid value" or "this datum does not exist". With non-nullable types, they'd have to be set to a datum that asserts whenever it is accessed, leading to the same behavior. Having the program abort due to an assert failure rather than a segment violation is not the great advance it's sold as. I think one should carefully consider whether a particular null pointer problem is a symptom of a logic bug or not before claiming that eliminating null pointers will magically resolve it. Otherwise, you're just shooting the messenger.
Oct 28 2010
parent reply Roman Ivanov <isroman.del ete.km.ru> writes:
On 10/28/2010 1:40 PM, Walter Bright wrote:
 Don wrote:
 With the bugs I've fixed in the DMD source, I've seen very many cases
 of 7, several cases of 2 and 6, and only one case of 8.
 Many bugs are also caused by dangerous casts (where a pointer is cast
 from one type to another).
 But almost everything else been caused by a logic error.

 I am certain that there are still many null pointer bugs in DMD.
None of the null pointer bugs dmd has had would have been prevented by using non-nullable types. I.e. they were not "I forgot to initialize this pointer", but were instead the result of logic errors, like running off the end of a list.
Preventing bad initializations at compile time is not the only benefit of non-nullable types worth considering. They would be a great help in debugging programs, for example. NullPointerException is probably the most common error I see in Java. 95% of all times it gets thrown in some weird context, which gives you no idea about what happened. The result is a long and tedious debugging session. Here is a common case: String name = Config.get("name"); //returns null, no error //... if (name.length() > 0) { //NullPointerException! The problem is that the second line could be reached after executing hundreds of lines of (unrelated) code. The null could be passed around, propagated through method calls. The exception could even be thrown from a deeply nested call in a third party library. Stuff like that is common and really difficult to debug. With non-nullable types, the exception will be thrown on the first line (where assignment happens), making it obvious, which piece of code is at fault and potentially saving hours of programmer's time. Another benefit of non-nullable types is that they can serve as a form of documentation, making intent behind the code clearer, and making the code easier to read: Old Java: if (myString != null && !myString.equals("")) if (!String.IsNullOrEmpty(myString)) Could be: if (myString != "")
 NULL in dmd represents "this datum has not been computed yet" or "this
 datum has an invalid value" or "this datum does not exist". With
 non-nullable types, they'd have to be set to a datum that asserts
 whenever it is accessed, leading to the same behavior.
 
 Having the program abort due to an assert failure rather than a segment
 violation is not the great advance it's sold as. I think one should
 carefully consider whether a particular null pointer problem is a
 symptom of a logic bug or not before claiming that eliminating null
 pointers will magically resolve it.
 
 Otherwise, you're just shooting the messenger.
Oct 29 2010
parent reply dennis luehring <dl.soluz gmx.net> writes:
Am 29.10.2010 09:26, schrieb Roman Ivanov:
 They would be a great help in debugging programs, for example.
 NullPointerException is probably the most common error I see in Java.
 95% of all times it gets thrown in some weird context, which gives you
 no idea about what happened. The result is a long and tedious debugging
 session.
100% correct - but to have null-able types help to writer code faster in the prototype phase, and not having them will also change the way developers are "forced" to write code and there are million developers out there who likes/and use null-able values for flow-control - if the null-able "feature" is removed without something that keeps the style working, you will loose them, or much more evil, they will try to code around the non-null-able-style getting back to there well known null-able behavior, by using bools, ints, strings whatever -> that will not help in library growth around D try comming up with an pattern that keeps both pro/cons...
Oct 29 2010
next sibling parent reply "Denis Koroskin" <2korden gmail.com> writes:
On Fri, 29 Oct 2010 11:58:56 +0400, dennis luehring <dl.soluz gmx.net>  
wrote:

 Am 29.10.2010 09:26, schrieb Roman Ivanov:
 They would be a great help in debugging programs, for example.
 NullPointerException is probably the most common error I see in Java.
 95% of all times it gets thrown in some weird context, which gives you
 no idea about what happened. The result is a long and tedious debugging
 session.
100% correct - but to have null-able types help to writer code faster in the prototype phase, and not having them will also change the way developers are "forced" to write code and there are million developers out there who likes/and use null-able values for flow-control - if the null-able "feature" is removed without something that keeps the style working, you will loose them, or much more evil, they will try to code around the non-null-able-style getting back to there well known null-able behavior, by using bools, ints, strings whatever -> that will not help in library growth around D try comming up with an pattern that keeps both pro/cons...
No one is talking about removing nullable references but rather adding non-nullable types and making them default. You could still achieve old behavior if it is needed (most proposed proposed syntax): Foo? foo = stuff.find(predicate); if (foo is null) { // not found }
Oct 29 2010
parent reply dennis luehring <dl.soluz gmx.net> writes:
Am 29.10.2010 11:07, schrieb Denis Koroskin:
 On Fri, 29 Oct 2010 11:58:56 +0400, dennis luehring<dl.soluz gmx.net>
 wrote:

  Am 29.10.2010 09:26, schrieb Roman Ivanov:
  They would be a great help in debugging programs, for example.
  NullPointerException is probably the most common error I see in Java.
  95% of all times it gets thrown in some weird context, which gives you
  no idea about what happened. The result is a long and tedious debugging
  session.
100% correct - but to have null-able types help to writer code faster in the prototype phase, and not having them will also change the way developers are "forced" to write code and there are million developers out there who likes/and use null-able values for flow-control - if the null-able "feature" is removed without something that keeps the style working, you will loose them, or much more evil, they will try to code around the non-null-able-style getting back to there well known null-able behavior, by using bools, ints, strings whatever -> that will not help in library growth around D try comming up with an pattern that keeps both pro/cons...
No one is talking about removing nullable references but rather adding non-nullable types and making them default. You could still achieve old behavior if it is needed (most proposed proposed syntax): Foo? foo = stuff.find(predicate); if (foo is null) { // not found }
 No one is talking about removing nullable references
sorry
 most proposed proposed syntax
sound very similar to the long talked about "make parameters const per default" proposal - which is also still not there :(
Oct 29 2010
next sibling parent reply Jonathan M Davis <jmdavisProg gmx.com> writes:
On Friday 29 October 2010 03:06:56 dennis luehring wrote:
 Am 29.10.2010 11:07, schrieb Denis Koroskin:
 On Fri, 29 Oct 2010 11:58:56 +0400, dennis luehring<dl.soluz gmx.net>
 No one is talking about removing nullable references but rather adding
 non-nullable types and making them default. You could still achieve old
 behavior if it is needed (most proposed proposed syntax):
 
 Foo? foo = stuff.find(predicate);
 if (foo is null) {
 
       // not found
 
 }
 
> No one is talking about removing nullable references sorry > most proposed proposed syntax :) sound very similar to the long talked about "make parameters const per default" proposal - which is also still not there :(
Personally, I think that both would be horrible. Having const is great, and having non-nullable references could be great, but I sure wouldn't want them to be the default. In addition to that, however, having them as the default would make porting code from other C-based languages a total nightmare - not to mention it totally shatters the general principle that either C/C++ code is valid D code with the exact same behavior it doesn't compile. That alone makes making them the default untenable. - Jonathan M Davis
Oct 29 2010
next sibling parent reply tls <do notha.ev> writes:
Jonathan M Davis Wrote:

 On Friday 29 October 2010 03:06:56 dennis luehring wrote:
 Am 29.10.2010 11:07, schrieb Denis Koroskin:
 On Fri, 29 Oct 2010 11:58:56 +0400, dennis luehring<dl.soluz gmx.net>
 No one is talking about removing nullable references but rather adding
 non-nullable types and making them default. You could still achieve old
 behavior if it is needed (most proposed proposed syntax):
 
 Foo? foo = stuff.find(predicate);
 if (foo is null) {
 
       // not found
 
 }
 
> No one is talking about removing nullable references sorry > most proposed proposed syntax :) sound very similar to the long talked about "make parameters const per default" proposal - which is also still not there :(
Personally, I think that both would be horrible. Having const is great, and having non-nullable references could be great, but I sure wouldn't want them to be the default. In addition to that, however, having them as the default would make porting code from other C-based languages a total nightmare - not to mention it totally shatters the general principle that either C/C++ code is valid D code with the exact same behavior it doesn't compile. That alone makes making them the default untenable.
Sometime not having safety is better. You see I write now GUI program for users uses xview toolkit. xview using k&r C (http://bytes.com/topic/c/answers/215340-k-r-style-function-dec arations-good-bad). I very sad find D not support standard k&r C so me consider update dmd frontend to support k&r C. much easy to read: void * newKlElem (frame_size,num_blocks,num_frames,frame_locator) size_t frame_size; unsigned short num_blocks; unsigned short num_frames; Kl_frame_locator *locator; { instead modern confuser sintax. This show how important support legacy is. Cost many hours writing new dmd frontend. If also const broken and nulls then this coode won't run at all! No good. Keep legacy.
Oct 29 2010
parent dennis luehring <dl.soluz gmx.net> writes:
Am 29.10.2010 13:54, schrieb tls:
 Jonathan M Davis Wrote:

  On Friday 29 October 2010 03:06:56 dennis luehring wrote:
  >  Am 29.10.2010 11:07, schrieb Denis Koroskin:
  >  >  On Fri, 29 Oct 2010 11:58:56 +0400, dennis luehring<dl.soluz gmx.net>
  >  >  No one is talking about removing nullable references but rather adding
  >  >  non-nullable types and making them default. You could still achieve old
  >  >  behavior if it is needed (most proposed proposed syntax):
  >  >
  >  >  Foo? foo = stuff.find(predicate);
  >  >  if (foo is null) {
  >  >
  >  >        // not found
  >  >
  >  >  }
  >  >
  >   >  No one is talking about removing nullable references
  >
  >  sorry
  >
  >   >  most proposed proposed syntax
  >

  >  :)
  >
  >  sound very similar to the long talked about "make parameters const per
  >  default" proposal - which is also still not there :(

  Personally, I think that both would be horrible. Having const is great, and
  having non-nullable references could be great, but I sure wouldn't want them
to
  be the default. In addition to that, however, having them as the default would
  make porting code from other C-based languages a total nightmare - not to
  mention it totally shatters the general principle that either C/C++ code is
  valid D code with the exact same behavior it doesn't compile. That alone makes
  making them the default untenable.
Sometime not having safety is better. You see I write now GUI program for users uses xview toolkit. xview using k&r C (http://bytes.com/topic/c/answers/215340-k-r-style-function-dec arations-good-bad). I very sad find D not support standard k&r C so me consider update dmd frontend to support k&r C. much easy to read: void * newKlElem (frame_size,num_blocks,num_frames,frame_locator) size_t frame_size; unsigned short num_blocks; unsigned short num_frames; Kl_frame_locator *locator; { instead modern confuser sintax. This show how important support legacy is. Cost many hours writing new dmd frontend. If also const broken and nulls then this coode won't run at all! No good. Keep legacy.
sorry but your arguments are loosing any sense, to add K&R support only because of the ease to port an very old style code is nothing more then stupid 100% ... sorry, but you should not involve yourselfe in language design dicussion
Oct 29 2010
prev sibling parent reply =?UTF-8?B?IkrDqXLDtG1lIE0uIEJlcmdlciI=?= <jeberger free.fr> writes:
Jonathan M Davis wrote:
 Personally, I think that both would be horrible. Having const is great,=
and=20
 having non-nullable references could be great, but I sure wouldn't want=
them to=20
 be the default. In addition to that, however, having them as the defaul=
t would=20
 make porting code from other C-based languages a total nightmare - not =
to=20
 mention it totally shatters the general principle that either C/C++ cod=
e is=20
 valid D code with the exact same behavior it doesn't compile. That alon=
e makes=20
 making them the default untenable.
=20
How does making const and/or non-nullable default break this principle? If the code relies on nullable variables, then it won't compile, otherwise it will work exactly the same way as C. Jerome --=20 mailto:jeberger free.fr http://jeberger.free.fr Jabber: jeberger jabber.fr
Oct 29 2010
parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Friday, October 29, 2010 10:04:03 J=E9r=F4me M. Berger wrote:
 Jonathan M Davis wrote:
 Personally, I think that both would be horrible. Having const is great,
 and having non-nullable references could be great, but I sure wouldn't
 want them to be the default. In addition to that, however, having them
 as the default would make porting code from other C-based languages a
 total nightmare - not to mention it totally shatters the general
 principle that either C/C++ code is valid D code with the exact same
 behavior it doesn't compile. That alone makes making them the default
 untenable.
=20 How does making const and/or non-nullable default break this principle? If the code relies on nullable variables, then it won't compile, otherwise it will work exactly the same way as C. =20 Jerome
Hmm. I suppose that in the general case it wouldn't (though the change in s= yntax=20 for const (what would you do, add mutable or unconst?) would certainly resu= lt in=20 a _lot_ of code having to be changed), but once you add in shared libraries= =20 having non-nullable be the default could be a problem since the code which = deals=20 with the fact that something could be null could be disconnected from the c= ode=20 which actually declares the non-nullable references or pointers such that i= t=20 compiles just fine, but the code which uses it chokes. But you're right tha= t it's=20 not as bad as I was thinking. Regardless, I don't think that it's a good idea. And this late in the game,= even=20 if it were a good idea, I think that we should think long and hard before m= aking=20 that large breaking change. D is supposed to be stabilizing now, not making= =20 language changes which break tons of code. =2D Jonathan M Davis
Oct 29 2010
prev sibling parent reply tls <do notha.ev> writes:
dennis luehring Wrote:

 Am 29.10.2010 11:07, schrieb Denis Koroskin:
 On Fri, 29 Oct 2010 11:58:56 +0400, dennis luehring<dl.soluz gmx.net>
 wrote:

  Am 29.10.2010 09:26, schrieb Roman Ivanov:
  They would be a great help in debugging programs, for example.
  NullPointerException is probably the most common error I see in Java.
  95% of all times it gets thrown in some weird context, which gives you
  no idea about what happened. The result is a long and tedious debugging
  session.
100% correct - but to have null-able types help to writer code faster in the prototype phase, and not having them will also change the way developers are "forced" to write code and there are million developers out there who likes/and use null-able values for flow-control - if the null-able "feature" is removed without something that keeps the style working, you will loose them, or much more evil, they will try to code around the non-null-able-style getting back to there well known null-able behavior, by using bools, ints, strings whatever -> that will not help in library growth around D try comming up with an pattern that keeps both pro/cons...
No one is talking about removing nullable references but rather adding non-nullable types and making them default. You could still achieve old behavior if it is needed (most proposed proposed syntax): Foo? foo = stuff.find(predicate); if (foo is null) { // not found }
> No one is talking about removing nullable references sorry > most proposed proposed syntax sound very similar to the long talked about "make parameters const per default" proposal - which is also still not there :(
const parameters no good per default. I reuse parameters whole time to space conservations. Imagine if browser need twice space for all functions, make them too big. Already needs 2 GigaB of my 4 GigaB system. We need fight bloat with every weapons. They fix this bug: this(int x, int y) { x = x; this.y = y; } but only bad coder make mistakes. D programmers old C++ veterans so make no mistakes ever.
Oct 29 2010
parent dennis luehring <dl.soluz gmx.net> writes:
Am 29.10.2010 13:41, schrieb tls:
 dennis luehring Wrote:

  Am 29.10.2010 11:07, schrieb Denis Koroskin:
  >  On Fri, 29 Oct 2010 11:58:56 +0400, dennis luehring<dl.soluz gmx.net>
  >  wrote:
  >
  >>   Am 29.10.2010 09:26, schrieb Roman Ivanov:
  >>>   They would be a great help in debugging programs, for example.
  >>>   NullPointerException is probably the most common error I see in Java.
  >>>   95% of all times it gets thrown in some weird context, which gives you
  >>>   no idea about what happened. The result is a long and tedious debugging
  >>>   session.
  >>
  >>   100% correct - but to have null-able types help to writer code faster in
  >>   the prototype phase, and not having them will also change the way
  >>   developers are "forced" to write code
  >>
  >>   and there are million developers out there who likes/and use null-able
  >>   values for flow-control - if the null-able "feature" is removed without
  >>   something that keeps the style working, you will loose them, or much
  >>   more evil, they will try to code around the non-null-able-style getting
  >>   back to there well known null-able behavior, by using bools, ints,
  >>   strings whatever ->   that will not help in library growth around D
  >>
  >>   try comming up with an pattern that keeps both pro/cons...
  >
  >  No one is talking about removing nullable references but rather adding
  >  non-nullable types and making them default. You could still achieve old
  >  behavior if it is needed (most proposed proposed syntax):
  >
  >  Foo? foo = stuff.find(predicate);
  >  if (foo is null) {
  >        // not found
  >  }

   >  No one is talking about removing nullable references
  sorry

   >  most proposed proposed syntax


  sound very similar to the long talked about "make parameters const per
  default" proposal - which is also still not there :(
const parameters no good per default. I reuse parameters whole time to space conservations. Imagine if browser need twice space for all functions, make them too big. Already needs 2 GigaB of my 4 GigaB system. We need fight bloat with every weapons. They fix this bug: this(int x, int y) { x = x; this.y = y; } but only bad coder make mistakes. D programmers old C++ veterans so make no mistakes ever.
in a library that is not written by you should be: void functionX( int& refered ) { refered = 10; } was mistakenly types as void functionX( int refered ) { refered = 10; } now start to find the error in your million lines of code i've usualy find errors like these in projects im involed in
Oct 29 2010
prev sibling parent reply tls <do notha.ev> writes:
dennis luehring Wrote:

 Am 29.10.2010 09:26, schrieb Roman Ivanov:
 They would be a great help in debugging programs, for example.
 NullPointerException is probably the most common error I see in Java.
 95% of all times it gets thrown in some weird context, which gives you
 no idea about what happened. The result is a long and tedious debugging
 session.
100% correct - but to have null-able types help to writer code faster in the prototype phase, and not having them will also change the way developers are "forced" to write code and there are million developers out there who likes/and use null-able values for flow-control - if the null-able "feature" is removed without something that keeps the style working, you will loose them, or much more evil, they will try to code around the non-null-able-style getting back to there well known null-able behavior, by using bools, ints, strings whatever -> that will not help in library growth around D try comming up with an pattern that keeps both pro/cons...
I need null whole time. Null = empty array. Null = empty object. Null = 0 integer. Null = 0 floating point. Null = "" String with length 0. Null = void return in function. It so good many places. Not have null make me not know how to program! then I switch to C++!
Oct 29 2010
parent dennis luehring <dl.soluz gmx.net> writes:
Am 29.10.2010 13:36, schrieb tls:
 dennis luehring Wrote:

  Am 29.10.2010 09:26, schrieb Roman Ivanov:
  >  They would be a great help in debugging programs, for example.
  >  NullPointerException is probably the most common error I see in Java.
  >  95% of all times it gets thrown in some weird context, which gives you
  >  no idea about what happened. The result is a long and tedious debugging
  >  session.

  100% correct - but to have null-able types help to writer code faster in
  the prototype phase, and not having them will also change the way
  developers are "forced" to write code

  and there are million developers out there who likes/and use null-able
  values for flow-control - if the null-able "feature" is removed without
  something that keeps the style working, you will loose them, or much
  more evil, they will try to code around the non-null-able-style getting
  back to there well known null-able behavior, by using bools, ints,
  strings whatever ->  that will not help in library growth around D

  try comming up with an pattern that keeps both pro/cons...
I need null whole time. Null = empty array. Null = empty object. Null = 0 integer. Null = 0 floating point. Null = "" String with length 0. Null = void return in function. It so good many places. Not have null make me not know how to program! then I switch to C++!
ok you one of these null-using guys - does null also mean time-by-time and error to you ... so what is null in you world? a maybe-no-result, maybe-error or even both
Oct 29 2010
prev sibling parent bearophile <bearophileHUGS lycos.com> writes:
Sorry for the answering delay.

Don:

 With the bugs I've fixed in the DMD source, I've seen very many cases of
 7, several cases of 2 and 6, and only one case of 8.
 Many bugs are also caused by dangerous casts (where a pointer is cast
 from one type to another).
 But almost everything else been caused by a logic error.
 
 I am certain that there are still many null pointer bugs in DMD.
Thank you for your numbers. We may think about ways to reduce similar bugs in D code. ------------------ Walter Bright:
 None of the null pointer bugs dmd has had would have been prevented by using
 non-nullable types. I.e. they were not "I forgot to initialize this pointer",
 but were instead the result of logic errors, like running off the end of a
list.
Non-null references/pointers are meant to protect against more than just forgotten initalializations. There are ways to help against some of those that you call "logic errors".
 NULL in dmd represents "this datum has not been computed yet" or "this datum
has
 an invalid value" or "this datum does not exist". With non-nullable types,
 they'd have to be set to a datum that asserts whenever it is accessed, leading
 to the same behavior.
If not-null pointers are supported, then probably a language has to ask you to manage the null case explicitly (in D this is less easy to do). --------------------------- Walter:
 It still causes the same slowdown. If the CPU can speculatively execute 5
 instructions ahead, then you're reducing it to 4.
I don't think the situation is as linear as you say :-) Anyway, I have experimentally seen both in few years of Delphi usage and some the slow down they cause is comparable or less than the slowdown caused by array bound tests.
 If you were right I'd be reading all the time about integer overflow bugs,
 not buffer overflow bugs.
Good C compilers (and lints) warn against mixing signed-unsigned values :-) So maybe the programmers are able to avoid those bugs. But I agree that buffer overflow bugs are more common in C code.
 Languages which have these have failed to gain traction. That might be for
other
 reasons, but it's not ausipcious.
:-)
 Delphi has also failed in the marketplace.
Delphi (and TurboPascal) has being used for many years, in the whole world, probably it will eventually die because people like C-family languages more, but so far it has had 100 times the success of D. Delphi devs also have people use Ada in high safety software, and people use Ada still for such purposes.
 Or perhaps because people really aren't having a problem with integer
overflows.
I don't know.
 Such a switch is completely impractical, because such a language would then
have
 two quite incompatible variants.
I agree, that's why I prefer integer overflows.
 Also, *you* care about performance,
I care more for correct programs and for a language that helps me spot bugs very quickly. --------------------------- dennis luehring:
 and there are million developers out there who likes/and use null-able
 values for flow-control - if the null-able "feature" is removed without
 something that keeps the style working, you will loose them, or much
 more evil, they will try to code around the non-null-able-style getting
 back to there well known null-able behavior, by using bools, ints,
 strings whatever -> that will not help in library growth around D
 
 try comming up with an pattern that keeps both pro/cons...
Now it's too much late to introduce not-nullables on default, so this isn't a problem. --------------------------- Denis Koroskin:
 No one is talking about removing nullable references but rather adding  
 non-nullable types and making them default. You could still achieve old  
 behavior if it is needed (most proposed proposed syntax):
 
 Foo? foo = stuff.find(predicate);
 if (foo is null) {
      // not found
 }
See my answer to dennis luehring. Now we can hope to do the opposite, to introduce some syntax to denote what pointers/references can't be null. --------------------------- Now references/pointers in D are nullable on default. But a light syntax may be added to denote pointers that can't be null. I have suggested to add a trailing (or +) to denote not-nullable reference or pointer: class T {} T nullable_reference; T nonnullable_reference = new T (); struct S {} S nullable_pointer; S nonnullable_pointer = new S (); class T {} T nullable_reference; T+ nonnullable_reference = new T+(); struct S {} S nullable_pointer; S+ nonnullable_pointer = new S+(); With OOP there is a problem in using this kind of references, see this for more info, this is an unfinished proposal: http://d.puremagic.com/issues/show_bug.cgi?id=4571 I think this first part may be implemented well and it will avoid some null bugs. The Walter:
 Having the program abort due to an assert failure rather than a segment
 violation is not the great advance it's sold as.
The compiler enforces the not-null nature at compile-time, so some run-time errors may be avoided. You can't give a nullable pointer to a function with this annotation, so there is no risk of forgetting to test for null and it can't cause a null reference error: void foo(SomeClass c) { In other situations when the not-nullable reference is created it may generate an assert error, but this is better than a segment violation later because the bug is spotted when the reference is built, and not when it's used, so the line number of the assert error is closer to the true location of the bug. This speeds up debugging. But you are right that not-null references/pointers can't catch all "logic errors", because in many/some situations you need nullable references/pointers. So there's a optional second half of the proposal. To have a more null-safe language you need to ask for tests every time a nullable pointer is about to be dereferenced, unless you already are inside a branch of a "if" where the reference/pointer is statically known to be non-nullable (or below an assert(some_pointer)). In presence of gotos and exceptions this becomes harder to do, so the compiler may need to act conservatively and ask for a test (or that assert(some_pointer) where it can't infer. Bye, bearophile
Oct 31 2010
prev sibling parent reply bearophile <bearophileHUGS lycos.com> writes:
Walter:

 I'd hoped to be able to obsolete them by designing the need for such checks
out of the language.
This is a good purpose, I agree that when possible it's the best solution, and indeed D makes useless (or less useful) many of the bugs found by C lints. That Condate tool is simple, its rule set is less than a page long, so it can't find very complex kind of bugs. Much more refined lints (for C), that contain quite more than just few simple rules, are able to find more complex bugs. There is a free online demo for a good lint.
 There's no need to check for an error that cannot be expressed.
Right :-)
 Looking at what the rule based analyzers do falls into predictable categories:
Commercial lints for C are probably able to find other kind of bugs too. Even Splint (a free lint for C) is probably able to do more than your list (but you need to add some semantics annotations to the C code if you want Split to do that).
 Keep in mind that such tools can also produce large quantities of false
positives,
That Condate tool is able to find only simple bugs, but I think its false positives aren't many. The static analyzer of Clang is supposed to have a really low amount of false positives, so low that I think it may be configured to submit bug reports automatically :-)
 requiring ugly workarounds or causing the programmer to miss the real
 bugs. Keep in mind also that these tools are often way oversold - they catch a
 few kinds of bugs, but not logic errors. Over time, I've found my own coding
 bugs that such tools might catch get less and less rare. The bugs in my code
are
 logic errors that no tool could catch.
I am quite sure that if I run a good C/C++ lint on the D front-end it may catch a large (hundreds) of bugs. But even if you are right, that you don't write bugs that a simple rule-based analyzers is able to catch, the world is full of people that don't have your level of experience in C/C++ coding. So for them a lint may be useful. I have some practice for C, I keep my brain switched on when I write C code, but I have found a good C lint able to find some mistakes in my C code. It gives many false positives (well, most of them are true "bugs" but I don't want to fix them, like giving a signed integer to the malloc function), but in the end I want to keep using it. I am sold on it.
 Here's how D deals with them:
Once in a while I use a low-level coding style in D, in such cases I may fall in several traps of C. A lint tool for D may help in those parts of code too. I have merged your two parallel list of points, to make my answers more readable.
 1. Memory allocation errors - failure to free, dangling pointers, redundant
frees
 1. Garbage collection.
The GC avoids a large number of bugs. For the situations where the GC is unfit Ada shows the usage of more than one class of pointers, with different capabilities. This may reduce the bug rate (but introduces some extra complexity). In past for example I have suggested to statically differentiate pointers to GC-managed memory from pointers to manually managed memory (so they are two different kind of pointers), because they are quite different (example: putting tags inside a GC-managed pointer is a bad idea). You answered me that this introduces too much complexity in the language.
 4. Memory corruption, such as buffer overflows
 4. Array bounds checking, and safe mode in general, solves this.
Array bounds checking slows down code a lot. Often more than 20%. See below for my answer to point 8. (Static analysis in recent JavaVMs is able to infer that many of those tests checks are useless and removed them with no harm for the code).
 6. Failure to deal with error returns
 6. Exceptions solve this
Yet I don't see exceptions used much in Phobos yet :-] Example: in Python list.index() throws an exception if the item is missing, while indexOf returns -1.
 7. Null pointer dereferencing
 7. Certainly the idea of non-null types has a lot of adherents, D doesn't have
that.
It's too much late to to make D2 references non-null on default now, so that discussion is closed. But in bug http://d.puremagic.com/issues/show_bug.cgi?id=4571 I have proposed something different that once improved may be added still, and may improve the situation. The idea is in two parts. The first is to add a simple syntax to denote a non-null reference or pointer. A possibility is to use the suffix: class T {} T nullable_reference; T nonnullable_reference = new T (); struct S {} S nullable_pointer; S nonnullable_pointer = new S (); A possible alternative is to use the - (or +) suffix: class T {} T nullable_reference; T+ nonnullable_reference = new T+(); struct S {} S nullable_pointer; S+ nonnullable_pointer = new S+(); A possible problem with non-null class references can be seen with this D program that uses the trailing syntax: class Foo {} class A { Foo name; this(Foo s) { this.name = s; this.m(); } void m() { /*...*/ } } class B : A { Foo path; this(Foo p, Foo s) { super(s); this.path = p; } override void m() { // here this.path is null despite it's a non-null assert(this.path !is null); } } void main() { new B(new Foo, new Foo); } I have adapted that example from this paper, it discusses about partially uninitialized objects too: http://research.microsoft.com/pubs/67461/non-null.pdf A comment about that program from the paper:
The problem with the code is that during the base call to A's constructor, the
virtual method B.m may be invoked. At this time, field path of the object under
construction has not yet been initialized. Thus, accesses of this.path in
method B.m may yield a possibly-null value, even though the field has been
declared as being non-null.<
[Another more recent possible idea not present in the bug 4571: a island of null-safe code may be present inside more code that is not null-safe. The soundness of the whole code may be assured if the compiler adds run-time tests at the interface points between the two realms. I have seen this solution used to introduce "gradually" dependently typed code inside a larger amount of code that lacks dependent type annotations: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.118.4557 ] That's just the first half of a solution. Beside introducing nonnull pointers/references, and a handy syntax to denote them, to have a null-safe language you also need to require explicit tests every time a nullable pointers/references is about to be dereferenced, and then after this test in the else branch the reference type "becomes" a non-nullable one. This requires a small amount of flow analysis. This second part of solution is optional, but it makes the overall solution good enough.
 8. Signed/unsigned mismatching
 8. The only successful solution to this I've seen is Java's simply not having
unsigned types. Analysis tools just produce false positives. When I write C code I always keep this GCC warning active, and I find it useful. A partial solution to this problem are compile-time/run-time overflows from integral types. If I assign a -1 to an unsigned value (without an esplicit static cast) there is a compile-time error. In your list you are forgetting integral overflows too, that for example static analysis in SPARK is often able to avoid (but you need lot of time and brain to write such code, so it's not a general solution for D or other languages, it's fit only for special code). See below for more answers.
 I don't think there's much value left for add-on static analysis tools.
In the end I don't have a D lint to test/try, so I can't know if you are right :-) Recently in this newsgroup I have shown something about Dependant types, their type system is able to statically catch higher order bugs (well, to enforce certain higher order invariants) in the code :-) D is not able to catch such bugs: http://www.bluishcoder.co.nz/2010/09/01/dependent-types-in-ats.html Thank you for all your answers. -------------------------- Ellery Newcomer:
 or python's variation: not having fixed integer types at all.
 
 Question: would it be feasible to make D use some sort of bignum type as
 the default index type for arrays, etc, but make it possible for the
 programmer to use uint or ulong or whatever for e.g. places where
 performance is an issue?
Another solution are integral overflows.
 I went to the trouble of modifying dmd to warn on unsigned/signed
 comparison. It found me some bugs which probably would not have been
 noticed otherwise. Did it produce false positives? Yes. Did that make me
 wish I hadn't done it? Hell no.
I guess most not-DMD D compilers will have this warning. -------------------------- Walter:
 I suspect that's a big jump in complexity for very little gain.
Well, avoiding many integral-related bugs is a nice gain.
 I also suspect that it would result in a drastic performance problem (remember,
 Python is 100x slower than native code),
Python is 100-130x slower than native code only if your native code is numerically-intensive and really well written for performance, otherwise in most cases you don't end up more than 20-50x slower. If your Python code uses strings, it uses functions written in C that are usually faster than the ones found in std.string, so your Python code is not slow. If your D code is high-level and looks like Java code, it's probably no more than 4-20x slower than Python code. Regarding the source of the low Python performance you are quite wrong. Psyco is a JIT for Python, if you use Psyco you keep using multi-precision numbers, and it's very uncommon to see good Psyco programs to run more than 10-15 times slower than perfectly optimized D programs. Python is slow first of all because it's dynamic, and because its interpreter is designed to be simple and hackable by not very expert C open source programmers. CLisp and OCaML implementations use tagged integers, and from what I have have seen you can't expect CLisp code that uses lot of integers more than 2-3 times slower than D code. Often the difference is lower (and very often the difference is speed doesn't come from tagged integral numbers, but has other causes, first of all that lot of CLisp code is dynamically typed). you usually don't see the code more than 50% slower even if the code is using integer numbers all the time :-) Usually the percentage is lower. In practice this is not a problem, especially if you are debugging your program :-)
 and will disable the advantages D has
 with type inference (as the user will be back to explicitly naming the type).
You may be right, but here I don't fully understand you.
 You might want to consider changing your coding style to eschew the use of
 unsigned types.
This is unfortunately impossible in D, Phobos, arrays and other things are full of unsigned numbers...
 That's a good sentiment, but it doesn't work that way in practice. Warnings
 always become de-facto requirements. They aren't the solution to a disagreement
 about how the language should work.
Comparing signed and unsigned values is an hazard in D. ----------------------------- KennyTM~:
 Why .length cannot return a signed integer (not size_t) by default?
I have a bug report on this: http://d.puremagic.com/issues/show_bug.cgi?id=3843 Recently it was discussed, but not appreciated :-) ----------------------------- Walter:
 20% is an unacceptably huge penalty.
In D there are array bound tests that often slow down array-intensive code more than 20%. You probably accept that slow down because it's optional (lot of future D programmer will not want to remove those run time tests. You can't assume that most people will what to remove them!). I am using OCaML that uses tagged integers and it's far from being a slow language. And I am willing to accept an optional 50% slow down (or even 100%, this means it runs half the speed) for optional integral overflows (on non-growable integral values). I'd like to learn to modify DMD just to introduce those overflows, maybe in few more years I'll be able to do it :-) Bye, bearophile
Oct 27 2010
next sibling parent bearophile <bearophileHUGS lycos.com> writes:
 Yet I don't see exceptions used much in Phobos yet :-]
Andrei has used enforce() in many places, so there are many exceptions. But enforce() is used like a non-removable assert(), the exceptions it generates are generic... Once DMD has two compiled Phobos libs inside its binary distribution (one with asserts too), those enforce() are better replaced by asserts. Bye, bearophile
Oct 27 2010
prev sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
bearophile wrote:
 Commercial lints for C are probably able to find other kind of bugs too. Even
 Splint (a free lint for C) is probably able to do more than your list (but
 you need to add some semantics annotations to the C code if you want Split to
 do that).
When you find yourself adding semantic annotations to the C code, or providing extra semantic rules to the static analyzer, what that really is saying is that the abstract type abilities of the language being analyzed are deficient.
 The static analyzer of Clang is supposed to have a
 really low amount of false positives, so low that I think it may be
 configured to submit bug reports automatically :-)
We'll see. A lot of organizations treat false positives as "bugs" simply because it's easier to deal with them that way.
 I am quite sure that if I run a good C/C++ lint on the D front-end it may
 catch a large (hundreds) of bugs.
I don't believe it, but feel free to prove me wrong. I'll be happy to fix any bugs found that way, but not false positives.
 But even if you are right, that you don't
 write bugs that a simple rule-based analyzers is able to catch, the world is
 full of people that don't have your level of experience in C/C++ coding. So
 for them a lint may be useful.
I found lint useful for maybe a year or so, and then it just stopped finding any problems in my code. Not that there weren't bugs, not at all, but I had simply learned to not do the kinds of things lint detects.
 1. Memory allocation errors - failure to free, dangling pointers, redundant
 frees 1. Garbage collection.
The GC avoids a large number of bugs. For the situations where the GC is unfit Ada shows the usage of more than one class of pointers, with different capabilities. This may reduce the bug rate (but introduces some extra complexity). In past for example I have suggested to statically differentiate pointers to GC-managed memory from pointers to manually managed memory (so they are two different kind of pointers), because they are quite different (example: putting tags inside a GC-managed pointer is a bad idea). You answered me that this introduces too much complexity in the language.
Microsoft's Managed C++ does exactly this. While a technical success, it is a complete failure in regards to pleasing programmers. I'm also very familiar with using multiple pointer types, which are a necessity for DOS programming. I'm sick of it. It sucks. I don't want to go back to it, and I don't know anyone who does.
 4. Memory corruption, such as buffer overflows 4. Array bounds checking,
 and safe mode in general, solves this.
Array bounds checking slows down code a lot. Often more than 20%. See below for my answer to point 8.
That's why it's selectable with a switch.
 (Static analysis in recent JavaVMs is able to infer that many of those tests
 checks are useless and removed them with no harm for the code).
I know about data flow analysis, and I was able to implement such checking for array bounds. But it is of only limited effectiveness.
 6. Failure to deal with error returns 6. Exceptions solve this
Yet I don't see exceptions used much in Phobos yet :-] Example: in Python list.index() throws an exception if the item is missing, while indexOf returns -1.
That is because there is a different idea of what an "error" is when indexing a list.
 8. Signed/unsigned mismatching 8. The only successful solution to this I've
 seen is Java's simply not having
unsigned types. Analysis tools just produce false positives. When I write C code I always keep this GCC warning active, and I find it useful.
The rate of false positives for such make them not suitable for inclusion in the language.
 In your list you are forgetting integral overflows too, that for example
 static analysis in SPARK is often able to avoid (but you need lot of time and
 brain to write such code, so it's not a general solution for D or other
 languages, it's fit only for special code).
I did that deliberately, as I haven't seen any focus on them in static analysis tools. But I do know that it is of particular interest to you.
 I also suspect that it would result in a drastic performance problem
 (remember, Python is 100x slower than native code),
Python is 100-130x slower than native code only if your native code is numerically-intensive and really well written for performance, otherwise in most cases you don't end up more than 20-50x slower.
Even 20% slower, let alone 20 times slower, is unacceptable.
 CLisp and OCaML implementations use tagged integers, and from what I have
 have seen you can't expect CLisp code that uses lot of integers more than 2-3
 times slower than D code.
Such slowdowns are simply not acceptable for D.

 code you usually don't see the code more than 50% slower even if the code is
 using integer numbers all the time :-) Usually the percentage is lower. In
 practice this is not a problem, especially if you are debugging your program
 :-)
50% slower than C++ means that people will not switch from C++ to D. I think a total 5% slowdown relative to C++ is about the max acceptable. In particular, I do not view integer overflows as remotely big enough of a problem to justify such massive slowdowns. Yes, I've had an overflow bug here and there over the years, but nothing remotely as debilitating as uninitialized data bugs or pointer bugs.
Oct 27 2010
parent reply bearophile <bearophileHUGS lycos.com> writes:
Walter:

 When you find yourself adding semantic annotations to the C code, or providing
 extra semantic rules to the static analyzer, what that really is saying is that
 the abstract type abilities of the language being analyzed are deficient.
Right. Some of my recent suggestions like the outer or the optional_tag() ( http://d.puremagic.com/issues/show_bug.cgi?id=5125 ) attributes are ways to give more semantics to the D compiler. On the other hand Cyclone (and heavily annotated C code for Splint) shows that too many annotations make code writing an unpleasant experience (or even a pain. Writing D2 code is a bit more painful just because of const correctness), so you must keep a balance. That idea about islands in that dynamic dependent types paper is a possible way to solve this problem with annotations.
 I found lint useful for maybe a year or so, and then it just stopped finding
any
 problems in my code.
I am using C for quite more than a year, yet I find useful a good lint. You are surely much better than me in C.
 Microsoft's Managed C++ does exactly this. While a technical success, it is a
 complete failure in regards to pleasing programmers.
I see.
 I'm also very familiar with using multiple pointer types, which are a necessity
 for DOS programming. I'm sick of it. It sucks. I don't want to go back to it,
 and I don't know anyone who does.
I see. I will need to program more in Ada to see if you are right regarding the why Ada uses pointers.
 I know about data flow analysis, and I was able to implement such checking for
 array bounds. But it is of only limited effectiveness.
This is interesting. You have done lot of things. (Time ago I have used the Java SciMark 2.0 benchmark before and after the introduction of that static analyis and I have seen the difference.)
 as I haven't seen any focus on them in static analysis tools.
Most lints don't focus on them because they are hard to spot statically, especially if you use C language. To avoid overflow bugs with a static analysis you need big guns, as done by SPARK on a subset of Ada, but it's a language for special purposes.
 [integer overflows]
 50% slower than C++ means that people will not switch from C++ to D. I think a
 total 5% slowdown relative to C++ is about the max acceptable.
 ...
 In particular, I do not view integer overflows as remotely big enough of a
 problem to justify such massive slowdowns.
(There I was talking about integer overflows.) I answer with one of your answers:
 That's why it's selectable with a switch.
(But for integral overflows in D I suggest two switches, one for unsigned and one for both signed and unsigned values).
 Yes, I've had an overflow bug here
 and there over the years, but nothing remotely as debilitating as uninitialized
 data bugs or pointer bugs.
D allows you to remove a large percentage of the memory-related bugs, so now the percentage of integer-related bugs (over the new total of bugs) becomes higher :-) Thank you for your many answers. Bye, bearophile
Oct 27 2010
parent bearophile <bearophileHUGS lycos.com> writes:
 (But for integral overflows in D I suggest two switches, one for unsigned and
one for both signed and unsigned values).
I meant one for signed and one for both signed and unsigned values, sorry. Bye, bearophile
Oct 27 2010