digitalmars.D.learn - Empty Array is Null?
- Brian White (5/5) Mar 19 2008 char[] array = "".dup;
- Steven Schveighoffer (28/32) Mar 19 2008 Here is my guess:
- Brian White (2/3) Mar 19 2008 It confirms I'm not going insane, and that's always helpful. :-)
- Frits van Bommel (54/88) Mar 19 2008 Sorry, but your guess is wrong:
- Steven Schveighoffer (18/67) Mar 19 2008 My view is that array is null should not compile, as array is not a poin...
- Frits van Bommel (16/41) Mar 19 2008 Indeed, no program should be able to get a non-empty array with .ptr ==
- Steven Schveighoffer (7/48) Mar 19 2008 Good point. I wonder if comparing any struct to null is equivalent to
- BCS (6/10) Mar 19 2008 no this is the weird part:
- Frits van Bommel (4/19) Mar 19 2008 It does with DMD, if you s/ap/cp/, but I'm pretty sure what you're doing...
- torhu (6/26) Mar 19 2008 Can't see why that would be undefined. It's pretty clear what it means.
- Frits van Bommel (12/39) Mar 19 2008 Yes, it means constructing a completely invalid array. :)
- bearophile (4/7) Mar 20 2008 This can be acceptable for a little C compiler, like TinyCC, but D langu...
char[] array = "".dup; assert(array !is null); This will exit because the assert condition is false. Why is that? -- Brian
Mar 19 2008
"Brian White" wrotechar[] array = "".dup; assert(array !is null); This will exit because the assert condition is false. Why is that?Here is my guess: The compiler does not allocate a piece of memory for "", and so the array struct for it looks like: { ptr = null, length = 0 } If you dup this, it gives you the same thing (no need to allocate an array of size 0). Now, here is the weird part. The compiler does some magic with arrays. If you are comparing an array with null, it changes the code to actually just compare the array pointer to null. So, the the following code: array !is null is translated to: array.ptr !is null And this is why the program fails the assert. The sucky part about all this is that if you have an empty array where the pointer is NOT null, then you get a different result (that array is not considered to be null) So an array is null ONLY if the pointer is null. An array is empty if the length is 0. If you want to check for an empty array, just check that the length is 0. If you want to make sure that the pointer is null (which implies the length is 0), then check against null. array = null; array.length = 5; // you would expect a segfault here Because array is really a struct with some compiler magic, the variable array itself can never truly be null. Anyways, hope this helps. -Steve
Mar 19 2008
Anyways, hope this helps.It confirms I'm not going insane, and that's always helpful. :-) -- Brian
Mar 19 2008
Steven Schveighoffer wrote:"Brian White" wroteSorry, but your guess is wrong: --- urxae urxae:~/tmp$ cat test.d import std.stdio; void main() { writefln("%s", "".ptr); } urxae urxae:~/tmp$ dmd -run test.d 805C41C ---char[] array = "".dup; assert(array !is null); This will exit because the assert condition is false. Why is that?Here is my guess: The compiler does not allocate a piece of memory for "", and so the array struct for it looks like: { ptr = null, length = 0 }If you dup this, it gives you the same thing (no need to allocate an array of size 0).Since as I mentioned above the input wasn't null, so it's not "the same thing". Otherwise, this is correct (including the reason given; actually allocating 0 bytes is pretty useless). The fact that empty_arr.dup returns null has been the topic of some discussion in the newsgroups IIRC, but the fact is that it's equivalent to allocating a zero-byte array on the heap in the most important aspects: * The returned array has the correct length. * All elements of the returned array are identical to the original array. [1] * All of the returned array's elements can be freely modified without modifying the original array. [1] * Changing any of the original elements doesn't change the returned array. [1] * Appending anything to the returned value doesn't risk changing anything previously allocated (as the GC will allocate a new block of memory when appending to a non-gc-allocated array; which includes null arrays). On top of all that, it's also very efficient since it doesn't require any allocation (at least, until anything is appended onto it). The *only* property it doesn't have that 'normal' .dups do have is that normal .dups return unique non-null values. The only ways to even detect that are by 'is'-comparing to null (or a null-valued array) or (implicitly or explicitly) casting it to a boolean. All other behavior is completely consistent. The discussion on the NGs was, IIRC, between those who considered 'null' to mean "no string" while considering other empty strings as "empty string" and those who just don't see any reason to explicitly distinguish between the two. In the end, I believe, it came down to "Walter is in the latter camp". [1]: These are trivially true since having no elements that can be read or written means they don't actually require anything for empty arrays.Now, here is the weird part. The compiler does some magic with arrays. If you are comparing an array with null, it changes the code to actually just compare the array pointer to null. So, the the following code: array !is null is translated to: array.ptr !is null And this is why the program fails the assert.Actually, if you compare an array to null (using 'is') DMD performs an 'or' instruction on the .ptr and .length and tests for the flag that it sets if the result is zero. This is just an optimization; this is equivalent to checking if both .ptr and .length are 0 (though presumably faster, since it's a single instruction that doesn't even implement full comparison).The sucky part about all this is that if you have an empty array where the pointer is NOT null, then you get a different result (that array is not considered to be null)Actually, 'array == null' should return true for any empty array. Testing arrays with 'is' explicitly requests comparing .ptr and .length directly, not paying any attention to the contents; 'is' checks for identity, '==' for equivalence.So an array is null ONLY if the pointer is null. An array is empty if the length is 0. If you want to check for an empty array, just check that the length is 0. If you want to make sure that the pointer is null (which implies the length is 0), then check against null.Other ways to check for an empty array are 'arr == ""' or 'arr == null' (using '==' instead of 'is')
Mar 19 2008
"Frits van Bommel" wroteSteven Schveighoffer wrote:Hm... ok, like I said it was a guess :)"Brian White" wroteSorry, but your guess is wrong: --- urxae urxae:~/tmp$ cat test.d import std.stdio; void main() { writefln("%s", "".ptr); } urxae urxae:~/tmp$ dmd -run test.d 805C41Cchar[] array = "".dup; assert(array !is null); This will exit because the assert condition is false. Why is that?Here is my guess: The compiler does not allocate a piece of memory for "", and so the array struct for it looks like: { ptr = null, length = 0 }On top of all that, it's also very efficient since it doesn't require any allocation (at least, until anything is appended onto it). The *only* property it doesn't have that 'normal' .dups do have is that normal .dups return unique non-null values. The only ways to even detect that are by 'is'-comparing to null (or a null-valued array) or (implicitly or explicitly) casting it to a boolean. All other behavior is completely consistent. The discussion on the NGs was, IIRC, between those who considered 'null' to mean "no string" while considering other empty strings as "empty string" and those who just don't see any reason to explicitly distinguish between the two. In the end, I believe, it came down to "Walter is in the latter camp".My view is that array is null should not compile, as array is not a pointer type. Having statements like this confuses new coders into thinking array is a pure pointer or reference type, when in fact it is a struct. This is to an array being a heap-allocated type. But I seriously doubt my view is going to change anything like others before me :)Actually, if you compare an array to null (using 'is') DMD performs an 'or' instruction on the .ptr and .length and tests for the flag that it sets if the result is zero. This is just an optimization; this is equivalent to checking if both .ptr and .length are 0 (though presumably faster, since it's a single instruction that doesn't even implement full comparison).Huh? Why does it do that? If you have a null pointer, then clearly the length should be 0. An optimization in my mind would be to just replace array is null to array.ptr is null. Is there a good reason to have a null pointer array with a non-zero length?I would guess that the newest D compiler would not allow that, since comparing to null is now an error except for using 'x is null' Of course, this is another guess, since I haven't downloaded the new compiler yet :) -SteveThe sucky part about all this is that if you have an empty array where the pointer is NOT null, then you get a different result (that array is not considered to be null)Actually, 'array == null' should return true for any empty array. Testing arrays with 'is' explicitly requests comparing .ptr and .length directly, not paying any attention to the contents; 'is' checks for identity, '==' for equivalence.
Mar 19 2008
Steven Schveighoffer wrote:"Frits van Bommel" wroteIndeed, no program should be able to get a non-empty array with .ptr == null. However, it appears the compiler currently doesn't use that as an optimization opportunity. Maybe even only because Walter didn't think of it, or just because it doesn't really save that much and it isn't worth the trouble of checking if one of the values is known to be null at compile time. The 'or' is itself an optimization that only applies when comparing to a 0-length null array, but this optimization may well be implemented completely in the compiler backend which doesn't know that the length should always be null if the pointer is; it may only know that it needs to compare these two numbers against those other two numbers and jump based on the result...Actually, if you compare an array to null (using 'is') DMD performs an 'or' instruction on the .ptr and .length and tests for the flag that it sets if the result is zero. This is just an optimization; this is equivalent to checking if both .ptr and .length are 0 (though presumably faster, since it's a single instruction that doesn't even implement full comparison).Huh? Why does it do that? If you have a null pointer, then clearly the length should be 0. An optimization in my mind would be to just replace array is null to array.ptr is null. Is there a good reason to have a null pointer array with a non-zero length?I'm pretty sure it's only an error when comparing class instances. It shouldn't be an error to compare pointers or arrays against null. (There's no reason for it to be since they don't use vtables)I would guess that the newest D compiler would not allow that, since comparing to null is now an error except for using 'x is null' Of course, this is another guess, since I haven't downloaded the new compiler yet :)The sucky part about all this is that if you have an empty array where the pointer is NOT null, then you get a different result (that array is not considered to be null)Actually, 'array == null' should return true for any empty array. Testing arrays with 'is' explicitly requests comparing .ptr and .length directly, not paying any attention to the contents; 'is' checks for identity, '==' for equivalence.
Mar 19 2008
"Frits van Bommel" wroteSteven Schveighoffer wrote:Good point. I wonder if comparing any struct to null is equivalent to comparing if all it's values are 0..."Frits van Bommel" wroteIndeed, no program should be able to get a non-empty array with .ptr == null. However, it appears the compiler currently doesn't use that as an optimization opportunity. Maybe even only because Walter didn't think of it, or just because it doesn't really save that much and it isn't worth the trouble of checking if one of the values is known to be null at compile time. The 'or' is itself an optimization that only applies when comparing to a 0-length null array, but this optimization may well be implemented completely in the compiler backend which doesn't know that the length should always be null if the pointer is; it may only know that it needs to compare these two numbers against those other two numbers and jump based on the result...Actually, if you compare an array to null (using 'is') DMD performs an 'or' instruction on the .ptr and .length and tests for the flag that it sets if the result is zero. This is just an optimization; this is equivalent to checking if both .ptr and .length are 0 (though presumably faster, since it's a single instruction that doesn't even implement full comparison).Huh? Why does it do that? If you have a null pointer, then clearly the length should be 0. An optimization in my mind would be to just replace array is null to array.ptr is null. Is there a good reason to have a null pointer array with a non-zero length?I think you are right. Now that I look at Walter's message, he said specifically comparing class to null is invalid... Thanks -SteveI'm pretty sure it's only an error when comparing class instances. It shouldn't be an error to compare pointers or arrays against null. (There's no reason for it to be since they don't use vtables)I would guess that the newest D compiler would not allow that, since comparing to null is now an error except for using 'x is null' Of course, this is another guess, since I haven't downloaded the new compiler yet :)The sucky part about all this is that if you have an empty array where the pointer is NOT null, then you get a different result (that array is not considered to be null)Actually, 'array == null' should return true for any empty array. Testing arrays with 'is' explicitly requests comparing .ptr and .length directly, not paying any attention to the contents; 'is' checks for identity, '==' for equivalence.
Mar 19 2008
Steven Schveighoffer wrote:Now, here is the weird part. The compiler does some magic with arrays. If you are comparing an array with null, it changes the code to actually just compare the array pointer to null. So, the the following code:no this is the weird part: IIRC this passes. char* cp = cast(char*)null; char[] ca = ap[0..15]; assert(ca.ptr == null && ca.length == 15);
Mar 19 2008
BCS wrote:Steven Schveighoffer wrote:It does with DMD, if you s/ap/cp/, but I'm pretty sure what you're doing is invoking undefined behavior. Or at least it should be, but I can't seem to find any mention of it in the spec...Now, here is the weird part. The compiler does some magic with arrays. If you are comparing an array with null, it changes the code to actually just compare the array pointer to null. So, the the following code:no this is the weird part: IIRC this passes. char* cp = cast(char*)null; char[] ca = ap[0..15]; assert(ca.ptr == null && ca.length == 15);
Mar 19 2008
Frits van Bommel wrote:BCS wrote:Can't see why that would be undefined. It's pretty clear what it means. Perhaps it should be an error when the compiler detects that you're setting .ptr to null but .length to nonzero. But the compiler can't be expected to detect that in the general case, so it would be of limited usefulness. A bit like disallowing comparing objects to 'null' with ==.Steven Schveighoffer wrote:It does with DMD, if you s/ap/cp/, but I'm pretty sure what you're doing is invoking undefined behavior. Or at least it should be, but I can't seem to find any mention of it in the spec...Now, here is the weird part. The compiler does some magic with arrays. If you are comparing an array with null, it changes the code to actually just compare the array pointer to null. So, the the following code:no this is the weird part: IIRC this passes. char* cp = cast(char*)null; char[] ca = ap[0..15]; assert(ca.ptr == null && ca.length == 15);
Mar 19 2008
torhu wrote:Frits van Bommel wrote:Yes, it means constructing a completely invalid array. :) Though perhaps it should only be undefined if you ever try to read or write the elements?BCS wrote:Can't see why that would be undefined. It's pretty clear what it means.Steven Schveighoffer wrote:It does with DMD, if you s/ap/cp/, but I'm pretty sure what you're doing is invoking undefined behavior. Or at least it should be, but I can't seem to find any mention of it in the spec...Now, here is the weird part. The compiler does some magic with arrays. If you are comparing an array with null, it changes the code to actually just compare the array pointer to null. So, the the following code:no this is the weird part: IIRC this passes. char* cp = cast(char*)null; char[] ca = ap[0..15]; assert(ca.ptr == null && ca.length == 15);Perhaps it should be an error when the compiler detects that you're setting .ptr to null but .length to nonzero. But the compiler can't be expected to detect that in the general case, so it would be of limited usefulness. A bit like disallowing comparing objects to 'null' with ==.I didn't say it should be detected, only that the compiler should be well within its rights to make your code crash if you do that :P. An error message would also be nice of course, but by no means required. Though as mentioned above, maybe the undefined behavior could be postponed until you actually try to read or write to the array. It's quite similar to dereferencing null pointers: the compiler can refuse to compile code that tries to do it, but most compilers will just generate crashing code...
Mar 19 2008
Frits van Bommel:It's quite similar to dereferencing null pointers: the compiler can refuse to compile code that tries to do it, but most compilers will just generate crashing code...This can be acceptable for a little C compiler, like TinyCC, but D language is supposed to a safer and less bug-prone language. Otherwise it's just sugared C++ ;-) Bye, bearophile
Mar 20 2008