digitalmars.D.learn - Checking if a string is null
- Max Samukha (22/22) Jul 24 2007 Using '== null' and 'is null' with strings gives odd results (DMD
- Hoenir (3/35) Jul 24 2007 Makes sense to me. is compares the pointer and == the content or
- Max Samukha (5/40) Jul 25 2007 Then, it's unclear what null content means. If it is the same as empty
- Regan Heath (36/67) Jul 25 2007 Not I, it's inconsistent IMO and it gets worse:
- Regan Heath (14/24) Jul 25 2007 There have been several, I did a brief search and came up with:
- Max Samukha (7/74) Jul 25 2007 You didn't update all writefln's :)
- Ald (2/2) Jul 25 2007 I believe the manual says that, when comparing, the compiler tries to ca...
- Regan Heath (49/58) Jul 25 2007 Not that I can find. The array page does say:
- Frits van Bommel (47/121) Jul 25 2007 As Max said, you forgot to update some writeflns. The output of the
- Regan Heath (34/76) Jul 25 2007 True. I guess what I meant to say was I'm in the '3 distict states'
- Frits van Bommel (36/77) Jul 25 2007 At least with that last paragraph I can agree ;)
- Regan Heath (15/36) Jul 25 2007 I can't tell in which way you're joking so I'm just going to come out
- Don Clugston (7/25) Jul 25 2007 I don't think that's really what's happening here.
- Derek Parnell (6/32) Jul 25 2007 But arrays are not vectors.
- Carlos Santander (5/8) Jul 25 2007 But empty arrays are not null. You could even argue that null arrays don...
- Derek Parnell (18/62) Jul 25 2007 Not in my world. I see that null arrays have no length. That is to say, ...
- Frits van Bommel (17/76) Jul 25 2007 But the fact of the matter is, 'T[] x = null;' reserves space for the
- Derek Parnell (23/94) Jul 25 2007 I'm trying not to set in concrete the ABI of variable-length arrays. So
- Oskar Linde (12/26) Jul 25 2007 Uhu... Why whould a slice of the full addressable memory space be a good...
- Derek Parnell (10/32) Jul 25 2007 Maybe x.ptr = size_t.max and x.length = size_t.max might be useful
- Frits van Bommel (8/21) Jul 26 2007 It's not the *full* addressable memory space for 1-byte types (the last
- Bruno Medeiros (9/21) Jul 26 2007 Today's T[] is "a slice type with value semantics and some provisions
- Regan Heath (25/27) Jul 26 2007 No, definately not. This is one of the things I love about arrays,
- Oskar Linde (15/36) Jul 25 2007 But that is not how T[] behaves in D. T[]s are of a dual slice/array
- Regan Heath (22/29) Jul 26 2007 Not true, the two arrays you mention below would still compare 'true' as...
- Derek Parnell (8/11) Jul 25 2007 I don't think this is such a good idea. How does one address the array o...
- Frits van Bommel (6/14) Jul 25 2007 I'm pretty sure the only way to obtain such an array would be to have
- Derek Parnell (14/29) Jul 25 2007 There is no basis for assuming that any RAM location is not addressable....
- Frits van Bommel (16/42) Jul 26 2007 I'm sorry, but what would then be the problem with accessing
- Derek Parnell (10/21) Jul 26 2007 Duh! I am so stupid! I misread Regan's original post. When he said "I...
- Regan Heath (7/15) Jul 26 2007 What I meant was:
- Bruno Medeiros (13/23) Jul 25 2007 The .ptr of empty arrays may be different than the .ptr of null arrays,
- Regan Heath (9/32) Jul 25 2007 Ick. IMO "".dup should allocate 1 byte of memory, set it to '\0' and
- Bruno Medeiros (8/45) Jul 25 2007 I meant that in current D they are semantically the same. (I should have...
- Regan Heath (6/21) Jul 25 2007 Yes, I remember it. I just forgot who was involved and what their
- Derek Parnell (9/18) Jul 25 2007 No they are not! Conceptually they are different things. However, D
- Bruno Medeiros (8/28) Jul 26 2007 Check my reply to Regan just above, what I meant to say is that in
- Derek Parnell (13/18) Jul 25 2007 However,
Using '== null' and 'is null' with strings gives odd results (DMD 1.019): void main() { char[] s; if (s is null) writefln("s is null"); if (s == null) writefln("s == null"); } Output: s is null s == null ---- void main() { char[] s = ""; if (s is null) writefln("s is null"); if (s == null) writefln("s == null"); } Output: s == null ---- Can anybody explain why s == null is true in the second example?
Jul 24 2007
Max Samukha schrieb:Using '== null' and 'is null' with strings gives odd results (DMD 1.019): void main() { char[] s; if (s is null) writefln("s is null"); if (s == null) writefln("s == null"); } Output: s is null s == null ---- void main() { char[] s = ""; if (s is null) writefln("s is null"); if (s == null) writefln("s == null"); } Output: s == null ---- Can anybody explain why s == null is true in the second example?Makes sense to me. is compares the pointer and == the content or something like that.
Jul 24 2007
On Wed, 25 Jul 2007 08:32:52 +0200, Hoenir <mrmocool gmx.de> wrote:Max Samukha schrieb:Then, it's unclear what null content means. If it is the same as empty string (ptr != null and length == 0), I remain confused. If it means a null string (ptr == null and length == 0), the second example should output nothing since s.ptr != null.Using '== null' and 'is null' with strings gives odd results (DMD 1.019): void main() { char[] s; if (s is null) writefln("s is null"); if (s == null) writefln("s == null"); } Output: s is null s == null ---- void main() { char[] s = ""; if (s is null) writefln("s is null"); if (s == null) writefln("s == null"); } Output: s == null ---- Can anybody explain why s == null is true in the second example?Makes sense to me. is compares the pointer and == the content or something like that.
Jul 25 2007
Max Samukha wrote:Using '== null' and 'is null' with strings gives odd results (DMD 1.019): void main() { char[] s; if (s is null) writefln("s is null"); if (s == null) writefln("s == null"); } Output: s is null s == null ---- void main() { char[] s = ""; if (s is null) writefln("s is null"); if (s == null) writefln("s == null"); } Output: s == null ---- Can anybody explain why s == null is true in the second example?Not I, it's inconsistent IMO and it gets worse: import std.stdio; void main() { foo(null); foo(""); } void foo(string s) { writefln(s.ptr, ", ", s.length); if (s is null) writefln("s is null"); if (s == null) writefln("s == null"); if (s < null) writefln("s < null"); if (s > null) writefln("s < null"); if (s <= null) writefln("s <= null"); if (s >= null) writefln("s < null"); writefln(""); } Output: 0000, 0 s is null s == null s <= null s < null 415080, 0 s == null s <= null s < null So, "" is < and == null!? and <=,== but not >=!? This all boils down to the empty vs null string debate where some people want to be able to distinguish between them and some see no point. I'm in the 'distinguishable' camp. I can see the merit. At the very least it should be consistent! Regan
Jul 25 2007
Manfred Nowak wrote:Regan Heath wroteThere have been several, I did a brief search and came up with: http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=55270 (this one was my fault) http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=25804 http://www.digitalmars.com/d/archives/digitalmars/D/learn/3521.html http://www.digitalmars.com/d/archives/21782.html http://www.digitalmars.com/d/archives/digitalmars/D/27123.html http://www.digitalmars.com/d/archives/16905.html http://www.digitalmars.com/d/archives/digitalmars/D/bugs/Issue_1314_New_Dupping_an_empty_array_creates_a_null_array_11585.html http://www.digitalmars.com/pnews/read.php?server=news.digitalmars.com&group=D&artnum=17083 Some of those go back a long, long way.This all boils down to the empty vs null string debate where some people want to be able to distinguish between them and some see no point.I haven't seen such a debate.Does it mean that it is not possible to implement a Kleene Algebra for strings in D because there is no neutral element for the alternative operator?I have no idea. :) Regan
Jul 25 2007
On Wed, 25 Jul 2007 11:12:19 +0100, Regan Heath <regan netmail.co.nz> wrote:Max Samukha wrote:You didn't update all writefln's :)Using '== null' and 'is null' with strings gives odd results (DMD 1.019): void main() { char[] s; if (s is null) writefln("s is null"); if (s == null) writefln("s == null"); } Output: s is null s == null ---- void main() { char[] s = ""; if (s is null) writefln("s is null"); if (s == null) writefln("s == null"); } Output: s == null ---- Can anybody explain why s == null is true in the second example?Not I, it's inconsistent IMO and it gets worse: import std.stdio; void main() { foo(null); foo(""); } void foo(string s) { writefln(s.ptr, ", ", s.length); if (s is null) writefln("s is null"); if (s == null) writefln("s == null"); if (s < null) writefln("s < null"); if (s > null) writefln("s < null"); if (s <= null) writefln("s <= null"); if (s >= null) writefln("s < null"); writefln(""); } Output: 0000, 0 s is null s == null s <= null s < null 415080, 0 s == null s <= null s < null So, "" is < and == null!? and <=,== but not >=!?This all boils down to the empty vs null string debate where some people want to be able to distinguish between them and some see no point. I'm in the 'distinguishable' camp. I can see the merit. At the very least it should be consistent! ReganAnyway, it feels like an undefined area in the language. Do the specs say anything about how exactly arrays/strings/delegates should compare to null? It seems to be more than comparing the pointer part of the structs.
Jul 25 2007
I believe the manual says that, when comparing, the compiler tries to call the opEquals() method. And calling that from null pointer yields undefined behavior. You should use _!is null_ construct instead. Max Samukha Wrote:
Jul 25 2007
<hangs head in shame> What can I say, I'm having a bad morning.So, "" is < and == null!? and <=,== but not >=!?You didn't update all writefln's :)Anyway, it feels like an undefined area in the language. Do the specs say anything about how exactly arrays/strings/delegates should compare to null? It seems to be more than comparing the pointer part of the structs.Not that I can find. The array page does say: "Strings can be copied, compared, concatenated, and appended:" .. "with the obvious semantics." but not much more on the topic. Under "Array Initialization" we see: * Pointers are initialized to null. .. * Dynamic arrays are initialized to having 0 elements. .. Which does not state that an array will be initialised to "null" but rather to something with 0 elements. To my mind something with 0 elements is 'empty' as opposed to being 'non existant' which is typically represented by 'null' or a similar value (like NAN for floats, 0xFF for char, etc). So, it seems the spec is hinting/saying that arrays cannot be non-existant, only empty (or not empty). And yet in the current implementation there is clearly a difference between 'null' and "" when it comes to arrays. I'm still firmly in favour of there being 3 distinct states for an array: * non existant (null) * empty ("", length == 0) * not empty (length > 0) That said I'm all firmly in favour of not getting a seg-fault when I have a reference to a non-existant array (we currently have this behaviour and it's perfect). All I think that needs 'fixing', and going back to your initial test case: char[] s = ""; if (s is null) writefln("s is null"); if (s == null) writefln("s == null"); neither of these tests should evaluate 'true'. The fact that the latter does indicates to me that the array compare is first comparing length, seeing they're both 0 and assuming the arrays must be equal. I think instead it should also check the data pointer because in the case of "" the data pointer is non-null. The same is true for a zero length slice i.e. s[0..0], it exists (data pointer is non-null) but is empty (length is zero). In short, the compare function should recognise the 3 states: * non existant (data pointer is null) * empty (data pointer is non-null, length is zero) * not empty (length is > zero) and never make the mistake of calling an array in one state equal to an array in another state. Regan p.s. I am cross-posting and setting followup to digitalmars.D as it has become more of a theory/discussion on D than a learning exercise :) p.p.s Plus, I figure if Manfred cannot recall a discussion on this topic we probably need another one about now.
Jul 25 2007
Regan Heath wrote:Max Samukha wrote:As Max said, you forgot to update some writeflns. The output of the corrected version is: === 0000, 0 s is null s == null s <= null s >= null 805BEF0, 0 s == null s <= null s >= null === Seems perfectly consistent to me. Anything with an equality comparison (==, <=, >=) is true in both cases, and 'is' is only true when the pointer as well as the length is equal.Using '== null' and 'is null' with strings gives odd results (DMD 1.019): void main() { char[] s; if (s is null) writefln("s is null"); if (s == null) writefln("s == null"); } Output: s is null s == null ---- void main() { char[] s = ""; if (s is null) writefln("s is null"); if (s == null) writefln("s == null"); } Output: s == null ---- Can anybody explain why s == null is true in the second example?Not I, it's inconsistent IMO and it gets worse: import std.stdio; void main() { foo(null); foo(""); } void foo(string s) { writefln(s.ptr, ", ", s.length); if (s is null) writefln("s is null"); if (s == null) writefln("s == null"); if (s < null) writefln("s < null"); if (s > null) writefln("s < null"); if (s <= null) writefln("s <= null"); if (s >= null) writefln("s < null"); writefln(""); } Output: 0000, 0 s is null s == null s <= null s < null 415080, 0 s == null s <= null s < null So, "" is < and == null!? and <=,== but not >=!?This all boils down to the empty vs null string debate where some people want to be able to distinguish between them and some see no point. I'm in the 'distinguishable' camp. I can see the merit. At the very least it should be consistent!They *are* distinguishable. That's why above code returns different results for the 'is' comparison... I for one am perfectly fine with "cast(char[]) null" meaning ".length == 0 && .ptr == null" and with comparisons of arrays using == and friends only inspecting the contents (not location) of the data. Now, about comparisons: array comparisons basically operate like this: --- int opEquals(T)(T[] u, T[] v) { // bah to int return type if (u.length != v.length) return false; for (size_t i = 0; i < u.length; i++) { if (u[i] != v[i]) return false; } return true; } int opCmp(T)(T[] u, T[] v) { size_t len = min(u.length, v.length) for (size_t i = 0; i < len; i++) { if (auto diff = u[i].opCmp(v[i])) { return diff; } } return cast(int)u.length - cast(int)v.length; } --- (Taken from object.TypeInfo_Array and converted to templates instead of void*s + casting + element TypeInfo.{equals/compare} for readability) Since both the null string and "" have .length == 0, that means they compare equal using those methods (having no contents to compare and equal length) This is all perfectly consistent (and even useful) to me...
Jul 25 2007
True. I guess what I meant to say was I'm in the '3 distict states' camp (which may be a camp of 1 for all I know). See my reply to digitalmars.D for a definition of the 3 states.I'm in the 'distinguishable' camp. I can see the merit. At the very least it should be consistent!They *are* distinguishable. That's why above code returns different results for the 'is' comparison...I for one am perfectly fine with "cast(char[]) null" meaning ".length == 0 && .ptr == null"Same here.and with comparisons of arrays using == and friends only inspecting the contents (not location) of the data.I don't think an empty string (non-null, length == 0) should compare equal to a non-existant string (null, length == 0). And vice-versa. The only thing that should compare equal to null is null. Likewise an empty array should only compare equal to another empty array. My reasoning for this is consistency, see at end. Aside: If the location and length are identical you can short-circuit the compare, returning true and ignoring the content, this could save a bit of time on comparisons of large arrays.Now, about comparisons: array comparisons basically operate like this: --- int opEquals(T)(T[] u, T[] v) { // bah to int return type if (u.length != v.length) return false; for (size_t i = 0; i < u.length; i++) { if (u[i] != v[i]) return false; } return true; } int opCmp(T)(T[] u, T[] v) { size_t len = min(u.length, v.length) for (size_t i = 0; i < len; i++) { if (auto diff = u[i].opCmp(v[i])) { return diff; } } return cast(int)u.length - cast(int)v.length; } --- (Taken from object.TypeInfo_Array and converted to templates instead of void*s + casting + element TypeInfo.{equals/compare} for readability)Thanks.Since both the null string and "" have .length == 0, that means they compare equal using those methods (having no contents to compare and equal length)This is the bit I don't like.This is all perfectly consistent (and even useful) to me...It's not consistent with other reference types, types which can represent 'non-existant', eg. char *p = null; //non-existant if (p == null) writefln("p == null"); if (p == "") writefln("p == \"\""); Output: p == null Compare that to: char[] p = null; if (p == null) writefln("p == null"); if (p == "") writefln("p == \"\""); Output: p == null p == "" All that I would like changed is for the compare, in the case of length == 0, to check the data pointers, eg.int opEquals(T)(T[] u, T[] v) { if (u.length != v.length) return false;if (u.length == 0) return (u.ptr == v.ptr);for (size_t i = 0; i < u.length; i++) { if (u[i] != v[i]) return false; } return true; }This should mean "" == "" but not "" == null, likewise null == null but not null == "". Regan
Jul 25 2007
Regan Heath wrote:Since null arrays have length 0, they *are* empty arrays :P.True. I guess what I meant to say was I'm in the '3 distict states' camp (which may be a camp of 1 for all I know). See my reply to digitalmars.D for a definition of the 3 states.I'm in the 'distinguishable' camp. I can see the merit. At the very least it should be consistent!They *are* distinguishable. That's why above code returns different results for the 'is' comparison...I for one am perfectly fine with "cast(char[]) null" meaning ".length == 0 && .ptr == null"Same here. > and with comparisons of arrays using == and friendsonly inspecting the contents (not location) of the data.I don't think an empty string (non-null, length == 0) should compare equal to a non-existant string (null, length == 0). And vice-versa. The only thing that should compare equal to null is null. Likewise an empty array should only compare equal to another empty array. My reasoning for this is consistency, see at end.Aside: If the location and length are identical you can short-circuit the compare, returning true and ignoring the content, this could save a bit of time on comparisons of large arrays.At least with that last paragraph I can agree ;) Now, about this:All that I would like changed is for the compare, in the case of length == 0, to check the data pointers, eg. > int opEquals(T)(T[] u, T[] v) { > if (u.length != v.length) return false; if (u.length == 0) return (u.ptr == v.ptr); > for (size_t i = 0; i < u.length; i++) { > if (u[i] != v[i]) return false; > } > return true; > } This should mean "" == "" but not "" == null, likewise null == null but not null == "".Let's look at this code: --- import std.stdio; void main() { char[][] strings = ["hello world!", "", null]; foreach (str; strings) { auto str2 = str.dup; if (str == str2) writefln(`"%s" == "%s" (%s, %s)`, str, str2, str.ptr, str2.ptr); else writefln(`"%s" != "%s" (%s, %s)`, str, str2, str.ptr, str2.ptr); } } --- The output is currently (on my machine): ===== "hello world!" == "hello world!" (805BE60, F7CFBFE0) "" == "" (805BE78, 0000) "" == "" (0000, 0000) ===== Your change would change the second line (even if it actually allocated a new empty string like you probably want instead of returning null). How would that be consistent in any way? (Same goes for other ways to create different-ptr empty strings) What you might have meant on that extra line might be more like: --- if (u.length == 0) return ((u.ptr is null) == (v.ptr is null)); --- which will return true if both .ptr values are null or both are non-null.
Jul 25 2007
I can't tell in which way you're joking so I'm just going to come out with... The length of something be it an array, a car, a <insert thing> is totally independant of whether it exists (though a non-existant item cannot have a length). It either exists or it does not. If it exists, it has a length which may or may not be zero. Something which exists cannot be equal to something which doesn't. Period.The only thing that should compare equal to null is null. Likewise an empty array should only compare equal to another empty array.> > My reasoning for this is consistency, see at end. Since null arrays have length 0, they *are* empty arrays :P.:)Aside: If the location and length are identical you can short-circuit the compare, returning true and ignoring the content, this could save a bit of time on comparisons of large arrays.At least with that last paragraph I can agree ;)Your change would change the second line (even if it actually allocated a new empty string like you probably want instead of returning null). How would that be consistent in any way?Oops, my bad. My suggested code change is totally incorrect. That'll teach me for posting while working on something else at the same time.(Same goes for other ways to create different-ptr empty strings) What you might have meant on that extra line might be more like: --- if (u.length == 0) return ((u.ptr is null) == (v.ptr is null)); --- which will return true if both .ptr values are null or both are non-null.Yes, and yes, I want "".dup to allocate a new 1 byte point at it and set length to 0. Regan
Jul 25 2007
Regan Heath wrote:I don't think that's really what's happening here. Consider vectors. If a vector has a length of zero, the direction doesn't exist. Take two arbitrary vectors with different directions, a and b. a*0 == b*0, even though the direction of a is completely different to that of b. This is the same model which is being used for arrays; if the .length is zero, the .ptr is irrelevant.I can't tell in which way you're joking so I'm just going to come out with... The length of something be it an array, a car, a <insert thing> is totally independant of whether it exists (though a non-existant item cannot have a length). It either exists or it does not. If it exists, it has a length which may or may not be zero. Something which exists cannot be equal to something which doesn't.The only thing that should compare equal to null is null. Likewise an empty array should only compare equal to another empty array.> > My reasoning for this is consistency, see at end. Since null arrays have length 0, they *are* empty arrays :P.
Jul 25 2007
On Wed, 25 Jul 2007 22:07:15 +0200, Don Clugston wrote:Regan Heath wrote:But arrays are not vectors. -- Derek Parnell Melbourne, Australia "Down with mediocrity!"I don't think that's really what's happening here. Consider vectors. If a vector has a length of zero, the direction doesn't exist. Take two arbitrary vectors with different directions, a and b. a*0 == b*0, even though the direction of a is completely different to that of b. This is the same model which is being used for arrays; if the .length is zero, the .ptr is irrelevant.I can't tell in which way you're joking so I'm just going to come out with... The length of something be it an array, a car, a <insert thing> is totally independant of whether it exists (though a non-existant item cannot have a length). It either exists or it does not. If it exists, it has a length which may or may not be zero. Something which exists cannot be equal to something which doesn't.The only thing that should compare equal to null is null. Likewise an empty array should only compare equal to another empty array.> > My reasoning for this is consistency, see at end. Since null arrays have length 0, they *are* empty arrays :P.
Jul 25 2007
Frits van Bommel escribió:Since null arrays have length 0, they *are* empty arrays :P.But empty arrays are not null. You could even argue that null arrays don't have a length, thus they can't be empty. -- Carlos Santander Bernal
Jul 25 2007
On Wed, 25 Jul 2007 19:01:57 +0200, Frits van Bommel wrote:Since null arrays have length 0, they *are* empty arrays :P.Not in my world. I see that null arrays have no length. That is to say, the do not have any length, which is different from saying they have a length and that length is zero.Your example is misleading for at least two reasons: ** The '==' operator compares the contents of the strings. A null string has no content so there is nothing to compare. This should fail but is doesn't in the current D. It should fail in the same manner that a null object reference fails the '==' operator. ** The output is 'writefln' attempt at given a string representation of the data presented. It (aka Walter) has decided that the string representation of a null array is an empty string. This does not mean that a null array is an empty strng but just that writefln represents it as such. -- Derek Parnell Melbourne, Australia "Down with mediocrity!"All that I would like changed is for the compare, in the case of length == 0, to check the data pointers, eg. > int opEquals(T)(T[] u, T[] v) { > if (u.length != v.length) return false; if (u.length == 0) return (u.ptr == v.ptr); > for (size_t i = 0; i < u.length; i++) { > if (u[i] != v[i]) return false; > } > return true; > } This should mean "" == "" but not "" == null, likewise null == null but not null == "".Let's look at this code: --- import std.stdio; void main() { char[][] strings = ["hello world!", "", null]; foreach (str; strings) { auto str2 = str.dup; if (str == str2) writefln(`"%s" == "%s" (%s, %s)`, str, str2, str.ptr, str2.ptr); else writefln(`"%s" != "%s" (%s, %s)`, str, str2, str.ptr, str2.ptr); } } --- The output is currently (on my machine): ===== "hello world!" == "hello world!" (805BE60, F7CFBFE0) "" == "" (805BE78, 0000) "" == "" (0000, 0000) ===== Your change would change the second line (even if it actually allocated a new empty string like you probably want instead of returning null). How would that be consistent in any way?
Jul 25 2007
Derek Parnell wrote:On Wed, 25 Jul 2007 19:01:57 +0200, Frits van Bommel wrote:But the fact of the matter is, 'T[] x = null;' reserves space for the .length and sets it to 0. If you have a suggestion for a different value to put there, by all means make it. Or would you prefer a segfault or diagnostic when accessing (cast(T[])null).length? That'd introduce overhead on every .length access (unless the compiler can statically determine whether an array reference is null).Since null arrays have length 0, they *are* empty arrays :P.Not in my world. I see that null arrays have no length. That is to say, the do not have any length, which is different from saying they have a length and that length is zero.This wasn't the point of the example. I could have left out the third element and change the .dup in the second line to a different empty string (f.e. a 0-length slice of the first one) and the point would remain the same: the proposed change would break comparison by '==' for empty non-null strings.Your example is misleading for at least two reasons: ** The '==' operator compares the contents of the strings. A null string has no content so there is nothing to compare. This should fail but is doesn't in the current D. It should fail in the same manner that a null object reference fails the '==' operator.All that I would like changed is for the compare, in the case of length == 0, to check the data pointers, eg. > int opEquals(T)(T[] u, T[] v) { > if (u.length != v.length) return false; if (u.length == 0) return (u.ptr == v.ptr); > for (size_t i = 0; i < u.length; i++) { > if (u[i] != v[i]) return false; > } > return true; > } This should mean "" == "" but not "" == null, likewise null == null but not null == "".Let's look at this code: --- import std.stdio; void main() { char[][] strings = ["hello world!", "", null]; foreach (str; strings) { auto str2 = str.dup; if (str == str2) writefln(`"%s" == "%s" (%s, %s)`, str, str2, str.ptr, str2.ptr); else writefln(`"%s" != "%s" (%s, %s)`, str, str2, str.ptr, str2.ptr); } } --- The output is currently (on my machine): ===== "hello world!" == "hello world!" (805BE60, F7CFBFE0) "" == "" (805BE78, 0000) "" == "" (0000, 0000) ===== Your change would change the second line (even if it actually allocated a new empty string like you probably want instead of returning null). How would that be consistent in any way?** The output is 'writefln' attempt at given a string representation of the data presented. It (aka Walter) has decided that the string representation of a null array is an empty string. This does not mean that a null array is an empty strng but just that writefln represents it as such.Like I said, the point of the example didn't actually have anything to do with null strings, but rather with a bug in a change Regan proposed to make null strings and non-null empty strings compare unequal, which resulted in non-null empty strings comparing unequal.
Jul 25 2007
On Thu, 26 Jul 2007 07:47:03 +0200, Frits van Bommel wrote:Derek Parnell wrote:I'm trying not to set in concrete the ABI of variable-length arrays. So even though the current D definition is that a VL array consists of a two-element struct and zero or one block of RAM, conceptually a null array doesn't point to anything and does not have a length. So to me it doesn't matter that D allocates space for .length and .ptr portions of the nullVL array, because it still should not use the .length value. But, because theoretically every RAM address possbiel could be stored in the .ptr portion, including zero, I conceed that in D the .ptr and .length both being zero is needed to indicate a null array, even though this disallows the conceptual empty array begining at address zero.On Wed, 25 Jul 2007 19:01:57 +0200, Frits van Bommel wrote:But the fact of the matter is, 'T[] x = null;' reserves space for the .length and sets it to 0. If you have a suggestion for a different value to put there, by all means make it.Since null arrays have length 0, they *are* empty arrays :P.Not in my world. I see that null arrays have no length. That is to say, the do not have any length, which is different from saying they have a length and that length is zero.Or would you prefer a segfault or diagnostic when accessing (cast(T[])null).length? That'd introduce overhead on every .length access (unless the compiler can statically determine whether an array reference is null).Yes I would. However, too many people are relying on this inconsistency so I'll live with that wart in the language.Sorry for misunderstanding.This wasn't the point of the example.Your example is misleading for at least two reasons: ** The '==' operator compares the contents of the strings. A null string has no content so there is nothing to compare. This should fail but is doesn't in the current D. It should fail in the same manner that a null object reference fails the '==' operator.All that I would like changed is for the compare, in the case of length == 0, to check the data pointers, eg. > int opEquals(T)(T[] u, T[] v) { > if (u.length != v.length) return false; if (u.length == 0) return (u.ptr == v.ptr); > for (size_t i = 0; i < u.length; i++) { > if (u[i] != v[i]) return false; > } > return true; > } This should mean "" == "" but not "" == null, likewise null == null but not null == "".Let's look at this code: --- import std.stdio; void main() { char[][] strings = ["hello world!", "", null]; foreach (str; strings) { auto str2 = str.dup; if (str == str2) writefln(`"%s" == "%s" (%s, %s)`, str, str2, str.ptr, str2.ptr); else writefln(`"%s" != "%s" (%s, %s)`, str, str2, str.ptr, str2.ptr); } } --- The output is currently (on my machine): ===== "hello world!" == "hello world!" (805BE60, F7CFBFE0) "" == "" (805BE78, 0000) "" == "" (0000, 0000) ===== Your change would change the second line (even if it actually allocated a new empty string like you probably want instead of returning null). How would that be consistent in any way?I could have left out the third element and change the .dup in the second line to a different empty string (f.e. a 0-length slice of the first one) and the point would remain the same: the proposed change would break comparison by '==' for empty non-null strings.I agree with you. Two empty non-null strings should compare as equal because the equality test is against the contents of the array and not the addresses of the array. A null array has no content so one has nothing to compare it with; this is why I think that it is an illegal/meaningless operation. -- Derek Parnell Melbourne, Australia "Down with mediocrity!"
Jul 25 2007
Manfred Nowak wrote:Frits van Bommel wroteUhu... Why whould a slice of the full addressable memory space be a good initialization value?But the fact of the matter is, 'T[] x = null;' reserves space for the .length and sets it to 0. If you have a suggestion for a different value to put there, by all means make it.Suggestion: After `T[] x= null;' `x.length == size_t.max' and `x.ptr == null', i.e. `size_t.max' will no more be a valid length for an array.This is a hack to avoid some overhead in some places, but may introduce more overhead in other places.This entire discussion is trying to make todays T[] -- a slice type with value semantics and some provisions for making it behave as an array in some cases -- into a pure array type with a well defined null. You can't do that without breaking its slice semantics. A much better suggestion is Walter's T[new]. Make T[] remain the slice type it is today and make a distinct array type (preferably a by-reference type).Note: after `T[] x= null;' `x' holds an untyped array and so `y= x;' should be a legal assignment for every `y' declared as `U[] y;' for some type `U'---duck and run.So you are proposing adding runtime type errors? :P -- Oskar
Jul 25 2007
On Thu, 26 Jul 2007 08:37:13 +0200, Oskar Linde wrote:Manfred Nowak wrote:Maybe x.ptr = size_t.max and x.length = size_t.max might be useful representation of a null array as it is an illegal RAM reference otherwise. But I know, its too late now and probably too expensive at run-time to implement.Frits van Bommel wroteUhu... Why whould a slice of the full addressable memory space be a good initialization value?But the fact of the matter is, 'T[] x = null;' reserves space for the .length and sets it to 0. If you have a suggestion for a different value to put there, by all means make it.Suggestion: After `T[] x= null;' `x.length == size_t.max' and `x.ptr == null', i.e. `size_t.max' will no more be a valid length for an array.You may very well be correct. -- Derek Parnell Melbourne, Australia "Down with mediocrity!"This is a hack to avoid some overhead in some places, but may introduce more overhead in other places.This entire discussion is trying to make todays T[] -- a slice type with value semantics and some provisions for making it behave as an array in some cases -- into a pure array type with a well defined null. You can't do that without breaking its slice semantics. A much better suggestion is Walter's T[new]. Make T[] remain the slice type it is today and make a distinct array type (preferably a by-reference type).
Jul 25 2007
Oskar Linde wrote:Manfred Nowak wrote:It's not the *full* addressable memory space for 1-byte types (the last byte of the address space has an address equal to .ptr(0) + .length(size_t.max), which isn't a member of the array) and it's more than the address space for bigger types (though I guess it does indeed cover the entire address space, possibly several times over, due to wraparound on overflow...). </pedantic>Frits van Bommel wroteUhu... Why whould a slice of the full addressable memory space be a good initialization value?But the fact of the matter is, 'T[] x = null;' reserves space for the .length and sets it to 0. If you have a suggestion for a different value to put there, by all means make it.Suggestion: After `T[] x= null;' `x.length == size_t.max' and `x.ptr == null', i.e. `size_t.max' will no more be a valid length for an array.
Jul 26 2007
Oskar Linde wrote:Manfred Nowak wrote:Today's T[] is "a slice type with value semantics and some provisions for making it behave as an array in some cases"? Whoa. What do you mean "making it behave as an array in some cases" ? What's the difference between a slice type and an array? And why would having null arrays in D break its slice semantics? -- Bruno Medeiros - MSc in CS/E student http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#DThis is a hack to avoid some overhead in some places, but may introduce more overhead in other places.This entire discussion is trying to make todays T[] -- a slice type with value semantics and some provisions for making it behave as an array in some cases -- into a pure array type with a well defined null. You can't do that without breaking its slice semantics. A much better suggestion is Walter's T[new]. Make T[] remain the slice type it is today and make a distinct array type (preferably a by-reference type).
Jul 26 2007
Frits van Bommel wrote:Or would you prefer a segfault or diagnostic when accessing (cast(T[])null).length?No, definately not. This is one of the things I love about arrays, they're both value and reference type. It takes a while to get your head round (if the many discussions on these forums are any indication) but once you have it worked out it's quite powerful. In fact it's the reason slicing can work the way it does. Further, for those cases where we do not care to differentiate between null and "" checking length == 0 is the perfect solution. I'm not interested in an array implementation which is 'pure' in any academic sense but rather one which is consistent in that null arrays do not become empty and vice-versa under any conditions (other than explicitly assigning those values). For example: In the past setting length to 0 would free the data pointer. The result of which was that a zero length (empty) array became a non-existant (null) array. And the problem we have now is that calling .dup on an empty array results in a null array. It is cases like these which I was to remove. The other thing I want is for == to tell me that null and "" are not the same. I suspect very little existing code is relying on the existing behaviour as it will likely be checking length as opposed to comparing to "" or null (note; comparing with == not checking identity with "is"). Regan
Jul 26 2007
Derek Parnell wrote:On Wed, 25 Jul 2007 19:01:57 +0200, Frits van Bommel wrote:But that is not how T[] behaves in D. T[]s are of a dual slice/array nature with semantics closer to a slice than an array. That is something Walter's T[new] suggestion has a potential to remedy. There is no difference between a "null" array and a slice starting at memory location null, 0 elements long. In my opinion, it would be quite strange for zero length slices to behave any differently if the starting position happens to be null. There is a very easy way to get the behavior you want BTW: class Array(T) { ... } :)Since null arrays have length 0, they *are* empty arrays :P.Not in my world. I see that null arrays have no length. That is to say, the do not have any length, which is different from saying they have a length and that length is zero.This would mean that "two arrays are equal if all elements are equal" would no longer hold. (Consider two zero length slices at arbitrary memory location, neither of them null). -- OskarAll that I would like changed is for the compare, in the case of length == 0, to check the data pointers, eg. > int opEquals(T)(T[] u, T[] v) { > if (u.length != v.length) return false; if (u.length == 0) return (u.ptr == v.ptr); > for (size_t i = 0; i < u.length; i++) { > if (u[i] != v[i]) return false; > } > return true; > } This should mean "" == "" but not "" == null, likewise null == null but not null == "".
Jul 25 2007
Oskar Linde wrote:Not true, the two arrays you mention below would still compare 'true' as their contents are still equal. Ignore the suggested code changes, my one was patently incorrect and the first step is to make it clear what behaviour is desired, something I have obviously not done.This would mean that "two arrays are equal if all elements are equal" would no longer hold.This should mean "" == "" but not "" == null, likewise null == null but not null == "".(Consider two zero length slices at arbitrary memory location, neither of them null).The content of these arrays is equal and would compare so. The case(s) I want to stop comparing as equal are: null == "" "" == null The cases which should continue to compare equal are: null == null "" == "" (your example above) No more, no less. Regan p.s. I know I said ignore the suggested code changes but it would have to go something like: if (lhs.length == 0) { if (lhs.ptr && rhs.ptr) return true; //"" == "" if (lhs.ptr || rhs.ptr) return false //"" == null && null == "" return true; //null == null }
Jul 26 2007
On Wed, 25 Jul 2007 14:29:47 +0100, Regan Heath wrote:Aside: If the location and length are identical you can short-circuit the compare, returning true and ignoring the content, this could save a bit of time on comparisons of large arrays.I don't think this is such a good idea. How does one address the array of four bytes at RAM location 4? -- Derek Parnell Melbourne, Australia "Down with mediocrity!"
Jul 25 2007
Derek Parnell wrote:On Wed, 25 Jul 2007 14:29:47 +0100, Regan Heath wrote:I'm pretty sure the only way to obtain such an array would be to have already invoked Undefined Behavior (assuming 4 is an invalid memory location on the platform the program's running on) and as such it doesn't really matter whether or not two array references to it compare equal or not...Aside: If the location and length are identical you can short-circuit the compare, returning true and ignoring the content, this could save a bit of time on comparisons of large arrays.I don't think this is such a good idea. How does one address the array of four bytes at RAM location 4?
Jul 25 2007
On Thu, 26 Jul 2007 07:50:07 +0200, Frits van Bommel wrote:Derek Parnell wrote:There is no basis for assuming that any RAM location is not addressable. I know that some operating systems prevent unprivileged programs from accessing certain locations, and that some RAM is hardware-mapped to I/O ports, but in theory, D as a system language should be able to address any RAM location. For example, if D had been implemented for the Amiga system, access to RAM address 4 is vital. As that location contained the 32-bit address of the list that contains all addresses of the loaded shared libraries. And every program needed to access that location. -- Derek Parnell Melbourne, Australia "Down with mediocrity!"On Wed, 25 Jul 2007 14:29:47 +0100, Regan Heath wrote:I'm pretty sure the only way to obtain such an array would be to have already invoked Undefined Behavior (assuming 4 is an invalid memory location on the platform the program's running on) and as such it doesn't really matter whether or not two array references to it compare equal or not...Aside: If the location and length are identical you can short-circuit the compare, returning true and ignoring the content, this could save a bit of time on comparisons of large arrays.I don't think this is such a good idea. How does one address the array of four bytes at RAM location 4?
Jul 25 2007
Derek Parnell wrote:On Thu, 26 Jul 2007 07:50:07 +0200, Frits van Bommel wrote:I'm sorry, but what would then be the problem with accessing (cast(byte)4)[0..4] if it's a valid memory location? I thought your question implied it was an invalid memory location, though I'm very aware that's not always the case (which was why I had the parenthesized sentence in there). By the way, null is a valid address on x86 too, but most operating systems don't map the first page to any memory to generate pagefaults for null pointer dereferences (and IIRC Linux treats the last page similarly, for null pointers with negative indices). IIRC DOS didn't (and probably couldn't on machines of the time), do this; the interrupt table was located there (which would seem to be a pretty bad idea for a system without memory protection -- a null pointer write could potentially crash the entire system...). Also, there's no particular reason null has to be cast(whatever)0, that just happens to be a convenient easily-checked-for value...Derek Parnell wrote:There is no basis for assuming that any RAM location is not addressable. I know that some operating systems prevent unprivileged programs from accessing certain locations, and that some RAM is hardware-mapped to I/O ports, but in theory, D as a system language should be able to address any RAM location. For example, if D had been implemented for the Amiga system, access to RAM address 4 is vital. As that location contained the 32-bit address of the list that contains all addresses of the loaded shared libraries. And every program needed to access that location.On Wed, 25 Jul 2007 14:29:47 +0100, Regan Heath wrote:I'm pretty sure the only way to obtain such an array would be to have already invoked Undefined Behavior (assuming 4 is an invalid memory location on the platform the program's running on) and as such it doesn't really matter whether or not two array references to it compare equal or not...Aside: If the location and length are identical you can short-circuit the compare, returning true and ignoring the content, this could save a bit of time on comparisons of large arrays.I don't think this is such a good idea. How does one address the array of four bytes at RAM location 4?
Jul 26 2007
On Thu, 26 Jul 2007 09:28:16 +0200, Frits van Bommel wrote:Derek Parnell wrote:On Thu, 26 Jul 2007 07:50:07 +0200, Frits van Bommel wrote:Derek Parnell wrote:On Wed, 25 Jul 2007 14:29:47 +0100, Regan Heath wrote:Aside: If the location and length are identical you can short-circuit the compare, returning true and ignoring the content, this could save a bit of time on comparisons of large arrays.I'm sorry, but what would then be the problem with accessing (cast(byte)4)[0..4] if it's a valid memory location?Duh! I am so stupid! I misread Regan's original post. When he said "If the location and length are identical" I incorrectly read that as "if an array's location and length are identical" and not "if the locations and lengths of two arrays are identical". Sorry (as he sulks off hoping no one notices) ... -- Derek Parnell Melbourne, Australia "Down with mediocrity!"
Jul 26 2007
Derek Parnell wrote:On Wed, 25 Jul 2007 14:29:47 +0100, Regan Heath wrote:What I meant was: if (lhs.length == rhs.length && lhs.ptr == rhs.ptr) return true; Not: if (lhs.length == lhs.ptr) return true; ;) ReganAside: If the location and length are identical you can short-circuit the compare, returning true and ignoring the content, this could save a bit of time on comparisons of large arrays.I don't think this is such a good idea. How does one address the array of four bytes at RAM location 4?
Jul 26 2007
Frits van Bommel wrote:Regan Heath wrote:The .ptr of empty arrays may be different than the .ptr of null arrays, but they are conceptually the same, and thus not safely distinguishable. Example: writefln("" is null); // false writefln("".dup is null); // true "".ptr is not null, but "".dup.ptr is null. Such duplication is correct, because empty arrays are conceptually the same as null arrays, and trying to use .ptr do distinguish them is unsafe, implementation-depedendent behavior (aka a program error). -- Bruno Medeiros - MSc in CS/E student http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#DThis all boils down to the empty vs null string debate where some people want to be able to distinguish between them and some see no point. I'm in the 'distinguishable' camp. I can see the merit. At the very least it should be consistent!They *are* distinguishable. That's why above code returns different results for the 'is' comparison...
Jul 25 2007
Bruno Medeiros wrote:Frits van Bommel wrote:Ick. IMO "".dup should allocate 1 byte of memory, set it to '\0' and create a reference to it with length of 0. What do you mean by "empty arrays are conceptually the same as null arrays"? To me null arrays (non-existant) and "" arrays (empty) are conceptually different. null indicates the array does not exist (no set at all), "" indicates it does but contains no items (an empty set). ReganRegan Heath wrote:The .ptr of empty arrays may be different than the .ptr of null arrays, but they are conceptually the same, and thus not safely distinguishable. Example: writefln("" is null); // false writefln("".dup is null); // true "".ptr is not null, but "".dup.ptr is null. Such duplication is correct, because empty arrays are conceptually the same as null arrays, and trying to use .ptr do distinguish them is unsafe, implementation-depedendent behavior (aka a program error).This all boils down to the empty vs null string debate where some people want to be able to distinguish between them and some see no point. I'm in the 'distinguishable' camp. I can see the merit. At the very least it should be consistent!They *are* distinguishable. That's why above code returns different results for the 'is' comparison...
Jul 25 2007
Regan Heath wrote:Bruno Medeiros wrote:I meant that in current D they are semantically the same. (I should have used those words)Frits van Bommel wrote:Ick. IMO "".dup should allocate 1 byte of memory, set it to '\0' and create a reference to it with length of 0. What do you mean by "empty arrays are conceptually the same as null arrays"?Regan Heath wrote:The .ptr of empty arrays may be different than the .ptr of null arrays, but they are conceptually the same, and thus not safely distinguishable. Example: writefln("" is null); // false writefln("".dup is null); // true "".ptr is not null, but "".dup.ptr is null. Such duplication is correct, because empty arrays are conceptually the same as null arrays, and trying to use .ptr do distinguish them is unsafe, implementation-depedendent behavior (aka a program error).This all boils down to the empty vs null string debate where some people want to be able to distinguish between them and some see no point. I'm in the 'distinguishable' camp. I can see the merit. At the very least it should be consistent!They *are* distinguishable. That's why above code returns different results for the 'is' comparison...To me null arrays (non-existant) and "" arrays (empty) are conceptually different. null indicates the array does not exist (no set at all), "" indicates it does but contains no items (an empty set). ReganI know, and I agree, don't you recall the V2 string discussion: http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=55388 -- Bruno Medeiros - MSc in CS/E student http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D
Jul 25 2007
Bruno Medeiros wrote:Regan Heath wrote::)What do you mean by "empty arrays are conceptually the same as null arrays"?I meant that in current D they are semantically the same. (I should have used those words)Yes, I remember it. I just forgot who was involved and what their opinions were. I have a hard enough time keeping track of my own opinion let alone others. ReganTo me null arrays (non-existant) and "" arrays (empty) are conceptually different. null indicates the array does not exist (no set at all), "" indicates it does but contains no items (an empty set). ReganI know, and I agree, don't you recall the V2 string discussion: http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmar .D&article_id=55388
Jul 25 2007
On Wed, 25 Jul 2007 14:31:28 +0100, Bruno Medeiros wrote:The .ptr of empty arrays may be different than the .ptr of null arrays, but they are conceptually the same, and thus not safely distinguishable.No they are not! Conceptually they are different things. However, D sometimes implements them as the same thing.Example: writefln("" is null); // false writefln("".dup is null); // true "".ptr is not null, but "".dup.ptr is null. Such duplication is correct, because empty arrays are conceptually the same as null arrays, and trying to use .ptr do distinguish them is unsafe, implementation-depedendent behavior (aka a program error).But I believe that the implementation here is wrong. "".dup should create another empty string and not a null string. -- Derek Parnell Melbourne, Australia "Down with mediocrity!"
Jul 25 2007
Derek Parnell wrote:On Wed, 25 Jul 2007 14:31:28 +0100, Bruno Medeiros wrote:Check my reply to Regan just above, what I meant to say is that in current D they are semantically the same.The .ptr of empty arrays may be different than the .ptr of null arrays, but they are conceptually the same, and thus not safely distinguishable.No they are not! Conceptually they are different things. However, D sometimes implements them as the same thing.The implementation is not wrong, it is according to Walter's intention, as you know. If anything, it is Walter's intention that is wrong. ^^' -- Bruno Medeiros - MSc in CS/E student http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#DExample: writefln("" is null); // false writefln("".dup is null); // true "".ptr is not null, but "".dup.ptr is null. Such duplication is correct, because empty arrays are conceptually the same as null arrays, and trying to use .ptr do distinguish them is unsafe, implementation-depedendent behavior (aka a program error).But I believe that the implementation here is wrong. "".dup should create another empty string and not a null string.
Jul 26 2007
On Wed, 25 Jul 2007 15:05:25 +0200, Frits van Bommel wrote:Since both the null string and "" have .length == 0, that means they compare equal using those methods (having no contents to compare and equal length) This is all perfectly consistent (and even useful) to me...However, string x = ""; means that 'x' is not null because it has a pointer and that points a string with no content. Something that is null has no pointer and therefore the length component is not significant. But of course, in order to represent something that really does have the address of zero we should only consider 'x' to be null when both x.ptr and x.length are both zero. In every other case it is not null. -- Derek Parnell Melbourne, Australia "Down with mediocrity!"
Jul 25 2007