digitalmars.D.learn - Checking if a string is null

Max Samukha (22/22) Jul 24 2007 Using '== null' and 'is null' with strings gives odd results (DMD

Hoenir (3/35) Jul 24 2007 Makes sense to me. is compares the pointer and == the content or

Max Samukha (5/40) Jul 25 2007 Then, it's unclear what null content means. If it is the same as empty

Regan Heath (36/67) Jul 25 2007 Not I, it's inconsistent IMO and it gets worse:

Regan Heath (14/24) Jul 25 2007 There have been several, I did a brief search and came up with:
Max Samukha (7/74) Jul 25 2007 You didn't update all writefln's :)

Ald (2/2) Jul 25 2007 I believe the manual says that, when comparing, the compiler tries to ca...
Regan Heath (49/58) Jul 25 2007 Not that I can find. The array page does say:

Frits van Bommel (47/121) Jul 25 2007 As Max said, you forgot to update some writeflns. The output of the

Regan Heath (34/76) Jul 25 2007 True. I guess what I meant to say was I'm in the '3 distict states'

Frits van Bommel (36/77) Jul 25 2007 At least with that last paragraph I can agree ;)

Regan Heath (15/36) Jul 25 2007 I can't tell in which way you're joking so I'm just going to come out

Don Clugston (7/25) Jul 25 2007 I don't think that's really what's happening here.

Derek Parnell (6/32) Jul 25 2007 But arrays are not vectors.

Carlos Santander (5/8) Jul 25 2007 But empty arrays are not null. You could even argue that null arrays don...
Derek Parnell (18/62) Jul 25 2007 Not in my world. I see that null arrays have no length. That is to say, ...

Frits van Bommel (17/76) Jul 25 2007 But the fact of the matter is, 'T[] x = null;' reserves space for the

Derek Parnell (23/94) Jul 25 2007 I'm trying not to set in concrete the ABI of variable-length arrays. So
Oskar Linde (12/26) Jul 25 2007 Uhu... Why whould a slice of the full addressable memory space be a good...

Derek Parnell (10/32) Jul 25 2007 Maybe x.ptr = size_t.max and x.length = size_t.max might be useful
Frits van Bommel (8/21) Jul 26 2007 It's not the *full* addressable memory space for 1-byte types (the last
Bruno Medeiros (9/21) Jul 26 2007 Today's T[] is "a slice type with value semantics and some provisions

Regan Heath (25/27) Jul 26 2007 No, definately not. This is one of the things I love about arrays,

Oskar Linde (15/36) Jul 25 2007 But that is not how T[] behaves in D. T[]s are of a dual slice/array

Regan Heath (22/29) Jul 26 2007 Not true, the two arrays you mention below would still compare 'true' as...

Derek Parnell (8/11) Jul 25 2007 I don't think this is such a good idea. How does one address the array o...

Frits van Bommel (6/14) Jul 25 2007 I'm pretty sure the only way to obtain such an array would be to have

Derek Parnell (14/29) Jul 25 2007 There is no basis for assuming that any RAM location is not addressable....

Frits van Bommel (16/42) Jul 26 2007 I'm sorry, but what would then be the problem with accessing

Derek Parnell (10/21) Jul 26 2007 Duh! I am so stupid! I misread Regan's original post. When he said "I...

Regan Heath (7/15) Jul 26 2007 What I meant was:

Bruno Medeiros (13/23) Jul 25 2007 The .ptr of empty arrays may be different than the .ptr of null arrays,

Regan Heath (9/32) Jul 25 2007 Ick. IMO "".dup should allocate 1 byte of memory, set it to '\0' and

Bruno Medeiros (8/45) Jul 25 2007 I meant that in current D they are semantically the same. (I should have...

Regan Heath (6/21) Jul 25 2007 Yes, I remember it. I just forgot who was involved and what their

Derek Parnell (9/18) Jul 25 2007 No they are not! Conceptually they are different things. However, D

Bruno Medeiros (8/28) Jul 26 2007 Check my reply to Regan just above, what I meant to say is that in

Derek Parnell (13/18) Jul 25 2007 However,

Max Samukha <samukha voliacable.com.removethis> writes:

Using '== null' and 'is null' with strings gives odd results (DMD
1.019):

void main()
{
	char[] s;

	if (s is null) writefln("s is null");
	if (s == null) writefln("s == null");		
}

Output:
s is null
s == null

----

void main()
{
	char[] s = "";

	if (s is null) writefln("s is null");
	if (s == null) writefln("s == null");		
}

Output:
s == null

----

Can anybody explain why s == null is true in the second example?

Jul 24 2007

Hoenir <mrmocool gmx.de> writes:

Max Samukha schrieb:
 Using '== null' and 'is null' with strings gives odd results (DMD
 1.019):
 
 void main()
 {
 	char[] s;
 
 	if (s is null) writefln("s is null");
 	if (s == null) writefln("s == null");		
 }
 
 Output:
 s is null
 s == null
 
 ----
 
 void main()
 {
 	char[] s = "";
 
 	if (s is null) writefln("s is null");
 	if (s == null) writefln("s == null");		
 }
 
 Output:
 s == null
 
 ----
 
 Can anybody explain why s == null is true in the second example?
 

Makes sense to me. is compares the pointer and == the content or 
something like that.

Jul 24 2007

Max Samukha <samukha voliacable.com.removethis> writes:

On Wed, 25 Jul 2007 08:32:52 +0200, Hoenir <mrmocool gmx.de> wrote:

Max Samukha schrieb:
 Using '== null' and 'is null' with strings gives odd results (DMD
 1.019):
 
 void main()
 {
 	char[] s;
 
 	if (s is null) writefln("s is null");
 	if (s == null) writefln("s == null");		
 }
 
 Output:
 s is null
 s == null
 
 ----
 
 void main()
 {
 	char[] s = "";
 
 	if (s is null) writefln("s is null");
 	if (s == null) writefln("s == null");		
 }
 
 Output:
 s == null
 
 ----
 
 Can anybody explain why s == null is true in the second example?
 

Makes sense to me. is compares the pointer and == the content or 
something like that.

Then, it's unclear what null content means. If it is the same as empty
string (ptr != null and length == 0), I remain confused. If it means a
null string (ptr == null and length == 0), the second example should
output nothing since s.ptr != null.

Jul 25 2007

Regan Heath <regan netmail.co.nz> writes:

Max Samukha wrote:
 Using '== null' and 'is null' with strings gives odd results (DMD
 1.019):
 
 void main()
 {
 	char[] s;
 
 	if (s is null) writefln("s is null");
 	if (s == null) writefln("s == null");		
 }
 
 Output:
 s is null
 s == null
 
 ----
 
 void main()
 {
 	char[] s = "";
 
 	if (s is null) writefln("s is null");
 	if (s == null) writefln("s == null");		
 }
 
 Output:
 s == null
 
 ----
 
 Can anybody explain why s == null is true in the second example?

Not I, it's inconsistent IMO and it gets worse:

import std.stdio;

void main()
{
	foo(null);
	foo("");	
}

void foo(string s)
{
	writefln(s.ptr, ", ", s.length);
	if (s is null) writefln("s is null");
	if (s == null) writefln("s == null");
	if (s < null)  writefln("s <  null");
	if (s > null)  writefln("s <  null");
	if (s <= null) writefln("s <= null");
	if (s >= null) writefln("s <  null");
	writefln("");
}

Output:
0000, 0
s is null
s == null
s <= null
s <  null

415080, 0
s == null
s <= null
s <  null

So, "" is < and == null!?
and <=,== but not >=!?


This all boils down to the empty vs null string debate where some people 
want to be able to distinguish between them and some see no point.

I'm in the 'distinguishable' camp.  I can see the merit.  At the very 
least it should be consistent!

Regan

Jul 25 2007

Regan Heath <regan netmail.co.nz> writes:

Manfred Nowak wrote:
 Regan Heath wrote
 
 This all boils down to the empty vs null string debate where some
 people want to be able to distinguish between them and some see no
 point. 

 
 I haven't seen such a debate.

There have been several, I did a brief search and came up with:

http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=55270
(this one was my fault)

http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=25804
http://www.digitalmars.com/d/archives/digitalmars/D/learn/3521.html
http://www.digitalmars.com/d/archives/21782.html
http://www.digitalmars.com/d/archives/digitalmars/D/27123.html
http://www.digitalmars.com/d/archives/16905.html
http://www.digitalmars.com/d/archives/digitalmars/D/bugs/Issue_1314_New_Dupping_an_empty_array_creates_a_null_array_11585.html
http://www.digitalmars.com/pnews/read.php?server=news.digitalmars.com&group=D&artnum=17083

Some of those go back a long, long way.

 Does it mean that it is not possible to implement a Kleene Algebra for 
 strings in D because there is no neutral element for the alternative 
 operator?

I have no idea. :)

Regan

Jul 25 2007

Max Samukha <samukha voliacable.com.removethis> writes:

On Wed, 25 Jul 2007 11:12:19 +0100, Regan Heath <regan netmail.co.nz>
wrote:

Max Samukha wrote:
 Using '== null' and 'is null' with strings gives odd results (DMD
 1.019):
 
 void main()
 {
 	char[] s;
 
 	if (s is null) writefln("s is null");
 	if (s == null) writefln("s == null");		
 }
 
 Output:
 s is null
 s == null
 
 ----
 
 void main()
 {
 	char[] s = "";
 
 	if (s is null) writefln("s is null");
 	if (s == null) writefln("s == null");		
 }
 
 Output:
 s == null
 
 ----
 
 Can anybody explain why s == null is true in the second example?

Not I, it's inconsistent IMO and it gets worse:

import std.stdio;

void main()
{
	foo(null);
	foo("");	
}

void foo(string s)
{
	writefln(s.ptr, ", ", s.length);
	if (s is null) writefln("s is null");
	if (s == null) writefln("s == null");
	if (s < null)  writefln("s <  null");
	if (s > null)  writefln("s <  null");
	if (s <= null) writefln("s <= null");
	if (s >= null) writefln("s <  null");
	writefln("");
}

Output:
0000, 0
s is null
s == null
s <= null
s <  null

415080, 0
s == null
s <= null
s <  null

So, "" is < and == null!?
and <=,== but not >=!?

You didn't update all writefln's :)

This all boils down to the empty vs null string debate where some people 
want to be able to distinguish between them and some see no point.

I'm in the 'distinguishable' camp.  I can see the merit.  At the very 
least it should be consistent!

Regan

Anyway, it feels like an undefined area in the language. Do the specs
say anything about how exactly arrays/strings/delegates should compare
to null? It seems to be more than comparing the pointer part of the
structs.

Jul 25 2007

Ald <aldarri_s yahoo.com> writes:

I believe the manual says that, when comparing, the compiler tries to call the
opEquals() method.  And calling that from null pointer yields undefined
behavior.  You should use _!is null_ construct instead.

Max Samukha Wrote:

Jul 25 2007

Regan Heath <regan netmail.co.nz> writes:

 So, "" is < and == null!?
 and <=,== but not >=!?

 
 You didn't update all writefln's :)

<hangs head in shame> What can I say, I'm having a bad morning.

 Anyway, it feels like an undefined area in the language. Do the specs
 say anything about how exactly arrays/strings/delegates should compare
 to null? It seems to be more than comparing the pointer part of the
 structs.

Not that I can find.  The array page does say:

"Strings can be copied, compared, concatenated, and appended:"
..
"with the obvious semantics."

but not much more on the topic.  Under "Array Initialization" we see:

     * Pointers are initialized to null.
     ..
     * Dynamic arrays are initialized to having 0 elements.
     ..

Which does not state that an array will be initialised to "null" but 
rather to something with 0 elements.

To my mind something with 0 elements is 'empty' as opposed to being 'non 
existant' which is typically represented by 'null' or a similar value 
(like NAN for floats, 0xFF for char, etc).

So, it seems the spec is hinting/saying that arrays cannot be 
non-existant, only empty (or not empty).

And yet in the current implementation there is clearly a difference 
between 'null' and "" when it comes to arrays.

I'm still firmly in favour of there being 3 distinct states for an array:
  * non existant (null)
  * empty        ("", length == 0)
  * not empty    (length > 0)

That said I'm all firmly in favour of not getting a seg-fault when I 
have a reference to a non-existant array (we currently have this 
behaviour and it's perfect).

All I think that needs 'fixing', and going back to your initial test case:

char[] s = "";

if (s is null) writefln("s is null");
if (s == null) writefln("s == null");		

neither of these tests should evaluate 'true'.

The fact that the latter does indicates to me that the array compare is 
first comparing length, seeing they're both 0 and assuming the arrays 
must be equal.

I think instead it should also check the data pointer because in the 
case of "" the data pointer is non-null.  The same is true for a zero 
length slice i.e. s[0..0], it exists (data pointer is non-null) but is 
empty (length is zero).

In short, the compare function should recognise the 3 states:
  * non existant (data pointer is null)
  * empty        (data pointer is non-null, length is zero)
  * not empty    (length is > zero)

and never make the mistake of calling an array in one state equal to an 
array in another state.

Regan

p.s. I am cross-posting and setting followup to digitalmars.D as it has 
become more of a theory/discussion on D than a learning exercise :)

p.p.s Plus, I figure if Manfred cannot recall a discussion on this topic 
we probably need another one about now.

Jul 25 2007

Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:

Regan Heath wrote:
 Max Samukha wrote:
 Using '== null' and 'is null' with strings gives odd results (DMD
 1.019):

 void main()
 {
     char[] s;

     if (s is null) writefln("s is null");
     if (s == null) writefln("s == null");       
 }

 Output:
 s is null
 s == null

 ----

 void main()
 {
     char[] s = "";

     if (s is null) writefln("s is null");
     if (s == null) writefln("s == null");       
 }

 Output:
 s == null

 ----

 Can anybody explain why s == null is true in the second example?

 
 Not I, it's inconsistent IMO and it gets worse:
 
 import std.stdio;
 
 void main()
 {
     foo(null);
     foo("");   
 }
 
 void foo(string s)
 {
     writefln(s.ptr, ", ", s.length);
     if (s is null) writefln("s is null");
     if (s == null) writefln("s == null");
     if (s < null)  writefln("s <  null");
     if (s > null)  writefln("s <  null");
     if (s <= null) writefln("s <= null");
     if (s >= null) writefln("s <  null");
     writefln("");
 }
 
 Output:
 0000, 0
 s is null
 s == null
 s <= null
 s <  null
 
 415080, 0
 s == null
 s <= null
 s <  null
 
 So, "" is < and == null!?
 and <=,== but not >=!?

As Max said, you forgot to update some writeflns. The output of the 
corrected version is:
===
0000, 0
s is null
s == null
s <= null
s >= null

805BEF0, 0
s == null
s <= null
s >= null
===

Seems perfectly consistent to me. Anything with an equality comparison 
(==, <=, >=) is true in both cases, and 'is' is only true when the 
pointer as well as the length is equal.

 This all boils down to the empty vs null string debate where some people 
 want to be able to distinguish between them and some see no point.
 
 I'm in the 'distinguishable' camp.  I can see the merit.  At the very 
 least it should be consistent!

They *are* distinguishable. That's why above code returns different 
results for the 'is' comparison...

I for one am perfectly fine with "cast(char[]) null" meaning ".length == 
0 && .ptr == null" and with comparisons of arrays using == and friends 
only inspecting the contents (not location) of the data.

Now, about comparisons: array comparisons basically operate like this:
---
int opEquals(T)(T[] u, T[] v) {              // bah to int return type
     if (u.length != v.length) return false;
     for (size_t i = 0; i < u.length; i++) {
         if (u[i] != v[i]) return false;
     }
     return true;
}

int opCmp(T)(T[] u, T[] v) {
     size_t len = min(u.length, v.length)
     for (size_t i = 0; i < len; i++) {
         if (auto diff = u[i].opCmp(v[i])) {
             return diff;
         }
     }
     return cast(int)u.length - cast(int)v.length;
}
---
(Taken from object.TypeInfo_Array and converted to templates instead of 
void*s + casting + element TypeInfo.{equals/compare} for readability)

Since both the null string and "" have .length == 0, that means they 
compare equal using those methods (having no contents to compare and 
equal length)

This is all perfectly consistent (and even useful) to me...

Jul 25 2007

Regan Heath <regan netmail.co.nz> writes:

 I'm in the 'distinguishable' camp.  I can see the merit.  At the very 
 least it should be consistent!

 
 They *are* distinguishable. That's why above code returns different 
 results for the 'is' comparison...

True.  I guess what I meant to say was I'm in the '3 distict states' 
camp (which may be a camp of 1 for all I know).  See my reply to 
digitalmars.D for a definition of the 3 states.

 I for one am perfectly fine with "cast(char[]) null" meaning ".length == 
 0 && .ptr == null" 

Same here.

 and with comparisons of arrays using == and friends
 only inspecting the contents (not location) of the data.

I don't think an empty string (non-null, length == 0) should compare 
equal to a non-existant string (null, length == 0).  And vice-versa.

The only thing that should compare equal to null is null.  Likewise an 
empty array should only compare equal to another empty array.

My reasoning for this is consistency, see at end.

Aside: If the location and length are identical you can short-circuit 
the compare, returning true and ignoring the content, this could save a 
bit of time on comparisons of large arrays.

 Now, about comparisons: array comparisons basically operate like this:
 ---
 int opEquals(T)(T[] u, T[] v) {              // bah to int return type
     if (u.length != v.length) return false;
     for (size_t i = 0; i < u.length; i++) {
         if (u[i] != v[i]) return false;
     }
     return true;
 }
 
 int opCmp(T)(T[] u, T[] v) {
     size_t len = min(u.length, v.length)
     for (size_t i = 0; i < len; i++) {
         if (auto diff = u[i].opCmp(v[i])) {
             return diff;
         }
     }
     return cast(int)u.length - cast(int)v.length;
 }
 ---
 (Taken from object.TypeInfo_Array and converted to templates instead of 
 void*s + casting + element TypeInfo.{equals/compare} for readability)

Thanks.

 Since both the null string and "" have .length == 0, that means they 
 compare equal using those methods (having no contents to compare and 
 equal length)

This is the bit I don't like.

 This is all perfectly consistent (and even useful) to me...

It's not consistent with other reference types, types which can 
represent 'non-existant', eg.

   char *p = null;  //non-existant

   if (p == null) writefln("p == null");
   if (p == "") writefln("p == \"\"");

Output:
   p == null

Compare that to:

   char[] p = null;

   if (p == null) writefln("p == null");
   if (p == "") writefln("p == \"\"");

Output:
   p == null
   p == ""

All that I would like changed is for the compare, in the case of length 
== 0, to check the data pointers, eg.

 int opEquals(T)(T[] u, T[] v) {
     if (u.length != v.length) return false;

       if (u.length == 0) return (u.ptr == v.ptr);
     for (size_t i = 0; i < u.length; i++) {
         if (u[i] != v[i]) return false;
     }
     return true;
 }

This should mean "" == "" but not "" == null, likewise null == null but 
not null == "".

Regan

Jul 25 2007

Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:

Regan Heath wrote:
 I'm in the 'distinguishable' camp.  I can see the merit.  At the very 
 least it should be consistent!

 They *are* distinguishable. That's why above code returns different 
 results for the 'is' comparison...

 True.  I guess what I meant to say was I'm in the '3 distict states' 
 camp (which may be a camp of 1 for all I know).  See my reply to 
 digitalmars.D for a definition of the 3 states.

 I for one am perfectly fine with "cast(char[]) null" meaning ".length 
 == 0 && .ptr == null" 

 Same here.

  > and with comparisons of arrays using == and friends
 only inspecting the contents (not location) of the data.

 I don't think an empty string (non-null, length == 0) should compare 
 equal to a non-existant string (null, length == 0).  And vice-versa.

 The only thing that should compare equal to null is null.  Likewise an 
 empty array should only compare equal to another empty array.

 My reasoning for this is consistency, see at end.

Since null arrays have length 0, they *are* empty arrays :P.

 Aside: If the location and length are identical you can short-circuit 
 the compare, returning true and ignoring the content, this could save a 
 bit of time on comparisons of large arrays.

At least with that last paragraph I can agree ;)

Now, about this:

 All that I would like changed is for the compare, in the case of length 
 == 0, to check the data pointers, eg.

  > int opEquals(T)(T[] u, T[] v) {
  >     if (u.length != v.length) return false;
       if (u.length == 0) return (u.ptr == v.ptr);
  >     for (size_t i = 0; i < u.length; i++) {
  >         if (u[i] != v[i]) return false;
  >     }
  >     return true;
  > }

 This should mean "" == "" but not "" == null, likewise null == null but 
 not null == "".

Let's look at this code:
---
import std.stdio;

void main()
{
     char[][] strings = ["hello world!", "", null];

     foreach (str; strings) {
         auto str2 = str.dup;
         if (str == str2)
             writefln(`"%s" == "%s" (%s, %s)`, str, str2, str.ptr, 
str2.ptr);
         else
             writefln(`"%s" != "%s" (%s, %s)`, str, str2, str.ptr, 
str2.ptr);
     }
}
---
The output is currently (on my machine):
=====
"hello world!" == "hello world!" (805BE60, F7CFBFE0)
"" == "" (805BE78, 0000)
"" == "" (0000, 0000)
=====
Your change would change the second line (even if it actually allocated 
a new empty string like you probably want instead of returning null). 
How would that be consistent in any way?
(Same goes for other ways to create different-ptr empty strings)

What you might have meant on that extra line might be more like:
---
        if (u.length == 0) return ((u.ptr is null) == (v.ptr is null));
---
which will return true if both .ptr values are null or both are non-null.

Jul 25 2007

Regan Heath <regan netmail.co.nz> writes:

 The only thing that should compare equal to null is null.  Likewise an 
 empty array should only compare equal to another empty array.

  >
  > My reasoning for this is consistency, see at end.
 
 Since null arrays have length 0, they *are* empty arrays :P.

I can't tell in which way you're joking so I'm just going to come out 
with...

The length of something be it an array, a car, a <insert thing> is 
totally independant of whether it exists (though a non-existant item 
cannot have a length).

It either exists or it does not.  If it exists, it has a length which 
may or may not be zero.

Something which exists cannot be equal to something which doesn't.

Period.

 Aside: If the location and length are identical you can short-circuit 
 the compare, returning true and ignoring the content, this could save 
 a bit of time on comparisons of large arrays.

 
 At least with that last paragraph I can agree ;)

:)

 Your change would change the second line (even if it actually allocated 
 a new empty string like you probably want instead of returning null). 
 How would that be consistent in any way?

Oops, my bad.  My suggested code change is totally incorrect.  That'll 
teach me for posting while working on something else at the same time.

 (Same goes for other ways to create different-ptr empty strings)
 
 What you might have meant on that extra line might be more like:
 ---
        if (u.length == 0) return ((u.ptr is null) == (v.ptr is null));
 ---
 which will return true if both .ptr values are null or both are non-null.

Yes, and yes, I want "".dup to allocate a new 1 byte point at it and set 
length to 0.

Regan

Jul 25 2007

Don Clugston <dac nospam.com.au> writes:

Regan Heath wrote:
 The only thing that should compare equal to null is null.  Likewise 
 an empty array should only compare equal to another empty array.

  >
  > My reasoning for this is consistency, see at end.

 Since null arrays have length 0, they *are* empty arrays :P.

 I can't tell in which way you're joking so I'm just going to come out 
 with...

 The length of something be it an array, a car, a <insert thing> is 
 totally independant of whether it exists (though a non-existant item 
 cannot have a length).

 It either exists or it does not.  If it exists, it has a length which 
 may or may not be zero.

 Something which exists cannot be equal to something which doesn't.

I don't think that's really what's happening here.
Consider vectors. If a vector has a length of zero, the direction doesn't exist.
Take two arbitrary vectors with different directions, a and b.
a*0 == b*0, even though the direction of a is completely different to that of b.
This is the same model which is being used for arrays; if the .length is zero, 
the .ptr is irrelevant.

Jul 25 2007

Derek Parnell <derek psyc.ward> writes:

On Wed, 25 Jul 2007 22:07:15 +0200, Don Clugston wrote:

 Regan Heath wrote:
 The only thing that should compare equal to null is null.  Likewise 
 an empty array should only compare equal to another empty array.

  >
  > My reasoning for this is consistency, see at end.

 Since null arrays have length 0, they *are* empty arrays :P.

 I can't tell in which way you're joking so I'm just going to come out 
 with...

 The length of something be it an array, a car, a <insert thing> is 
 totally independant of whether it exists (though a non-existant item 
 cannot have a length).

 It either exists or it does not.  If it exists, it has a length which 
 may or may not be zero.

 Something which exists cannot be equal to something which doesn't.

 I don't think that's really what's happening here.
 Consider vectors. If a vector has a length of zero, the direction doesn't
exist.
 Take two arbitrary vectors with different directions, a and b.
 a*0 == b*0, even though the direction of a is completely different to that of
b.
 This is the same model which is being used for arrays; if the .length is zero, 
 the .ptr is irrelevant.

But arrays are not vectors.

-- 
Derek Parnell
Melbourne, Australia
"Down with mediocrity!"

Jul 25 2007

Carlos Santander <csantander619 gmail.com> writes:

Frits van Bommel escribió:
 
 Since null arrays have length 0, they *are* empty arrays :P.
 

But empty arrays are not null. You could even argue that null arrays don't have 
a length, thus they can't be empty.

-- 
Carlos Santander Bernal

Jul 25 2007

Derek Parnell <derek psyc.ward> writes:

On Wed, 25 Jul 2007 19:01:57 +0200, Frits van Bommel wrote:

 Since null arrays have length 0, they *are* empty arrays :P.

Not in my world. I see that null arrays have no length. That is to say, the
do not have any length, which is different from saying they have a length
and that length is zero.

 All that I would like changed is for the compare, in the case of length 
 == 0, to check the data pointers, eg.

  > int opEquals(T)(T[] u, T[] v) {
  >     if (u.length != v.length) return false;
       if (u.length == 0) return (u.ptr == v.ptr);
  >     for (size_t i = 0; i < u.length; i++) {
  >         if (u[i] != v[i]) return false;
  >     }
  >     return true;
  > }

 This should mean "" == "" but not "" == null, likewise null == null but 
 not null == "".

 Let's look at this code:
 ---
 import std.stdio;

 void main()
 {
      char[][] strings = ["hello world!", "", null];

      foreach (str; strings) {
          auto str2 = str.dup;
          if (str == str2)
              writefln(`"%s" == "%s" (%s, %s)`, str, str2, str.ptr, 
 str2.ptr);
          else
              writefln(`"%s" != "%s" (%s, %s)`, str, str2, str.ptr, 
 str2.ptr);
      }
 }
 ---
 The output is currently (on my machine):
 =====
 "hello world!" == "hello world!" (805BE60, F7CFBFE0)
 "" == "" (805BE78, 0000)
 "" == "" (0000, 0000)
 =====
 Your change would change the second line (even if it actually allocated 
 a new empty string like you probably want instead of returning null). 
 How would that be consistent in any way?

Your example is misleading for at least two reasons:
** The '==' operator compares the contents of the strings. A null string
has no content so there is nothing to compare. This should fail but is
doesn't in the current D. It should fail in the same manner that a null
object reference fails the '==' operator.
** The output is 'writefln' attempt at given a string representation of the
data presented. It (aka Walter) has decided that the string representation
of a null array is an empty string. This does not mean that a null array is
an empty strng but just that writefln represents it as such.

-- 
Derek Parnell
Melbourne, Australia
"Down with mediocrity!"

Jul 25 2007

Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:

Derek Parnell wrote:
 On Wed, 25 Jul 2007 19:01:57 +0200, Frits van Bommel wrote:

 Since null arrays have length 0, they *are* empty arrays :P.

 Not in my world. I see that null arrays have no length. That is to say, the
 do not have any length, which is different from saying they have a length
 and that length is zero.

But the fact of the matter is, 'T[] x = null;' reserves space for the 
.length and sets it to 0. If you have a suggestion for a different value 
to put there, by all means make it.
Or would you prefer a segfault or diagnostic when accessing 
(cast(T[])null).length? That'd introduce overhead on every .length 
access (unless the compiler can statically determine whether an array 
reference is null).

 All that I would like changed is for the compare, in the case of length 
 == 0, to check the data pointers, eg.

  > int opEquals(T)(T[] u, T[] v) {
  >     if (u.length != v.length) return false;
       if (u.length == 0) return (u.ptr == v.ptr);
  >     for (size_t i = 0; i < u.length; i++) {
  >         if (u[i] != v[i]) return false;
  >     }
  >     return true;
  > }

 This should mean "" == "" but not "" == null, likewise null == null but 
 not null == "".

 Let's look at this code:
 ---
 import std.stdio;

 void main()
 {
      char[][] strings = ["hello world!", "", null];

      foreach (str; strings) {
          auto str2 = str.dup;
          if (str == str2)
              writefln(`"%s" == "%s" (%s, %s)`, str, str2, str.ptr, 
 str2.ptr);
          else
              writefln(`"%s" != "%s" (%s, %s)`, str, str2, str.ptr, 
 str2.ptr);
      }
 }
 ---
 The output is currently (on my machine):
 =====
 "hello world!" == "hello world!" (805BE60, F7CFBFE0)
 "" == "" (805BE78, 0000)
 "" == "" (0000, 0000)
 =====
 Your change would change the second line (even if it actually allocated 
 a new empty string like you probably want instead of returning null). 
 How would that be consistent in any way?

 Your example is misleading for at least two reasons:
 ** The '==' operator compares the contents of the strings. A null string
 has no content so there is nothing to compare. This should fail but is
 doesn't in the current D. It should fail in the same manner that a null
 object reference fails the '==' operator.

This wasn't the point of the example. I could have left out the third 
element and change the .dup in the second line to a different empty 
string (f.e. a 0-length slice of the first one) and the point would 
remain the same: the proposed change would break comparison by '==' for 
empty non-null strings.

 ** The output is 'writefln' attempt at given a string representation of the
 data presented. It (aka Walter) has decided that the string representation
 of a null array is an empty string. This does not mean that a null array is
 an empty strng but just that writefln represents it as such.

Like I said, the point of the example didn't actually have anything to 
do with null strings, but rather with a bug in a change Regan proposed 
to make null strings and non-null empty strings compare unequal, which 
resulted in non-null empty strings comparing unequal.

Jul 25 2007

Derek Parnell <derek psyc.ward> writes:

On Thu, 26 Jul 2007 07:47:03 +0200, Frits van Bommel wrote:

 Derek Parnell wrote:
 On Wed, 25 Jul 2007 19:01:57 +0200, Frits van Bommel wrote:

 Since null arrays have length 0, they *are* empty arrays :P.

 Not in my world. I see that null arrays have no length. That is to say, the
 do not have any length, which is different from saying they have a length
 and that length is zero.

 But the fact of the matter is, 'T[] x = null;' reserves space for the 
 .length and sets it to 0. If you have a suggestion for a different value 
 to put there, by all means make it.

I'm trying not to set in concrete the ABI of variable-length arrays. So
even though the current D definition is that a VL array consists of a
two-element struct and zero or one block of RAM, conceptually a null array
doesn't point to anything and does not have a length. So to me it doesn't
matter that D allocates space for .length and .ptr portions of the nullVL
array, because it still should not use the .length value. But, because
theoretically every RAM address possbiel could be stored in the .ptr
portion, including zero, I conceed that in D the .ptr and .length both
being zero is needed to indicate a null array, even though this disallows
the conceptual empty array begining at address zero.

 Or would you prefer a segfault or diagnostic when accessing 
 (cast(T[])null).length? That'd introduce overhead on every .length 
 access (unless the compiler can statically determine whether an array 
 reference is null).

Yes I would. However, too many people are relying on this inconsistency so
I'll live with that wart in the language.

 All that I would like changed is for the compare, in the case of length 
 == 0, to check the data pointers, eg.

  > int opEquals(T)(T[] u, T[] v) {
  >     if (u.length != v.length) return false;
       if (u.length == 0) return (u.ptr == v.ptr);
  >     for (size_t i = 0; i < u.length; i++) {
  >         if (u[i] != v[i]) return false;
  >     }
  >     return true;
  > }

 This should mean "" == "" but not "" == null, likewise null == null but 
 not null == "".

 Let's look at this code:
 ---
 import std.stdio;

 void main()
 {
      char[][] strings = ["hello world!", "", null];

      foreach (str; strings) {
          auto str2 = str.dup;
          if (str == str2)
              writefln(`"%s" == "%s" (%s, %s)`, str, str2, str.ptr, 
 str2.ptr);
          else
              writefln(`"%s" != "%s" (%s, %s)`, str, str2, str.ptr, 
 str2.ptr);
      }
 }
 ---
 The output is currently (on my machine):
 =====
 "hello world!" == "hello world!" (805BE60, F7CFBFE0)
 "" == "" (805BE78, 0000)
 "" == "" (0000, 0000)
 =====
 Your change would change the second line (even if it actually allocated 
 a new empty string like you probably want instead of returning null). 
 How would that be consistent in any way?

 Your example is misleading for at least two reasons:
 ** The '==' operator compares the contents of the strings. A null string
 has no content so there is nothing to compare. This should fail but is
 doesn't in the current D. It should fail in the same manner that a null
 object reference fails the '==' operator.

 This wasn't the point of the example. 

Sorry for misunderstanding.

 I could have left out the third 
 element and change the .dup in the second line to a different empty 
 string (f.e. a 0-length slice of the first one) and the point would 
 remain the same: the proposed change would break comparison by '==' for 
 empty non-null strings.

I agree with you. Two empty non-null strings should compare as equal
because the equality test is against the contents of the array and not the
addresses of the array. A null array has no content so one has nothing to
compare it with; this is why I think that it is an illegal/meaningless
operation.

-- 
Derek Parnell
Melbourne, Australia
"Down with mediocrity!"

Jul 25 2007

Oskar Linde <oskar.lindeREM OVEgmail.com> writes:

Manfred Nowak wrote:
 Frits van Bommel wrote
 
 But the fact of the matter is, 'T[] x = null;' reserves space for
 the .length and sets it to 0. If you have a suggestion for a
 different value to put there, by all means make it.

 
 Suggestion:
 After `T[] x= null;' `x.length == size_t.max' and `x.ptr == null', i.e.  
 `size_t.max' will no more be a valid length for an array.

Uhu... Why whould a slice of the full addressable memory space be a good 
initialization value?

 This is a hack to avoid some overhead in some places, but may introduce  
 more overhead in other places.

This entire discussion is trying to make todays T[] -- a slice type with 
value semantics and some provisions for making it behave as an array in 
some cases -- into a pure array type with a well defined null. You can't 
do that without breaking its slice semantics. A much better suggestion 
is Walter's T[new]. Make T[] remain the slice type it is today and make 
a distinct array type (preferably a by-reference type).

 Note: after `T[] x= null;' `x' holds an untyped array and so `y= x;' 
 should be a legal assignment for every `y' declared as `U[] y;' for 
 some type `U'---duck and run.

So you are proposing adding runtime type errors? :P


-- 
Oskar

Jul 25 2007

Derek Parnell <derek psyc.ward> writes:

On Thu, 26 Jul 2007 08:37:13 +0200, Oskar Linde wrote:

 Manfred Nowak wrote:
 Frits van Bommel wrote
 
 But the fact of the matter is, 'T[] x = null;' reserves space for
 the .length and sets it to 0. If you have a suggestion for a
 different value to put there, by all means make it.

 
 Suggestion:
 After `T[] x= null;' `x.length == size_t.max' and `x.ptr == null', i.e.  
 `size_t.max' will no more be a valid length for an array.

 
 Uhu... Why whould a slice of the full addressable memory space be a good 
 initialization value?

Maybe x.ptr = size_t.max and x.length = size_t.max might be useful
representation of a null array as it is an illegal RAM reference otherwise.
But I know, its too late now and probably too expensive at run-time to
implement.

 This is a hack to avoid some overhead in some places, but may introduce  
 more overhead in other places.

 
 This entire discussion is trying to make todays T[] -- a slice type with 
 value semantics and some provisions for making it behave as an array in 
 some cases -- into a pure array type with a well defined null. You can't 
 do that without breaking its slice semantics. A much better suggestion 
 is Walter's T[new]. Make T[] remain the slice type it is today and make 
 a distinct array type (preferably a by-reference type).

You may very well be correct.

-- 
Derek Parnell
Melbourne, Australia
"Down with mediocrity!"

Jul 25 2007

Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:

Oskar Linde wrote:
 Manfred Nowak wrote:
 Frits van Bommel wrote

 But the fact of the matter is, 'T[] x = null;' reserves space for
 the .length and sets it to 0. If you have a suggestion for a
 different value to put there, by all means make it.

 Suggestion:
 After `T[] x= null;' `x.length == size_t.max' and `x.ptr == null', 
 i.e.  `size_t.max' will no more be a valid length for an array.

 
 Uhu... Why whould a slice of the full addressable memory space be a good 
 initialization value?

It's not the *full* addressable memory space for 1-byte types (the last 
byte of the address space has an address equal to .ptr(0) + 
.length(size_t.max), which isn't a member of the array) and it's more 
than the address space for bigger types (though I guess it does indeed 
cover the entire address space, possibly several times over, due to 
wraparound on overflow...).
</pedantic>

Jul 26 2007

Bruno Medeiros <brunodomedeiros+spam com.gmail> writes:

Oskar Linde wrote:
 Manfred Nowak wrote:
 
 This is a hack to avoid some overhead in some places, but may 
 introduce  more overhead in other places.

 
 This entire discussion is trying to make todays T[] -- a slice type with 
 value semantics and some provisions for making it behave as an array in 
 some cases -- into a pure array type with a well defined null. You can't 
 do that without breaking its slice semantics. A much better suggestion 
 is Walter's T[new]. Make T[] remain the slice type it is today and make 
 a distinct array type (preferably a by-reference type).
 

Today's T[] is "a slice type with value semantics and some provisions 
for making it behave as an array in some cases"? Whoa. What do you mean 
"making it behave as an array in some cases" ? What's the difference 
between a slice type and an array? And why would having null arrays in D 
break its slice semantics?


-- 
Bruno Medeiros - MSc in CS/E student
http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D

Jul 26 2007

Regan Heath <regan netmail.co.nz> writes:

Frits van Bommel wrote:
 Or would you prefer a segfault or diagnostic when accessing 
 (cast(T[])null).length? 

No, definately not.  This is one of the things I love about arrays, 
they're both value and reference type.  It takes a while to get your 
head round (if the many discussions on these forums are any indication) 
but once you have it worked out it's quite powerful.  In fact it's the 
reason slicing can work the way it does.

Further, for those cases where we do not care to differentiate between 
null and "" checking length == 0 is the perfect solution.

I'm not interested in an array implementation which is 'pure' in any 
academic sense but rather one which is consistent in that null arrays do 
not become empty and vice-versa under any conditions (other than 
explicitly assigning those values).

For example:

In the past setting length to 0 would free the data pointer.  The result 
of which was that a zero length (empty) array became a non-existant 
(null) array.

And the problem we have now is that calling .dup on an empty array 
results in a null array.

It is cases like these which I was to remove.

The other thing I want is for == to tell me that null and "" are not the 
same.

I suspect very little existing code is relying on the existing behaviour 
as it will likely be checking length as opposed to comparing to "" or 
null (note; comparing with == not checking identity with "is").

Regan

Jul 26 2007

Oskar Linde <oskar.lindeREM OVEgmail.com> writes:

Derek Parnell wrote:

 On Wed, 25 Jul 2007 19:01:57 +0200, Frits van Bommel wrote:

 Since null arrays have length 0, they *are* empty arrays :P.

 Not in my world. I see that null arrays have no length. That is to say, the
 do not have any length, which is different from saying they have a length
 and that length is zero.

But that is not how T[] behaves in D. T[]s are of a dual slice/array 
nature with semantics closer to a slice than an array. That is something 
Walter's T[new] suggestion has a potential to remedy.

There is no difference between a "null" array and a slice starting at 
memory location null, 0 elements long. In my opinion, it would be quite 
strange for zero length slices to behave any differently if the starting 
position happens to be null.

There is a very easy way to get the behavior you want BTW:

class Array(T) { ... } :)

 All that I would like changed is for the compare, in the case of length 
 == 0, to check the data pointers, eg.

  > int opEquals(T)(T[] u, T[] v) {
  >     if (u.length != v.length) return false;
       if (u.length == 0) return (u.ptr == v.ptr);
  >     for (size_t i = 0; i < u.length; i++) {
  >         if (u[i] != v[i]) return false;
  >     }
  >     return true;
  > }

 This should mean "" == "" but not "" == null, likewise null == null but 
 not null == "".

This would mean that "two arrays are equal if all elements are equal" 
would no longer hold. (Consider two zero length slices at arbitrary 
memory location, neither of them null).

-- 
Oskar

Jul 25 2007

Regan Heath <regan netmail.co.nz> writes:

Oskar Linde wrote:
 This should mean "" == "" but not "" == null, likewise null == null 
 but not null == "".



 
 This would mean that "two arrays are equal if all elements are equal" 
 would no longer hold. 

Not true, the two arrays you mention below would still compare 'true' as 
their contents are still equal.

Ignore the suggested code changes, my one was patently incorrect and the 
first step is to make it clear what behaviour is desired, something I 
have obviously not done.

 (Consider two zero length slices at arbitrary
 memory location, neither of them null).

The content of these arrays is equal and would compare so.

The case(s) I want to stop comparing as equal are:

null == ""
"" == null

The cases which should continue to compare equal are:

null == null
"" == ""  (your example above)

No more, no less.

Regan

p.s. I know I said ignore the suggested code changes but it would have 
to go something like:

if (lhs.length == 0) {
   if (lhs.ptr && rhs.ptr) return true;  //"" == ""
   if (lhs.ptr || rhs.ptr) return false  //"" == null && null == ""
   return true;                          //null == null
}

Jul 26 2007

Derek Parnell <derek psyc.ward> writes:

On Wed, 25 Jul 2007 14:29:47 +0100, Regan Heath wrote:

 Aside: If the location and length are identical you can short-circuit 
 the compare, returning true and ignoring the content, this could save a 
 bit of time on comparisons of large arrays.

I don't think this is such a good idea. How does one address the array of
four bytes at RAM location 4?
 

-- 
Derek Parnell
Melbourne, Australia
"Down with mediocrity!"

Jul 25 2007

Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:

Derek Parnell wrote:
 On Wed, 25 Jul 2007 14:29:47 +0100, Regan Heath wrote:
 
 Aside: If the location and length are identical you can short-circuit 
 the compare, returning true and ignoring the content, this could save a 
 bit of time on comparisons of large arrays.

 
 I don't think this is such a good idea. How does one address the array of
 four bytes at RAM location 4?

I'm pretty sure the only way to obtain such an array would be to have 
already invoked Undefined Behavior (assuming 4 is an invalid memory 
location on the platform the program's running on) and as such it 
doesn't really matter whether or not two array references to it compare 
equal or not...

Jul 25 2007

Derek Parnell <derek psyc.ward> writes:

On Thu, 26 Jul 2007 07:50:07 +0200, Frits van Bommel wrote:

 Derek Parnell wrote:
 On Wed, 25 Jul 2007 14:29:47 +0100, Regan Heath wrote:
 
 Aside: If the location and length are identical you can short-circuit 
 the compare, returning true and ignoring the content, this could save a 
 bit of time on comparisons of large arrays.

 
 I don't think this is such a good idea. How does one address the array of
 four bytes at RAM location 4?

 
 I'm pretty sure the only way to obtain such an array would be to have 
 already invoked Undefined Behavior (assuming 4 is an invalid memory 
 location on the platform the program's running on) and as such it 
 doesn't really matter whether or not two array references to it compare 
 equal or not...

There is no basis for assuming that any RAM location is not addressable. I
know that some operating systems prevent unprivileged programs from
accessing certain locations, and that some RAM is hardware-mapped to I/O
ports, but in theory, D as a system language should be able to address any
RAM location.

For example, if D had been implemented for the Amiga system, access to RAM
address 4 is vital. As that location contained the 32-bit address of the
list that contains all addresses of the loaded shared libraries. And every
program needed to access that location.

-- 
Derek Parnell
Melbourne, Australia
"Down with mediocrity!"

Jul 25 2007

Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:

Derek Parnell wrote:
 On Thu, 26 Jul 2007 07:50:07 +0200, Frits van Bommel wrote:
 
 Derek Parnell wrote:
 On Wed, 25 Jul 2007 14:29:47 +0100, Regan Heath wrote:

 Aside: If the location and length are identical you can short-circuit 
 the compare, returning true and ignoring the content, this could save a 
 bit of time on comparisons of large arrays.

 I don't think this is such a good idea. How does one address the array of
 four bytes at RAM location 4?

 I'm pretty sure the only way to obtain such an array would be to have 
 already invoked Undefined Behavior (assuming 4 is an invalid memory 
 location on the platform the program's running on) and as such it 
 doesn't really matter whether or not two array references to it compare 
 equal or not...

 
 There is no basis for assuming that any RAM location is not addressable. I
 know that some operating systems prevent unprivileged programs from
 accessing certain locations, and that some RAM is hardware-mapped to I/O
 ports, but in theory, D as a system language should be able to address any
 RAM location.
 
 For example, if D had been implemented for the Amiga system, access to RAM
 address 4 is vital. As that location contained the 32-bit address of the
 list that contains all addresses of the loaded shared libraries. And every
 program needed to access that location.

I'm sorry, but what would then be the problem with accessing 
(cast(byte)4)[0..4] if it's a valid memory location?
I thought your question implied it was an invalid memory location, 
though I'm very aware that's not always the case (which was why I had 
the parenthesized sentence in there).

By the way, null is a valid address on x86 too, but most operating 
systems don't map the first page to any memory to generate pagefaults 
for null pointer dereferences (and IIRC Linux treats the last page 
similarly, for null pointers with negative indices). IIRC DOS didn't 
(and probably couldn't on machines of the time), do this; the interrupt 
table was located there (which would seem to be a pretty bad idea for a 
system without memory protection -- a null pointer write could 
potentially crash the entire system...).

Also, there's no particular reason null has to be cast(whatever)0, that 
just happens to be a convenient easily-checked-for value...

Jul 26 2007

Derek Parnell <derek psyc.ward> writes:

On Thu, 26 Jul 2007 09:28:16 +0200, Frits van Bommel wrote:

 Derek Parnell wrote:
 On Thu, 26 Jul 2007 07:50:07 +0200, Frits van Bommel wrote:
 
 Derek Parnell wrote:
 On Wed, 25 Jul 2007 14:29:47 +0100, Regan Heath wrote:

 Aside: If the location and length are identical you can short-circuit 
 the compare, returning true and ignoring the content, this could save a 
 bit of time on comparisons of large arrays.





 I'm sorry, but what would then be the problem with accessing 
 (cast(byte)4)[0..4] if it's a valid memory location?

Duh!  I am so stupid!   I misread Regan's original post. When he said "If
the location and length are identical" I incorrectly read that as "if an
array's location and length are identical" and not "if the locations and
lengths of two arrays are identical".

Sorry (as he sulks off hoping no one notices) ...

-- 
Derek Parnell
Melbourne, Australia
"Down with mediocrity!"

Jul 26 2007

Regan Heath <regan netmail.co.nz> writes:

Derek Parnell wrote:
 On Wed, 25 Jul 2007 14:29:47 +0100, Regan Heath wrote:
 
 Aside: If the location and length are identical you can short-circuit 
 the compare, returning true and ignoring the content, this could save a 
 bit of time on comparisons of large arrays.

 
 I don't think this is such a good idea. How does one address the array of
 four bytes at RAM location 4?


What I meant was:

if (lhs.length == rhs.length && lhs.ptr == rhs.ptr) return true;

Not:

if (lhs.length == lhs.ptr) return true;

;)

Regan

Jul 26 2007

Bruno Medeiros <brunodomedeiros+spam com.gmail> writes:

Frits van Bommel wrote:
 Regan Heath wrote:
 This all boils down to the empty vs null string debate where some 
 people want to be able to distinguish between them and some see no point.

 I'm in the 'distinguishable' camp.  I can see the merit.  At the very 
 least it should be consistent!

 
 They *are* distinguishable. That's why above code returns different 
 results for the 'is' comparison...
 

The .ptr of empty arrays may be different than the .ptr of null arrays, 
but they are conceptually the same, and thus not safely distinguishable.
Example:
	writefln("" is null); // false
	writefln("".dup is null); // true

"".ptr is not null, but "".dup.ptr is null. Such duplication is correct, 
because empty arrays are conceptually the same as null arrays, and 
trying to use .ptr do distinguish them is unsafe, 
implementation-depedendent behavior (aka a program error).

-- 
Bruno Medeiros - MSc in CS/E student
http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D

Jul 25 2007

Regan Heath <regan netmail.co.nz> writes:

Bruno Medeiros wrote:
 Frits van Bommel wrote:
 Regan Heath wrote:
 This all boils down to the empty vs null string debate where some 
 people want to be able to distinguish between them and some see no 
 point.

 I'm in the 'distinguishable' camp.  I can see the merit.  At the very 
 least it should be consistent!

 They *are* distinguishable. That's why above code returns different 
 results for the 'is' comparison...

 
 The .ptr of empty arrays may be different than the .ptr of null arrays, 
 but they are conceptually the same, and thus not safely distinguishable.
 Example:
     writefln("" is null); // false
     writefln("".dup is null); // true
 
 "".ptr is not null, but "".dup.ptr is null. Such duplication is correct, 
 because empty arrays are conceptually the same as null arrays, and 
 trying to use .ptr do distinguish them is unsafe, 
 implementation-depedendent behavior (aka a program error).

Ick.  IMO "".dup should allocate 1 byte of memory, set it to '\0' and 
create a reference to it with length of 0.

What do you mean by "empty arrays are conceptually the same as null 
arrays"?

To me null arrays (non-existant) and "" arrays (empty) are conceptually 
different.  null indicates the array does not exist (no set at all), "" 
indicates it does but contains no items (an empty set).

Regan

Jul 25 2007

Bruno Medeiros <brunodomedeiros+spam com.gmail> writes:

Regan Heath wrote:
Bruno Medeiros wrote:
Frits van Bommel wrote:
Regan Heath wrote:
This all boils down to the empty vs null string debate where some
people want to be able to distinguish between them and some see no
point.

I'm in the 'distinguishable' camp. I can see the merit. At the
very least it should be consistent!

They *are* distinguishable. That's why above code returns different
results for the 'is' comparison...

The .ptr of empty arrays may be different than the .ptr of null
arrays, but they are conceptually the same, and thus not safely
distinguishable.
Example:
writefln("" is null); // false
writefln("".dup is null); // true

"".ptr is not null, but "".dup.ptr is null. Such duplication is
correct, because empty arrays are conceptually the same as null
arrays, and trying to use .ptr do distinguish them is unsafe,
implementation-depedendent behavior (aka a program error).

Ick. IMO "".dup should allocate 1 byte of memory, set it to '\0' and
create a reference to it with length of 0.

What do you mean by "empty arrays are conceptually the same as null
arrays"?

I meant that in current D they are semantically the same. (I should have
used those words)

To me null arrays (non-existant) and "" arrays (empty) are conceptually
different. null indicates the array does not exist (no set at all), ""
indicates it does but contains no items (an empty set).

Regan

I know, and I agree, don't you recall the V2 string discussion:
http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=55388

--
Bruno Medeiros - MSc in CS/E student
http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D

Jul 25 2007

Regan Heath <regan netmail.co.nz> writes:

Bruno Medeiros wrote:
 Regan Heath wrote:
 What do you mean by "empty arrays are conceptually the same as null 
 arrays"?

 
 I meant that in current D they are semantically the same. (I should have 
 used those words)

:)

 To me null arrays (non-existant) and "" arrays (empty) are 
 conceptually different.  null indicates the array does not exist (no 
 set at all), "" indicates it does but contains no items (an empty set).

 Regan

 
 I know, and I agree, don't you recall the V2 string discussion:
 http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmar
.D&article_id=55388 

Yes, I remember it.  I just forgot who was involved and what their 
opinions were.  I have a hard enough time keeping track of my own 
opinion let alone others.

Regan

Jul 25 2007

Derek Parnell <derek psyc.ward> writes:

On Wed, 25 Jul 2007 14:31:28 +0100, Bruno Medeiros wrote:

 The .ptr of empty arrays may be different than the .ptr of null arrays, 
 but they are conceptually the same, and thus not safely distinguishable.

No they are not! Conceptually they are different things. However, D
sometimes implements them as the same thing.

 Example:
 	writefln("" is null); // false
 	writefln("".dup is null); // true

 "".ptr is not null, but "".dup.ptr is null. Such duplication is correct, 
 because empty arrays are conceptually the same as null arrays, and 
 trying to use .ptr do distinguish them is unsafe, 
 implementation-depedendent behavior (aka a program error).

But I believe that the implementation here is wrong. "".dup should create
another empty string and not a null string. 

-- 
Derek Parnell
Melbourne, Australia
"Down with mediocrity!"

Jul 25 2007

Bruno Medeiros <brunodomedeiros+spam com.gmail> writes:

Derek Parnell wrote:
 On Wed, 25 Jul 2007 14:31:28 +0100, Bruno Medeiros wrote:
 
 The .ptr of empty arrays may be different than the .ptr of null arrays, 
 but they are conceptually the same, and thus not safely distinguishable.

 
 No they are not! Conceptually they are different things. However, D
 sometimes implements them as the same thing.
 

Check my reply to Regan just above, what I meant to say is that in 
current D they are semantically the same.

 Example:
 	writefln("" is null); // false
 	writefln("".dup is null); // true

 "".ptr is not null, but "".dup.ptr is null. Such duplication is correct, 
 because empty arrays are conceptually the same as null arrays, and 
 trying to use .ptr do distinguish them is unsafe, 
 implementation-depedendent behavior (aka a program error).

 
 But I believe that the implementation here is wrong. "".dup should create
 another empty string and not a null string. 
 

The implementation is not wrong, it is according to Walter's intention, 
as you know. If anything, it is Walter's intention that is wrong. ^^'

-- 
Bruno Medeiros - MSc in CS/E student
http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D

Jul 26 2007

Derek Parnell <derek psyc.ward> writes:

On Wed, 25 Jul 2007 15:05:25 +0200, Frits van Bommel wrote:

 Since both the null string and "" have .length == 0, that means they 
 compare equal using those methods (having no contents to compare and 
 equal length)
 
 This is all perfectly consistent (and even useful) to me...

However, 

   string x = "";

means that 'x' is not null because it has a pointer and that points a
string with no content. Something that is null has no pointer and therefore
the length component is not significant. But of course, in order to
represent something that really does have the address of zero we should
only consider 'x' to be null when both x.ptr and x.length are both zero. In
every other case it is not null.

-- 
Derek Parnell
Melbourne, Australia
"Down with mediocrity!"

Jul 25 2007

D Programming

C/C++ Programming

Other

digitalmars.D.learn - Checking if a string is null