www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.bugs - [Issue 2934] New: "".dup does not return empty string

reply d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=2934

           Summary: "".dup does not return empty string
           Product: D
           Version: unspecified
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: normal
          Priority: P2
         Component: DMD
        AssignedTo: bugzilla digitalmars.com
        ReportedBy: qian.xu funkwerk-itk.com


The following code will throw an exception: 
  char[] s;
  assert( s.dup  is null); // OK
  assert("".dup !is null); // FAILED

"".dup is expectly also an empty string.

Confirmed with dmd v1, gdc


-- 
May 04 2009
next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=2934






Sorry. I should have post the following code:

  char[] s;
  assert(s     is null);
  assert(s.dup is null);

  assert(""     !is null); // OK
  assert("".dup !is null); // FAILED

The last two lines behave not consistent. 
Either both are failed, or both are passed.


-- 
May 04 2009
prev sibling next sibling parent reply d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=2934


schveiguy yahoo.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |INVALID





From posts in the newsgroup, I've determined that this bug is invalid:

1. Duplicating an empty array should always return a null array.  Otherwise,
you'd have to allocate space to store 0 data bytes in order for the result to
be non-null.

2. String literals have a null character implicitly appended to them by the
compiler.  This is done to ease calling c functions.  So a string literal's
pointer cannot be null, since it has to point to a static zero byte.

The spec identifies specifically item 2 here:
http://www.digitalmars.com/d/1.0/arrays.html#strings

see the section describing "C's printf and Strings"

I could not find a reference for item 1, but I remember reading something about
it.  Regardless of it is identified specifically in the spec or not, it is not
a bug, as the alternative would be to allocate blocks for 0-sized arrays.


-- 
May 04 2009
parent reply Derek Parnell <derek psych.ward> writes:
On Mon, 4 May 2009 17:44:56 +0000 (UTC), d-bugmail puremagic.com wrote:

 http://d.puremagic.com/issues/show_bug.cgi?id=2934
 
 schveiguy yahoo.com changed:
 
            What    |Removed                     |Added
 ----------------------------------------------------------------------------
              Status|NEW                         |RESOLVED
          Resolution|                            |INVALID
 

 From posts in the newsgroup, I've determined that this bug is invalid:
 
 1. Duplicating an empty array should always return a null array.  Otherwise,
 you'd have to allocate space to store 0 data bytes in order for the result to
 be non-null.
 
 2. String literals have a null character implicitly appended to them by the
 compiler.  This is done to ease calling c functions.  So a string literal's
 pointer cannot be null, since it has to point to a static zero byte.
 
 The spec identifies specifically item 2 here:
 http://www.digitalmars.com/d/1.0/arrays.html#strings
 
 see the section describing "C's printf and Strings"
 
 I could not find a reference for item 1, but I remember reading something about
 it.  Regardless of it is identified specifically in the spec or not, it is not
 a bug, as the alternative would be to allocate blocks for 0-sized arrays.
Huh??? Duplicating something should give one a duplicate. I do not think that this is an invalid bug. Ok, so duplicating an empty array causes memory to be allocated - so what! I asked for a duplicate so give me a duplicate, please. To me, the "no surprise" path is simple. Duplicating an empty array should return an empty array. Duplicating a null array should return a null array. Is that not intuitive? -- Derek Parnell Melbourne, Australia skype: derek.j.parnell
May 04 2009
parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Mon, 04 May 2009 16:56:49 -0400, Derek Parnell <derek psych.ward> wrote:

 On Mon, 4 May 2009 17:44:56 +0000 (UTC), d-bugmail puremagic.com wrote:

 http://d.puremagic.com/issues/show_bug.cgi?id=2934

 schveiguy yahoo.com changed:

            What    |Removed                     |Added
 ----------------------------------------------------------------------------
              Status|NEW                         |RESOLVED
          Resolution|                            |INVALID


 From posts in the newsgroup, I've determined that this bug is invalid:

 1. Duplicating an empty array should always return a null array.   
 Otherwise,
 you'd have to allocate space to store 0 data bytes in order for the  
 result to
 be non-null.

 2. String literals have a null character implicitly appended to them by  
 the
 compiler.  This is done to ease calling c functions.  So a string  
 literal's
 pointer cannot be null, since it has to point to a static zero byte.

 The spec identifies specifically item 2 here:
 http://www.digitalmars.com/d/1.0/arrays.html#strings

 see the section describing "C's printf and Strings"

 I could not find a reference for item 1, but I remember reading  
 something about
 it.  Regardless of it is identified specifically in the spec or not, it  
 is not
 a bug, as the alternative would be to allocate blocks for 0-sized  
 arrays.
Huh??? Duplicating something should give one a duplicate. I do not think that this is an invalid bug. Ok, so duplicating an empty array causes memory to be allocated - so what! I asked for a duplicate so give me a duplicate, please. To me, the "no surprise" path is simple. Duplicating an empty array should return an empty array. Duplicating a null array should return a null array. Is that not intuitive?
what's not intuitive is comparing an array (which is a struct) to null. char[] arr1 = ""; char[] arr2 = null; assert(arr1 == arr2); // OK assert(arr1 == null); // FAIL I'd say that comparing an array to null should always succeed if the array is empty, but I guess some people may use the fact that the pointer is not null in an empty array. I definitely don't want the runtime to allocate blocks of data when requested to allocate 0 bytes. In any case, this bug is not valid, because the compiler acts as specified by the spec. I never compare arrays to null if I can remember, I always check the length instead, which is consistent for both null and empty arrays. -Steve
May 04 2009
parent reply Derek Parnell <derek psych.ward> writes:
On Mon, 04 May 2009 17:16:45 -0400, Steven Schveighoffer wrote:


 what's not intuitive is comparing an array (which is a struct) to null.
Hmmm ... interesting. I regard the array not as a struct but as a concept implemented in D as a struct.
 char[] arr1 = "";
 char[] arr2 = null;
 
 assert(arr1 == arr2); // OK
 assert(arr1 == null); // FAIL
 
 I'd say that comparing an array to null should always succeed if the array  
 is empty, but I guess some people may use the fact that the pointer is not  
 null in an empty array.
Yes, some people rely on the distinction. However, I think that this ought to be the case ... char[] arr1 = ""; char[] arr2 = null; assert(arr1 == arr2); // FAIL assert(arr1 == null); // FAIL assert(arr2 == ""); // FAIL assert(arr2 == arr1); // FAIL assert(null == ""); // FAIL Simply because an empty array is one with an allocation and a null array is one without an allocation therefore they are not the same thing. So the '==' equality test should tell the coder that there are two different beasties at play here. I know that there is an "efficiency" aspect to this. A "proper" test IMO is that an array is null if arr.ptr == null and arr.length = 0, but I suspect that will be evil to the speed aficionados.
  I definitely don't want the runtime to allocate  
 blocks of data when requested to allocate 0 bytes.
Then don't allocate zero bytes.
 In any case, this bug is not valid, because the compiler acts as specified  
 by the spec.
I'm having trouble locating the specification for this.
 I never compare arrays to null if I can remember, I always check the  
 length instead, which is consistent for both null and empty arrays.
I do the same as you. -- Derek Parnell Melbourne, Australia skype: derek.j.parnell
May 04 2009
parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Mon, 04 May 2009 20:02:01 -0400, Derek Parnell <derek psych.ward> wrote:

 On Mon, 04 May 2009 17:16:45 -0400, Steven Schveighoffer wrote:


 what's not intuitive is comparing an array (which is a struct) to null.
Hmmm ... interesting. I regard the array not as a struct but as a concept implemented in D as a struct.
Yes, but null is a pointer. Can I make just any struct with a pointer, and expect to be able to compare it to null (and have it direct that comparision to the pointer)? The distinction that an array is a struct and not a pointer or reference is one of the frequent causes of newbie frustration, because they just don't get it at first. I know of no other language that implements arrays like this (where the length is local, but the data is shared). It's also one of the gems of D if you learn to use it correctly.
 char[] arr1 = "";
 char[] arr2 = null;

 assert(arr1 == arr2); // OK
 assert(arr1 == null); // FAIL

 I'd say that comparing an array to null should always succeed if the  
 array
 is empty, but I guess some people may use the fact that the pointer is  
 not
 null in an empty array.
Yes, some people rely on the distinction. However, I think that this ought to be the case ... char[] arr1 = ""; char[] arr2 = null; assert(arr1 == arr2); // FAIL assert(arr1 == null); // FAIL assert(arr2 == ""); // FAIL assert(arr2 == arr1); // FAIL assert(null == ""); // FAIL Simply because an empty array is one with an allocation and a null array is one without an allocation therefore they are not the same thing. So the '==' equality test should tell the coder that there are two different beasties at play here.
I would be also fine with this, as it would discourage comparing to null. I'd also be fine with comparing an array to null being a syntax error. You can always do arr.ptr == null.
 I know that there is an "efficiency" aspect to this.

 A "proper" test IMO is that an array is null if arr.ptr == null and
 arr.length = 0, but I suspect that will be evil to the speed aficionados.
Such an array is an anomaly, and shouldn't ever occur, unless someone forces it by setting the ptr specifically. I don't think it's worth the extra code to cover this very rare possibility.
  I definitely don't want the runtime to allocate
 blocks of data when requested to allocate 0 bytes.
Then don't allocate zero bytes.
Sometimes, you don't know whether it's going to be zero bytes or not until runtime. I don't want to have to check for zero-length arrays everywhere I dup, when the GC does it for me.
 In any case, this bug is not valid, because the compiler acts as  
 specified
 by the spec.
I'm having trouble locating the specification for this.
As far as the "" being not null, the spec does talk about it (although indirectly) as I cited in the original bug resolution. As far as returning a null array when allocating zero bytes, there is nothing I could find in the spec, but this means it's up to the implementer. So the implementation does not violate the spec, and it can be considered desired behavior, not an accident. I'd be interested to know what Walter had in mind. -Steve
May 05 2009
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=2934


qian.xu funkwerk-itk.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|RESOLVED                    |REOPENED
         Resolution|INVALID                     |





 2. String literals have a null character implicitly appended to them by the
 compiler.  This is done to ease calling c functions.  So a string literal's
 pointer cannot be null, since it has to point to a static zero byte.
I am fully agree with you. But before using ".dup" a string variable has triple-state (null, empty or not empty). After adding a ".dup" to an empty string, it might be reduced to two. This will break existing code, if defensive copies of strings are made. An example is as follows: class test { private char[] val; char[] getVal() { return val.dup; // make a defensive copy to avoid unexpected change from outside } void setVal(char[] val) { this.val = val.dup; } } myTestObj.setVal(""); char[] s = myTestObj.getVal; if (s is null) { // do task 1 } else if (s == "") { // do task 2 } else { // do task 3 } In this case, task 2 is expected to be performed. However task 1 will be performed.
 Regardless of it is identified specifically in the spec or not, it is not
 a bug, as the alternative would be to allocate blocks for 0-sized arrays.
Did you mean, that this is a feature request? I would like to regard the inconsistency of the dup-effect as a defect. --
May 05 2009
prev sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=2934






From that point of view, your request makes a lot more sense.

But there are two counter arguments:

1. Comparing an array to null has limited utility, I don't think it should be
in widespread use, as most of the time you only care if the array is empty or
not.  There may be special cases, but in those cases, you can use arr.ptr ==
null.  It would have been much better if arr == null never compiled.

2. Duping an empty array has limited defensive utility.  You can just as easily
return the array itself.  If it weren't for the horrendous append behavior, it
would be a no brainer:

T[] edup(T)(T[] arr)
{
   return arr.length == 0 ? arr : arr.dup;
}

usage:

return arr.edup();

Allocating data for duping an empty array is not an acceptable pessimization. 
However, I thought of another possible solution:  A dup of an empty, non-null
array can return a pointer into the read only data segment.  This would allow a
non-allocation on duping an empty array, would not return a pointer to null,
and would not accidentally overwrite the original array if appending is done.

So a fix can be done.


-- 
May 05 2009