digitalmars.D - empty arrays

digitalmars.D - empty arrays - no complaints?

Farmer (108/108) Jun 27 2004 Why are there (almost) no complaints about D's support for empty arrays?

Sean Kelly (10/23) Jun 27 2004 Not really. I'd rather argue that D tries to make both usable and reduc...

Farmer (26/48) Jun 27 2004 You misunderstood me, I meant that the function interface is a good one.

Derek Parnell (15/64) Jun 27 2004 Well....the *use* of an uninitialized array it which it is assumed to be

Kris (17/20) Jun 27 2004 What I do to handle such issues is to check the array length only. See, ...

Regan Heath (27/50) Jun 27 2004 I think Derek is thinking more of this other example he gave:

Andy Friesen (14/22) Jun 27 2004 I think the problem is that D arrays almost always behave like reference...

Farmer (12/20) Jun 27 2004 Yes, this is a problem. It is a necessary evil to archive that outstandi...

Andy Friesen (33/52) Jun 27 2004 Conceptually they are. If the length is zero, then the data pointer is

Derek Parnell (21/82) Jun 27 2004 Huh? There are times when a zero-length array is valid and an uninitaliz...
Regan Heath (32/86) Jun 27 2004 D allows both empty arrays *and* null arrays.

Andy Friesen (25/57) Jun 27 2004 D arrays are implement exactly so:

Regan Heath (36/93) Jun 27 2004 I see what you're saying... the internal data pointer for the array can ...

Andy Friesen (15/32) Jun 28 2004 You say that as though it is self-evident that strings must absolutely,

Regan Heath (16/50) Jun 28 2004 I have not used C++ containers. I program in C for a living, and C++ for...

Andy Friesen (18/31) Jun 28 2004 Yeah, it's called std::string, and it's more or less the default.

Derek Parnell (8/50) Jun 28 2004 Agreed, D doesn't seem to work that way, but isn't that the issue. Some
Bent Rasmussen (4/6) Jun 28 2004 I must say, I kind of like that. I don't have to write a read/write prop...
Regan Heath (14/47) Jun 28 2004 And it's crap. IMNSHO.

Andy Friesen (6/10) Jun 28 2004 That would work, but it might be better to adjust your thinking to match...

Regan Heath (112/121) Jun 29 2004 You may be right, so in an effort to change my thinking, pls consider

Charlie (7/129) Jun 29 2004 ---

Regan Heath (7/176) Jun 29 2004 an empty char[] has a length of 0.

Andy Friesen (20/31) Jun 29 2004 In this case, I would say that the best thing to do on failure is to

Regan Heath (32/64) Jun 29 2004 Nope. This is taken from a real life example, I have a config file with ...

Andy Friesen (21/39) Jun 29 2004 I guess it's just a matter of preference. I don't have a problem with

Regan Heath (30/70) Jun 30 2004 It's more like:

Andy Friesen (27/55) Jun 30 2004 I very much doubt this. Associative arrays maintain an internal list of...

Regan Heath (39/94) Jun 30 2004 I agree totally. I am not disputing how an associative array works, what...

Andy Friesen (30/42) Jun 30 2004 This could never work anyway. Types for which null does not make sense

Regan Heath (29/70) Jun 30 2004 I think.. I agree. :)

Andy Friesen (9/21) Jul 01 2004 Right, but Cmp functions return 0 to indicate equality, which would be

Arcane Jill (13/20) Jun 30 2004 And indeed that very situation is ALSO true with integer parameters. How...

Regan Heath (14/42) Jun 30 2004 Yep. As another poster noted he had the same problem with integers,

Kevin Bealer (19/64) Jul 07 2004 The D equivalent might be to return int[] or char[][] y. Test if the le...

Farmer (17/101) Jul 09 2004 Disagree. Returning an array for a single value confuses a programmer th...

Arcane Jill (23/29) Jun 29 2004 You'll get no arguments from me there. D got it right in not having a st...

Derek Parnell (22/51) Jun 29 2004 Because that's not what is being meant. I'd like to differentiate betwee...

Arcane Jill (12/17) Jun 29 2004 Why?

Sam McCall (16/29) Jun 29 2004 The difference is in C++ it's common to use a pointer to a class (and I

Arcane Jill (8/12) Jun 29 2004 I do, but less frequently than this one as it's a slow turnover list. I ...
Matthias Becker (14/20) Jun 29 2004 Nope, wrong.

Sam McCall (4/12) Jun 29 2004 To request default behaviour a la optional arguments, without
Regan Heath (8/34) Jun 29 2004 pls read my post (2 prior to this one - sorted flat and by date, it is a...

Derek (16/38) Jun 29 2004 I don't use C++, so I'm not aware of what std::vector does or does not

Arcane Jill (15/30) Jun 29 2004 I'd use two functions for this:

Sam McCall (6/25) Jun 29 2004 Sure, but it sucks if there's a lot of them, and is impossible if the

Carlos Santander B. (63/63) Jun 29 2004 "Arcane Jill" escribi� en el mensaje
Regan Heath (8/31) Jun 29 2004 Pls read the reply I just made to Andy's post that started this branch i...

Sam McCall (59/82) Jun 29 2004 I'm still getting there... I still don't see why toUpper("hello") is

Arcane Jill (55/73) Jun 29 2004 Maybe not, but you still need something to store them in. Even if you le...

Sam McCall (54/127) Jun 29 2004 Sure, but given that the "user" shouldn't be touching chars without

Arcane Jill (24/52) Jun 29 2004 Yes it does. Java chars operate in UTF-16. If you want to store the char...

Sam McCall (55/111) Jun 29 2004 Whoops. Having never had to deal with this case (and taken a series of

Arcane Jill (38/76) Jun 30 2004 I'm led to believe there was a lot of debate about this. Some folk said ...

Sam McCall (24/102) Jun 30 2004 Sorry, I meant "if java had originally been defined to have char being

Arcane Jill (31/43) Jun 30 2004 You weren't mistaken. You were spot on.

Sam McCall (8/24) Jun 30 2004

Bent Rasmussen (8/14) Jun 29 2004 That's true. In Standard ML you could do

Sam McCall (20/34) Jun 29 2004 McCall's Law the First:

Regan Heath (5/41) Jun 29 2004 I think the current value-type-kinda nature of arrays is good, it just

Matthias Becker (13/29) Jun 29 2004 Why do you need to add member-functions to a string class, but you don't...

Bent Rasmussen (32/36) Jun 29 2004 Perhaps,

Farmer (13/18) Jun 29 2004 Arcane Jill wrote in

Sam McCall (9/20) Jun 29 2004 We don't have array literals, so we can't do this:

Farmer (12/33) Jun 30 2004 What's messy here?

Matthias Becker (5/39) Jun 29 2004 Could you please make some real world examples, where you need empty str...

Regan Heath (10/58) Jun 29 2004 Thus why I dont use references either when I need the ability to say it'...

Sean Kelly (7/9) Jun 29 2004 Why? It seems to me that this behavior would also require arrays to be

Farmer (17/29) Jun 29 2004 The .length parameter would still work with null-arrays (as they current...

Sean Kelly (21/33) Jun 29 2004 Consider the following:

Farmer (18/61) Jun 30 2004 I agree with you that this feature is quite useful.

Regan Heath (32/97) Jun 30 2004 Provably correct. :)

Farmer (2/2) Jul 01 2004 Sorry, I've posted rubbish.

Sean Kelly (5/11) Jun 30 2004 I read it that the assertion requires either the length to be zero or th...

Regan Heath (6/22) Jun 30 2004 It may very well allow it (in this code, at this level), but how do you ...
Farmer (9/28) Jul 01 2004 I blush for shame, this is too embarrassing. What a whimp I am, I can't ...

Regan Heath (9/22) Jun 29 2004 Nope. It already works, except for 2 inconsistencies (see the original

Farmer (14/16) Jun 29 2004 Andy Friesen wrote in

Andy Friesen (12/23) Jun 29 2004 They don't? Do you have a source to back that up? As far as I've ever

Regan Heath (11/31) Jun 29 2004 Sure.. can you show me how. I am having trouble doing it, it must be my ...
Farmer (22/50) Jun 30 2004 Sorry, my statement was badly expressed. I meant it more like "And proba...

Bent Rasmussen (5/5) Jun 30 2004 I hope you're not referring to the quick hack I posted. It was meant to

Farmer (9/17) Jul 01 2004 You suggested none's for int's but you don't use the term naive in your

Regan Heath (13/66) Jun 30 2004 Was it me.. these links don't work for me :(

Farmer (41/45) Jul 03 2004 Regan Heath wrote in

Farmer (13/34) Jun 30 2004 An expression like

Arcane Jill (19/20) Jun 27 2004 Actually, I think that D has got it right here. At least mostly. I'm hap...

Regan Heath (46/79) Jun 27 2004 This (now?) works.

Derek (6/12) Jun 27 2004 Agreed. A non-existant array is not the same as an array with no element...
Arcane Jill (19/23) Jun 28 2004 Indeed, I think it has always worked. It was just me misremembering the ...

Sean Kelly (12/19) Jun 28 2004 Yes it is. But I think it's the syntax that's the problem in this case....
Andy Friesen (10/16) Jun 28 2004 Something which just occurred to me that would resolve this issue would

Sean Kelly (5/9) Jun 28 2004 This might be very handy. If so, I wouldn't mind seeing rbegin and rend

Sam McCall (6/18) Jun 29 2004 Huh? They're pointers... wouldn't rbegin == end and rend == begin?

Sean Kelly (7/15) Jun 29 2004 It does apply to associative arrays IMO. I iterate through the contents...

Sam McCall (10/30) Jun 29 2004 We're talking about pointers for low level iteration, this doesn't apply...

Sean Kelly (8/13) Jun 30 2004 This is easy enough to do with free functions anyway. Something like:

Farmer (7/11) Jun 28 2004 The expression cast(elementtype*)a+n , does that.
Regan Heath (20/52) Jun 28 2004 Interestingly..
Norbert Nemec (9/22) Jun 29 2004 No, I disagree here. In general, that address would point to nothing.

Arcane Jill (7/14) Jun 29 2004 Such a pointer is never used for reading OR writing. It /is/, however, u...

Norbert Nemec (3/8) Jun 29 2004 Well - that's a workaround but not a clean solution.

Farmer (8/16) Jun 29 2004 [snip]

Farmer (9/16) Jun 28 2004 Regan Heath wrote in

Farmer (12/34) Jun 27 2004 I'm a bit confused, since in my sample, the array 'empty2' is created fr...

Farmer (6/19) Jun 28 2004

Farmer <itsFarmer. freenet.de> writes:

Why are there (almost) no complaints about D's support for empty arrays?


Just to get ex-BASIC programmers in touch with this aspect of D arrays, 
here's a (not so) small D sample that shows how to create 
   a)null arrays (named: null1, null2, null3)
   b)empty arrays (named: array1, array2, array3)
and also shows how they differ.

[D arrays have sooooo obvious semantic, that D programmers should feel free 
to skip to the end of this post and read the conclusion.]


--------------------- array sample code ---------------------


void printTraits(char[] array, char[] name)
{

   printf("\n%10.*s%-13.*s", name, ".length == 0");
   if (array.length == 0)
      printf("%10.*s","is true");
   else
      printf("%10.*s","is false");

   printf("%10.*s%-13.*s", name, " is null");
   if (array is null)
      printf("%10.*s","is true");
   else
      printf("%10.*s","is false");

   printf("\n%10.*s%-13.*s", name, " == null");
   if (array == null)
      printf("%10.*s","is true");
   else
      printf("%10.*s","is false");

   printf("%10.*s%-13.*s", name, " == \"\"");
   if (array == "")
      printf("%10.*s","is true");
   else
      printf("%10.*s","is false");
}


int main(char args[][])
{
   char[] empty1=(new char[1])[0..0];
   char[] empty2="1"[1..1];   // empty2="1"[2..2]  causes ArrayBoundsError
   char[] empty3="";

   char[] null1;
   char[] null2=new char[0];
   char[] null3=empty1;
   null3.length=0;

   printTraits(null1, "null1");
   printTraits(null2, "null2");
   printTraits(null3, "null3");
   printf("\n");
   printTraits(empty1, "empty1");
   printTraits(empty2, "empty2");
   printTraits(empty3, "empty3");
   printf("\n\n");
   if (null1 == null)
      printf("%20.*s","null1 == null   ");
   if (empty1 == null1)
      printf("%20.*s","empty1 == null1  ");
   if (empty1 != null)
      printf("%20.*s","but  empty1 != null");
   printf("\n");

   return 0;
}


Build with DMD 0.93 (Windows), the output is:

     null1.length == 0    is true     null1 is null        is true
     null1 == null        is true     null1 == ""          is true
     null2.length == 0    is true     null2 is null        is true
     null2 == null        is true     null2 == ""          is true
     null3.length == 0    is true     null3 is null        is true
     null3 == null        is true     null3 == ""          is true

    empty1.length == 0    is true    empty1 is null       is false
    empty1 == null       is false    empty1 == ""          is true
    empty2.length == 0    is true    empty2 is null       is false
    empty2 == null       is false    empty2 == ""          is true
    empty3.length == 0    is true    empty3 is null       is false
    empty3 == null       is false    empty3 == ""          is true

    null1 == null      empty1 == null1   but  empty1 != null


--------------------- end of array sample ---------------------



Conclusion: D does have empty-arrays and null-arrays but the language tries  
to blur them. 

This is unfortunate as 

1) a clear separation of empty-arrays vs. null-arrays is useful for 
functional rich but simple API interfaces:

Imagine a function that returns the value of attributes of a XML-element
char[] getAttrValue(char[] name)

The attribute value could be non-existant (the attribute doesn't exist), be 
empty, or have a non-empty value.

If empty-arrays vs. null-arrays are blurred, the interface gets more bloated: 
// additional parameter
char[] getAttrValue(char[] name, out bit isNull)  
// additional function, potentially wasting a slot in the VTable
bit hasAttrValue(char[] name)
// additional indirection
Attribute getAttribute(char[] name) 


2) Initialization bugs are not detected at runtime.

D has
-null-references for objects
-null for pointers
-nan's for FP types
-invalid characters for unicode characters
-garantueed initialization of structs (Constructors are comming, soon !)
-and strong typedefs that empower the programmer to define application 
specific 'not-initialized' values for integer types 

to make an ubiquitous source of bugs, easy to spot and fix. 
But if empty/null arrays are commonly treated as being the same thing, 
uninitialized arrays will cause subtle bugs here and there.


3) This aspect of array behaviour is not obvious!

Ok, what's obvious is always a moot point. (If I knew, what's obvious, I 
would write posts about bit vs. bool vs. strong bool types.)
But I know that the array behaviour is definitely not obvious to all D/C/C++ 
programmers.


So, why doesn't anyone complain?


Farmer.

Jun 27 2004

Sean Kelly <sean f4.ca> writes:

In article <Xns9515C8A3CA1ACitsFarmer 63.105.9.61>, Farmer says...
Conclusion: D does have empty-arrays and null-arrays but the language tries  
to blur them.

Not really.  I'd rather argue that D tries to make both usable and reduce odd
errors resulting from uninitialized arrays.

This is unfortunate as 

1) a clear separation of empty-arrays vs. null-arrays is useful for 
functional rich but simple API interfaces:

Imagine a function that returns the value of attributes of a XML-element
char[] getAttrValue(char[] name)

The attribute value could be non-existant (the attribute doesn't exist), be 
empty, or have a non-empty value.

I'd say this is an interface or documentaation problem, not a language problem.

2) Initialization bugs are not detected at runtime.

This makes sense in this case.  I don't like the idea of having to distinguish
between an initialized array with no elements and an uninitialized array, as
both are equivalent IMO.  Further, setting the length property will cause a
reallocation for both types of arrays.

to make an ubiquitous source of bugs, easy to spot and fix. 
But if empty/null arrays are commonly treated as being the same thing, 
uninitialized arrays will cause subtle bugs here and there.

I believe the opposite would be true.


Sean

Jun 27 2004

Farmer <itsFarmer. freenet.de> writes:

Sean Kelly <sean f4.ca> wrote in news:cbn29h$rpo$1 digitaldaemon.com:


 Not really.  I'd rather argue that D tries to make both usable and
 reduce odd errors resulting from uninitialized arrays.

I think, D tries to *hide* errors resulting from uninitialized arrays. 

 
This is unfortunate as 

1) a clear separation of empty-arrays vs. null-arrays is useful for 
functional rich but simple API interfaces:

Imagine a function that returns the value of attributes of a XML-element
char[] getAttrValue(char[] name)

The attribute value could be non-existant (the attribute doesn't exist),
be empty, or have a non-empty value.

 
 I'd say this is an interface or documentaation problem, not a language
 problem. 

You misunderstood me, I meant that the function interface is a good one.
I could document the function like this:
/* 
   Function returns the value the attribute of the given name.
    param  name  name of the attribute
    return  returns null if the attribute doesn't exist
            returns value of the attribute otherwise
*/
char[] getAttrValue(char[] name)

But the other functions, I mentioned would be a necessary workaround if you 
couldn't distinguish between null and empty arrays. And these functions are a 
waste of both cpu cycles and developer brain. 


2) Initialization bugs are not detected at runtime.

 
 This makes sense in this case.  I don't like the idea of having to
 distinguish between an initialized array with no elements and an
 uninitialized array, as both are equivalent IMO.  Further, setting the
 length property will cause a reallocation for both types of arrays.

Well, it's quite easy to do distinquish between an empty and a null array: An 
uninitialized array (null array) is a bug in either the programmer's code or 
in the code of a library. An initialized array (empty array) is a perfectly 
legal thing.

Why is the idea to distinguish between a bug and correct programm behaviour 
such an unpleasent thing?


Reallocation occures if the length is greater than the allocated size. I'm 
fine with that, the length 'property' is such an oddity that whatever it 
does, I would call it consistent.

Reallocation is garanteed to not happen if the new length is less or equal 
the allocated size (Walter said so). Well, except when the new length happens 
to be 0. Talk about consistency.

Jun 27 2004

Derek Parnell <derek psych.ward> writes:

On Sun, 27 Jun 2004 22:55:46 +0000 (UTC), Farmer wrote:

 Sean Kelly <sean f4.ca> wrote in news:cbn29h$rpo$1 digitaldaemon.com:
 
 Not really.  I'd rather argue that D tries to make both usable and
 reduce odd errors resulting from uninitialized arrays.

 
 I think, D tries to *hide* errors resulting from uninitialized arrays. 
 
 
This is unfortunate as 

1) a clear separation of empty-arrays vs. null-arrays is useful for 
functional rich but simple API interfaces:

Imagine a function that returns the value of attributes of a XML-element
char[] getAttrValue(char[] name)

The attribute value could be non-existant (the attribute doesn't exist),
be empty, or have a non-empty value.

 
 I'd say this is an interface or documentaation problem, not a language
 problem. 

 
 You misunderstood me, I meant that the function interface is a good one.
 I could document the function like this:
 /* 
    Function returns the value the attribute of the given name.
     param  name  name of the attribute
     return  returns null if the attribute doesn't exist
             returns value of the attribute otherwise
 */
 char[] getAttrValue(char[] name)
 
 But the other functions, I mentioned would be a necessary workaround if you 
 couldn't distinguish between null and empty arrays. And these functions are a 
 waste of both cpu cycles and developer brain. 
 
2) Initialization bugs are not detected at runtime.

 
 This makes sense in this case.  I don't like the idea of having to
 distinguish between an initialized array with no elements and an
 uninitialized array, as both are equivalent IMO.  Further, setting the
 length property will cause a reallocation for both types of arrays.

 
 Well, it's quite easy to do distinquish between an empty and a null array: An 
 uninitialized array (null array) is a bug in either the programmer's code or 
 in the code of a library. An initialized array (empty array) is a perfectly 
 legal thing.


Well....the *use* of an uninitialized array it which it is assumed to be
initialized is a bug. The fact, or presence, of an uninitialized array is
itself is not really a bug. Also, the use of an empty array may well be a
bug in other circumstances, even though is it 'a legal thing'.

 Why is the idea to distinguish between a bug and correct programm behaviour 
 such an unpleasent thing?

It's not, and no one said it was. We are talking about distinguishing
between an array that has not been set to anything specific *yet*, and one
that has been set explictly though assignment, to contain zero elements.

There is a timing issue here. For example, it might be prudent in some
situations to only initialize an array if its actually going to be used.
This is a run-time decision and not a compile time decision.



-- 
Derek
Melbourne, Australia
28/Jun/04 10:44:13 AM

Jun 27 2004

"Kris" <someidiot earthlink.dot.dot.dot.net> writes:

Derek Parnell" <derek psych.ward> wrote:
 There is a timing issue here. For example, it might be prudent in some
 situations to only initialize an array if its actually going to be used.
 This is a run-time decision and not a compile time decision.

What I do to handle such issues is to check the array length only. See, even
if the array is unallocated the length is still valid (because arrays are a
pointer/length pair). If the length is zero, you move on. If not, then the
pointer *should* be valid. That is, a length-check can perform double duty.
For example:

void foo (char[] bar)
{
    if (bar.length)
        // do something
        ;
}

main ()
{
    foo (null);
}

- Kris

Jun 27 2004

Regan Heath <regan netwin.co.nz> writes:

On Sun, 27 Jun 2004 18:09:05 -0700, Kris 
<someidiot earthlink.dot.dot.dot.net> wrote:
 Derek Parnell" <derek psych.ward> wrote:
 There is a timing issue here. For example, it might be prudent in some
 situations to only initialize an array if its actually going to be used.
 This is a run-time decision and not a compile time decision.

 What I do to handle such issues is to check the array length only. See, 
 even
 if the array is unallocated the length is still valid (because arrays 
 are a
 pointer/length pair). If the length is zero, you move on. If not, then 
 the
 pointer *should* be valid. That is, a length-check can perform double 
 duty.
 For example:

 void foo (char[] bar)
 {
     if (bar.length)
         // do something
         ;
 }

 main ()
 {
     foo (null);
 }

I think Derek is thinking more of this other example he gave:

   if (a === null)
     { // Initialize it }
   else
     { if (a.length == 0)
       {
         // Empty situation. I DO NOT WANT TO INITIALIZE IT HERE!
       }
       else
       {
         // Use the non-empty array
       }
     }

The array above is initialized if it's null. Otherwise it is handled based 
on whether it has items in it.

We need to be able to tell the difference between empty and null, and it 
needs to be consistent. The inconsistencies as I see them are:

empty array == null        //true
empry array == null array  //true

whereas both should be false.

No change needs to be made to the way the length property works, as you 
say it's useful if you do not need to handle them differently.

Regan.

-- 
Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/

Jun 27 2004

Andy Friesen <andy ikagames.com> writes:

Farmer wrote:
 Why are there (almost) no complaints about D's support for empty arrays?
 
 Conclusion: D does have empty-arrays and null-arrays but the language tries  
 to blur them. 
 
 This is unfortunate ...
 
 So, why doesn't anyone complain?

I think the problem is that D arrays almost always behave like reference 
types, and therefore are almost always treated like reference types.

They aren't.  null arrays *are* empty arrays.

Arrays are value types which consist of a length and a pointer to 
memory.  Copying and slicing an array creates a brand new array whose 
data happens to (generally) be memory that is also pointed to by another 
array.

So!  Rules of thumb:

     1) think of arrays as though they are value types which can be 
cheaply copied.
     2) use .dup if you need to mutate copies made in this way. (the 
Copy-on-Write principle)

  -- andy

Jun 27 2004

Farmer <itsFarmer. freenet.de> writes:

Andy Friesen <andy ikagames.com> wrote in 
news:cbn3js$tgq$1 digitaldaemon.com:


 
 I think the problem is that D arrays almost always behave like reference 
 types, and therefore are almost always treated like reference types.

Yes, this is a problem. It is a necessary evil to archive that outstanding  
performance. But it is not really related to the topic null array vs. empty 
array, since empty arrays are possible with the D array layout

 
 They aren't.  null arrays *are* empty arrays.

No, null arrays are not empty arrays, as my sample proofs.

 
 Arrays are value types which consist of a length and a pointer to 
 memory.  Copying and slicing an array creates a brand new array whose 
 data happens to (generally) be memory that is also pointed to by another 
 array.

I think there's a lapsus, slices *always* point to the same memory as the 
array from which they were created.


Regards,
   Farmer.

Jun 27 2004

Andy Friesen <andy ikagames.com> writes:

Farmer wrote:

 Andy Friesen <andy ikagames.com> wrote in 
 news:cbn3js$tgq$1 digitaldaemon.com:
 
I think the problem is that D arrays almost always behave like reference 
types, and therefore are almost always treated like reference types.

 
 Yes, this is a problem. It is a necessary evil to archive that outstanding  
 performance. But it is not really related to the topic null array vs. empty 
 array, since empty arrays are possible with the D array layout

Sure, in the same sense that D allows 'empty' integers. :)

They aren't.  null arrays *are* empty arrays.

 
 No, null arrays are not empty arrays, as my sample proofs.

Conceptually they are.  If the length is zero, then the data pointer is 
meaningless.  Testing the data pointer in such a case can be likened to 
using the result of a division by zero.  Doing things like 
mathematically 'proving' that 3==5 or that empty!==null is easy when you 
go into the twilight zone. :)

As an example:

     import std.string;

     char[] permute(char[] c) {
         // mutate that to which the array refers
         c[0] = 'H';
         // mutate the array
         c.length = 4;
         return c;
     }

     int main() {
         char[] c = "hello world!";
         printf("%s\n", toStringz(c));

         char[] d = permute(c);

         printf("Post-permute\n");
         printf("%s\n", toStringz(c));
         printf("%s\n", toStringz(d));
         return 0;
     }

This program produces the output:

	hello world!
	Hello world!
	Hell

The array is a value type.  The data it points to is not.

Arrays are value types which consist of a length and a pointer to 
memory.  Copying and slicing an array creates a brand new array whose 
data happens to (generally) be memory that is also pointed to by another 
array.

 
 I think there's a lapsus, slices *always* point to the same memory as the 
 array from which they were created.

In my experience, this is true, but I don't know if it *must*, so I felt 
obligated to qualify my statement.

  -- andy

Jun 27 2004

Derek Parnell <derek psych.ward> writes:

On Sun, 27 Jun 2004 17:02:27 -0700, Andy Friesen wrote:

 Farmer wrote:
 
 Andy Friesen <andy ikagames.com> wrote in 
 news:cbn3js$tgq$1 digitaldaemon.com:
 
I think the problem is that D arrays almost always behave like reference 
types, and therefore are almost always treated like reference types.

 
 Yes, this is a problem. It is a necessary evil to archive that outstanding  
 performance. But it is not really related to the topic null array vs. empty 
 array, since empty arrays are possible with the D array layout

 Sure, in the same sense that D allows 'empty' integers. :)

They aren't.  null arrays *are* empty arrays.

 
 No, null arrays are not empty arrays, as my sample proofs.

 Conceptually they are.  If the length is zero, then the data pointer is 
 meaningless.  Testing the data pointer in such a case can be likened to 
 using the result of a division by zero.  Doing things like 
 mathematically 'proving' that 3==5 or that empty!==null is easy when you 
 go into the twilight zone. :)

Huh? There are times when a zero-length array is valid and an uninitalized
array is not valid. There are simply not the same thing.

  if (a === null)
    { // Initialize it }
  else
    { if (a.length == 0) 
      {
        // Empty situation. I DO NOT WANT TO INITIALIZE IT HERE!
      }
      else
      {
        // Use the non-empty array
      }
    }

 
 As an example:
 
      import std.string;
 
      char[] permute(char[] c) {
          // mutate that to which the array refers
          c[0] = 'H';
          // mutate the array
          c.length = 4;
          return c;
      }
 
      int main() {
          char[] c = "hello world!";
          printf("%s\n", toStringz(c));
 
          char[] d = permute(c);
 
          printf("Post-permute\n");
          printf("%s\n", toStringz(c));
          printf("%s\n", toStringz(d));
          return 0;
      }
 
 This program produces the output:
 
 	hello world!
 	Hello world!
 	Hell
 
 The array is a value type.  The data it points to is not.
 
Arrays are value types which consist of a length and a pointer to 
memory.  Copying and slicing an array creates a brand new array whose 
data happens to (generally) be memory that is also pointed to by another 
array.

 
 I think there's a lapsus, slices *always* point to the same memory as the 
 array from which they were created.

 In my experience, this is true, but I don't know if it *must*, so I felt 
 obligated to qualify my statement.

Yes, it could be an artifact of the D compiler rather than the D language.

-- 
Derek
Melbourne, Australia
28/Jun/04 10:51:51 AM

Jun 27 2004

Regan Heath <regan netwin.co.nz> writes:

On Sun, 27 Jun 2004 17:02:27 -0700, Andy Friesen <andy ikagames.com> wrote:
 Farmer wrote:

 Andy Friesen <andy ikagames.com> wrote in 
 news:cbn3js$tgq$1 digitaldaemon.com:

 I think the problem is that D arrays almost always behave like 
 reference types, and therefore are almost always treated like 
 reference types.

 Yes, this is a problem. It is a necessary evil to archive that 
 outstanding  performance. But it is not really related to the topic 
 null array vs. empty array, since empty arrays are possible with the D 
 array layout


 Sure, in the same sense that D allows 'empty' integers. :)

D allows both empty arrays *and* null arrays.
It does *not* allow both empty *and* null integers.
They are different and not comparable.

 They aren't.  null arrays *are* empty arrays.

 No, null arrays are not empty arrays, as my sample proofs.


 Conceptually they are. If the length is zero, then the data pointer is 
 meaningless.

I disagree. Conceptually they aren't the same, as both my example and 
'Farmers' have proven for the case of a char array. Even with other array 
types there is still a conceptual difference between an array that does 
not exist and one containing no elements. In a large number of real world 
cases you would treat the 2 the same, but that does not make them the 
same, and is no reason to preclude the ability to treat them differently.

Even in D's implementation they aren't exactly the same, consider:

0) char[] a;
1) char[] b = "regan";
2) b = "";
3) b = null;

at 0 a's data pointer is null and length is zero
at 1 b's data pointer is non-null and length is 5
at 2 b's data pointer is non-null and length is 0

I am not 100% certain what happens at 3, either:
   at 3 b's data pointer is null and length is 0
or
   at 3 b's data pointer is non-null and length is 0

in either case 'a' (the null array) is not the same as 'b' when it is an 
empty array, and may not be even when 'b' is a null array.

 Testing the data pointer in such a case can be likened to using the 
 result of a division by zero. Doing things like mathematically 'proving' 
 that 3==5 or that empty!==null is easy when you go into the twilight 
 zone. :)

 As an example:

      import std.string;

      char[] permute(char[] c) {
          // mutate that to which the array refers
          c[0] = 'H';
          // mutate the array
          c.length = 4;
          return c;
      }

      int main() {
          char[] c = "hello world!";
          printf("%s\n", toStringz(c));

          char[] d = permute(c);

          printf("Post-permute\n");
          printf("%s\n", toStringz(c));
          printf("%s\n", toStringz(d));
          return 0;
      }

 This program produces the output:

 	hello world!
 	Hello world!
 	Hell

 The array is a value type.  The data it points to is not.

 Arrays are value types which consist of a length and a pointer to 
 memory.  Copying and slicing an array creates a brand new array whose 
 data happens to (generally) be memory that is also pointed to by 
 another array.

 I think there's a lapsus, slices *always* point to the same memory as 
 the array from which they were created.

 In my experience, this is true, but I don't know if it *must*, so I felt 
 obligated to qualify my statement.

The simple fact remains that we require both null strings (and possibly 
other arrays) and empty strings and that conceptually they are different, 
or rather they can mean different things and/or demand different behaviour.

All I'm advocating is that test for null to not compare true for an empty 
array, and thus a null array and an empty array not to compare equal.

Regan.

-- 
Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/

Jun 27 2004

Andy Friesen <andy ikagames.com> writes:

Regan Heath wrote:
 Yes, this is a problem. It is a necessary evil to archive that 
 outstanding  performance. But it is not really related to the topic 
 null array vs. empty array, since empty arrays are possible with the 
 D array layout


 
 Sure, in the same sense that D allows 'empty' integers. :)

 
 D allows both empty arrays *and* null arrays.
 It does *not* allow both empty *and* null integers.
 They are different and not comparable.
 

D arrays are implement exactly so:

	struct Array {
	    int length;
	    void* data;
	}

	Array a; // value type
	int i; // value type

'i' will never be null, and 'a' never will either, because both types 
exist on the stack.  'a' can be *compared* to null because an implicit 
pointer conversion is performed.  However, if 'a' does not contain any 
data, its pointer value is meaningless, so the result of such a 
comparison is undefined.  Either way, 'a' itself is *not* null any more 
than 'i' ever could be.

(I'm not saying that this is how it should be, I'm just saying that this 
is how it is)

 They aren't.  null arrays *are* empty arrays.

 No, null arrays are not empty arrays, as my sample proofs.


 
 Conceptually they are. If the length is zero, then the data pointer is 
 meaningless.

 
 I disagree. Conceptually they aren't the same, as both my example and 
 'Farmers' have proven for the case of a char array. Even with other 
 array types there is still a conceptual difference between an array that 
 does not exist and one containing no elements. In a large number of real 
 world cases you would treat the 2 the same, but that does not make them 
 the same, and is no reason to preclude the ability to treat them 
 differently.

This goes back to D performing implicit pointer conversion.  Comparing 
arrays with null is not a good idea.

 The simple fact remains that we require both null strings (and possibly 
 other arrays) and empty strings and that conceptually they are 
 different, or rather they can mean different things and/or demand 
 different behaviour.
 
 All I'm advocating is that test for null to not compare true for an 
 empty array, and thus a null array and an empty array not to compare equal.

I'm still forming an opinion on whether this is the right thing to do or 
not.  If comparing arrays with pointers was illegal, this issue would 
never arise.

As for testing existence against emptiness, I suggest you do the same 
thing you would for an integer (or any other value type) for which nil 
and zero/empty/T.init must be distinguishable.

  -- andy

Jun 27 2004

Regan Heath <regan netwin.co.nz> writes:

On Sun, 27 Jun 2004 19:15:10 -0700, Andy Friesen <andy ikagames.com> wrote:
 Regan Heath wrote:
 Yes, this is a problem. It is a necessary evil to archive that 
 outstanding  performance. But it is not really related to the topic 
 null array vs. empty array, since empty arrays are possible with the 
 D array layout


 Sure, in the same sense that D allows 'empty' integers. :)

 D allows both empty arrays *and* null arrays.
 It does *not* allow both empty *and* null integers.
 They are different and not comparable.

 D arrays are implement exactly so:

 	struct Array {
 	    int length;
 	    void* data;
 	}

 	Array a; // value type
 	int i; // value type

 'i' will never be null, and 'a' never will either, because both types 
 exist on the stack.  'a' can be *compared* to null because an implicit 
 pointer conversion is performed.  However, if 'a' does not contain any 
 data, its pointer value is meaningless, so the result of such a 
 comparison is undefined.  Either way, 'a' itself is *not* null any more 
 than 'i' ever could be.

 (I'm not saying that this is how it should be, I'm just saying that this 
 is how it is)

I see what you're saying... the internal data pointer for the array can be 
null or non-null however, this is the difference between an un-initialized 
(or null) array and an empty one.

I dont care how we do it, I just know we need to be able to tell the 
difference for 'strings'. Perhaps this applies to all arrays. Perhaps 
strings need to be a specialized form of array...

 They aren't.  null arrays *are* empty arrays.

 No, null arrays are not empty arrays, as my sample proofs.


 Conceptually they are. If the length is zero, then the data pointer is 
 meaningless.

 I disagree. Conceptually they aren't the same, as both my example and 
 'Farmers' have proven for the case of a char array. Even with other 
 array types there is still a conceptual difference between an array 
 that does not exist and one containing no elements. In a large number 
 of real world cases you would treat the 2 the same, but that does not 
 make them the same, and is no reason to preclude the ability to treat 
 them differently.

 This goes back to D performing implicit pointer conversion.  Comparing 
 arrays with null is not a good idea.

Perhaps not, but, there is currently no other way to tell the difference 
between an empty string and a null string. This is very important.

 The simple fact remains that we require both null strings (and possibly 
 other arrays) and empty strings and that conceptually they are 
 different, or rather they can mean different things and/or demand 
 different behaviour.

 All I'm advocating is that test for null to not compare true for an 
 empty array, and thus a null array and an empty array not to compare 
 equal.

 I'm still forming an opinion on whether this is the right thing to do or 
 not.  If comparing arrays with pointers was illegal, this issue would 
 never arise.

True, but then you wouldn't be able to tell null strings from empty ones.

 As for testing existence against emptiness, I suggest you do the same 
 thing you would for an integer (or any other value type) for which nil 
 and zero/empty/T.init must be distinguishable.

I suspect an arrays .init parameter *is* null. in which case

   uint[] c;
   if (c == c.init)

is equvalent to

   if (c == null)


I was just recently told by Walter not to use the init value of an array.
I was trying to re-init the array, i.e.

uint[4] c = [0,1,2,3];

c = c.init
c[] = c.init;
c[] = c[].init;

none of those work. Walters soln...

static uint[4] cinit = [0,1,2,3];
uint[4] c;

c[] = cinit[];


Why can't .init do this implicitly? For my original example it would 
create one static array, and my array called 'c' then set c.init to the 
static array, so that

c = c.init;

would work. For an array that is not initialized c.init can stay null as

c = c.init;

would then be equivalent to

c = null;


Regan.

-- 
Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/

Jun 27 2004

Andy Friesen <andy ikagames.com> writes:

Regan Heath wrote:

 I see what you're saying... the internal data pointer for the array can 
 be null or non-null however, this is the difference between an 
 un-initialized (or null) array and an empty one.
 
 I dont care how we do it, I just know we need to be able to tell the 
 difference for 'strings'. Perhaps this applies to all arrays. Perhaps 
 strings need to be a specialized form of array...

You say that as though it is self-evident that strings must absolutely, 
unequivocably be, at all costs, reference types.  Why?

C++ containers cannot represent null either.  D will (and does) get 
along just fine if its array type works the same way.

 This goes back to D performing implicit pointer conversion.  Comparing 
 arrays with null is not a good idea.

 
 Perhaps not, but, there is currently no other way to tell the difference 
 between an empty string and a null string. This is very important.

A 'null array' is a completely arbitrary concept that has been 
extrapolated from undefined behaviour. :)

(check the documentation concerning arrays.  Nowhere does the concept of 
a null array appear.  The only place the keyword 'null' even occurs is a 
blip which says that arrays are initialized with their data pointer set 
to null)

 I'm still forming an opinion on whether this is the right thing to do 
 or not.  If comparing arrays with pointers was illegal, this issue 
 would never arise.

 
 True, but then you wouldn't be able to tell null strings from empty ones.

Because there is no such thing.  As far as D is concerned, all arrays 
exist.  Some contain elements, others don't.  Whether its data pointer 
is null or not does not set it apart from any other empty array.

  -- andy

Jun 28 2004

Regan Heath <regan netwin.co.nz> writes:

On Mon, 28 Jun 2004 12:50:08 -0700, Andy Friesen <andy ikagames.com> wrote:
 Regan Heath wrote:

 I see what you're saying... the internal data pointer for the array can 
 be null or non-null however, this is the difference between an 
 un-initialized (or null) array and an empty one.

 I dont care how we do it, I just know we need to be able to tell the 
 difference for 'strings'. Perhaps this applies to all arrays. Perhaps 
 strings need to be a specialized form of array...

 You say that as though it is self-evident that strings must absolutely, 
 unequivocably be, at all costs, reference types.  Why?

If it's not a reference type, then how can you signal non-existance (null)?

 C++ containers cannot represent null either.  D will (and does) get 
 along just fine if its array type works the same way.

I have not used C++ containers. I program in C for a living, and C++ for a 
hobby. Is there a C++ container for strings that cannot tell the 
difference between non-existant and empty?

 This goes back to D performing implicit pointer conversion.  Comparing 
 arrays with null is not a good idea.

 Perhaps not, but, there is currently no other way to tell the 
 difference between an empty string and a null string. This is very 
 important.

 A 'null array' is a completely arbitrary concept that has been 
 extrapolated from undefined behaviour. :)

It may be undefined, but I believe it is required.

 (check the documentation concerning arrays.  Nowhere does the concept of 
 a null array appear.  The only place the keyword 'null' even occurs is a 
 blip which says that arrays are initialized with their data pointer set 
 to null)

So it's undefined, lets define it.

 I'm still forming an opinion on whether this is the right thing to do 
 or not.  If comparing arrays with pointers was illegal, this issue 
 would never arise.

 True, but then you wouldn't be able to tell null strings from empty 
 ones.

 Because there is no such thing.

Yes there is. The concept exists, in C and in our examples.

 As far as D is concerned, all arrays exist.  Some contain elements, 
 others don't.  Whether its data pointer is null or not does not set it 
 apart from any other empty array.

Yes it does. This behaviour exists, it's just currently undefined (as you 
say) and inconsistent (as Farmer has pointed out).

The soln IMO is either to make the current behaviour official and 
consistent, or to change the behaviour, make that official and provide 
another way to tell null apart from an empty string.

Regan.

-- 
Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/

Jun 28 2004

Andy Friesen <andy ikagames.com> writes:

Regan Heath wrote:
 On Mon, 28 Jun 2004 12:50:08 -0700, Andy Friesen <andy ikagames.com> wrote:
 You say that as though it is self-evident that strings must 
 absolutely, unequivocably be, at all costs, reference types.  Why?

 If it's not a reference type, then how can you signal non-existance (null)?

You don't.

 I have not used C++ containers. I program in C for a living, and C++ for 
 a hobby. Is there a C++ container for strings that cannot tell the 
 difference between non-existant and empty?

Yeah, it's called std::string, and it's more or less the default.

 A 'null array' is a completely arbitrary concept that has been 
 extrapolated from undefined behaviour. :)

 It may be undefined, but I believe it is required.

Why?  C++ gets along without them just fine, and every C derivant I know 
of gets along fine without allowing primitive type returns to signify 
nonexistence.

Functions which returns structs cannot return null either.

 The soln IMO is either to make the current behaviour official and 
 consistent, or to change the behaviour, make that official and provide 
 another way to tell null apart from an empty string.

Farmer's test reports pretty consistent results if you suppose that 
comparing arrays to null is ill-formed:

     empty1.length == 0    is true
     empty1 == ""          is true
     empty2.length == 0    is true
     empty2 == ""          is true
     empty3.length == 0    is true
     empty3 == ""          is true

Don't compare arrays to null.  Don't try to differentiate between empty 
and nonexistent.  D arrays simply do not work that way.

  -- andy

Jun 28 2004

Derek Parnell <derek psych.ward> writes:

On Mon, 28 Jun 2004 16:33:25 -0700, Andy Friesen wrote:

 Regan Heath wrote:
 On Mon, 28 Jun 2004 12:50:08 -0700, Andy Friesen <andy ikagames.com> wrote:
 You say that as though it is self-evident that strings must 
 absolutely, unequivocably be, at all costs, reference types.  Why?

 If it's not a reference type, then how can you signal non-existance (null)?

 
 You don't.
 
 I have not used C++ containers. I program in C for a living, and C++ for 
 a hobby. Is there a C++ container for strings that cannot tell the 
 difference between non-existant and empty?

 
 Yeah, it's called std::string, and it's more or less the default.
 
 A 'null array' is a completely arbitrary concept that has been 
 extrapolated from undefined behaviour. :)

 It may be undefined, but I believe it is required.

 
 Why?  C++ gets along without them just fine, and every C derivant I know 
 of gets along fine without allowing primitive type returns to signify 
 nonexistence.
 
 Functions which returns structs cannot return null either.
 
 The soln IMO is either to make the current behaviour official and 
 consistent, or to change the behaviour, make that official and provide 
 another way to tell null apart from an empty string.

 
 Farmer's test reports pretty consistent results if you suppose that 
 comparing arrays to null is ill-formed:
 
      empty1.length == 0    is true
      empty1 == ""          is true
      empty2.length == 0    is true
      empty2 == ""          is true
      empty3.length == 0    is true
      empty3 == ""          is true
 
 Don't compare arrays to null.  Don't try to differentiate between empty 
 and nonexistent.  D arrays simply do not work that way.
 
   -- andy

Agreed, D doesn't seem to work that way, but isn't that the issue. Some
people would like to distinguish between an uninitialized array, and an
initialized but empty array.

-- 
Derek
Melbourne, Australia
29/Jun/04 10:44:05 AM

Jun 28 2004

"Bent Rasmussen" <exo bent-rasmussen.info> writes:

 Don't compare arrays to null.  Don't try to differentiate between empty
 and nonexistent.  D arrays simply do not work that way.

I must say, I kind of like that. I don't have to write a read/write property
where the write property has an in/out contract to guard against
internal/external code setting an array member field to null -- goodbye
bloat!

Jun 28 2004

Regan Heath <regan netwin.co.nz> writes:

On Mon, 28 Jun 2004 16:33:25 -0700, Andy Friesen <andy ikagames.com> wrote:
 Regan Heath wrote:
 On Mon, 28 Jun 2004 12:50:08 -0700, Andy Friesen <andy ikagames.com> 
 wrote:
 You say that as though it is self-evident that strings must 
 absolutely, unequivocably be, at all costs, reference types.  Why?

 If it's not a reference type, then how can you signal non-existance 
 (null)?

 You don't.

Thought so..

 I have not used C++ containers. I program in C for a living, and 
 C++ for a hobby. Is there a C++ container for strings that cannot tell 
 the difference between non-existant and empty?

 Yeah, it's called std::string, and it's more or less the default.

And it's crap. IMNSHO.

 A 'null array' is a completely arbitrary concept that has been 
 extrapolated from undefined behaviour. :)

 It may be undefined, but I believe it is required.

 Why?  C++ gets along without them just fine, and every C derivant I know 
 of gets along fine without allowing primitive type returns to signify 
 nonexistence.

 Functions which returns structs cannot return null either.

Thus why just about no-one ever does this (in C). They all return a 
pointer to a struct.

 The soln IMO is either to make the current behaviour official and 
 consistent, or to change the behaviour, make that official and provide 
 another way to tell null apart from an empty string.

 Farmer's test reports pretty consistent results if you suppose that 
 comparing arrays to null is ill-formed:

      empty1.length == 0    is true
      empty1 == ""          is true
      empty2.length == 0    is true
      empty2 == ""          is true
      empty3.length == 0    is true
      empty3 == ""          is true

 Don't compare arrays to null.  Don't try to differentiate between empty 
 and nonexistent.

Fine and dandy EXCEPT we *need* to differentiate between empty and 
non-existant strings.

 D arrays simply do not work that way.

In that case we need an array specialisation for strings, so I'll have to 
write my own. This defeats the purpose of char[] in the first place, which 
was, to be a better more consistent  string handling method than in 
possible in c/c++.

Regan.

-- 
Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/

Jun 28 2004

Andy Friesen <andy ikagames.com> writes:

Regan Heath wrote:
 ... we need an array specialisation for strings, so I'll have 
 to write my own. This defeats the purpose of char[] in the first place, 
 which was, to be a better more consistent  string handling method than 
 in possible in c/c++.

That would work, but it might be better to adjust your thinking to match 
the language instead of trying to shoehorn the way you're used to 
thinking onto an abstraction that clearly wasn't built for it.  Don't 
think in Java/C++/etc.  Think in D. :)

  -- andy

Jun 28 2004

Regan Heath <regan netwin.co.nz> writes:

On Mon, 28 Jun 2004 22:54:23 -0700, Andy Friesen <andy ikagames.com> wrote:

 Regan Heath wrote:
 ... we need an array specialisation for strings, so I'll have to write 
 my own. This defeats the purpose of char[] in the first place, which 
 was, to be a better more consistent  string handling method than in 
 possible in c/c++.

 That would work, but it might be better to adjust your thinking to match 
 the language instead of trying to shoehorn the way you're used to 
 thinking onto an abstraction that clearly wasn't built for it.  Don't 
 think in Java/C++/etc.  Think in D. :)

You may be right, so in an effort to change my thinking, pls consider 
this...

struct Item {
	char[] label;
	char[] value;
}

class Post {
	Item[] items;

	char[] getValue(char[] label)
	{
		foreach(Item i; items)
		{
			if (item.label == label)
				return item.value;
		}
		//return null; not allowed
		return "";
	}
}

Web page...

<form post.. >
<input type="text" name="foo" value="">
<input type="text" name="bar" value="">
</form>

Code to do something with the post.

char[] s;
Post p;

s = p.getValue("foo");
if (s) ..
s = p.getValue("bar");
if (s) ..

Right...

If I cannot return null, then (using the code above) I cannot tell the 
difference between whether foo or bar was passed or had an empty value.

So I have to add a function, something like

class Post {
	bool isPresent(char[] label)
	{
		foreach(Item i; items)
		{
			if (item.label == label)
				return true;
		}
		return false;
	}
}

and in my code..

if (p.isPresent("foo")) {
	s = p.getValue("foo");
	..
}

looks more complex. In addition I am searching for the label/value twice, 
doing twice the work.

To avoid that I can add a parameter to the getValue function i.e.

class Post {
	char[] getValue(char[] label, out bool isNull)
	{
		foreach(Item i; items)
		{
			if (item.label == label)
				return item.value;
		}
		//return null; not allowed
		isNull = true;
		return "";
	}
}

then my code looks like...

char[] s;
bool isn;

s = p.getValue("foo",isn);
if (!isn) {
}

more complex code again, less obvious, a 3rd option springs to mind, 
instead of returning a char[] from getValue I could return existance and 
fill a passed char[] i.e.

class Post {
	bool getValue(char[] label, out char[] value)
	{
		foreach(Item i; items)
		{
			if (item.label == label)
			{
				value = item.value;
				return true;
			}
		}
		return false;
	}
}

so my code now looks like...

char[] s;

if (getValue("foo",s)) {
}

this is perhaps the best soln so far. But! lets consider if this were 
extended to get 2 or more char[] values, (this is perfectly 
reasonable/likely, say they are loaded from a file, why process the file 
twice when you can do so once and get both values).

bool getValue(out char[] val1, out char[] val2)
{
}

what do we return if val1 exists but val2 does not? a set of flags? yuck.

It just seems to me, that all this is done to emulate a reference type.. 
so why not have a reference type?

We already have one, all it would take to make it consistent is 2 minor 
changes.

If you have a solution to the above that is both as simple, elegant and 
easy to code as being able to return null.. pls educate me.

Regan.

-- 
Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/

Jun 29 2004

Charlie <Charlie_member pathlink.com> writes:

---

s = p.getValue("foo");
if (s.length) 

---

Whats wrong with this way ? 

Charlie

In article <opsadsu8f75a2sq9 digitalmars.com>, Regan Heath says...
On Mon, 28 Jun 2004 22:54:23 -0700, Andy Friesen <andy ikagames.com> wrote:

 Regan Heath wrote:
 ... we need an array specialisation for strings, so I'll have to write 
 my own. This defeats the purpose of char[] in the first place, which 
 was, to be a better more consistent  string handling method than in 
 possible in c/c++.

 That would work, but it might be better to adjust your thinking to match 
 the language instead of trying to shoehorn the way you're used to 
 thinking onto an abstraction that clearly wasn't built for it.  Don't 
 think in Java/C++/etc.  Think in D. :)

You may be right, so in an effort to change my thinking, pls consider 
this...

struct Item {
	char[] label;
	char[] value;
}

class Post {
	Item[] items;

	char[] getValue(char[] label)
	{
		foreach(Item i; items)
		{
			if (item.label == label)
				return item.value;
		}
		//return null; not allowed
		return "";
	}
}

Web page...

<form post.. >
<input type="text" name="foo" value="">
<input type="text" name="bar" value="">
</form>

Code to do something with the post.

char[] s;
Post p;

s = p.getValue("foo");
if (s) ..
s = p.getValue("bar");
if (s) ..

Right...

If I cannot return null, then (using the code above) I cannot tell the 
difference between whether foo or bar was passed or had an empty value.

So I have to add a function, something like

class Post {
	bool isPresent(char[] label)
	{
		foreach(Item i; items)
		{
			if (item.label == label)
				return true;
		}
		return false;
	}
}

and in my code..

if (p.isPresent("foo")) {
	s = p.getValue("foo");
	..
}

looks more complex. In addition I am searching for the label/value twice, 
doing twice the work.

To avoid that I can add a parameter to the getValue function i.e.

class Post {
	char[] getValue(char[] label, out bool isNull)
	{
		foreach(Item i; items)
		{
			if (item.label == label)
				return item.value;
		}
		//return null; not allowed
		isNull = true;
		return "";
	}
}

then my code looks like...

char[] s;
bool isn;

s = p.getValue("foo",isn);
if (!isn) {
}

more complex code again, less obvious, a 3rd option springs to mind, 
instead of returning a char[] from getValue I could return existance and 
fill a passed char[] i.e.

class Post {
	bool getValue(char[] label, out char[] value)
	{
		foreach(Item i; items)
		{
			if (item.label == label)
			{
				value = item.value;
				return true;
			}
		}
		return false;
	}
}

so my code now looks like...

char[] s;

if (getValue("foo",s)) {
}

this is perhaps the best soln so far. But! lets consider if this were 
extended to get 2 or more char[] values, (this is perfectly 
reasonable/likely, say they are loaded from a file, why process the file 
twice when you can do so once and get both values).

bool getValue(out char[] val1, out char[] val2)
{
}

what do we return if val1 exists but val2 does not? a set of flags? yuck.

It just seems to me, that all this is done to emulate a reference type.. 
so why not have a reference type?

We already have one, all it would take to make it consistent is 2 minor 
changes.

If you have a solution to the above that is both as simple, elegant and 
easy to code as being able to return null.. pls educate me.

Regan.

-- 
Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/

Jun 29 2004

Regan Heath <regan netwin.co.nz> writes:

On Wed, 30 Jun 2004 00:52:17 +0000 (UTC), Charlie 
<Charlie_member pathlink.com> wrote:

 ---

 s = p.getValue("foo");
 if (s.length)

 ---

 Whats wrong with this way ?

an empty char[] has a length of 0.
the above would not see an empty value passed in a form.

Regan.

 Charlie

 In article <opsadsu8f75a2sq9 digitalmars.com>, Regan Heath says...
 On Mon, 28 Jun 2004 22:54:23 -0700, Andy Friesen <andy ikagames.com> 
 wrote:

 Regan Heath wrote:
 ... we need an array specialisation for strings, so I'll have to write
 my own. This defeats the purpose of char[] in the first place, which
 was, to be a better more consistent  string handling method than in
 possible in c/c++.

 That would work, but it might be better to adjust your thinking to 
 match
 the language instead of trying to shoehorn the way you're used to
 thinking onto an abstraction that clearly wasn't built for it.  Don't
 think in Java/C++/etc.  Think in D. :)

 You may be right, so in an effort to change my thinking, pls consider
 this...

 struct Item {
 	char[] label;
 	char[] value;
 }

 class Post {
 	Item[] items;

 	char[] getValue(char[] label)
 	{
 		foreach(Item i; items)
 		{
 			if (item.label == label)
 				return item.value;
 		}
 		//return null; not allowed
 		return "";
 	}
 }

 Web page...

 <form post.. >
 <input type="text" name="foo" value="">
 <input type="text" name="bar" value="">
 </form>

 Code to do something with the post.

 char[] s;
 Post p;

 s = p.getValue("foo");
 if (s) ..
 s = p.getValue("bar");
 if (s) ..

 Right...

 If I cannot return null, then (using the code above) I cannot tell the
 difference between whether foo or bar was passed or had an empty value.

 So I have to add a function, something like

 class Post {
 	bool isPresent(char[] label)
 	{
 		foreach(Item i; items)
 		{
 			if (item.label == label)
 				return true;
 		}
 		return false;
 	}
 }

 and in my code..

 if (p.isPresent("foo")) {
 	s = p.getValue("foo");
 	..
 }

 looks more complex. In addition I am searching for the label/value 
 twice,
 doing twice the work.

 To avoid that I can add a parameter to the getValue function i.e.

 class Post {
 	char[] getValue(char[] label, out bool isNull)
 	{
 		foreach(Item i; items)
 		{
 			if (item.label == label)
 				return item.value;
 		}
 		//return null; not allowed
 		isNull = true;
 		return "";
 	}
 }

 then my code looks like...

 char[] s;
 bool isn;

 s = p.getValue("foo",isn);
 if (!isn) {
 }

 more complex code again, less obvious, a 3rd option springs to mind,
 instead of returning a char[] from getValue I could return existance and
 fill a passed char[] i.e.

 class Post {
 	bool getValue(char[] label, out char[] value)
 	{
 		foreach(Item i; items)
 		{
 			if (item.label == label)
 			{
 				value = item.value;
 				return true;
 			}
 		}
 		return false;
 	}
 }

 so my code now looks like...

 char[] s;

 if (getValue("foo",s)) {
 }

 this is perhaps the best soln so far. But! lets consider if this were
 extended to get 2 or more char[] values, (this is perfectly
 reasonable/likely, say they are loaded from a file, why process the file
 twice when you can do so once and get both values).

 bool getValue(out char[] val1, out char[] val2)
 {
 }

 what do we return if val1 exists but val2 does not? a set of flags? 
 yuck.

 It just seems to me, that all this is done to emulate a reference type..
 so why not have a reference type?

 We already have one, all it would take to make it consistent is 2 minor
 changes.

 If you have a solution to the above that is both as simple, elegant and
 easy to code as being able to return null.. pls educate me.

 Regan.

 --
 Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/




-- 
Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/

Jun 29 2004

Andy Friesen <andy ikagames.com> writes:

Regan Heath wrote:

 ... I could return existance and
 fill a passed char[]...  so my code now looks like...
 
 char[] s;
 if (getValue("foo",s))

I like this.  It's simple and obvious.

 if this were extended to get 2 or more char[] values...
 bool getValue(out char[] val1, out char[] val2) {}

In this case, I would say that the best thing to do on failure is to 
throw an exception.  Asking for a number of values all at once looks (to 
me, anyhow) to be implying that you expect them all to be present.  If 
you don't, you'll have to test them all individually at some point 
anyway, in which case the previous form allows you to test and retrieve 
in one step.

It may also be useful to return all the attributes as an associative 
array.  They're easy to mutate and iterate through.

 It just seems to me, that all this is done to emulate a reference type.. 
 so why not have a reference type?

You got me there, but it seems to me that things could get very weird if 
you need to express a non-null array of 0 length.

 If you have a solution to the above that is both as simple, elegant and 
 easy to code as being able to return null.. pls educate me.

Exposing POST data as an associative array seems like a win to me; it's 
faster and can can be iterated over conveniently.  Also, as a language 
intrinsic, it's a bit more likely to plug into other APIs easily.

If you *really* need to, you could probably get away with doing 
something like:

     const char[] nadda = "nadda";
     if (s is not nadda) { ... }

  -- andy

Jun 29 2004

Regan Heath <regan netwin.co.nz> writes:

On Tue, 29 Jun 2004 19:26:22 -0700, Andy Friesen <andy ikagames.com> wrote:
 Regan Heath wrote:

 ... I could return existance and
 fill a passed char[]...  so my code now looks like...

 char[] s;
 if (getValue("foo",s))

 I like this.  It's simple and obvious.

I agree.

 if this were extended to get 2 or more char[] values...
 bool getValue(out char[] val1, out char[] val2) {}

 In this case, I would say that the best thing to do on failure is to 
 throw an exception. Asking for a number of values all at once looks (to 
 me, anyhow) to be implying that you expect them all to be present.

Nope. This is taken from a real life example, I have a config file with 10 
different settings, all optional, I want 3 or them at this point in the 
code, so I process the file once and load the 3 settings which may or may 
not be present, and may or may not have a zero length values.

 If you don't, you'll have to test them all individually at some point 
 anyway

Yes, at that point I need to be able to tell if the setting was present, 
present with zero length value, or not present at all.

 , in which case the previous form allows you to test and retrieve in one 
 step.

Which previous form? do you mean the one that takes only one parameter, if 
so, that would involve parsing the file 3 times, not acceptable.

 It may also be useful to return all the attributes as an associative 
 array.  They're easy to mutate and iterate through.

It's the same problem all over again, say I have:

char[char[]] list;
char[] s1,s2,s3;

fn(list);
s1 = list["setting1"];
s2 = list["setting2"];
s3 = list["setting3"];

s needs to be null for setting3, empty for setting2 and "foobar" for 
setting1.

I believe this is currently the case, but!, as Farmer has shown if I then 
went

if (s2 == s3) //this would evaluate to true

and that's a problem.

 It just seems to me, that all this is done to emulate a reference 
 type.. so why not have a reference type?

 You got me there, but it seems to me that things could get very weird if 
 you need to express a non-null array of 0 length.

char[] s = ""

s is a non-null array of 0 length.

 If you have a solution to the above that is both as simple, elegant and 
 easy to code as being able to return null.. pls educate me.

 Exposing POST data as an associative array seems like a win to me;

I agree, it's a more D thing to do also :)
I believe the same problem still applies (see above)

 it's faster and can can be iterated over conveniently.  Also, as a 
 language intrinsic, it's a bit more likely to plug into other APIs 
 easily.

 If you *really* need to, you could probably get away with doing 
 something like:

      const char[] nadda = "nadda";
      if (s is not nadda) { ... }

True, but this is yucky and what if a setting actually had a value of 
"nadda"?

Regan.

-- 
Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/

Jun 29 2004

Andy Friesen <andy ikagames.com> writes:

Regan Heath wrote:
 This is taken from a real life example, I have a config file with 
 10 different settings, all optional, I want 3 or them at this point in 
 the code, so I process the file once and load the 3 settings which may 
 or may not be present, and may or may not have a zero length values.

I guess it's just a matter of preference.  I don't have a problem with 
something like this:

     char[][char[]] attribs = ...;

     if ("a" in attribs && "b" in attribs && "c" in attribs) {

If nonexistence is an alias for some default, fill the array before 
parsing the file.  Attributes that are present will override those which 
are not.

Python offers a get() method which takes two arguments: a key, and a 
default value which is returned should the key not exist.  I use this a lot.

 things could get very weird if you need to express a non-null array of 0
length.

 
 char[] s = ""
 
 s is a non-null array of 0 length.

What about non-char types?

 If you *really* need to, you could probably get away with doing 
 something like:

      const char[] nadda = "nadda";
      if (s is not nadda) { ... }

 
 
 True, but this is yucky and what if a setting actually had a value of 
 "nadda"?

That's why you use 'is' and not ==.  'is' performs a pointer comparison. 
   The array has to point into that exact string literal for the 
comparison to be true.  The only catch is string pooling.  It'd be okay 
as long as the string literal "nadda" isn't declared anywhere in the 
source code.

Come to think of it, this is better:

    char[] nonString = new char[1]; // don't mutate me!  Just compare 
with 'is'!

I'm officially out of ideas now.  heh.

  -- andy

Jun 29 2004

Regan Heath <regan netwin.co.nz> writes:

On Tue, 29 Jun 2004 22:35:28 -0700, Andy Friesen <andy ikagames.com> wrote:
 Regan Heath wrote:
 This is taken from a real life example, I have a config file with 10 
 different settings, all optional, I want 3 or them at this point in the 
 code, so I process the file once and load the 3 settings which may or 
 may not be present, and may or may not have a zero length values.

 I guess it's just a matter of preference.  I don't have a problem with 
 something like this:

      char[][char[]] attribs = ...;

      if ("a" in attribs && "b" in attribs && "c" in attribs) {

It's more like:

if ("a" in attribs) {
}
if ("b" in attribs) {
}
if ("c" in attribs) {
}

but, you seem to have completely ignored the fact that, *if* we remove the 
ability to return null when an array type is expected (you suggested 
removing the ability to assign null to an array, it's the same thing), the 
above will cease to work altogether as I imagine the above is simply going

if (attribs["a"] != null)

which is the same as

char[] s;

s = attribs["a"];
if (s != null)

which is impossible if you cannot use null with arrays.

 If nonexistence is an alias for some default, fill the array before 
 parsing the file.  Attributes that are present will override those which 
 are not.

 Python offers a get() method which takes two arguments: a key, and a 
 default value which is returned should the key not exist.  I use this a 
 lot.

but if there is no default, you're left doing the nadda thing below which 
is simply an ugly hack (explanation below)

 things could get very weird if you need to express a non-null array of 
 0 length.

 char[] s = ""

 s is a non-null array of 0 length.

 What about non-char types?

 If you *really* need to, you could probably get away with doing 
 something like:

      const char[] nadda = "nadda";
      if (s is not nadda) { ... }


 True, but this is yucky and what if a setting actually had a value of 
 "nadda"?

 That's why you use 'is' and not ==.  'is' performs a pointer comparison. 
    The array has to point into that exact string literal for the 
 comparison to be true.  The only catch is string pooling.  It'd be okay 
 as long as the string literal "nadda" isn't declared anywhere in the 
 source code.

ahh, gotcha, so basically you're creating null with another name. Why not 
just have null. :)

 Come to think of it, this is better:

     char[] nonString = new char[1]; // don't mutate me!  Just compare 
 with 'is'!

Another face for the same entity, null.

 I'm officially out of ideas now.  heh.

Think of it from the other point of view, assume we make the minor 
adjustments to arrays that I suggested, what effect does it have on the 
people who cannot see themselves needing a null array? hmm.. I think none. 
IMO it simply gives us more flexibilty of expression at no cost.

Regan.

-- 
Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/

Jun 30 2004

Andy Friesen <andy ikagames.com> writes:

Regan Heath wrote:

 if ("a" in attribs) { ... }
 ...
 
 you seem to have completely ignored the fact that, *if* we remove 
 the ability to return null when an array type is expected (you suggested 
 removing the ability to assign null to an array, it's the same thing), 
 the above will cease to work altogether as I imagine the above is simply 
 going
 
 if (attribs["a"] != null)

I very much doubt this.  Associative arrays maintain an internal list of 
keys and values.  In all likelihood, the 'in' operator hashes the key 
("a" in this case) and searches through the associative array's internal 
hash table for one that matches.

 If nonexistence is an alias for some default, fill the array before 
 parsing the file.  Attributes that are present will override those 
 which are not.

 Python offers a get() method which takes two arguments: a key, and a 
 default value which is returned should the key not exist.  I use this 
 a lot.

 
 but if there is no default, you're left doing the nadda thing below 
 which is simply an ugly hack (explanation below)

Right.  I am an idiot. (below)

 That's why you use 'is' and not ==.  'is' performs a pointer 
 comparison.    The array has to point into that exact string literal 
 for the comparison to be true.  The only catch is string pooling.  
 It'd be okay as long as the string literal "nadda" isn't declared 
 anywhere in the source code.

 
 ahh, gotcha, so basically you're creating null with another name. Why 
 not just have null. :)

I was thinking about this, and the conclusion that I came to is that I 
am a complete idiot for not noticing what looked to be a completely 
arbitrary distinction with respect to comparing against null and 
comparing against any other pointer.

After a tiny bit of testing, I came to the conclusion that I am an even 
bigger idiot than I could have possibly imagined.  D already gets things 
pretty much bang on:

     T[] a, b;
     a = b;     // 'a == b' and 'a is b' will both be true. (even if b is
                // null)
     a = b.dup; // 'a == b' will be true.  'a is b' will be true iff b is
                // null. (null.dup is null, evidently.  funny that)

With respect to 'a == null', my mind is quite blown.  Farmer's tests 
reliably produce situations where zero-length strings compare false 
against null.  My own tests show that empty arrays are equivalent to 
null but do not share identity.  Don't test x==null, I guess. :)

Explicitly testing for an empty, non-null array requires that you write 
'if (x !== null && x.length == 0)', which is probably okay: I can 
envision hordes of new programmers going postal because of 'name != ""' 
and 'name.length == 0' somehow both evaluating to true at the same time.

  -- andy

Jun 30 2004

Regan Heath <regan netwin.co.nz> writes:

On Wed, 30 Jun 2004 19:02:22 -0700, Andy Friesen <andy ikagames.com> wrote:

 Regan Heath wrote:

 if ("a" in attribs) { ... }
 ...

 you seem to have completely ignored the fact that, *if* we remove the 
 ability to return null when an array type is expected (you suggested 
 removing the ability to assign null to an array, it's the same thing), 
 the above will cease to work altogether as I imagine the above is 
 simply going

 if (attribs["a"] != null)

 I very much doubt this.  Associative arrays maintain an internal list of 
 keys and values.  In all likelihood, the 'in' operator hashes the key 
 ("a" in this case) and searches through the associative array's internal 
 hash table for one that matches.

I agree totally. I am not disputing how an associative array works, what I 
am saying is, without the ability to compare an array to null, you cannot 
express 'does not exist' in terms of an associative array.

What does:
   if ("a" in attribs)

actually evaluate to, if not:
   if (attribs["a"] != null)

?

 If nonexistence is an alias for some default, fill the array before 
 parsing the file.  Attributes that are present will override those 
 which are not.

 Python offers a get() method which takes two arguments: a key, and a 
 default value which is returned should the key not exist.  I use this 
 a lot.

 but if there is no default, you're left doing the nadda thing below 
 which is simply an ugly hack (explanation below)

 Right.  I am an idiot. (below)

 That's why you use 'is' and not ==.  'is' performs a pointer 
 comparison.    The array has to point into that exact string literal 
 for the comparison to be true.  The only catch is string pooling.  
 It'd be okay as long as the string literal "nadda" isn't declared 
 anywhere in the source code.

 ahh, gotcha, so basically you're creating null with another name. Why 
 not just have null. :)

 I was thinking about this, and the conclusion that I came to is that I 
 am a complete idiot for not noticing what looked to be a completely 
 arbitrary distinction with respect to comparing against null and 
 comparing against any other pointer.

 After a tiny bit of testing, I came to the conclusion that I am an even 
 bigger idiot than I could have possibly imagined.  D already gets things 
 pretty much bang on:

      T[] a, b;
      a = b;     // 'a == b' and 'a is b' will both be true. (even if b is
                 // null)
      a = b.dup; // 'a == b' will be true.  'a is b' will be true iff b is
                 // null. (null.dup is null, evidently.  funny that)

 With respect to 'a == null', my mind is quite blown.  Farmer's tests 
 reliably produce situations where zero-length strings compare false 
 against null. My own tests show that empty arrays are equivalent to null 
 but do not share identity.  Don't test x==null, I guess. :)

 Explicitly testing for an empty, non-null array requires that you write 
 'if (x !== null && x.length == 0)', which is probably okay:

My tests, given:

char[] e = ""
char[] n;

output:

e is ""    (f)
n is ""    (f)
e is null  (f)
n is null  (t)
e is n     (f)

e == ""    (t)
n == ""    (t) incorrect?
e == null  (f)
n == null  (t)
e == n     (t) incorrect?

e === ""   (f)
n === ""   (f)
e === null (f)
n === null (t)
e === n    (f)

The != and !== tests were all the opposite of the above, so I have not 
included them.

== calls opEquals, perhaps it has a shortcut in it which says if the 
lengths are both 0 return true? this would explain the two cases above I 
have marked "incorrect?". I think these two cases are inconsistent.

To reliably test for nullness I can use '===' or '!==' or 'is'.

 I can envision hordes of new programmers going postal because of 'name 
 != ""' and 'name.length == 0' somehow both evaluating to true at the 
 same time.

Yeah.. to stop that name.length would have to have a NaN (null) value. 
Which 'int' or 'uint' does not have.

Regan.

-- 
Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/

Jun 30 2004

Andy Friesen <andy ikagames.com> writes:

Regan Heath wrote:
 I am not disputing how an associative array works, what 
 I am saying is, without the ability to compare an array to null, you 
 cannot express 'does not exist' in terms of an associative array.
 
 What does:
   if ("a" in attribs)
 
 actually evaluate to, if not:
   if (attribs["a"] != null)

This could never work anyway.  Types for which null does not make sense 
obviously can't use null to indicate nonexistence.  Types for which null 
does make sense can't do this either, as it makes perfect sense to store 
a null reference.

The fundamental idea is that you're trying to represent a "nonvalue", 
which is storable in the result variable, but not part of the variable's 
range.  This obviously won't work, as it requires two contradictory 
ideas to be simultaneously true.  Adding a 'special' value like null is 
sometimes close enough for specific application domains, but, in the 
end, all you're doing is making the range of allowable values bigger.

 == calls opEquals, perhaps it has a shortcut in it which says if the 
 lengths are both 0 return true? this would explain the two cases above I 
 have marked "incorrect?". I think these two cases are inconsistent.

Looking at internal/adi.d, it looks like it compares the lengths, then 
compares each element in succession:

     extern (C) int _adEq(Array a1, Array a2, TypeInfo ti)
     {
         if (a1.length != a2.length)
             return 0;		// not equal
         int sz = ti.tsize();
         //printf("sz = %d\n", sz);
         void *p1 = a1.ptr;
         void *p2 = a2.ptr;
         for (int i = 0; i < a1.length; i++)
         {
             if (!ti.equals(p1 + i * sz, p2 + i * sz))
                 return 0;		// not equal
         }
         return 1;			// equal
     }

How on Earth ""!=null ever comes about is beyond me.

  -- andy

Jun 30 2004

Regan Heath <regan netwin.co.nz> writes:

On Wed, 30 Jun 2004 22:40:28 -0700, Andy Friesen <andy ikagames.com> wrote:

 Regan Heath wrote:
 I am not disputing how an associative array works, what I am saying is, 
 without the ability to compare an array to null, you cannot express 
 'does not exist' in terms of an associative array.

 What does:
   if ("a" in attribs)

 actually evaluate to, if not:
   if (attribs["a"] != null)

 This could never work anyway.  Types for which null does not make sense 
 obviously can't use null to indicate nonexistence.  Types for which null 
 does make sense can't do this either, as it makes perfect sense to store 
 a null reference.

Yeah... you're right.

 The fundamental idea is that you're trying to represent a "nonvalue", 
 which is storable in the result variable, but not part of the variable's 
 range.  This obviously won't work, as it requires two contradictory 
 ideas to be simultaneously true.  Adding a 'special' value like null is 
 sometimes close enough for specific application domains, but, in the 
 end, all you're doing is making the range of allowable values bigger.

I think.. I agree. :)

 == calls opEquals, perhaps it has a shortcut in it which says if the 
 lengths are both 0 return true? this would explain the two cases above 
 I have marked "incorrect?". I think these two cases are inconsistent.

 Looking at internal/adi.d, it looks like it compares the lengths, then 
 compares each element in succession:

I went looking for that (not hard enough obviously)..

      extern (C) int _adEq(Array a1, Array a2, TypeInfo ti)
      {
          if (a1.length != a2.length)
              return 0;		// not equal
          int sz = ti.tsize();
          //printf("sz = %d\n", sz);
          void *p1 = a1.ptr;
          void *p2 = a2.ptr;
          for (int i = 0; i < a1.length; i++)
          {
              if (!ti.equals(p1 + i * sz, p2 + i * sz))
                  return 0;		// not equal
          }
          return 1;			// equal
      }

 How on Earth ""!=null ever comes about is beyond me.

below _adEq is..

extern (C) int _adCmp(Array a1, Array a2, TypeInfo ti)
{
     int len;

     //printf("adCmp()\n");
     len = a1.length;
     if (a2.length < len)
	len = a2.length;
     int sz = ti.tsize();
     void *p1 = a1.ptr;
     void *p2 = a2.ptr;
     for (int i = 0; i < len; i++)
     {
	int c;

	c = ti.compare(p1 + i * sz, p2 + i * sz);
	if (c)
	    return c;
     }
     return cast(int)a1.length - cast(int)a2.length;
}

which would return 0 if both lengths were 0. "" and null both have a 
length of 0.

Regan.

-- 
Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/

Jun 30 2004

Andy Friesen <andy ikagames.com> writes:

Regan Heath wrote:
 On Wed, 30 Jun 2004 22:40:28 -0700, Andy Friesen <andy ikagames.com> wrote:
 How on Earth ""!=null ever comes about is beyond me.

 
 _adEq is..
 
 extern (C) int _adCmp(Array a1, Array a2, TypeInfo ti)
 {
     [....]
 }
 
 which would return 0 if both lengths were 0. "" and null both have a 
 length of 0.

Right, but Cmp functions return 0 to indicate equality, which would be 
the right thing in this case.

My money says the cause is in that inline-assembly-optimized _adCmpChar. 
(line 360)  I freely admit that I blame it on the inline assembly 
because me and assembly have not been on speaking terms for some time 
now.  (one too many hand-coded alpha-blits that lost to MSVC's 
optimizing compiler)

  -- andy

Jul 01 2004

Arcane Jill <Arcane_member pathlink.com> writes:

In article <opsadsu8f75a2sq9 digitalmars.com>, Regan Heath says...

s = p.getValue("foo");
if (s) ..
s = p.getValue("bar");
if (s) ..

Right...

If I cannot return null, then (using the code above) I cannot tell the 
difference between whether foo or bar was passed or had an empty value.


And indeed that very situation is ALSO true with integer parameters. How can
tell the difference between an integer parameter being present and zero, and no
integer parameter being present at all?

But of course, there are various solutions to this problem, many much simpler
than you propose. For a start, you could return an int* instead of an int, or
indeed a char[]* instead of a char[]. Then you could explicitly test for ===null
in both cases.

In C++, I'd just return a std::pair<bool, T>. I'm sure that once we have a good
supply of standard templates in D we'll be able to do much the same thing. (Even
without templates, you could define a struct and return it).

Anything wrong with either of these approaches?

Arcane Jill

Jun 30 2004

Regan Heath <regan netwin.co.nz> writes:

On Wed, 30 Jun 2004 07:27:33 +0000 (UTC), Arcane Jill 
<Arcane_member pathlink.com> wrote:

 In article <opsadsu8f75a2sq9 digitalmars.com>, Regan Heath says...

 s = p.getValue("foo");
 if (s) ..
 s = p.getValue("bar");
 if (s) ..

 Right...

 If I cannot return null, then (using the code above) I cannot tell the
 difference between whether foo or bar was passed or had an empty value.


 And indeed that very situation is ALSO true with integer parameters. How 
 can
 tell the difference between an integer parameter being present and zero, 
 and no
 integer parameter being present at all?

Yep. As another poster noted he had the same problem with integers, 
resulting in him using a value of -1 to represent null. Yuck.

 But of course, there are various solutions to this problem, many much 
 simpler
 than you propose. For a start, you could return an int* instead of an 
 int, or
 indeed a char[]* instead of a char[]. Then you could explicitly test for 
 ===null
 in both cases.

This is the C solution. For int I cannot think of a good D solution. For 
char[] (or any array) we already have one, the array emulates/acts like a 
reference type, it's just inconsistent.

 In C++, I'd just return a std::pair<bool, T>. I'm sure that once we have 
 a good
 supply of standard templates in D we'll be able to do much the same 
 thing. (Even
 without templates, you could define a struct and return it).

You're emulating a reference type, why not just have one. This may be the 
best soln for int and other strict value types.

 Anything wrong with either of these approaches?

Yep. Neither is as simple, elegant or clean as a reference type, which we 
already have in D arrays albeit inconsistently.

Regan.

-- 
Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/

Jun 30 2004

Kevin Bealer <Kevin_member pathlink.com> writes:

In article <opsafg63rh5a2sq9 digitalmars.com>, Regan Heath says...
On Wed, 30 Jun 2004 07:27:33 +0000 (UTC), Arcane Jill 
<Arcane_member pathlink.com> wrote:

 In article <opsadsu8f75a2sq9 digitalmars.com>, Regan Heath says...

 s = p.getValue("foo");
 if (s) ..
 s = p.getValue("bar");
 if (s) ..

 Right...

 If I cannot return null, then (using the code above) I cannot tell the
 difference between whether foo or bar was passed or had an empty value.


 And indeed that very situation is ALSO true with integer parameters. How 
 can
 tell the difference between an integer parameter being present and zero, 
 and no
 integer parameter being present at all?

Yep. As another poster noted he had the same problem with integers, 
resulting in him using a value of -1 to represent null. Yuck.

 But of course, there are various solutions to this problem, many much 
 simpler
 than you propose. For a start, you could return an int* instead of an 
 int, or
 indeed a char[]* instead of a char[]. Then you could explicitly test for 
 ===null
 in both cases.

This is the C solution. For int I cannot think of a good D solution. For 
char[] (or any array) we already have one, the array emulates/acts like a 
reference type, it's just inconsistent.

The D equivalent might be to return int[] or char[][] y.  Test if the length is
zero.  If it's not, then the data is "present".  Otherwise it is missing.













For the HTML parsing example given in this thread, this may be even better
because sometimes HTML has multiple values with the same tag.

Another data point: I've also used the technique listed below (pair<bool, T>),
albeit wrapped in a template class.  The code is very readable.

Kevin

 In C++, I'd just return a std::pair<bool, T>. I'm sure that once we have 
 a good
 supply of standard templates in D we'll be able to do much the same 
 thing. (Even
 without templates, you could define a struct and return it).

You're emulating a reference type, why not just have one. This may be the 
best soln for int and other strict value types.

 Anything wrong with either of these approaches?

Yep. Neither is as simple, elegant or clean as a reference type, which we 
already have in D arrays albeit inconsistently.

Regan.

-- 
Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/

Jul 07 2004

Farmer <itsFarmer. freenet.de> writes:

Kevin Bealer <Kevin_member pathlink.com> wrote in
news:cci0tl$2dnl$1 digitaldaemon.com: 

 In article <opsafg63rh5a2sq9 digitalmars.com>, Regan Heath says...
On Wed, 30 Jun 2004 07:27:33 +0000 (UTC), Arcane Jill 
<Arcane_member pathlink.com> wrote:

 In article <opsadsu8f75a2sq9 digitalmars.com>, Regan Heath says...

 s = p.getValue("foo");
 if (s) ..
 s = p.getValue("bar");
 if (s) ..

 Right...

 If I cannot return null, then (using the code above) I cannot tell
 the difference between whether foo or bar was passed or had an empty
 value. 


 And indeed that very situation is ALSO true with integer parameters.
 How can
 tell the difference between an integer parameter being present and
 zero, and no
 integer parameter being present at all?

Yep. As another poster noted he had the same problem with integers, 
resulting in him using a value of -1 to represent null. Yuck.

 But of course, there are various solutions to this problem, many much 
 simpler
 than you propose. For a start, you could return an int* instead of an 
 int, or
 indeed a char[]* instead of a char[]. Then you could explicitly test
 for ===null
 in both cases.

This is the C solution. For int I cannot think of a good D solution. For
char[] (or any array) we already have one, the array emulates/acts like
a reference type, it's just inconsistent.

 
 The D equivalent might be to return int[] or char[][] y.  Test if the
 length is zero.  If it's not, then the data is "present".  Otherwise it
 is missing. 

Disagree. Returning an array for a single value confuses a programmer that 
didn't bother to fully read the function's documentation. (Don't blame the 
programmer, in most cases the documentation doesn't exist, anyway.)
 












 
 For the HTML parsing example given in this thread, this may be even
 better because sometimes HTML has multiple values with the same tag.
 
 Another data point: I've also used the technique listed below
 (pair<bool, T>), albeit wrapped in a template class.  The code is very
 readable. 

Since the code is very readable, why do you argue that the D-way would be 
something different, then? 
[No need to answer this, I already know one good answer.]

What do you mean by 'albeit wrapped in a template class'? Do you wrap 
'pair<bool, T>' into your own templated class to provide an  isNull()  
method?


I like the  pair<bool, T>  solution best. It expresses the meaning of the 
returned value precisely and can be generically applied to all types.
Still, in some cases using reference typec (e.g.null-arrays) is a simpler and 
faster solution.


Farmer.



 
 Kevin
 
 In C++, I'd just return a std::pair<bool, T>. I'm sure that once we
 have a good
 supply of standard templates in D we'll be able to do much the same 
 thing. (Even
 without templates, you could define a struct and return it).

You're emulating a reference type, why not just have one. This may be
the best soln for int and other strict value types.

 Anything wrong with either of these approaches?

Yep. Neither is as simple, elegant or clean as a reference type, which
we already have in D arrays albeit inconsistently.

Regan.

-- 
Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/

Jul 09 2004

Arcane Jill <Arcane_member pathlink.com> writes:

In article <opsab6o5rl5a2sq9 digitalmars.com>, Regan Heath says...
 Yeah, it's called std::string, and it's more or less the default.

And it's crap. IMNSHO.

You'll get no arguments from me there. D got it right in not having a string
class. I didn't think that at first, but I've come round to the D way of
thinking. The problem with a string class is that you can't add new member
functions to it. (Oh, you may be able to subclass String, if it's not final. Oh
wait - it /is/ final in Java). With char[] arrays, you CAN add new functions.

Besides which, what else can a char[] array possibly repreresent, other than a
string? (Given that a char[] array MUST contain UTF-8, I mean). It's not the
same as a byte[] array, which could mean anything.







 Don't compare arrays to null.  Don't try to differentiate between empty 
 and nonexistent.

Fine and dandy EXCEPT we *need* to differentiate between empty and 
non-existant strings.

Why? Do we also need a way to differentiate between empty and non-existent ints?


In D, there is no such thing as a non-existent int; there is no such thing as a
non-existent struct; and there is no such thing as a non-existent string.

Why not just start from the assumption that we DON'T need to differentiate
between empty and non-existant strings, and take it from there?

Maybe the real solution would be to make it a compile error to assign an array
with null, or to compare it with null. This would then force people to say what
they mean, and all such problems would go away.

(Anyway, you all KNOW my opinion that




should be a compile-time error anyway, because char[] is not boolean. But that's
another story).

Jill

Jun 29 2004

Derek Parnell <derek psych.ward> writes:

On Tue, 29 Jun 2004 07:18:20 +0000 (UTC), Arcane Jill wrote:

 In article <opsab6o5rl5a2sq9 digitalmars.com>, Regan Heath says...
 Yeah, it's called std::string, and it's more or less the default.

And it's crap. IMNSHO.

 
 You'll get no arguments from me there. D got it right in not having a string
 class. I didn't think that at first, but I've come round to the D way of
 thinking. The problem with a string class is that you can't add new member
 functions to it. (Oh, you may be able to subclass String, if it's not final. Oh
 wait - it /is/ final in Java). With char[] arrays, you CAN add new functions.
 
 Besides which, what else can a char[] array possibly repreresent, other than a
 string? (Given that a char[] array MUST contain UTF-8, I mean). It's not the
 same as a byte[] array, which could mean anything.
 
 Don't compare arrays to null.  Don't try to differentiate between empty 
 and nonexistent.

Fine and dandy EXCEPT we *need* to differentiate between empty and 
non-existant strings.

 
 Why? Do we also need a way to differentiate between empty and non-existent
ints?
 
 In D, there is no such thing as a non-existent int; there is no such thing as a
 non-existent struct; and there is no such thing as a non-existent string.
 
 Why not just start from the assumption that we DON'T need to differentiate
 between empty and non-existant strings, and take it from there?

Because that's not what is being meant. I'd like to differentiate between
INITIALIZED and UNINITIALIZED vectors. This non-existant thing is a
red-herring. 'empty' means initialized and length of zero. 'non-existant'
means not initialized yet.












Its a workaround for the current (longer) way of handling this situation.
Its no big deal but it would be 'nice to have'. Like a strict bool type
would be nice to have.
-- 
Derek
Melbourne, Australia
29/Jun/04 6:24:19 PM

Jun 29 2004

Arcane Jill <Arcane_member pathlink.com> writes:

In article <cbr9e5$vai$1 digitaldaemon.com>, Derek Parnell says...

Because that's not what is being meant. I'd like to differentiate between
INITIALIZED and UNINITIALIZED vectors.

Why?

D's dynamic arrays are the same thing as C++ std::vectors (as I'm sure you
realize). In C++, there is no such thing as an uninitialized vector. Why on
Earth would you want them in D?



This non-existant thing is a
red-herring. 'empty' means initialized and length of zero. 'non-existant'
means not initialized yet.

Yeah - but nobody has yet answered WHY? Why would ANYONE want to allow
uninitialized array handles (as opposed to array content) to exist in D. It
makes no sense.

Please, can someone who is arguing in favor of allowing a distinction between
initialized and unintialized dynamic array handles, explain exactly why you want
such a distinction to exist?


Arcane Jill

Jun 29 2004

Sam McCall <tunah.d tunah.net> writes:

Arcane Jill wrote:

 In article <cbr9e5$vai$1 digitaldaemon.com>, Derek Parnell says...
 
 
Because that's not what is being meant. I'd like to differentiate between
INITIALIZED and UNINITIALIZED vectors.

 
 
 Why?
 
 D's dynamic arrays are the same thing as C++ std::vectors (as I'm sure you
 realize). 

The difference is in C++ it's common to use a pointer to a class (and I 
presume, a vector).
In D, an array is a struct, not a class, so to get reference semantics 
you have to use a struct pointer. In C++ this would be no big deal, but 
this doesn't seem like the D way.
Reference semantics allow me to change the length of an array and have 
it reflected in the caller, and to store nulls.

 In C++, there is no such thing as an uninitialized vector. Why on
 Earth would you want them in D?

For the same reason you use null in other situations with reference 
types. I want accessing an uninitialised member array to give an error. 
I want to be able to use a null argument to a function to trigger 
special or default behaviour (optional arguments in any position).

Sam

PS: AJ, I'm not sure if you read the forums at dsource, I posted a 
couple of deimos bugs:
http://dsource.org/forums/viewtopic.php?t=224

Jun 29 2004

Arcane Jill <Arcane_member pathlink.com> writes:

In article <cbrgtm$19gj$1 digitaldaemon.com>, Sam McCall says...
PS: AJ, I'm not sure if you read the forums at dsource,

I do, but less frequently than this one as it's a slow turnover list. I get
notified when new posts are added to existing threads, but not when new threads
are added.


I posted a 
couple of deimos bugs:
http://dsource.org/forums/viewtopic.php?t=224

Okay, I'm on it. I'll let you know when they're fixed.

Maybe we could start a "Bugs" thread on Deimos. That way I'll always get
notified when anyone adds to it.

Jill

Jun 29 2004

Matthias Becker <Matthias_member pathlink.com> writes:

 In C++, there is no such thing as an uninitialized vector. Why on
 Earth would you want them in D?

For the same reason you use null in other situations with reference 
types. I want accessing an uninitialised member array to give an error. 
I want to be able to use a null argument to a function to trigger 
special or default behaviour (optional arguments in any position).

Nope, wrong.

If you use reference-types that are allowed to be NULL (in C++ references
aren't, e.g. in nice there are references, that aren't, too, ...) you want to
show that there possibly is no object. At least in languages that allow you to
use other kinds of references (e.g. C++ or nixe as mentiond above).

In languages that don't have references that can't be null, you just can't
express yourself in the code.



In C++ I never had the wish to pass a container/collection as a pointer. I
allways pass them as C++-reference. So I'm sure there allways is a collection
and I don't have to check for this.
If there are no values to pass in, I just pass an empty collection.


Could you please make some example where it makes sense not to pass a collection
instead of passing an empty collection?

-- Matthias Becker

Jun 29 2004

Sam McCall <tunah.d tunah.net> writes:

Matthias Becker wrote:
 In C++ I never had the wish to pass a container/collection as a pointer. I
 allways pass them as C++-reference. So I'm sure there allways is a collection
 and I don't have to check for this.
 If there are no values to pass in, I just pass an empty collection.
 
 
 Could you please make some example where it makes sense not to pass a
collection
 instead of passing an empty collection?

To request default behaviour a la optional arguments, without 
restrictions on the number or position of the arguments.

Sam

Jun 29 2004

Regan Heath <regan netwin.co.nz> writes:

On Tue, 29 Jun 2004 15:58:29 +0000 (UTC), Matthias Becker 
<Matthias_member pathlink.com> wrote:

 In C++, there is no such thing as an uninitialized vector. Why on
 Earth would you want them in D?

 For the same reason you use null in other situations with reference
 types. I want accessing an uninitialised member array to give an error.
 I want to be able to use a null argument to a function to trigger
 special or default behaviour (optional arguments in any position).

 Nope, wrong.

 If you use reference-types that are allowed to be NULL (in C++ references
 aren't, e.g. in nice there are references, that aren't, too, ...) you 
 want to
 show that there possibly is no object. At least in languages that allow 
 you to
 use other kinds of references (e.g. C++ or nixe as mentiond above).

 In languages that don't have references that can't be null, you just 
 can't
 express yourself in the code.



 In C++ I never had the wish to pass a container/collection as a pointer. 
 I
 allways pass them as C++-reference. So I'm sure there allways is a 
 collection
 and I don't have to check for this.
 If there are no values to pass in, I just pass an empty collection.


 Could you please make some example where it makes sense not to pass a 
 collection
 instead of passing an empty collection?

pls read my post (2 prior to this one - sorted flat and by date, it is a 
response to Andy's post) it contains an example. I would like some 
feedback on how to achieve what I want to do...

Regan.

 -- Matthias Becker



-- 
Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/

Jun 29 2004

Derek <derek psyc.ward> writes:

On Tue, 29 Jun 2004 09:50:35 +0000 (UTC), Arcane Jill wrote:

 In article <cbr9e5$vai$1 digitaldaemon.com>, Derek Parnell says...
 
Because that's not what is being meant. I'd like to differentiate between
INITIALIZED and UNINITIALIZED vectors.

 
 Why?
 
 D's dynamic arrays are the same thing as C++ std::vectors (as I'm sure you
 realize). In C++, there is no such thing as an uninitialized vector. Why on
 Earth would you want them in D?
 

I don't use C++, so I'm not aware of what std::vector does or does not
provide.  

Ok, off the top of my head...

I'm writing a library that will be used by other coders. It has a function
that accepts a dynamic array. A zero-length array is a valid parameter. The
caller however can pass an uninitialized parameter to tell my function that
the user wishes to use the default values instead of supplying a value.

In short, an uninitialized variable contains information - namely the fact
that it *is* uninitialized. And that information could be utilized by a
coder - if they had the chance.

 
This non-existant thing is a
red-herring. 'empty' means initialized and length of zero. 'non-existant'
means not initialized yet.

 
 Yeah - but nobody has yet answered WHY? Why would ANYONE want to allow
 uninitialized array handles (as opposed to array content) to exist in D. It
 makes no sense.

Ok, but it does to me. Sorry I can't seem to be able to explain why.

 Please, can someone who is arguing in favor of allowing a distinction between
 initialized and unintialized dynamic array handles, explain exactly why you
want
 such a distinction to exist?

Apparently not; sorry.

-- 
Derek
Melbourne, Australia

Jun 29 2004

Arcane Jill <Arcane_member pathlink.com> writes:

In article <12vwf4nkzjzxa.17ai9mojp3dpz$.dlg 40tude.net>, Derek says...
Ok, off the top of my head...

I'm writing a library that will be used by other coders. It has a function
that accepts a dynamic array. A zero-length array is a valid parameter. The
caller however can pass an uninitialized parameter to tell my function that
the user wishes to use the default values instead of supplying a value.

I'd use two functions for this:



..but only if an empty array was NOT the default. In many cases, I could
probably get away with an empty array BEING the default, in which case, I could
simply do:




In short, an uninitialized variable contains information - namely the fact
that it *is* uninitialized.

It's a nice argument, but it could be applied equally well to ANY types. If I
were supremely in favor of the notion that uninitializedness carries information
(which I'm not), I might argue as follows:

I'm writing a library that will be used by other coders. It has a function
that accepts a bit. Zero is a valid parameter. The
caller however can pass an uninitialized parameter to tell my function that
the user wishes to use the default values instead of supplying a value.

If I believed that, I'd be arguing for a distinction between an uninitialized
bit, and a bit containing zero. I happen not to believe that, however.



 Why would ANYONE want to allow
 uninitialized array handles (as opposed to array content) to exist in D. It
 makes no sense.


Ok, but it does to me. Sorry I can't seem to be able to explain why.

Yeah, human language is a bummer. Someone ought to invent telepathy.

Jill

Jun 29 2004

Sam McCall <tunah.d tunah.net> writes:

Arcane Jill wrote:
 In article <12vwf4nkzjzxa.17ai9mojp3dpz$.dlg 40tude.net>, Derek says...
 
Ok, off the top of my head...

I'm writing a library that will be used by other coders. It has a function
that accepts a dynamic array. A zero-length array is a valid parameter. The
caller however can pass an uninitialized parameter to tell my function that
the user wishes to use the default values instead of supplying a value.

 
 
 I'd use two functions for this:


 
 ..but only if an empty array was NOT the default. In many cases, I could
 probably get away with an empty array BEING the default, in which case, I could
 simply do:
 


Sure, but it sucks if there's a lot of them, and is impossible if the 
function is variadic.
The ability to pass null to a function is very useful, I've switched 
from structs to classes more than once for this reason.

Sam

Jun 29 2004

"Carlos Santander B." <carlos8294 msn.com> writes:

"Arcane Jill" <Arcane_member pathlink.com> escribi� en el mensaje
news:cbre1b$15j0$1 digitaldaemon.com
|
| ...
|
| Yeah - but nobody has yet answered WHY? Why would ANYONE want to allow
| uninitialized array handles (as opposed to array content) to exist in D.
It
| makes no sense.
|
| Please, can someone who is arguing in favor of allowing a distinction
between
| initialized and unintialized dynamic array handles, explain exactly why
you want
| such a distinction to exist?
|
|
| Arcane Jill

Regan already said why:

"Regan Heath" <regan netwin.co.nz> escribi� en el mensaje
news:opr99w0st25a2sq9 digitalmars.com
|
| ...
|
| We *need* to have *both* null and empty arrays. The reason is pretty
| simple:
|    - null means does not exist
|    - emtpy means exists, but has no value (or empty value)
|
| This is important in situations like the original poster mentioned and in
| my experience for example... When reading POST input from a web page, you
| get a string like so:
|
|    Setting1=Regan+Heath&Setting2=&&
|
| when requesting items you might have a function like:
|
|    char[] getFormValue(char[] label);
|
| the code to get the values for the above form might go:
|
|    char[] s;
|
|    s = getFormValue("Setting1"); // s is "Regan Heath"
|    s = getFormValue("Setting2"); // s is ""
|    s = getFormValue("Setting3"); // s is null
|
| It is important the above code can tell that Setting3 was not passed in
| the form, so it can decide not to overwrite whatever current value that
| setting has, whereas it can tell Setting2 was passed and will overwrite
| the current value with a new blank one.
|
| ...
|

Personally, I would use an associative array to represent such a thing
(instead of using a function), but it's an implementation difference, and
the language should let Regan do the way he wants.

I've ran into such cases before ("" !== null), I know that. I just can't
remember any of them right now :D

Two more things: I don't think this should only be for strings, but for any
array. And I'm 100% sure this has been raised before.

-----------------------
Carlos Santander Bernal

Jun 29 2004

Regan Heath <regan netwin.co.nz> writes:

On Tue, 29 Jun 2004 09:50:35 +0000 (UTC), Arcane Jill 
<Arcane_member pathlink.com> wrote:

 In article <cbr9e5$vai$1 digitaldaemon.com>, Derek Parnell says...

 Because that's not what is being meant. I'd like to differentiate 
 between
 INITIALIZED and UNINITIALIZED vectors.

 Why?

 D's dynamic arrays are the same thing as C++ std::vectors (as I'm sure 
 you
 realize). In C++, there is no such thing as an uninitialized vector. Why 
 on
 Earth would you want them in D?



 This non-existant thing is a
 red-herring. 'empty' means initialized and length of zero. 
 'non-existant'
 means not initialized yet.

 Yeah - but nobody has yet answered WHY? Why would ANYONE want to allow
 uninitialized array handles (as opposed to array content) to exist in D. 
 It
 makes no sense.

 Please, can someone who is arguing in favor of allowing a distinction 
 between
 initialized and unintialized dynamic array handles, explain exactly why 
 you want
 such a distinction to exist?

Pls read the reply I just made to Andy's post that started this branch in 
this thread i.e. just go up a little bit in a threaded reader, or look for 
the post I made just prior to this one if viewing flat and sorting by date.

Regan.

-- 
Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/

Jun 29 2004

Sam McCall <tunah.d tunah.net> writes:

Arcane Jill wrote:
 In article <opsab6o5rl5a2sq9 digitalmars.com>, Regan Heath says...
 
Yeah, it's called std::string, and it's more or less the default.

And it's crap. IMNSHO.

 
 
 You'll get no arguments from me there. D got it right in not having a string
 class. I didn't think that at first, but I've come round to the D way of
 thinking. 

I'm still getting there... I still don't see why toUpper("hello") is 
better than "hello".toUpper(), under the assumption that the OO way has 
any merit. (If it doesn't, why do we have it?)

 The problem with a string class is that you can't add new member
 functions to it. (Oh, you may be able to subclass String, if it's not final. Oh
 wait - it /is/ final in Java). With char[] arrays, you CAN add new functions.

I'm confused: is there a way of adding functions to array types that 
can't be used with classes?

 Besides which, what else can a char[] array possibly repreresent, other than a
 string? (Given that a char[] array MUST contain UTF-8, I mean). It's not the
 same as a byte[] array, which could mean anything.

In theory you're right. The problem is when people assume "a char array 
is a list of characters", which is perfectly logical, given the name.
In theory, you should only store a list of characters in a dchar[]. But 
it's not going to happen, see std.string.maketrans (char[] is a list) 
and translate (char[] is opaque).

[RANT]
IMO, D (language, not libraries) isn't _really_ trying to be 
fully-unicode at all.
What is the purpose of a char/wchar variable? How often do you actually 
need to be directly manipulating UTF8/16 fragments? (Hint: in a 
unicode-based language with good libraries, almost never).
*IF* D is going to be fully-unicode, that does have performance impacts. 
A single character must _always_ go in a dchar variable. So what is the 
advantage in having strings being char[] arrays? ("knowing the encoding" 
doesn't count, the user shouldn't have to care).
IMO, strings NEED to:
	* Have only one type, or one base type.
I want to write a function that accepts a string. I don't want to write 
three functions, or use a template (that has to be manually instantiated).
	* Expose character data as _characters_, not fragments.
This means characters accessed must be dchars, indexing must be 
character, not fragment-based.
	* Be efficient in the common case.
At the moment, this probably means using UTF-8 internally. This could be 
changed in the future, or there could be multiple versions with the same 
base type, because all character data would be exposed at the character 
level.
	* Be fully reference types.
At the moment, if someone passes in a string, I can modify its data, 
which is shared, and its length, which is not. This makes sense if you 
understand the implementation, but why should foo~="bar" have the truly 
odd effects it does? Always passing strings inout is ugly and confusing 
in other cases.

Based on this, the solution to me looks like a String interface that 
exposes character data, and UTF8String as the default implementation, 
which stores its data in a ubyte[], literal strings would create these.
There could then be a UTF32String implementation which would be more 
efficient for various other languages.
The "char" type should be 32 bits wide. Anything else is confusing. 
(Hey, they did it with "int"...).
[/RANT]

Now flame on, I'm sure that's not going to be too popular ;-)

Don't compare arrays to null.  Don't try to differentiate between empty 
and nonexistent.

Fine and dandy EXCEPT we *need* to differentiate between empty and 
non-existant strings.

 
 Why? Do we also need a way to differentiate between empty and non-existent
ints?

Frankly, yes, I use -1 as a "magic value" all the time, and do all sorts 
of ugly things when negative numbers are perfectly valid. This is 
neccesary for pragmatic reasons of efficiency, I'd love chips to treat 
0x8000... as NaN like the NaN we have in IEEE floating point. (This'd 
also balance the range of integers). I'm not saying we can/should change 
the behaviour of ints, just that I don't think this argument has merit.

I think arrays should become fully reference types, for the same reason 
as strings above. Yes, this would probably mean double indirection, 
arrays would be a pointer to the (length,data pointer) struct that they 
currently are.

Sam

Jun 29 2004

Arcane Jill <Arcane_member pathlink.com> writes:

In article <cbrfn4$1805$1 digitaldaemon.com>, Sam McCall says...

[RANT]
IMO, D (language, not libraries) isn't _really_ trying to be 
fully-unicode at all.
What is the purpose of a char/wchar variable? How often do you actually 
need to be directly manipulating UTF8/16 fragments? (Hint: in a 
unicode-based language with good libraries, almost never).

Maybe not, but you still need something to store them in. Even if you let a
library do all your UTF-8 work for you (which you should), then you still a type
designed to contain such sequences. In D, a char array is that type.

In other words, the type char exists in order that the type char[] might exist.
I don't have a problem with that.


*IF* D is going to be fully-unicode, that does have performance impacts. 
A single character must _always_ go in a dchar variable. So what is the 
advantage in having strings being char[] arrays?

Space.


("knowing the encoding" 
doesn't count, the user shouldn't have to care).

In a strongly typed language, that would be true, but D is not a strongly typed
language. Walter is on record as stating that all char types including dchar can
be freely used as integers. If that's going to be true, you MUST care about the
encoding.



IMO, strings NEED to:
	* Have only one type, or one base type.

And, to take that reasoning further, it should have other interesting properties
too, like it should be IMPOSSIBLE IN ANY CIRCUMSTANCE to end up with a char
containing a value outside the range U+000000 to U+10FFFF inclusive. However, I
don't see this happening in D. The reason being that even a dchar is not a
character in the Unicode sense. It is a UTF-32 encoding of a character. (The
minor technical difference being that dchar values above 0x10FFFF exist, but are
invalid, whereas Unicode characters beyond U+10FFFF do not even exist).



	* Expose character data as _characters_, not fragments.
This means characters accessed must be dchars, indexing must be 
character, not fragment-based.

That depends on your point of view. Unicode may be viewed on many levels. I'm
sure I could hold a reasonable argument in which I insisted that string data
should be exposed as _glyphs_, not characters (characters are, after all, merely
glyph fragments). Glyphs are what you see. If a string contains an e-acute
glyph, should your application really /care/ which characters compose that
glyph?

Somewhere along the line, you have to face the bottom level. That level is the
level of character encoding. Language support is given to the encoding level.
For anything above that, you use libraries. If such libraries don't exist yet,
we can write them.




The "char" type should be 32 bits wide. Anything else is confusing. 

21 bits wide, and limited to the range 0-0x10FFFF. Anything else is confusing.
But this is D, and D is practical.


Now flame on, I'm sure that's not going to be too popular ;-)

Actually, I loved it, and I'm not flaming (and I hope nobody does). You've made
some excellent observations. But it's way too late to shape D that way now. In
the future, the may well be languages which handle characters as true, pure,
Unicode characters, but the world isn't fully Unicode-aware yet.

To give an example of what I mean: Suppose you publish a web site containing a
few musical symbols and a few exotic math symbols. (All valid Unicode). The sad
fact is, such a website won't display properly on most people's browsers. To get
them to display properly, it is currently the responsibility of VIEWERS (rather
than publishers) of web sites, to "obtain", somehow, the relevant fonts to make
it work. Usually, obtaining such fonts costs money, so who's going to bother?
It'd be like buying a book and opening it to find half the characters looking
like black blobs until you pay more money to a font-designer. And so, web site
designers tend NOT to use such characters on their web sites, prefering gif
images which everyone can view. It's a vicious circle.

In short, the world is not Unicode yet, and it's frustrating. Bits of it are
still trying to catch up with other bits. Sometimes you just want scream at the
planet to get its act together right now. But we have to be realistic.

And realistically, things /are/ changing - but slowly. What D is doing is moving
in the right direction. The shift to full Unicode support in all things is a
long way off yet, and to get there, we must move in small steps.

Defining a char as a UTF-8 fragment may be a small step, but it is a very
important and valuable one. At least we don't say "a char is a character in some
unspecified encoding", like some other languages do.

Nice post, by the way. I enjoyed reading it.

Jill

Jun 29 2004

Sam McCall <tunah.d tunah.net> writes:

Arcane Jill wrote:

 In article <cbrfn4$1805$1 digitaldaemon.com>, Sam McCall says...
 
 
[RANT]
IMO, D (language, not libraries) isn't _really_ trying to be 
fully-unicode at all.
What is the purpose of a char/wchar variable? How often do you actually 
need to be directly manipulating UTF8/16 fragments? (Hint: in a 
unicode-based language with good libraries, almost never).

 
 
 Maybe not, but you still need something to store them in. Even if you let a
 library do all your UTF-8 work for you (which you should), then you still a
type
 designed to contain such sequences. In D, a char array is that type.
 
 In other words, the type char exists in order that the type char[] might exist.
 I don't have a problem with that.

Sure, but given that the "user" shouldn't be touching chars without 
realising that they're more complicated than in C, byte[] would do?
Still, I'm not fussed about this.

*IF* D is going to be fully-unicode, that does have performance impacts. 
A single character must _always_ go in a dchar variable. So what is the 
advantage in having strings being char[] arrays?

 
 
 Space.

Sorry, I didn't mean char[] as opposed to dchar[], I meant char[] as 
opposed to something more opaque. The reasoning for not having a string 
class, IIRC, is "strings are lists of characters". Well, chars aren't 
characters.

("knowing the encoding" 
doesn't count, the user shouldn't have to care).

 In a strongly typed language, that would be true, but D is not a strongly typed
 language. Walter is on record as stating that all char types including dchar
can
 be freely used as integers. If that's going to be true, you MUST care about the
 encoding.

If you don't use them as integers, then you don't have to care.
I'm not saying it shouldn't be well-defined, but Java doesn't require 
the user to understand the intricacies of unicode encodings to 
manipulate strings.
(Yes, java has efficiency problems with strings and presumably some 
problems with wide unicode characters due to a 16 bit char type, but I 
think that still makes sense).

IMO, strings NEED to:
	* Have only one type, or one base type.

 
 
 And, to take that reasoning further, it should have other interesting
properties
 too, like it should be IMPOSSIBLE IN ANY CIRCUMSTANCE to end up with a char
 containing a value outside the range U+000000 to U+10FFFF inclusive. However, I
 don't see this happening in D. The reason being that even a dchar is not a
 character in the Unicode sense. It is a UTF-32 encoding of a character. (The
 minor technical difference being that dchar values above 0x10FFFF exist, but
are
 invalid, whereas Unicode characters beyond U+10FFFF do not even exist).

Okay, I didn't realise dchars were 21 bits wide... if there's a way of 
doing this that's efficient, that'd be cool, dchar (or "char") could be 
21 bits. If it's going to be hopelessly slow, you have to trust the 
programmer to some extent, what about "any library operation involving 
an out-of-range dchar is undefined"?

 That depends on your point of view. Unicode may be viewed on many levels. I'm
 sure I could hold a reasonable argument in which I insisted that string data
 should be exposed as _glyphs_, not characters (characters are, after all,
merely
 glyph fragments). Glyphs are what you see. If a string contains an e-acute
 glyph, should your application really /care/ which characters compose that
 glyph?

Probably not, although if reading an encoded string and then writing it 
again doesn't produce the same byte-output, I'm sure I could find a 
contrived example... copy-pasting text invalidating a digital signature?
Either would be much better than what we've got now, and I think 
character is more likely (though still spectacularly unlikely), because 
it has an obvious, efficient representation (32 bit unsigned number). Am 
I right in assuming a glyph can be fairly complicated?

 Somewhere along the line, you have to face the bottom level. That level is the
 level of character encoding. Language support is given to the encoding level.
 For anything above that, you use libraries. If such libraries don't exist yet,
 we can write them.

Yeah. It's just a bit disappointing after hearing "Strings are character 
arrays and everything about them makes sense" to realise that you either 
have to grok UTF-N or treat these "characters" as opaque... the 
advantages over a class are gone, and a class has reference semantics 
and member functions.

The "char" type should be 32 bits wide. Anything else is confusing. 

 21 bits wide, and limited to the range 0-0x10FFFF. Anything else is confusing.

It clearly is, because I assumed a unicode character was 32 bits wide, 
on the basis that that's what D had taught me :-\
 But this is D, and D is practical.

If it's going to be horribly inefficient to make it 21 bits, have the 
spec say "it's at least 21 bits" and alias it to uint.

 Actually, I loved it, and I'm not flaming (and I hope nobody does). You've made
 some excellent observations. But it's way too late to shape D that way now. In
 the future, the may well be languages which handle characters as true, pure,
 Unicode characters, but the world isn't fully Unicode-aware yet.

Yeah, it's the partly-there that's frustrating... my selfish side would 
be happy with just ASCII ;-). It just seems sometimes that if it's not 
easy and consistent to make things unicode-friendly, it won't happen. 
Especially in places where ASCII works fine, that's certainly easy and 
consistent! The current way seems to suggest that officially it's all 
unicode and happy, but (don't tell anyone) feel free to use ascii and 
assume chars are characters if you want. The standard library even does 
this, in std.string no less.

 To give an example of what I mean: Suppose you publish a web site containing a
 few musical symbols and a few exotic math symbols. (All valid Unicode). The sad
 fact is, such a website won't display properly on most people's browsers. To
get
 them to display properly, it is currently the responsibility of VIEWERS (rather
 than publishers) of web sites, to "obtain", somehow, the relevant fonts to make
 it work. Usually, obtaining such fonts costs money, so who's going to bother?
 It'd be like buying a book and opening it to find half the characters looking
 like black blobs until you pay more money to a font-designer. And so, web site
 designers tend NOT to use such characters on their web sites, prefering gif
 images which everyone can view. It's a vicious circle.

Yeah, fonts are a problem. My ideal world would have a (huge!) complete 
system default font (or one each for serif, sans, and mono) supplied 
with the OS, that would be the fallback for nonexistant characters.

 And realistically, things /are/ changing - but slowly. What D is doing is
moving
 in the right direction. The shift to full Unicode support in all things is a
 long way off yet, and to get there, we must move in small steps.

Yes. What gets me is that in a 5 years we'll (hopefully) be far enough 
down the unicode road that D's approach will seem backward, and I'll 
have to wait for someone to reinvent a similar language, with a more 
thorough unicode integration.
Ah well, maybe we'll get a strong boolean next time <g>

 Defining a char as a UTF-8 fragment may be a small step, but it is a very
 important and valuable one. At least we don't say "a char is a character in
some
 unspecified encoding", like some other languages do.

Yeah, definitely. I just wish it was easier to use and harder to ignore.

Sam

Jun 29 2004

Arcane Jill <Arcane_member pathlink.com> writes:

In article <cbs0bj$1vhf$1 digitaldaemon.com>, Sam McCall says...
I'm not saying it shouldn't be well-defined, but Java doesn't require 
the user to understand the intricacies of unicode encodings to 
manipulate strings.

Yes it does. Java chars operate in UTF-16. If you want to store the character
U+012345 in a Java string, you need to worry about UTF-16.


Probably not, although if reading an encoded string and then writing it 
again doesn't produce the same byte-output, I'm sure I could find a 
contrived example... copy-pasting text invalidating a digital signature?

That's what normalization is for. We'll have that soon in a forthcoming version
of etc.unicode.


Am I right in assuming a glyph can be fairly complicated?

Very much so. Especially if you're a font designer, since Unicode allows you to
munge any two glyphs together into a bigger glyph (a ligature). In practice,
fonts only provide a small subset of all possible ligatures (as you can
imagine!).


Yeah. It's just a bit disappointing after hearing "Strings are character 
arrays and everything about them makes sense" to realise that you either 
have to grok UTF-N or treat these "characters" as opaque... the 
advantages over a class are gone, and a class has reference semantics 
and member functions.

Not really. So long as you remember that characters <= 0x7F are OK in a char,
and that characters <= 0xFFFF are fine in a wchar, you're sorted.


Yeah, it's the partly-there that's frustrating... my selfish side would 
be happy with just ASCII ;-). It just seems sometimes that if it's not 
easy and consistent to make things unicode-friendly, it won't happen. 

Right, but it's a question of where that support comes from. To demand it all of
the language itself is asking /a lot/ from poor old Walter. If we can add it,
piece by piece, in libraries, I'd say we're not doing too badly.



Especially in places where ASCII works fine, that's certainly easy and 
consistent! The current way seems to suggest that officially it's all 
unicode and happy, but (don't tell anyone) feel free to use ascii

It /is/ okay to use ASCII. All valid ASCII also happens to be valid UTF-8. UTF-8
was designed that way.


and 
assume chars are characters if you want. The standard library even does 
this, in std.string no less.

So long as they make no assumptions about characters > 0x7F, that's perfectly
reasonable.


Yeah, fonts are a problem. My ideal world would have a (huge!) complete 
system default font (or one each for serif, sans, and mono) supplied 
with the OS, that would be the fallback for nonexistant characters.

I absolutely agree. There are free fonts which do this, but they don't display
well at small point-size because of something called "hinting", which apparently
you can't do without paying someone royalties because of some stupid IP
nonsense.


Yes. What gets me is that in a 5 years we'll (hopefully) be far enough 
down the unicode road that D's approach will seem backward, and I'll 
have to wait for someone to reinvent a similar language, with a more 
thorough unicode integration.

Yup. That's the way it goes. So what else shall we imagine for D++?

Jill

Jun 29 2004

Sam McCall <tunah.d tunah.net> writes:

Arcane Jill wrote:

 In article <cbs0bj$1vhf$1 digitaldaemon.com>, Sam McCall says...
 
I'm not saying it shouldn't be well-defined, but Java doesn't require 
the user to understand the intricacies of unicode encodings to 
manipulate strings.

 
 
 Yes it does. Java chars operate in UTF-16. If you want to store the character
 U+012345 in a Java string, you need to worry about UTF-16.

Whoops. Having never had to deal with this case (and taken a series of 
CS courses where we've iterated over chars countless times and they 
never mentioned this once :-\) I hadn't thought about this.
Okay, suppose java had a 21- or 32-bit char type.

Probably not, although if reading an encoded string and then writing it 
again doesn't produce the same byte-output, I'm sure I could find a 
contrived example... copy-pasting text invalidating a digital signature?

 
 That's what normalization is for. We'll have that soon in a forthcoming version
 of etc.unicode.

Of course... so no, the program shouldn't care, but...

Am I right in assuming a glyph can be fairly complicated?

 Very much so. Especially if you're a font designer, since Unicode allows you to
 munge any two glyphs together into a bigger glyph (a ligature). In practice,
 fonts only provide a small subset of all possible ligatures (as you can
 imagine!).

Glyphs aren't really a practical option as the logical element type of 
strings if they can't be easily represented as a fixed-width number, I'd 
imagine.

Yeah. It's just a bit disappointing after hearing "Strings are character 
arrays and everything about them makes sense" to realise that you either 
have to grok UTF-N or treat these "characters" as opaque... the 
advantages over a class are gone, and a class has reference semantics 
and member functions.

 
 
 Not really. So long as you remember that characters <= 0x7F are OK in a char,
 and that characters <= 0xFFFF are fine in a wchar, you're sorted.

But you can't do obvious "list-of-characters" things like index by 
character or even slice at any offset.

Yeah, it's the partly-there that's frustrating... my selfish side would 
be happy with just ASCII ;-). It just seems sometimes that if it's not 
easy and consistent to make things unicode-friendly, it won't happen. 

 
 
 Right, but it's a question of where that support comes from. To demand it all
of
 the language itself is asking /a lot/ from poor old Walter. If we can add it,
 piece by piece, in libraries, I'd say we're not doing too badly.

A decent unicode string class could be almost entirely library based, 
and would only require a little magic language support (for string 
literals). I might have a play around with one, on the assumption that 
if people find it useful, the horribly inefficient/incorrect bits could 
be fixed by people who know what they're doing ;)

 It /is/ okay to use ASCII. All valid ASCII also happens to be valid UTF-8.
UTF-8
 was designed that way.

So this means a char[] has two purposes depending on the app?
On the one hand, ASCII/Unicode being a per-app decision is fair enough.
On the other hand, that's not what it looked like to me in the docs, and 
  I still think unicode should be the "default".
Also, if people are going to use char[] as ASCII, they may write 
libraries that assume char[] is ASCII or worse, "a character in some 
unknown encoding".

and 
assume chars are characters if you want. The standard library even does 
this, in std.string no less.

 So long as they make no assumptions about characters > 0x7F, that's perfectly
 reasonable.

If it were documented as only working for ASCII, sure, otherwise you 
might assume it was a UTF-8 encoded character list. And I'm still not 
sure it'd be reasonable unless a wchar/dchar version was provided, how 
good is a language's unicode support if string manipulation functions 
only work on ascii?
Anyway:
/************************************
  * Construct translation table for translate().
  */

char[] maketrans(char[] from, char[] to)
     in
     {
	assert(from.length == to.length);
     }
     body
     {
	char[] t = new char[256];
	int i;

	for (i = 0; i < 256; i++)
	    t[i] = cast(char)i;

	for (i = 0; i < from.length; i++)
	    t[from[i]] = to[i];

	return t;
     }

Yeah, fonts are a problem. My ideal world would have a (huge!) complete 
system default font (or one each for serif, sans, and mono) supplied 
with the OS, that would be the fallback for nonexistant characters.

 I absolutely agree. There are free fonts which do this, but they don't display
 well at small point-size because of something called "hinting", which
apparently
 you can't do without paying someone royalties because of some stupid IP
 nonsense.

Ew, does that apply to creating fonts too? I thought most free fonts 
weren't manually hinted because it'd take forever, especially for 
unicode... I know freetype doesn't interpret hints by default, but 
there's a #define somewhere: "set this to 1 if you have permission from 
Apple Legal, or live somewhere sane". On my distro of choice, this was 
set by default :-D

Yes. What gets me is that in a 5 years we'll (hopefully) be far enough 
down the unicode road that D's approach will seem backward, and I'll 
have to wait for someone to reinvent a similar language, with a more 
thorough unicode integration.

 Yup. That's the way it goes. So what else shall we imagine for D++?

Fix C's broken precedence rules?

Sam

Jun 29 2004

Arcane Jill <Arcane_member pathlink.com> writes:

In article <cbsufo$a8u$1 digitaldaemon.com>, Sam McCall says...

Okay, suppose java had a 21- or 32-bit char type.

I'm led to believe there was a lot of debate about this. Some folk said that
Java's char could NOT be anything other that 16 bits wide because it was defined
that way and changing it would break things. Other folk looked under the hood of
the JVM and decided that actually it probably wouldn't break anything after all.
I don't know the ins and outs of it, but I gather the first lot won. The way
it's going to go is UTF-16 support, with functions like isLetter() taking an int
rather than a char.




Glyphs aren't really a practical option as the logical element type of 
strings if they can't be easily represented as a fixed-width number, I'd 
imagine.

Well, they can, with a bit of sneaky manipulation. The trick is to map only
those ones you actually USE to the unused codepoints between 0x110000 and
0xFFFFFFFF. So long as such a mapping stays within the application (like, don't
try to export it), you can indeed have one dchar per glyph. But it would be a
temporary one - not one you could write to a file, for example.

In general, you're right.



But you can't do obvious "list-of-characters" things like index by 
character or even slice at any offset.

True.




 It /is/ okay to use ASCII. All valid ASCII also happens to be valid UTF-8.
UTF-8
 was designed that way.

So this means a char[] has two purposes depending on the app?

I'm not sure I follow that. If you say char[] a = "hello world"; then you will
get a string containing eleven chars, and it will be both valid ASCII and valid
UTF-8. It's not like you have to choose.


On the one hand, ASCII/Unicode being a per-app decision is fair enough.

That isn't what I said. It's possible we may be misunderstanding each other
somehow.



Also, if people are going to use char[] as ASCII, they may write 
libraries that assume char[] is ASCII

Well, that would be a bug, of course. It's perfectly ok to choose only to store
ASCII characters in chars, but NOT perfectly okay to assume that chars will only
contain ASCII characters. Anyone writing a library containing such a bug should
simply be press-ganged into fixing it.



or worse, "a character in some 
unknown encoding".

Again, that would be a bug, and at odds with D's definition of what a char is.


If it were documented as only working for ASCII, sure, otherwise you 
might assume it was a UTF-8 encoded character list. And I'm still not 
sure it'd be reasonable unless a wchar/dchar version was provided, how 
good is a language's unicode support if string manipulation functions 
only work on ascii?

I'm not completely clear what functions you're talking about, as I haven't read
the source code for std.string. Am I correct in assuming that the quote below is
an extract?



Anyway:
/************************************
  * Construct translation table for translate().
  */

char[] maketrans(char[] from, char[] to)
     in
     {
	assert(from.length == to.length);
     }
     body
     {
	char[] t = new char[256];
	int i;

	for (i = 0; i < 256; i++)
	    t[i] = cast(char)i;

	for (i = 0; i < from.length; i++)
	    t[from[i]] = to[i];

	return t;
     }

This is a bug. ASCII stops at 0x7F. Characters above 0x7F are not ASCII. If this
function is intended as an ASCII-only function then (a) it should be documented
as such, and (b) it should leave all bytes >0x7F unmodified. Char values between
0x80 and 0xFF are resevered for the role they play in UTF-8. You CANNOT mess
with them (unless you're a UTF-8 engine).

You're right. I'd prefer to see a dchar version of this routine. Of course, you
wouldn't want a lookup table with 0x1100000 entries in it, but an associative
array should do the job.

Assuming this is from std.string, I guess one of us should report this as a bug.

Arcane Jill

Jun 30 2004

Sam McCall <tunah.d tunah.net> writes:

Arcane Jill wrote:

 In article <cbsufo$a8u$1 digitaldaemon.com>, Sam McCall says...
 
 
Okay, suppose java had a 21- or 32-bit char type.

 
 
 I'm led to believe there was a lot of debate about this. Some folk said that
 Java's char could NOT be anything other that 16 bits wide because it was
defined
 that way and changing it would break things. Other folk looked under the hood
of
 the JVM and decided that actually it probably wouldn't break anything after
all.
 I don't know the ins and outs of it, but I gather the first lot won. The way
 it's going to go is UTF-16 support, with functions like isLetter() taking an
int
 rather than a char.

Sorry, I meant "if java had originally been defined to have char being 
21 bits instead of 16, and storing a unicode codepoint instead of a 
UTF-16 fragment". All java's string manipulation stuff is char-based, 
and I was convinced there was a one-to-one correspondence between chars 
and characters (or possibly some too-big char values possible). Clearly 
I was mistaken, but if they had made chars 21 bits and kept the rest the 
same, it looks to me like it'd be just about perfect. (Well, I'm sure 
the APIs could be improved in minor ways, etc, but relatively speaking).

Glyphs aren't really a practical option as the logical element type of 
strings if they can't be easily represented as a fixed-width number, I'd 
imagine.

 
 Well, they can, with a bit of sneaky manipulation. The trick is to map only
 those ones you actually USE to the unused codepoints between 0x110000 and
 0xFFFFFFFF. So long as such a mapping stays within the application (like, don't
 try to export it), you can indeed have one dchar per glyph. But it would be a
 temporary one - not one you could write to a file, for example.

Ooh, clever :) But I don't see this working in a situation where you 
have dynamic libraries, for example.

It /is/ okay to use ASCII. All valid ASCII also happens to be valid UTF-8. UTF-8
was designed that way.

So this means a char[] has two purposes depending on the app?

 
 I'm not sure I follow that. If you say char[] a = "hello world"; then you will
 get a string containing eleven chars, and it will be both valid ASCII and valid
 UTF-8. It's not like you have to choose.

 On the one hand, ASCII/Unicode being a per-app decision is fair
 enough.

 That isn't what I said. It's possible we may be misunderstanding each 
 other somehow.

Sorry, what I originally meant:
 Especially in places where ASCII works fine, that's certainly easy and
 consistent! The current way seems to suggest that officially it's all
 unicode and happy, but (don't tell anyone) feel free to use ascii

Was that although unicode is the officially designated content of these 
types, char[] looks and feels (and the standard library uses it) like 
it's ASCII, and people won't bother to use unicode, because it's 
requires calling conversion functions and so on.
Especially since if you assume the language will take care of unicode 
for you like java (almost) does, then you'll end up with code that only 
works properly for ASCII data. That's probably all a lot of people will 
test it with. We should get unicode by default.

If it were documented as only working for ASCII, sure, otherwise you 
might assume it was a UTF-8 encoded character list. And I'm still not 
sure it'd be reasonable unless a wchar/dchar version was provided, how 
good is a language's unicode support if string manipulation functions 
only work on ascii?

 
 
 I'm not completely clear what functions you're talking about, as I haven't read
 the source code for std.string. Am I correct in assuming that the quote below
is
 an extract?

std.string.maketrans and std.string.translate.

Anyway:
/************************************
 * Construct translation table for translate().
 */

char[] maketrans(char[] from, char[] to)
    in
    {
	assert(from.length == to.length);
    }
    body
    {
	char[] t = new char[256];
	int i;

	for (i = 0; i < 256; i++)
	    t[i] = cast(char)i;

	for (i = 0; i < from.length; i++)
	    t[from[i]] = to[i];

	return t;
    }

 
 
 This is a bug. ASCII stops at 0x7F. Characters above 0x7F are not ASCII. If
this
 function is intended as an ASCII-only function then (a) it should be documented
 as such, and (b) it should leave all bytes >0x7F unmodified. Char values
between
 0x80 and 0xFF are resevered for the role they play in UTF-8. You CANNOT mess
 with them (unless you're a UTF-8 engine).

It's got a single-line explanation that doesn't mention encoding. I'll 
report it.
Sam

Jun 30 2004

Arcane Jill <Arcane_member pathlink.com> writes:

In article <cbts89$1poh$1 digitaldaemon.com>, Sam McCall says...

Sorry, I meant "if java had originally been defined to have char being 
21 bits instead of 16, and storing a unicode codepoint instead of a 
UTF-16 fragment". All java's string manipulation stuff is char-based, 
and I was convinced there was a one-to-one correspondence between chars 
and characters (or possibly some too-big char values possible). Clearly 
I was mistaken,

You weren't mistaken. You were spot on.

When Java was invented, Unicode stood at version 2.0. Possibly even earlier. At
that time, Unicode was touted as a 16-bit standard, and its maximum codepoint
was U+FFFF. At that time, there was no such thing as UTF-16. A Unicode char was
16 bits wide, and that was that. The only relevant 16-bit encodings were
UCS-16LE (which meant, emit the 16-bit codepoint low order byte first), and
UCS-16BE (which meant, emit the codepoint high order byte first).

Java simply took that on board and went with it. 

But as time went by, the Unicode folk realized that sixty five thousand
characters wasn't actually ENOUGH for all the world's scripts (including
historical ones that nobody ever uses any more), so they managed to find a way
to squeeze even more characters into that 16-bit model. They called it UTF-16,
and it extends the range from U+FFFF to U+10FFFF.

There has been some discussion on the Unicode public formum as to whether even
THIS limit will ever be extended. The Unicode Consortium currently are stating
flat out that there will never, ever, be Unicode characters with codepoints
above U+10FFFF. So, you can choose to believe them, or you can regard this
statement with as much credibility as the statements like "64K should be enough
memory for anyone" which were touted in the ZX81 days.

Java got caught out by the changing of the times. D's chars should probably be
wider than 21-bits, just in case.... (Not that I'm choosing to disbelieve the
Unicode Consortium of course!)  32 bits seems safe enough, for the forseeable
future.




but if they had made chars 21 bits and kept the rest the 
same, it looks to me like it'd be just about perfect.

Yes. I'll bet the Java folk thought that at the time.



Was that although unicode is the officially designated content of these 
types, char[] looks and feels (and the standard library uses it) like 
it's ASCII, and people won't bother to use unicode, because it's 
requires calling conversion functions and so on.

Well, of course UTF-8 was /designed/ to be compatible with ASCII, to ease
transition. That's not such a bad thing. Bugs will happen, of course, just as
they happen with any other encoding, but they can be found and fixed (and fixing
them will be easier, the more library support there is). It's just one of those
things which is going to get better with time.

Arcane Jill

Jun 30 2004

Sam McCall <tunah.d tunah.net> writes:

Arcane Jill wrote:

 In article <cbts89$1poh$1 digitaldaemon.com>, Sam McCall says...
 
 You weren't mistaken. You were spot on.

<snip>
Wow, thanks for that explanation, I really appreciate it :-)
 
 
but if they had made chars 21 bits and kept the rest the 
same, it looks to me like it'd be just about perfect.

 
 
 Yes. I'll bet the Java folk thought that at the time.
 

Okay, we'll stick with 32 bits. If they reach that in my lifetime, 
someone is going to die...

Anyway, by the time I work out how to efficiently character-index UTF-8 
in mutable stri]ngs, I'm sure I'll think unicode is thorougly overrated :-D
Sam

 Well, of course UTF-8 was /designed/ to be compatible with ASCII, to ease
 transition. That's not such a bad thing. Bugs will happen, of course, just as
 they happen with any other encoding, but they can be found and fixed (and
fixing
 them will be easier, the more library support there is). It's just one of those
 things which is going to get better with time.

Jun 30 2004

"Bent Rasmussen" <exo bent-rasmussen.info> writes:

 Frankly, yes, I use -1 as a "magic value" all the time, and do all sorts
 of ugly things when negative numbers are perfectly valid. This is

That's true. In Standard ML you could do

val index : 'a -> int option

Then if 'a exists return SOME(x), if not, return NONE. If a function has a
an option type as a domain it has to deal with both cases.

In D, you'd either use a magic value like -1 or encapsulate values in a
class; then null is NONE and not null is SOME.

 I think arrays should become fully reference types, for the same reason
 as strings above. Yes, this would probably mean double indirection,
 arrays would be a pointer to the (length,data pointer) struct that they
 currently are.

But you can go ahead and create a class for lists, no problem at all.
Neither Phobos nor DTL has fully hatched yet, so we'll see what happens.

Jun 29 2004

Sam McCall <tunah.d tunah.net> writes:

Bent Rasmussen wrote:

Frankly, yes, I use -1 as a "magic value" all the time, and do all sorts
of ugly things when negative numbers are perfectly valid. This is

 
 
 That's true. In Standard ML you could do
 
 val index : 'a -> int option
 
 Then if 'a exists return SOME(x), if not, return NONE. If a function has a
 an option type as a domain it has to deal with both cases.

McCall's Law the First:
Every feature of a "traditional" language is a special case of a feature 
of every functional language.
McCall's Law the Second:
Every feature of every functional language is a special case of the only 
feature of Lisp.

 In D, you'd either use a magic value like -1 or encapsulate values in a
 class; then null is NONE and not null is SOME.

But this isn't ML. I will get some weird looks, and nobody will touch my 
libraries ;-)
Besides, that's exactly equivalent (AFAICS) to a reference type, 
assuming no pointer arithmetic and casting shenanigans. If this _is_ 
useful, is dereferencing one more pointer to access arrays really going 
to kill us? Or is there some case where the value-type-kinda nature of 
arrays is useful?

 But you can go ahead and create a class for lists, no problem at all.
 Neither Phobos nor DTL has fully hatched yet, so we'll see what happens.

I'm beginning to think this is the only answer. But lists are such a 
fundamental type, using a non-standard list type would be a pain. I 
can't see room for another list type, so I guess I'll end up using DTL's 
list everywhere, and hope everyone does the same. But it does seem a 
waste of such powerful arrays in the language.

Sam

Jun 29 2004

Regan Heath <regan netwin.co.nz> writes:

On Wed, 30 Jun 2004 03:20:54 +1200, Sam McCall <tunah.d tunah.net> wrote:

 Bent Rasmussen wrote:

 Frankly, yes, I use -1 as a "magic value" all the time, and do all 
 sorts
 of ugly things when negative numbers are perfectly valid. This is


 That's true. In Standard ML you could do

 val index : 'a -> int option

 Then if 'a exists return SOME(x), if not, return NONE. If a function 
 has a
 an option type as a domain it has to deal with both cases.

 McCall's Law the First:
 Every feature of a "traditional" language is a special case of a feature 
 of every functional language.
 McCall's Law the Second:
 Every feature of every functional language is a special case of the only 
 feature of Lisp.

 In D, you'd either use a magic value like -1 or encapsulate values in a
 class; then null is NONE and not null is SOME.

 But this isn't ML. I will get some weird looks, and nobody will touch my 
 libraries ;-)
 Besides, that's exactly equivalent (AFAICS) to a reference type, 
 assuming no pointer arithmetic and casting shenanigans. If this _is_ 
 useful, is dereferencing one more pointer to access arrays really going 
 to kill us? Or is there some case where the value-type-kinda nature of 
 arrays is useful?

I think the current value-type-kinda nature of arrays is good, it just 
needs the 2 tweaks I mentioned to make it consistent.

 But you can go ahead and create a class for lists, no problem at all.
 Neither Phobos nor DTL has fully hatched yet, so we'll see what happens.

 I'm beginning to think this is the only answer. But lists are such a 
 fundamental type, using a non-standard list type would be a pain. I 
 can't see room for another list type, so I guess I'll end up using DTL's 
 list everywhere, and hope everyone does the same. But it does seem a 
 waste of such powerful arrays in the language.

 Sam



-- 
Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/

Jun 29 2004

Matthias Becker <Matthias_member pathlink.com> writes:

 Yeah, it's called std::string, and it's more or less the default.

And it's crap. IMNSHO.

You'll get no arguments from me there. D got it right in not having a string
class. I didn't think that at first, but I've come round to the D way of
thinking. The problem with a string class is that you can't add new member
functions to it. (Oh, you may be able to subclass String, if it's not final. Oh
wait - it /is/ final in Java). With char[] arrays, you CAN add new functions.

Why do you need to add member-functions to a string class, but you don't on
char-arrays? Why are global functions OK for char-arrays, but aren't for a
string class?
This is some kind of strange. Just because another notation? Does taht realy
matters? There are languages where you can wirte:

object.function()
and
function(objekt)

and it measn the same.

I don't get your point.

 Don't compare arrays to null.  Don't try to differentiate between empty 
 and nonexistent.

Fine and dandy EXCEPT we *need* to differentiate between empty and 
non-existant strings.

Why? Do we also need a way to differentiate between empty and non-existent ints?

In D, there is no such thing as a non-existent int; there is no such thing as a
non-existent struct; and there is no such thing as a non-existent string.

Something like that would be cool, just like option in SML.
I think I have to write something like this.

-- Matthias Becker

Jun 29 2004

"Bent Rasmussen" <exo bent-rasmussen.info> writes:

In D, there is no such thing as a non-existent int; there is no such


thing as a
non-existent struct; and there is no such thing as a non-existent string.

 Something like that would be cool, just like option in SML.
 I think I have to write something like this.


Perhaps,

class Option(VALUE)
{
    VALUE Item;
}

template SOME(VALUE)
{
    Option!(VALUE) SOME(VALUE x)
    {
        Option!(VALUE) e = new Option!(VALUE)();
        e.Item = x;
        return e;
    }
}

alias Option!(uint) INDEX;


class Array(VALUE)
{
    ...

    INDEX Index(VALUE x)
    {
        foreach (uint i, VALUE z; Items)
        {
            if (x == z)
            {
                return SOME!(VALUE)(i);
            }
        }
        return null;
    }
}

Somewhat non-ideal though.

Jun 29 2004

Farmer <itsFarmer. freenet.de> writes:

Arcane Jill <Arcane_member pathlink.com> wrote in
news:cbr53s$op8$1 digitaldaemon.com: 

[snip]
 Why? Do we also need a way to differentiate between empty and
 non-existent ints? 

Yes, we do. 
A slightly *naive*  but definitely opinionated soul already suggested exactly 
this. Unfortunately, this is not implementable without unacceptable 
performance loss. So we cannot have this.

[snip]
 
 Maybe the real solution would be to make it a compile error to assign an
 array with null, or to compare it with null. This would then force
 people to say what they mean, and all such problems would go away.

I agree, that would help to avoid some confusion. Unfortunately, people would 
be forced to either say 'I mean empty' or to shut up completely and use sth. 
completely different.


Farmer.

Jun 29 2004

Sam McCall <tunah.d tunah.net> writes:

Farmer wrote:

 Arcane Jill <Arcane_member pathlink.com> wrote in
 news:cbr53s$op8$1 digitaldaemon.com: 
 
Maybe the real solution would be to make it a compile error to assign an
array with null, or to compare it with null. This would then force
people to say what they mean, and all such problems would go away.

 
 
 I agree, that would help to avoid some confusion. Unfortunately, people would 
 be forced to either say 'I mean empty' or to shut up completely and use sth. 
 completely different.

We don't have array literals, so we can't do this:
foo( [] );
At the moment we can do this:
foo( null );
If we outlawed using nulls as arrays, we'd be left with
foo( new int[0] )
which is maybe a bit messy?
Sam

Jun 29 2004

Farmer <itsFarmer. freenet.de> writes:

Sam McCall <tunah.d tunah.net> wrote in
news:cbsupg$anb$1 digitaldaemon.com: 

 Farmer wrote:
 
 Arcane Jill <Arcane_member pathlink.com> wrote in
 news:cbr53s$op8$1 digitaldaemon.com: 
 
Maybe the real solution would be to make it a compile error to assign
an array with null, or to compare it with null. This would then force
people to say what they mean, and all such problems would go away.

 
 
 I agree, that would help to avoid some confusion. Unfortunately, people
 would be forced to either say 'I mean empty' or to shut up completely
 and use sth. completely different.

 We don't have array literals, so we can't do this:
 foo( [] );
 At the moment we can do this:
 foo( null );
 If we outlawed using nulls as arrays, we'd be left with
 foo( new int[0] )
 which is maybe a bit messy?
 Sam

What's messy here? 
A bit more typing, that's it.

One disadvantage of
   foo( null );
is,  that there is no type information. 


If you had
    	foo(int[])
    	foo(float[])
you would need a cast, because it gets ambiguous.


Farmer.

Jun 30 2004

Matthias Becker <Matthias_member pathlink.com> writes:

 A 'null array' is a completely arbitrary concept that has been 
 extrapolated from undefined behaviour. :)

 It may be undefined, but I believe it is required.

 Why?  C++ gets along without them just fine, and every C derivant I know 
 of gets along fine without allowing primitive type returns to signify 
 nonexistence.

 Functions which returns structs cannot return null either.

Thus why just about no-one ever does this (in C). They all return a 
pointer to a struct.

Because copying a struct costs much more than just copying a pointer to it. In
C++ you have references for things like this, which can't be NULL.

 The soln IMO is either to make the current behaviour official and 
 consistent, or to change the behaviour, make that official and provide 
 another way to tell null apart from an empty string.

 Farmer's test reports pretty consistent results if you suppose that 
 comparing arrays to null is ill-formed:

      empty1.length == 0    is true
      empty1 == ""          is true
      empty2.length == 0    is true
      empty2 == ""          is true
      empty3.length == 0    is true
      empty3 == ""          is true

 Don't compare arrays to null.  Don't try to differentiate between empty 
 and nonexistent.

Fine and dandy EXCEPT we *need* to differentiate between empty and 
non-existant strings.

 D arrays simply do not work that way.

In that case we need an array specialisation for strings, so I'll have to 
write my own. This defeats the purpose of char[] in the first place, which 
was, to be a better more consistent  string handling method than in 
possible in c/c++.

Could you please make some real world examples, where you need empty strings and
null-strings?

-- Matthias Becker

Jun 29 2004

Regan Heath <regan netwin.co.nz> writes:

On Tue, 29 Jun 2004 15:39:15 +0000 (UTC), Matthias Becker 
<Matthias_member pathlink.com> wrote:
 A 'null array' is a completely arbitrary concept that has been
 extrapolated from undefined behaviour. :)

 It may be undefined, but I believe it is required.

 Why?  C++ gets along without them just fine, and every C derivant I 
 know
 of gets along fine without allowing primitive type returns to signify
 nonexistence.

 Functions which returns structs cannot return null either.

 Thus why just about no-one ever does this (in C). They all return a
 pointer to a struct.

 Because copying a struct costs much more than just copying a pointer to 
 it. In
 C++ you have references for things like this, which can't be NULL.

Thus why I dont use references either when I need the ability to say it's 
NULL.

 The soln IMO is either to make the current behaviour official and
 consistent, or to change the behaviour, make that official and provide
 another way to tell null apart from an empty string.

 Farmer's test reports pretty consistent results if you suppose that
 comparing arrays to null is ill-formed:

      empty1.length == 0    is true
      empty1 == ""          is true
      empty2.length == 0    is true
      empty2 == ""          is true
      empty3.length == 0    is true
      empty3 == ""          is true

 Don't compare arrays to null.  Don't try to differentiate between empty
 and nonexistent.

 Fine and dandy EXCEPT we *need* to differentiate between empty and
 non-existant strings.

 D arrays simply do not work that way.

 In that case we need an array specialisation for strings, so I'll have 
 to
 write my own. This defeats the purpose of char[] in the first place, 
 which
 was, to be a better more consistent  string handling method than in
 possible in c/c++.

 Could you please make some real world examples, where you need empty 
 strings and
 null-strings?

Sure thing, pls see my reply to andy's post.. there has to be an easy way 
to direct you to a post but I dont know how.. I posted it 3 or 4 posts ago 
if you sort flat and by date.

Regan.

-- 
Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/

Jun 29 2004

Sean Kelly <sean f4.ca> writes:

In article <opsab6o5rl5a2sq9 digitalmars.com>, Regan Heath says...
Fine and dandy EXCEPT we *need* to differentiate between empty and 
non-existant strings.

Why?  It seems to me that this behavior would also require arrays to be
initialized with new rather than resizing from zero using the .length parameter.
And this would result in a ton of extra coding--either in clauses that errored
on null arrays or initialization code to handle both cases.  No thanks.  If this
happened I'd stil using built-in arrays and write a class for the purpose.

Sean

Jun 29 2004

Farmer <itsFarmer. freenet.de> writes:

Sean Kelly <sean f4.ca> wrote in news:cbs4ju$26aj$1 digitaldaemon.com:

 In article <opsab6o5rl5a2sq9 digitalmars.com>, Regan Heath says...
Fine and dandy EXCEPT we *need* to differentiate between empty and 
non-existant strings.

 
 Why?  It seems to me that this behavior would also require arrays to be
 initialized with new rather than resizing from zero using the .length
 parameter. And this would result in a ton of extra coding--either in
 clauses that errored on null arrays or initialization code to handle
 both cases. [...]

The .length parameter would still work with null-arrays (as they currently 
do). 
But why would you want to initialize an array to null/empty and then resize 
it, instead of 'newing' it with the correct size in first place? 
My CPU gets hot enough, no need for extra heat-up cycles :-)

Extra coding is not required if you don't need null-arrays: if some user 
passes a null-array, the user gets a nice access violation/array bounds 
exception and will quickly learn to not pass null-arrays to such functions. A 
quick check in the DbC section of your function would do the job, too. (But I 
suppose, the user might not adapt that fast that way :-)

If your function should deal with both null-arrays and empty-arrays, no extra 
code is required, since the .length property can be accessed for both null-
arrays and emtpy-arrays.


[...] No thanks.  If this happened I'd stil using built-in arrays
 and write a class for the purpose. 

I came to the same conclusion, wrapping a build-in array in a class or struct 
to adapt its behaviour to the specific needs is one (if not the) way to go.


Farmer.

Jun 29 2004

Sean Kelly <sean f4.ca> writes:

In article <Xns9517F3F654C29itsFarmer 63.105.9.61>, Farmer says...
The .length parameter would still work with null-arrays (as they currently 
do). 
But why would you want to initialize an array to null/empty and then resize 
it, instead of 'newing' it with the correct size in first place?

Consider the following:

char[] str = new char[100];
str.length = 0; // A
str.length = 5; // B
str = new char[10]; // C

In A, AFAIK it's legal for the compiler to retain the memory and merely change
the length parameter for the string.  B then just changes the length parameter
again, and no reallocation is performed.  C forces a reallocation even if the
array already has the (hidden) capacity in place.  Lacking allocators, this is a
feature I consider rather nice in D.

Extra coding is not required if you don't need null-arrays: if some user 
passes a null-array, the user gets a nice access violation/array bounds 
exception and will quickly learn to not pass null-arrays to such functions. A 
quick check in the DbC section of your function would do the job, too. (But I 
suppose, the user might not adapt that fast that way :-)

I originally thought D worked the way you describe and added DBC clauses to all
my functions to check for null array parameters.  After some testing I realized
I'd been mistaken and happily removed most of these clauses.  The result IMO was
tighter, cleaner code that was easier to understand.  I suppose it's really a
matter of opinion.  I like that arrays work the same as the other primitive
types.  

If your function should deal with both null-arrays and empty-arrays, no extra 
code is required, since the .length property can be accessed for both null-
arrays and emtpy-arrays.

Could it?  I suppose so, but the concept seems a tad odd.  I kind of expect none
of the parameters (besides sizeof, perhaps) to work for dynamic types that have
not been initialized.  Though perhaps that's the C way of thinking.


Sean

Jun 29 2004

Farmer <itsFarmer. freenet.de> writes:

Sean Kelly <sean f4.ca> wrote in news:cbsqnf$547$1 digitaldaemon.com:

 In article <Xns9517F3F654C29itsFarmer 63.105.9.61>, Farmer says...
The .length parameter would still work with null-arrays (as they
currently do). 
But why would you want to initialize an array to null/empty and then
resize it, instead of 'newing' it with the correct size in first place?

 
 Consider the following:
 
 char[] str = new char[100];
 str.length = 0; // A
 str.length = 5; // B
 str = new char[10]; // C
 
 In A, AFAIK it's legal for the compiler to retain the memory and merely
 change the length parameter for the string.  B then just changes the
 length parameter again, and no reallocation is performed.  C forces a
 reallocation even if the array already has the (hidden) capacity in
 place.  Lacking allocators, this is a feature I consider rather nice in
 D. 

I agree with you that this feature is quite useful.
The problem with (A) is, that DMD doesn't do that; the function 
'arraysetlength' explicitly checks whether the new length is null, and if so 
destroys the data pointer. Furthermore it seems that it is not allowed to 
call the .length property for null-arrays.
How do I know? Well the function in the phobos file internal\gc.d
    	byte[] _d_arraysetlength(uint newlength, uint sizeelem, Array *p)
contains this assertion
    	assert(!p.length || p.data);

Ironically, this assertion permits, that the data pointer is null, but the 
length is greater than 0. 



 
Extra coding is not required if you don't need null-arrays: if some user
passes a null-array, the user gets a nice access violation/array bounds 
exception and will quickly learn to not pass null-arrays to such
functions. A quick check in the DbC section of your function would do
the job, too. (But I suppose, the user might not adapt that fast that
way :-) 

 
 I originally thought D worked the way you describe and added DBC clauses
 to all my functions to check for null array parameters.  After some
 testing I realized I'd been mistaken and happily removed most of these
 clauses.  The result IMO was tighter, cleaner code that was easier to
 understand.  I suppose it's really a matter of opinion.  I like that
 arrays work the same as the other primitive types.  

I always love it when this happens. Code that isn't written, is bug-free, 
maintainable, and super-fast ;-)



 
If your function should deal with both null-arrays and empty-arrays, no
extra code is required, since the .length property can be accessed for
both null- arrays and emtpy-arrays.

 
 Could it?  I suppose so, but the concept seems a tad odd.  I kind of
 expect none of the parameters (besides sizeof, perhaps) to work for
 dynamic types that have not been initialized.  Though perhaps that's the
 C way of thinking. 

Yes, I think it is bit odd, too. For reading the length property it makes 
sense, but for resizing it is more questionable. But I am definetely thinking 
the C way here.


Farmer.

Jun 30 2004

Regan Heath <regan netwin.co.nz> writes:

On Wed, 30 Jun 2004 22:57:02 +0000 (UTC), Farmer <itsFarmer. freenet.de> 
wrote:
 Sean Kelly <sean f4.ca> wrote in news:cbsqnf$547$1 digitaldaemon.com:

 In article <Xns9517F3F654C29itsFarmer 63.105.9.61>, Farmer says...
 The .length parameter would still work with null-arrays (as they
 currently do).
 But why would you want to initialize an array to null/empty and then
 resize it, instead of 'newing' it with the correct size in first place?

 Consider the following:

 char[] str = new char[100];
 str.length = 0; // A
 str.length = 5; // B
 str = new char[10]; // C

 In A, AFAIK it's legal for the compiler to retain the memory and merely
 change the length parameter for the string.  B then just changes the
 length parameter again, and no reallocation is performed.  C forces a
 reallocation even if the array already has the (hidden) capacity in
 place.  Lacking allocators, this is a feature I consider rather nice in
 D.

 I agree with you that this feature is quite useful.
 The problem with (A) is, that DMD doesn't do that; the function
 'arraysetlength' explicitly checks whether the new length is null, and 
 if so
 destroys the data pointer.

Provably correct. :)

--[test.d]--
struct array { int length; void *data; }
void main() {
	char[] p = new char[100];
	array *s = cast(array *)&p;
	
	printf("%d\n",s.length);
	printf("%08x\n",s.data);
	p.length = 0;
	printf("%d\n",s.length);
	printf("%08x\n",s.data);
}

prints

100
007d2f80
0
00000000

 Furthermore it seems that it is not allowed to
 call the .length property for null-arrays.

I can go:

p.length = 0;
p.length = 0;
p.length = 0;
p.length = 0;

no problem? is that what you mean't?

 How do I know? Well the function in the phobos file internal\gc.d
     	byte[] _d_arraysetlength(uint newlength, uint sizeelem, Array *p)
 contains this assertion
     	assert(!p.length || p.data);

perhaps this function is not called if (p.length == 0 && newlength == 0) 
one level higher?

 Ironically, this assertion permits, that the data pointer is null, but 
 the
 length is greater than 0.

which is technically impossible.

Regan

 Extra coding is not required if you don't need null-arrays: if some 
 user
 passes a null-array, the user gets a nice access violation/array bounds
 exception and will quickly learn to not pass null-arrays to such
 functions. A quick check in the DbC section of your function would do
 the job, too. (But I suppose, the user might not adapt that fast that
 way :-)

 I originally thought D worked the way you describe and added DBC clauses
 to all my functions to check for null array parameters.  After some
 testing I realized I'd been mistaken and happily removed most of these
 clauses.  The result IMO was tighter, cleaner code that was easier to
 understand.  I suppose it's really a matter of opinion.  I like that
 arrays work the same as the other primitive types.

 I always love it when this happens. Code that isn't written, is bug-free,
 maintainable, and super-fast ;-)



 If your function should deal with both null-arrays and empty-arrays, no
 extra code is required, since the .length property can be accessed for
 both null- arrays and emtpy-arrays.

 Could it?  I suppose so, but the concept seems a tad odd.  I kind of
 expect none of the parameters (besides sizeof, perhaps) to work for
 dynamic types that have not been initialized.  Though perhaps that's the
 C way of thinking.

 Yes, I think it is bit odd, too. For reading the length property it makes
 sense, but for resizing it is more questionable. But I am definetely 
 thinking
 the C way here.


 Farmer.



-- 
Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/

Jun 30 2004

Farmer <itsFarmer. freenet.de> writes:

Sorry, I've posted rubbish.

Farmer.

Jul 01 2004

Sean Kelly <sean f4.ca> writes:

In article <Xns95199C928F73itsFarmer 63.105.9.61>, Farmer says...
How do I know? Well the function in the phobos file internal\gc.d
    	byte[] _d_arraysetlength(uint newlength, uint sizeelem, Array *p)
contains this assertion
    	assert(!p.length || p.data);

Ironically, this assertion permits, that the data pointer is null, but the 
length is greater than 0. 

I read it that the assertion requires either the length to be zero or the length
to be nonzero and the data to be non-null.  This seems to correspond to my
assumption that D allows for zero length arrays to retain allocated memory.

Sean

Jun 30 2004

Regan Heath <regan netwin.co.nz> writes:

On Thu, 1 Jul 2004 04:37:37 +0000 (UTC), Sean Kelly <sean f4.ca> wrote:

 In article <Xns95199C928F73itsFarmer 63.105.9.61>, Farmer says...
 How do I know? Well the function in the phobos file internal\gc.d
    	byte[] _d_arraysetlength(uint newlength, uint sizeelem, Array *p)
 contains this assertion
    	assert(!p.length || p.data);

 Ironically, this assertion permits, that the data pointer is null, but 
 the
 length is greater than 0.

 I read it that the assertion requires either the length to be zero or 
 the length
 to be nonzero and the data to be non-null.  This seems to correspond to 
 my
 assumption that D allows for zero length arrays to retain allocated 
 memory.

It may very well allow it (in this code, at this level), but how do you do 
it?

Regan

-- 
Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/

Jun 30 2004

Farmer <itsFarmer. freenet.de> writes:

Sean Kelly <sean f4.ca> wrote in news:cc04eh$2l5e$1 digitaldaemon.com:

 In article <Xns95199C928F73itsFarmer 63.105.9.61>, Farmer says...
How do I know? Well the function in the phobos file internal\gc.d
         byte[] _d_arraysetlength(uint newlength, uint sizeelem, Array
         *p) 
contains this assertion
         assert(!p.length || p.data);

Ironically, this assertion permits, that the data pointer is null, but
the length is greater than 0. 


Rubbish.

 
 I read it that the assertion requires either the length to be zero or
 the length to be nonzero and the data to be non-null.
 This seems to
 correspond to my assumption that D allows for zero length arrays to
 retain allocated memory. 
 
 Sean
 

I blush for shame, this is too embarrassing. What a whimp I am, I can't do 
simple boolean algebra. What must years of Java(TM) programming hav done to 
me? 

On the upside, it means that I was wrong. No assertion discourages null or 
empty-arrays. 

Yes, memory for zero length arrays is retained, if the array is sliced.

Jul 01 2004

Regan Heath <regan netwin.co.nz> writes:

On Tue, 29 Jun 2004 16:15:58 +0000 (UTC), Sean Kelly <sean f4.ca> wrote:

 In article <opsab6o5rl5a2sq9 digitalmars.com>, Regan Heath says...
 Fine and dandy EXCEPT we *need* to differentiate between empty and
 non-existant strings.

 Why?  It seems to me that this behavior would also require arrays to be
 initialized with new rather than resizing from zero using the .length 
 parameter.

Nope. It already works, except for 2 inconsistencies (see the original 
post)

 And this would result in a ton of extra coding--either in clauses that 
 errored
 on null arrays or initialization code to handle both cases.  No thanks.

Not true. You can/could still simply check the length vs 0 if you want to 
treat null and empty the same.

 If this
 happened I'd stil using built-in arrays and write a class for the 
 purpose.

? 'stil' == 'stop' ?

Regan

-- 
Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/

Jun 29 2004

Farmer <itsFarmer. freenet.de> writes:

Andy Friesen <andy ikagames.com> wrote in
news:cbpsi6$1u7d$1 digitaldaemon.com: 

[snip]
 C++ containers cannot represent null either.  D will (and does) get 
 along just fine if its array type works the same way.

[snip]

And probably that is one reason why programmers don't use std::vector.

<rant>
If I wanted to use sth. like std::vector, I'd simply use them in D. But if I 
want to get to the *bare metal*, I want the *bare metal*. No less. I don't 
want sth. that is similar to std::vector (just better tuned for performance), 
tightly integrated (or coupled, in book) with the language, with some odd 
syntax and superfluous but still incomplete properties like array.sort. Even 
if that means that I have to code a bubble sort, all the time myself ;-)
<end of rant>


Farmer.

Jun 29 2004

Andy Friesen <andy ikagames.com> writes:

Farmer wrote:

 Andy Friesen <andy ikagames.com> wrote in
 news:cbpsi6$1u7d$1 digitaldaemon.com: 
 
 [snip]
 
C++ containers cannot represent null either.  D will (and does) get 
along just fine if its array type works the same way.

 
 [snip]
 
 And probably that is one reason why programmers don't use std::vector.

They don't?  Do you have a source to back that up?  As far as I've ever 
noticed, bigwig C++ people have always made it clear that std::vector is 
preferable over an array and that std::string is preferable to a char*.

The concern for distinguishing empty vs null has quite honestly never 
even occurred to me until it was mentioned here.  Think about expressing 
the distinction a different way and move on.

I do apologize if I sound naive, (I'll assume that comment was directed 
at me :) ) but I honestly can't comprehend a situation in which the 
distinction is going to have any measurable cost on clarity, let alone 
performance.

  -- andy

Jun 29 2004

Regan Heath <regan netwin.co.nz> writes:

On Tue, 29 Jun 2004 18:16:25 -0700, Andy Friesen <andy ikagames.com> wrote:
 Farmer wrote:

 Andy Friesen <andy ikagames.com> wrote in
 news:cbpsi6$1u7d$1 digitaldaemon.com: [snip]

 C++ containers cannot represent null either.  D will (and does) get 
 along just fine if its array type works the same way.

 [snip]

 And probably that is one reason why programmers don't use std::vector.

 They don't?  Do you have a source to back that up?  As far as I've ever 
 noticed, bigwig C++ people have always made it clear that std::vector is 
 preferable over an array and that std::string is preferable to a char*.

 The concern for distinguishing empty vs null has quite honestly never 
 even occurred to me until it was mentioned here.  Think about expressing 
 the distinction a different way and move on.

Sure.. can you show me how. I am having trouble doing it, it must be my C 
fixated brain.
Pls use the example in the post I made to you earlier today..

 I do apologize if I sound naive, (I'll assume that comment was directed 
 at me :) )

LOL.. I thought it was me..

 but I honestly can't comprehend a situation in which the distinction is 
 going to have any measurable cost on clarity, let alone performance.

I think my example in my previous post does show a cost on either or both.
Basically I think a reference type allows me to *express* more than a 
value type does.

Regan.

-- 
Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/

Jun 29 2004

Farmer <itsFarmer. freenet.de> writes:

Andy Friesen <andy ikagames.com> wrote in
news:cbt41t$i1n$1 digitaldaemon.com: 

 Farmer wrote:
 
 Andy Friesen <andy ikagames.com> wrote in
 news:cbpsi6$1u7d$1 digitaldaemon.com: 
 
 [snip]
 
C++ containers cannot represent null either.  D will (and does) get 
along just fine if its array type works the same way.

 
 [snip]
 
 And probably that is one reason why programmers don't use std::vector.

 
 They don't?  Do you have a source to back that up?  As far as I've ever 
 noticed, bigwig C++ people have always made it clear that std::vector is
 preferable over an array and that std::string is preferable to a char*.

Sorry, my statement was badly expressed. I meant it more like "And probably 
that is another reason why programmers often refrain from using std:vector."

Of course, programmers use std::vector, otherwise I'd said that I am not a 
programmer ;-)


 
 The concern for distinguishing empty vs null has quite honestly never 
 even occurred to me until it was mentioned here.  Think about expressing
 the distinction a different way and move on.

I expect that this concern will rarely come up, and that's exactly why I 
brought it up.
I would move on, but I see no compelling reason to express it in a different 
way.


 
 I do apologize if I sound naive, (I'll assume that comment was directed 
 at me :) ) but I honestly can't comprehend a situation in which the 
 distinction is going to have any measurable cost on clarity, let alone 
 performance.
 
   -- andy

I was naive in believing that it is obvious what posts I referred to.
I was thinking e.g. of post
    	http://www.digitalmars.com/drn-bin/wwwnews/23126  
Btw, the author of this post, happens to use the term naive, so he shouldn't 
take offense. 

But in fact, this post doesn't really advocate 'NaN' for ints, rather
    	http://www.digitalmars.com/drn-bin/wwwnews/23100
does so.

Sorry, andy and sorry Regan. 
You didn't suggest 'NaN' for ints. So no f(l)ame(s) for you...


Farmer.

Jun 30 2004

"Bent Rasmussen" <exo bent-rasmussen.info> writes:

I hope you're not referring to the quick hack I posted. It was meant to
express the *conceptual* problem of returning a null value for a value
type -- *not* a practical one. It was mentioned in the context of the ML
option type.

ps. Both links are broken.

Jun 30 2004

Farmer <itsFarmer. freenet.de> writes:

"Bent Rasmussen" <exo bent-rasmussen.info> wrote in 
news:cbvk1g$1r9b$1 digitaldaemon.com:

 I hope you're not referring to the quick hack I posted. It was meant to
 express the *conceptual* problem of returning a null value for a value
 type -- *not* a practical one. It was mentioned in the context of the ML
 option type.
 
 ps. Both links are broken.
 
 

You suggested none's for int's but you don't use the term naive in your 
posts. 
So no f(l)ame(s) for you, either.


Try these:
http://www.digitalmars.com/drn-bin/wwwnews?D/29213
http://www.digitalmars.com/drn-bin/wwwnews?D/23120



Farmer.

Jul 01 2004

Regan Heath <regan netwin.co.nz> writes:

On Wed, 30 Jun 2004 22:57:04 +0000 (UTC), Farmer <itsFarmer. freenet.de> 
wrote:
 Andy Friesen <andy ikagames.com> wrote in
 news:cbt41t$i1n$1 digitaldaemon.com:

 Farmer wrote:

 Andy Friesen <andy ikagames.com> wrote in
 news:cbpsi6$1u7d$1 digitaldaemon.com:

 [snip]

 C++ containers cannot represent null either.  D will (and does) get
 along just fine if its array type works the same way.

 [snip]

 And probably that is one reason why programmers don't use std::vector.

 They don't?  Do you have a source to back that up?  As far as I've ever
 noticed, bigwig C++ people have always made it clear that std::vector is
 preferable over an array and that std::string is preferable to a char*.

 Sorry, my statement was badly expressed. I meant it more like "And 
 probably
 that is another reason why programmers often refrain from using 
 std:vector."

 Of course, programmers use std::vector, otherwise I'd said that I am not 
 a
 programmer ;-)


 The concern for distinguishing empty vs null has quite honestly never
 even occurred to me until it was mentioned here.  Think about expressing
 the distinction a different way and move on.

 I expect that this concern will rarely come up, and that's exactly why I
 brought it up.
 I would move on, but I see no compelling reason to express it in a 
 different
 way.


 I do apologize if I sound naive, (I'll assume that comment was directed
 at me :) ) but I honestly can't comprehend a situation in which the
 distinction is going to have any measurable cost on clarity, let alone
 performance.

   -- andy

 I was naive in believing that it is obvious what posts I referred to.
 I was thinking e.g. of post
     	http://www.digitalmars.com/drn-bin/wwwnews/23126
 Btw, the author of this post, happens to use the term naive, so he 
 shouldn't
 take offense.

Was it me.. these links don't work for me :(

 But in fact, this post doesn't really advocate 'NaN' for ints, rather
     	http://www.digitalmars.com/drn-bin/wwwnews/23100
 does so.

linky no worky :(

 Sorry, andy and sorry Regan.
 You didn't suggest 'NaN' for ints. So no f(l)ame(s) for you...

Aww.. AFAIKS we either need a NaN value for all value types, OR, we use 
reference types instead.

Arrays in D act just like reference types (except for the inconsitencies 
you have shown) even tho they aren't technically, what I want to know is, 
what effect will changes to those inconsistencies actually have to people 
who do not need to be able to tell a null array from an empty one?

Regan.

-- 
Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/

Jun 30 2004

Farmer <itsFarmer. freenet.de> writes:

Regan Heath <regan netwin.co.nz> wrote in 
news:opsafmvd1m5a2sq9 digitalmars.com:

[snip]

 Arrays in D act just like reference types (except for the inconsitencies 
 you have shown) even tho they aren't technically, what I want to know is, 
 what effect will changes to those inconsistencies actually have to people 
 who do not need to be able to tell a null array from an empty one?

The impact for code that doesn't need to distinguish between null arrays and 
empty arrays depends on
1) the semantic of null arrays regarding the .length property and the opCat 
operator.
2) whether null arrays are disallowed by a function interface contract.
3) whether a function should treat null arrays and empty arrays in the 
sameway. 


Regarding item 1) I assume these semantics:
- Reading and writing of the length property is allowed. Write access to the 
length property always returns an array of the given size. So
   nullarray.length=0
turns the null array 'nullarray' into an empty array
- The opCat operators allows null arrays for both arguments. So
   nullarray.opcat(nullarray2)
creates an empty array.


Regarding item 2):
Non-local arrays should be initialized to an empty array.
Local arrays should be initialized to an empty array instead of a null array. 

Note that local arrays that are not explicitly initialized are not permitted, 
anyway (see section 'Local Variables' in function.htm of the D spec). (But as 
DMD doesn't enforce this, yet, such illegal D code might be quite common.)


As with all reference types that are passed to a function, putting an 
assertion to check for the disallowed null-case is a good idea.

If the D language permits that different objects that are physically never 
changed, are allocated only once, then there is almost no performance penalty 
for using empty arrays instead of null. 



Regarding item 3):
Code need the same changed as described for item 2. Additionally any array 
parameters must be checked against null and eventually converted to empty 
arrays. E.g.
   if (array is null)
      array=new char[0];
Of course, a templated function could do that.

For many "low-level" functions null arrays can be treated as empty arrays, 
without any additional checks, since the length property can still be 
accessed. But 'high-level' functions typically have to deal with null arrays 
explicitly, because they would depend on functions that disallow null-arrays. 




Farmer.

Jul 03 2004

Farmer <itsFarmer. freenet.de> writes:

Andy Friesen <andy ikagames.com> wrote in
news:cbpsi6$1u7d$1 digitaldaemon.com: 

 Regan Heath wrote:
 
 ... I could return existance and
 fill a passed char[]...  so my code now looks like...
 
 char[] s;
 if (getValue("foo",s))

 
 I like this.  It's simple and obvious.

An expression like
    	if (getValue("foo",s) == true)
doesn't tell much to the maintainer. An enumeration is needed to fully 
express the intend.


[snip]

 
 Exposing POST data as an associative array seems like a win to me; it's 
 faster and can can be iterated over conveniently.  Also, as a language 
 intrinsic, it's a bit more likely to plug into other APIs easily.
 
 If you *really* need to, you could probably get away with doing 
 something like:
 
      const char[] nadda = "nadda";
      if (s is not nadda) { ... }
 
   -- andy

I see one issue with associative arrays here.
It would break up the encapsulation of the class. The internal data would be 
revealed. If your internal data structure is different you must convert the 
internal data to the associate array. At best, a call of .dup would be needed 
as safety-practice.


Farmer.

Jun 30 2004

Arcane Jill <Arcane_member pathlink.com> writes:

In article <Xns9515C8A3CA1ACitsFarmer 63.105.9.61>, Farmer says...
Why are there (almost) no complaints about D's support for empty arrays?

Actually, I think that D has got it right here. At least mostly. I'm happy with
the fact that null counts as an empty array. But I do have SOME gripes. These
are:

(1) given that a is an array of length n, the expression a[n..n] gives an array
bounds exception, and I don't believe it should. I would prefer that it simply
evaluated to an empty string. I've lost count of the number of times I've had to
put a special test for this case in various bits of code. It's a fairly normal
thing to do, to have a pointer (or index in this case) to the first element
BEYOND the last one in which you're interested, and to slice against it.
Currently you get the assert if n == a.length. I don't believe it should assert
unless n >= a.length

(2) I think it is wrong that the test (a == null) will return true if and only
if BOTH the length AND the address are zero. I think, if we're going to have a
model in which the statement a = null; will create an empty array, then (a ==
null) should return true if a /is/ an empty array. That is, only the length
should be tested, not the address. (If you want to test both parts, well there's
always a === null).

Arcane Jill

Jun 27 2004

Regan Heath <regan netwin.co.nz> writes:

On Sun, 27 Jun 2004 18:58:50 +0000 (UTC), Arcane Jill 
<Arcane_member pathlink.com> wrote:
 In article <Xns9515C8A3CA1ACitsFarmer 63.105.9.61>, Farmer says...
 Why are there (almost) no complaints about D's support for empty arrays?

 Actually, I think that D has got it right here. At least mostly. I'm 
 happy with
 the fact that null counts as an empty array. But I do have SOME gripes. 
 These
 are:

 (1) given that a is an array of length n, the expression a[n..n] gives 
 an array
 bounds exception, and I don't believe it should. I would prefer that it 
 simply
 evaluated to an empty string. I've lost count of the number of times 
 I've had to
 put a special test for this case in various bits of code. It's a fairly 
 normal
 thing to do, to have a pointer (or index in this case) to the first 
 element
 BEYOND the last one in which you're interested, and to slice against it.
 Currently you get the assert if n == a.length. I don't believe it should 
 assert
 unless n >= a.length

This (now?) works.

void main()
{
	char[] a;
	
	a ~= "1";
	a ~= "2";
	a ~= "3";
	printf("%.*s\n",a[3..3]);
	printf("%.*s\n",a[2..3]);
	printf("%.*s\n",a[1..3]);
	printf("%.*s\n",a[0..3]);
}

 (2) I think it is wrong that the test (a == null) will return true if 
 and only
 if BOTH the length AND the address are zero.

I think this is correct.

 I think, if we're going to have a
 model in which the statement a = null; will create an empty array,

I think this is wrong. a = null should set the data to null and length to 
0.
It should *not* create an empty array.

 then (a ==
 null) should return true if a /is/ an empty array. That is, only the 
 length
 should be tested, not the address. (If you want to test both parts, well 
 there's
 always a === null).

We *need* to have *both* null and empty arrays. The reason is pretty 
simple:
   - null means does not exist
   - emtpy means exists, but has no value (or empty value)

This is important in situations like the original poster mentioned and in 
my experience for example... When reading POST input from a web page, you 
get a string like so:

   Setting1=Regan+Heath&Setting2=&&

when requesting items you might have a function like:

   char[] getFormValue(char[] label);

the code to get the values for the above form might go:

   char[] s;

   s = getFormValue("Setting1"); // s is "Regan Heath"
   s = getFormValue("Setting2"); // s is ""
   s = getFormValue("Setting3"); // s is null

It is important the above code can tell that Setting3 was not passed in 
the form, so it can decide not to overwrite whatever current value that 
setting has, whereas it can tell Setting2 was passed and will overwrite 
the current value with a new blank one.


I think the problem with arrays is that a null array should not compare 
equal to an empty array. In other words the original post test(s)
   null1 == ""
   null1 == empty1

should be false.

Regan.

-- 
Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/

Jun 27 2004

Derek <derek psyc.ward> writes:

On Mon, 28 Jun 2004 10:06:18 +1200, Regan Heath wrote:

[snip]

 
 We *need* to have *both* null and empty arrays. The reason is pretty 
 simple:
    - null means does not exist
    - emtpy means exists, but has no value (or empty value)
 

Agreed. A non-existant array is not the same as an array with no elements.

-- 
Derek
Melbourne, Australia

Jun 27 2004

Arcane Jill <Arcane_member pathlink.com> writes:

In article <opr99w0st25a2sq9 digitalmars.com>, Regan Heath says...
 (1) given that a is an array of length n, the expression a[n..n] gives 
 an array
 bounds exception,


This (now?) works.

Indeed, I think it has always worked. It was just me misremembering the problem.
I'll start again. What I MEANT was...

Given that a is an array of length n, the expression &a[n] gives an array bounds
exception. And I don't believe it should. Taking the address of the first byte
beyond the end of an array can be a very useful thing to do. 

In particular, if a is an empty array, then &a[0] asserts, which means that code
like this:




intended to fill an array from a FILE*-type stream, will fall over if a is
empty. And there's no reason why it should - fread is quite happy to be passed a
length of zero. Same goes for functions like memset() and so on.

The fact of not being able to take &a[a.length] creates an akwardness that we
have to code around. The above example would have to be encased in an if test in
order not to assert - and you might think: So what? This is no big deal. But
having to make that explicit test time and time again can start to get annoying.

It should not, in my opinion, be an error to evaluate &a[a.length];

Arcane Jill

Jun 28 2004

Sean Kelly <sean f4.ca> writes:

In article <cbpkes$1ip0$1 digitaldaemon.com>, Arcane Jill says...
Indeed, I think it has always worked. It was just me misremembering the problem.
I'll start again. What I MEANT was...

Given that a is an array of length n, the expression &a[n] gives an array bounds
exception. And I don't believe it should. Taking the address of the first byte
beyond the end of an array can be a very useful thing to do. 

Yes it is.  But I think it's the syntax that's the problem in this case.  IIRC
using the subscript operator (ie. [n]) dereferences the element.  So what you're
doing when you call &a[n] is calculating the address of the element at position
n.  Since no such element exists, the call fails.  In C the correct thing to do
would be to use (a+n) instead.

Just to make sure I was right, I dug this quote out of the C++ standard (5.2.1):
"The expression E1[E2] is identical (by definition) to *((E1)+(E2))."

The fact of not being able to take &a[a.length] creates an akwardness that we
have to code around.

A possibility would be to have the compiler treat &a[n] as a special case...
since the address-of operator is present, it could treat this expression as
equivalent to: "a+n" rather than "&*(a+n)"


Sean

Jun 28 2004

Andy Friesen <andy ikagames.com> writes:

Arcane Jill wrote:

 The fact of not being able to take &a[a.length] creates an akwardness that we
 have to code around. The above example would have to be encased in an if test
in
 order not to assert - and you might think: So what? This is no big deal. But
 having to make that explicit test time and time again can start to get
annoying.
 
 It should not, in my opinion, be an error to evaluate &a[a.length];

Something which just occurred to me that would resolve this issue would 
be to add two properties to array types: begin and end.  These 
properties would be pointer types which point to the beginning and end 
of the array's contents.  (exactly like C++ iterators)

     T[] buffer = ...;
     // buffer.length makes more sense than end-begin in this case.
     // Bear with me: it's an example :)
     fread(buffer.begin, T.sizeof, buffer.end - buffer.begin, fileHandle);

  -- andy

Jun 28 2004

Sean Kelly <sean f4.ca> writes:

In article <cbprfd$1sq9$1 digitaldaemon.com>, Andy Friesen says...
Something which just occurred to me that would resolve this issue would 
be to add two properties to array types: begin and end.  These 
properties would be pointer types which point to the beginning and end 
of the array's contents.  (exactly like C++ iterators)

This might be very handy.  If so, I wouldn't mind seeing rbegin and rend
parameters as well though.  Plus, it raises the question of what they return for
associative arrays.

Sean

Jun 28 2004

Sam McCall <tunah.d tunah.net> writes:

Sean Kelly wrote:

 In article <cbprfd$1sq9$1 digitaldaemon.com>, Andy Friesen says...
 
Something which just occurred to me that would resolve this issue would 
be to add two properties to array types: begin and end.  These 
properties would be pointer types which point to the beginning and end 
of the array's contents.  (exactly like C++ iterators)

 
 
 This might be very handy.  If so, I wouldn't mind seeing rbegin and rend
 parameters as well though.

Huh? They're pointers... wouldn't rbegin == end and rend == begin?
I think I missed the point...

 Plus, it raises the question of what they return for
 associative arrays.

The concept doesn't apply to associative arrays afaics, so they wouldn't 
exist.
Sam

Jun 29 2004

Sean Kelly <sean f4.ca> writes:

In article <cbrhd9$1a0o$1 digitaldaemon.com>, Sam McCall says...
 This might be very handy.  If so, I wouldn't mind seeing rbegin and rend
 parameters as well though.

Huh? They're pointers... wouldn't rbegin == end and rend == begin?
I think I missed the point...

Actually, rbegin == end-1 and rend == begin-1.

 Plus, it raises the question of what they return for
 associative arrays.

The concept doesn't apply to associative arrays afaics, so they wouldn't 
exist.

It does apply to associative arrays IMO.  I iterate through the contents of such
containers quite regularly in C++.  I've done something similar with an iterator
wrapper for associative arrays in D, but it would be nice to have this built-in
if we move towards the iterator methodology.

Sean

Jun 29 2004

Sam McCall <tunah.d tunah.net> writes:

Sean Kelly wrote:
 In article <cbrhd9$1a0o$1 digitaldaemon.com>, Sam McCall says...
 
This might be very handy.  If so, I wouldn't mind seeing rbegin and rend
parameters as well though.

Huh? They're pointers... wouldn't rbegin == end and rend == begin?
I think I missed the point...

 
 
 Actually, rbegin == end-1 and rend == begin-1.

Oops. Yeah, this would be useful.

Plus, it raises the question of what they return for
associative arrays.

The concept doesn't apply to associative arrays afaics, so they wouldn't 
exist.

 
 It does apply to associative arrays IMO.  I iterate through the contents of
such
 containers quite regularly in C++.  I've done something similar with an
iterator
 wrapper for associative arrays in D, but it would be nice to have this built-in
 if we move towards the iterator methodology.

We're talking about pointers for low level iteration, this doesn't apply 
to associative arrays, who's data structure's opaque. I don't think 
we're moving towards iterators, just talking about pointers. The fact 
that iterators pretend to be pointers in their syntax is neither here 
nor threre ;)
If you really want "official" iterators, there's always (or will always 
be) the DTL...
Sam

Jun 29 2004

Sean Kelly <sean f4.ca> writes:

In article <cbt5vu$kdb$1 digitaldaemon.com>, Sam McCall says...
We're talking about pointers for low level iteration, this doesn't apply 
to associative arrays, who's data structure's opaque. I don't think 
we're moving towards iterators, just talking about pointers. The fact 
that iterators pretend to be pointers in their syntax is neither here 
nor threre ;)

This is easy enough to do with free functions anyway.  Something like:

alias char[][char[]] StrMap;
StrMap map;
Iterator!(Pair!(char[],char[])) i = begin!(StrMap)( map );

I'm sure the syntax could bwe improved but you get the idea.  I've already
experimented with such iterators for associative arrays and they work just fine.



Sean

Jun 30 2004

Farmer <itsFarmer. freenet.de> writes:

Arcane Jill <Arcane_member pathlink.com> wrote in
news:cbpkes$1ip0$1 digitaldaemon.com: 


 Given that a is an array of length n, the expression &a[n] gives an
 array bounds exception. And I don't believe it should. Taking the
 address of the first byte beyond the end of an array can be a very
 useful thing to do. 

The expression  cast(elementtype*)a+n , does that.

E.g. to get rid of annoying bounds-checking you could write.
    // given ubyte[] a;
    fread(cast(ubyte*)a+0, ubyte.size, a.length, fp);


Farmer.

Jun 28 2004

Regan Heath <regan netwin.co.nz> writes:

On Mon, 28 Jun 2004 17:27:56 +0000 (UTC), Arcane Jill 
<Arcane_member pathlink.com> wrote:

 In article <opr99w0st25a2sq9 digitalmars.com>, Regan Heath says...
 (1) given that a is an array of length n, the expression a[n..n] gives
 an array
 bounds exception,


 This (now?) works.

 Indeed, I think it has always worked. It was just me misremembering the 
 problem.
 I'll start again. What I MEANT was...

 Given that a is an array of length n, the expression &a[n] gives an 
 array bounds
 exception. And I don't believe it should. Taking the address of the 
 first byte
 beyond the end of an array can be a very useful thing to do.

 In particular, if a is an empty array, then &a[0] asserts, which means 
 that code
 like this:




 intended to fill an array from a FILE*-type stream, will fall over if a 
 is
 empty. And there's no reason why it should - fread is quite happy to be 
 passed a
 length of zero. Same goes for functions like memset() and so on.

 The fact of not being able to take &a[a.length] creates an akwardness 
 that we
 have to code around. The above example would have to be encased in an if 
 test in
 order not to assert - and you might think: So what? This is no big deal. 
 But
 having to make that explicit test time and time again can start to get 
 annoying.

 It should not, in my opinion, be an error to evaluate &a[a.length];

Interestingly..

void main()
{
	char[] p,s;
	s.length = 10;
	printf("%08x\n",&s[0]);
	printf("%08x\n",&p[0]);
}

D:\D\src\build\temp>dmd arr.d
d:\D\dmd\bin\..\..\dm\bin\link.exe arr,,,user32+kernel32/noi;

D:\D\src\build\temp>arr
007d0fd0
Error: ArrayBoundsError arr.d(6)

So it seems Sean is indeed correct about what p[0] is doing 
(de-referencing the element)

Regan

-- 
Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/

Jun 28 2004

Norbert Nemec <Norbert.Nemec gmx.de> writes:

Arcane Jill wrote:

 In article <opr99w0st25a2sq9 digitalmars.com>, Regan Heath says...
 (1) given that a is an array of length n, the expression a[n..n] gives
 an array
 bounds exception,


 
This (now?) works.

 
 Indeed, I think it has always worked. It was just me misremembering the
 problem. I'll start again. What I MEANT was...
 
 Given that a is an array of length n, the expression &a[n] gives an array
 bounds exception. And I don't believe it should. Taking the address of the
 first byte beyond the end of an array can be a very useful thing to do.

No, I disagree here. In general, that address would point to nothing.
Reading there is pointless, writing is dangerous. If you want to append to
a string by doing a low-level write to memory, then increment length first
and write then.

The way you could phrase it: In some cases it would be convenient if it were
not an error to take that address, if it is then not used afterward.

But still, I don't see that coding around that "limitation" is that much of
an effort. It gives you a few if-clauses around expressions, so what?

Jun 29 2004

Arcane Jill <Arcane_member pathlink.com> writes:

In article <cbr57k$p0m$1 digitaldaemon.com>, Norbert Nemec says...
 Given that a is an array of length n, the expression &a[n] gives an array
 bounds exception. And I don't believe it should. Taking the address of the
 first byte beyond the end of an array can be a very useful thing to do.

No, I disagree here. In general, that address would point to nothing.
Reading there is pointless, writing is dangerous.

Such a pointer is never used for reading OR writing. It /is/, however, used in
pointer comparison expressions, and in such context, is perfectly meaningful,
and safe.

But anyway, Farmer tells me I can write cast(elementtype*)a+n, so I'm happy.



If you want to append to
a string by doing a low-level write to memory,

I never said I wanted to do any such thing.


Arcane Jill

Jun 29 2004

Norbert Nemec <Norbert.Nemec gmx.de> writes:

Arcane Jill wrote:

 Such a pointer is never used for reading OR writing. It /is/, however,
 used in pointer comparison expressions, and in such context, is perfectly
 meaningful, and safe.

True, you have a point there - I really don't know what to think about it.

 But anyway, Farmer tells me I can write cast(elementtype*)a+n, so I'm
 happy.

Well - that's a workaround but not a clean solution.

Jun 29 2004

Farmer <itsFarmer. freenet.de> writes:

Norbert Nemec <Norbert.Nemec gmx.de> wrote in 
news:cbrogr$1jp7$1 digitaldaemon.com:

 Arcane Jill wrote:
 

[snip]
 
 But anyway, Farmer tells me I can write cast(elementtype*)a+n, so I'm
 happy.

 
 Well - that's a workaround but not a clean solution.
 

In Jill's example, a *C* function expects a pointer to anything, not a D-
array. So, I think, it makes perfect sense to convert the D array to the 
pointer type first, and than do pointer arithmetic as in C. (If you need the 
behaviour of a pointer, use one.)


Farmer.

Jun 29 2004

Farmer <itsFarmer. freenet.de> writes:

Regan Heath <regan netwin.co.nz> wrote in 
news:opr99w0st25a2sq9 digitalmars.com:
[snip]
 
 I think the problem with arrays is that a null array should not compare 
 equal to an empty array. In other words the original post test(s)
    null1 == ""
    null1 == empty1
 
 should be false.
 

Exactly, otherwise the equals() method would not be transitive. 
(Of course, we could also make  (empty1 == null)  evaluate to true by 
completely banning empty-arrays from the D sphere.)


Regards,
   Farmer.

Jun 28 2004

Farmer <itsFarmer. freenet.de> writes:

Arcane Jill <Arcane_member pathlink.com> wrote in
news:cbn5da$vu1$1 digitaldaemon.com: 

 In article <Xns9515C8A3CA1ACitsFarmer 63.105.9.61>, Farmer says...
Why are there (almost) no complaints about D's support for empty arrays?

 
 Actually, I think that D has got it right here. At least mostly. I'm
 happy with the fact that null counts as an empty array. But I do have
 SOME gripes. These are:
 
 (1) given that a is an array of length n, the expression a[n..n] gives
 an array bounds exception, and I don't believe it should. I would prefer
 that it simply evaluated to an empty string. I've lost count of the
 number of times I've had to put a special test for this case in various
 bits of code. It's a fairly normal thing to do, to have a pointer (or
 index in this case) to the first element BEYOND the last one in which
 you're interested, and to slice against it. Currently you get the assert
 if n == a.length. I don't believe it should assert unless n >= a.length

I'm a bit confused, since in my sample, the array 'empty2' is created from a 
slice that points behind the array and it didn't cause an array bounds 
exception. Or did you need empty-slices, that point at arbitrary memory 
locations?




 (2) I think it is wrong that the test (a == null) will return true if
 and only if BOTH the length AND the address are zero. I think, if we're
 going to have a model in which the statement a = null; will create an
 empty array, then (a == null) should return true if a /is/ an empty
 array. That is, only the length should be tested, not the address. (If
 you want to test both parts, well there's always a === null).

I guess the rule here is simple: For value types (as the array handle is one) 
==/equals() is exactly the same as ===/is.

But why should we're going to model arrays in way that make arrays less 
powerful and requires *additional* code to make the model work correct?



Regards,
   Farmer.

Jun 27 2004

Farmer <itsFarmer. freenet.de> writes:

Farmer <itsFarmer. freenet.de> wrote in
news:Xns951699362221itsFarmer 63.105.9.61: 

 Arcane Jill <Arcane_member pathlink.com> wrote in
 news:cbn5da$vu1$1 digitaldaemon.com: 
 
 In article <Xns9515C8A3CA1ACitsFarmer 63.105.9.61>, Farmer says...


 (2) I think it is wrong that the test (a == null) will return true if
 and only if BOTH the length AND the address are zero. I think, if we're
 going to have a model in which the statement a = null; will create an
 empty array, then (a == null) should return true if a /is/ an empty
 array. That is, only the length should be tested, not the address. (If
 you want to test both parts, well there's always a === null).

 
 I guess the rule here is simple: For value types (as the array handle is
 one) ==/equals() is exactly the same as ===/is.

 
My mistake, forget about this sentence, it is utter rubbish:  
For primitive value types,  ===/is  behaves like  ==/equals() , rather than 
the other way round. Furthermore, array handles aren't primitive types.

Jun 28 2004

D Programming

C/C++ Programming

Other

digitalmars.D - empty arrays - no complaints?