digitalmars.D - If T[new] is the container for T[], then what is the container for

Andrei Alexandrescu (10/10) Apr 25 2009 It looks we can't make it with only T[]. We need a genuine container

Stewart Gordon (4/6) Apr 25 2009

Andrei Alexandrescu (3/9) Apr 25 2009 One with capacity and probably value semantics.

Robert Jacques (14/22) Apr 25 2009 First, capacity is only static (i.e. can be included in the array as you...

dsimcha (7/23) Apr 25 2009 But in a pointer bumping GC, the right way to do dynamic arrays would be...

Robert Jacques (8/36) Apr 25 2009 1) There are multiple types of pointer bumping GCs and checking/exending...

Unknown W. Brackets (9/12) Apr 25 2009 What about simply using the const/invariant information?

Robert Jacques (12/19) Apr 25 2009 No, immutability really applies to the element and not just the array

Unknown W. Brackets (11/36) Apr 25 2009 I'm not talking about invariant(char)[], I'm talking about

Robert Jacques (11/24) Apr 25 2009 Okay I think we're on different pages, so I'll reiterate:

Jason House (2/32) Apr 25 2009 invariant(char[]) is not rebindable. There are bugs in bugzilla that sho...
Unknown W. Brackets (34/64) Apr 25 2009 Let me say it a different way:

Robert Jacques (10/43) Apr 25 2009 Okay, I understand where you're coming from. Here are some counter-point...

Unknown W. Brackets (25/79) Apr 25 2009 1. Well, I think that immutable(T[]) ~ immutable(T[]) should be

Unknown W. Brackets (5/93) Apr 25 2009 (and I know I said "char[] should _be_ a StringBuilder", which probably

Rainer Deyke (15/18) Apr 26 2009 There are two types of concatenation: in-place (operator ~=) and not

Unknown W. Brackets (8/14) Apr 26 2009 So, how often do you have an int array that you do not intend to append
Robert Jacques (12/28) Apr 26 2009 Actually, both ~= and ~ use capacity to avoid new allocation.

Leandro Lucarella (12/28) Apr 25 2009 I don't think I will, not at least for my thesis (I will be concentratin...

Robert Jacques (8/16) Apr 25 2009 If you add capacity to arrays, why do need seperate slice and container ...

Michel Fortin (11/22) Apr 25 2009 I always disliked T[new]. Here's a better candidate:
Robert Jacques (13/23) Apr 25 2009 No, it's perfectly possible to have T[] be the same type for the contain...

Unknown W. Brackets (9/39) Apr 25 2009 The only downside to this is, it's a Bad Thing to have the size of an
Andrei Alexandrescu (4/34) Apr 25 2009 Yes, but that way you take the owner (i.e. the memory block) completely

Robert Jacques (6/38) Apr 25 2009 1) In what way would it help to give the owner an identity?

Andrei Alexandrescu (20/21) Apr 25 2009 For example, on an implementation that doesn't want gc, they'd need the

Michel Fortin (13/30) Apr 25 2009 You say there must be two entities. But the two entities already exist,

Andrei Alexandrescu (6/36) Apr 25 2009 Yah, and the question is where is the range that the associative array

Robert Jacques (22/42) Apr 25 2009 Why is a single owner needed for deterministic destruction?

Andrei Alexandrescu (6/15) Apr 25 2009 My understanding is that you don't know how C++ and D value semantics

Robert Jacques (8/23) Apr 25 2009 I know how value semantics work. But

Christopher Wright (9/63) Apr 25 2009 Simple solution: put the array definition in object.d and try

Andrei Alexandrescu (12/21) Apr 25 2009 There are three schemes that are good starting points:

Robert Jacques (6/26) Apr 25 2009 Or, use (begin, length, store*) for both arrays and slices and have the ...
bearophile (7/12) Apr 26 2009 If you add such two other ways to manage memory, you end with different ...
Michel Fortin (17/31) Apr 26 2009 1 and 2 certainly look useful.
Christopher Wright (5/29) Apr 26 2009 Well, a slice will be begin, end, owner-ref-count*. The owner could move...
Craig Black (8/8) Apr 26 2009 Just my 2 cents. I use a C++ array template that optimizes memory usage...

Steve Schveighoffer (8/22) Apr 25 2009 You are confusing the difference between T[] and T[U]. T[U] is a strang...

Unknown W. Brackets (8/33) Apr 25 2009 Well, a range of an associative array is a possibility (not a slice, but...

Steven Schveighoffer (15/21) Apr 27 2009 Sure, but why should a range type of an AA be a special builtin type? I...

Christopher Wright (5/6) Apr 25 2009 Citation needed. If this is a continuation of a previous discussion, a
Danny Wilson (6/10) Apr 25 2009 T[new U] ?
Daniel Keep (35/48) Apr 25 2009 I've got a few problems understanding this thread.

Robert Jacques (16/64) Apr 25 2009 From a Model-View-Container standpoint, D's arrays mix both View and

Max Samukha (23/33) Apr 26 2009 T![] T![U] (kidding)

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

It looks we can't make it with only T[]. We need a genuine container 
type, and T[new] was suggested. It would probably have value semantics.

T[U] seems to have the same problem. If T[U] is the range, then how do 
you call the container?

If we follow through with a no-garbage-collected option for the core 
types, then we also need to distinguish between a slice and its 
container. The fact that we (almost) got away with T[] being at the same 
time the container and the slice was that the garbage collector would 
collect all unused slices.


Andrei

Apr 25 2009

Stewart Gordon <smjg_1998 yahoo.com> writes:

Andrei Alexandrescu wrote:
 It looks we can't make it with only T[]. We need a genuine container 
 type, and T[new] was suggested.

<snip>

What do you mean by a "genuine container type"?

Stewart.

Apr 25 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Stewart Gordon wrote:
 Andrei Alexandrescu wrote:
 It looks we can't make it with only T[]. We need a genuine container 
 type, and T[new] was suggested.

 <snip>
 
 What do you mean by a "genuine container type"?

One with capacity and probably value semantics.

Andrei

Apr 25 2009

"Robert Jacques" <sandford jhu.edu> writes:

On Sat, 25 Apr 2009 09:16:39 -0400, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 Stewart Gordon wrote:
 Andrei Alexandrescu wrote:
 It looks we can't make it with only T[]. We need a genuine container  
 type, and T[new] was suggested.

 <snip>
  What do you mean by a "genuine container type"?

 One with capacity and probably value semantics.

 Andrei

First, capacity is only static (i.e. can be included in the array as you  
suggest) for free-list based allocation. Which is _only_ used by malloc  
and mark-sweep GCs. And I'm really hoping Leandro gives us something  
better than mark-sweep.

Second, if people want value semantics, they can already use .dup or a[] =  
... when needed. Adding value semantics to arrays seems to me like a  
recipe for unintended poor performance by new to D programmers (who don't  
know) and experienced ones (who make a hard to find typo) a like. (I can  
see that value semantics for static arrays might be useful i.e.  
foo(real[4] vec), but I don't see a major need for dynamic vectors to have  
value semantics and every reason they shouldn't, i.e. foo(real[]  
big_image). Counter examples are desired and welcome.)

Apr 25 2009

dsimcha <dsimcha yahoo.com> writes:

== Quote from Robert Jacques (sandford jhu.edu)'s article
 On Sat, 25 Apr 2009 09:16:39 -0400, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:
 Stewart Gordon wrote:
 Andrei Alexandrescu wrote:
 It looks we can't make it with only T[]. We need a genuine container
 type, and T[new] was suggested.

 <snip>
  What do you mean by a "genuine container type"?

 One with capacity and probably value semantics.

 Andrei

 First, capacity is only static (i.e. can be included in the array as you
 suggest) for free-list based allocation. Which is _only_ used by malloc
 and mark-sweep GCs. And I'm really hoping Leandro gives us something
 better than mark-sweep.

But in a pointer bumping GC, the right way to do dynamic arrays would be to
allocate some extra space for the array beyond the length requested, to allow it
to be appended to without being reallocated.  In this case, the extra space is
being allocated under the hood by the array functions, not the garbage
collector,
but it's still there and you still need a capacity field to track it without
expensive GC queries.

Apr 25 2009

"Robert Jacques" <sandford jhu.edu> writes:

On Sat, 25 Apr 2009 11:57:25 -0400, dsimcha <dsimcha yahoo.com> wrote:
 == Quote from Robert Jacques (sandford jhu.edu)'s article
 On Sat, 25 Apr 2009 09:16:39 -0400, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:
 Stewart Gordon wrote:
 Andrei Alexandrescu wrote:
 It looks we can't make it with only T[]. We need a genuine container
 type, and T[new] was suggested.

 <snip>
  What do you mean by a "genuine container type"?

 One with capacity and probably value semantics.

 Andrei

 First, capacity is only static (i.e. can be included in the array as you
 suggest) for free-list based allocation. Which is _only_ used by malloc
 and mark-sweep GCs. And I'm really hoping Leandro gives us something
 better than mark-sweep.

 But in a pointer bumping GC, the right way to do dynamic arrays would be  
 to
 allocate some extra space for the array beyond the length requested, to  
 allow it
 to be appended to without being reallocated.  In this case, the extra  
 space is
 being allocated under the hood by the array functions, not the garbage  
 collector,
 but it's still there and you still need a capacity field to track it  
 without
 expensive GC queries.

1) There are multiple types of pointer bumping GCs and checking/exending  
an array can be pretty darn cheap. (You only have to find the memory page)
2) Pointer bumping is only half a GC, generally the other half is going to  
move/compact the array, at which point the capacity *should* be changed.
3) Over-allocating all arrays is a wonderful waste of memory and  
performance (re chache lines, etc). A cleaner answer is an array builder  
class, which only pays for this overhead when it's actually needed.

Apr 25 2009

"Unknown W. Brackets" <unknown simplemachines.org> writes:

What about simply using the const/invariant information?

After all, an "array builder" is a mutable array.  If you don't want to 
extend it, it should be invariant or const - e.g. invariant(string).

I've always thought the separation of strings and string builders, 
arrays and array builders, etc. was flawed.  It's just like strcmp; 
sure, it works, but is it really the right way to compare strings? 
Can't I just have my == and not worry about it?

-[Unknown]


Robert Jacques wrote:
 3) Over-allocating all arrays is a wonderful waste of memory and 
 performance (re chache lines, etc). A cleaner answer is an array builder 
 class, which only pays for this overhead when it's actually needed.

Apr 25 2009

"Robert Jacques" <sandford jhu.edu> writes:

On Sat, 25 Apr 2009 13:30:22 -0400, Unknown W. Brackets  
<unknown simplemachines.org> wrote:
 What about simply using the const/invariant information?

 After all, an "array builder" is a mutable array.  If you don't want to  
 extend it, it should be invariant or const - e.g. invariant(string).

No, immutability really applies to the element and not just the array  
length. Besides, what about "hello" ~ "world"? Essentially, an "array  
builder" is an array with a cheaply, extend-able length and it's often  
used for immutable strings, etc. Also, while a concatenate strings all the  
time I never end up concatenating non-char arrays. And they almost always  
end up being mutable.

 I've always thought the separation of strings and string builders,  
 arrays and array builders, etc. was flawed.  It's just like strcmp;  
 sure, it works, but is it really the right way to compare strings? Can't  
 I just have my == and not worry about it?

Well, you can just use ~= or ~ and not worry about it. Array builders are  
a performance enhancement for one special case at the cost of a general  
performance degradation. (There was a long discussion previously about  
these trade-offs)

Apr 25 2009

"Unknown W. Brackets" <unknown simplemachines.org> writes:

I'm not talking about invariant(char)[], I'm talking about 
invariant(char[]).  That cannot be extended in length.  Really, what I 
want is something that is a "invariant(variant(char)[])", I suppose, but 
I don't even think that's necessary.

When the default string type was made invariant, many people were 
non-plussed at best.  That doesn't mean it was wrong.

How crazy would it be if, for example, std.string.split returned 
invariant(string[]) instead of string[]?  I doubt it would hurt many 
people, in actuality, and you could always get a mutable copy.

-[Unknown]


Robert Jacques wrote:
 On Sat, 25 Apr 2009 13:30:22 -0400, Unknown W. Brackets 
 <unknown simplemachines.org> wrote:
 What about simply using the const/invariant information?

 After all, an "array builder" is a mutable array.  If you don't want 
 to extend it, it should be invariant or const - e.g. invariant(string).

 
 No, immutability really applies to the element and not just the array 
 length. Besides, what about "hello" ~ "world"? Essentially, an "array 
 builder" is an array with a cheaply, extend-able length and it's often 
 used for immutable strings, etc. Also, while a concatenate strings all 
 the time I never end up concatenating non-char arrays. And they almost 
 always end up being mutable.
 
 I've always thought the separation of strings and string builders, 
 arrays and array builders, etc. was flawed.  It's just like strcmp; 
 sure, it works, but is it really the right way to compare strings? 
 Can't I just have my == and not worry about it?

 
 Well, you can just use ~= or ~ and not worry about it. Array builders 
 are a performance enhancement for one special case at the cost of a 
 general performance degradation. (There was a long discussion previously 
 about these trade-offs)

Apr 25 2009

"Robert Jacques" <sandford jhu.edu> writes:

On Sat, 25 Apr 2009 14:21:49 -0400, Unknown W. Brackets  
<unknown simplemachines.org> wrote:
 I'm not talking about invariant(char)[], I'm talking about  
 invariant(char[]).  That cannot be extended in length.  Really, what I  
 want is something that is a "invariant(variant(char)[])", I suppose, but  
 I don't even think that's necessary.

 When the default string type was made invariant, many people were  
 non-plussed at best.  That doesn't mean it was wrong.

Okay I think we're on different pages, so I'll reiterate:
 Robert Jacques wrote:
  No, immutability really applies to the element and not just the array  
 length.


1) Immutability is a bad way to express 'this array doesn't change  
length'. Consider arrays of numbers instead of characters and you'll see  
what I mean.
2) A large portion of the time immutable arrays are concatenated (i.e.  
"hello" ~ "world")

 How crazy would it be if, for example, std.string.split returned  
 invariant(string[]) instead of string[]?  I doubt it would hurt many  
 people, in actuality, and you could always get a mutable copy.

 -[Unknown]

1) At first glance, I don't see any reason why split couldn't or  
shouldn't, besides string[] being shorter than immutable string[].
2) I don't see how this is relevant to array capacities / builders.

Apr 25 2009

Jason House <jason.james.house gmail.com> writes:

Robert Jacques Wrote:

 On Sat, 25 Apr 2009 14:21:49 -0400, Unknown W. Brackets  
 <unknown simplemachines.org> wrote:
 I'm not talking about invariant(char)[], I'm talking about  
 invariant(char[]).  That cannot be extended in length.  Really, what I  
 want is something that is a "invariant(variant(char)[])", I suppose, but  
 I don't even think that's necessary.

 When the default string type was made invariant, many people were  
 non-plussed at best.  That doesn't mean it was wrong.

 
 Okay I think we're on different pages, so I'll reiterate:
 Robert Jacques wrote:
  No, immutability really applies to the element and not just the array  
 length.


 
 1) Immutability is a bad way to express 'this array doesn't change  
 length'. Consider arrays of numbers instead of characters and you'll see  
 what I mean.
 2) A large portion of the time immutable arrays are concatenated (i.e.  
 "hello" ~ "world")
 
 How crazy would it be if, for example, std.string.split returned  
 invariant(string[]) instead of string[]?  I doubt it would hurt many  
 people, in actuality, and you could always get a mutable copy.

 -[Unknown]

 
 1) At first glance, I don't see any reason why split couldn't or  
 shouldn't, besides string[] being shorter than immutable string[].
 2) I don't see how this is relevant to array capacities / builders.

invariant(char[]) is not rebindable. There are bugs in bugzilla that show holes
in the usage of invariant(char)[] instead of a rebindable invariant(char[]).

Apr 25 2009

"Unknown W. Brackets" <unknown simplemachines.org> writes:

Let me say it a different way:

split() returns results that are, often, stored or compared.  It is not 
often that these results are concatenated to or modified.  Possibly, 
they may not be sliced (further) often either.

Such a case represents a perfect example of an array (or string) that 
need not have a greater capacity than its length.  Indeed; no property 
of the array is likely to change - its length, contents, capacity, or 
even ptr.  All are likely to remain constant until the object is deleted 
from memory.

So, it would seem to me, this return value (as a quick example) could 
easily be changed to one that fits the following:

1. Is an array of data.
2. Does not have mutable contents (data.)
3. Does not allow its length or capacity to be extended (excuse me for 
considering this mutation.)
4. Is a reference, rather than a static array.

My suggestion is that "char[]" should _be_ a StringBuilder, and "int[]" 
should _be_ an ArrayBuilder, and we should adopt a separate syntax for 
something that _isn't_ one of those.

That said, I did just browse some of my D source files - an XML 1.0 
parser/tree and a custom rfc-compliant ftp server - and nearly the only 
concatenation I use is building exception strings.  I'm a little 
surprised.  Only a couple times do I do anything a builder would be 
useful for.

One time is on XML child nodes; since I'm building a list of children, 
it is a "builder" type thing.  However, my XML tree is fully incremental 
- that means you could traverse it before all of the data has been read 
from the net.  It would not be practical for me to use a builder that 
didn't have proper read access for this case.

I really think using a "StringBuilder" class is just like the solution 
of using a "String" class.  D, in my opinion, proved the latter was not 
necessary; I hold that the former isn't either.

-[Unknown]


Robert Jacques wrote:
 On Sat, 25 Apr 2009 14:21:49 -0400, Unknown W. Brackets 
 <unknown simplemachines.org> wrote:
 I'm not talking about invariant(char)[], I'm talking about 
 invariant(char[]).  That cannot be extended in length.  Really, what I 
 want is something that is a "invariant(variant(char)[])", I suppose, 
 but I don't even think that's necessary.

 When the default string type was made invariant, many people were 
 non-plussed at best.  That doesn't mean it was wrong.

 
 Okay I think we're on different pages, so I'll reiterate:
 Robert Jacques wrote:
  No, immutability really applies to the element and not just the 
 array length.


 
 1) Immutability is a bad way to express 'this array doesn't change 
 length'. Consider arrays of numbers instead of characters and you'll see 
 what I mean.
 2) A large portion of the time immutable arrays are concatenated (i.e. 
 "hello" ~ "world")
 
 How crazy would it be if, for example, std.string.split returned 
 invariant(string[]) instead of string[]?  I doubt it would hurt many 
 people, in actuality, and you could always get a mutable copy.

 -[Unknown]

 
 1) At first glance, I don't see any reason why split couldn't or 
 shouldn't, besides string[] being shorter than immutable string[].
 2) I don't see how this is relevant to array capacities / builders.

Apr 25 2009

"Robert Jacques" <sandford jhu.edu> writes:

On Sat, 25 Apr 2009 20:17:00 -0400, Unknown W. Brackets  
<unknown simplemachines.org> wrote:

 Let me say it a different way:

 split() returns results that are, often, stored or compared.  It is not  
 often that these results are concatenated to or modified.  Possibly,  
 they may not be sliced (further) often either.

 Such a case represents a perfect example of an array (or string) that  
 need not have a greater capacity than its length.  Indeed; no property  
 of the array is likely to change - its length, contents, capacity, or  
 even ptr.  All are likely to remain constant until the object is deleted  
 from memory.

 So, it would seem to me, this return value (as a quick example) could  
 easily be changed to one that fits the following:

 1. Is an array of data.
 2. Does not have mutable contents (data.)
 3. Does not allow its length or capacity to be extended (excuse me for  
 considering this mutation.)
 4. Is a reference, rather than a static array.

 My suggestion is that "char[]" should _be_ a StringBuilder, and "int[]"  
 should _be_ an ArrayBuilder, and we should adopt a separate syntax for  
 something that _isn't_ one of those.

 That said, I did just browse some of my D source files - an XML 1.0  
 parser/tree and a custom rfc-compliant ftp server - and nearly the only  
 concatenation I use is building exception strings.  I'm a little  
 surprised.  Only a couple times do I do anything a builder would be  
 useful for.

 One time is on XML child nodes; since I'm building a list of children,  
 it is a "builder" type thing.  However, my XML tree is fully incremental  
 - that means you could traverse it before all of the data has been read  
 from the net.  It would not be practical for me to use a builder that  
 didn't have proper read access for this case.

 I really think using a "StringBuilder" class is just like the solution  
 of using a "String" class.  D, in my opinion, proved the latter was not  
 necessary; I hold that the former isn't either.

 -[Unknown]

Okay, I understand where you're coming from. Here are some counter-points
1) Immutable strings are often concatenated, which you don't address
2) int[], real[], and basically anything not a string rarely concatenates,  
but often mutable, which you don't address
3) Rather large thread a while ago concluded the builder overhead was too  
high for general use.
4) You're proposal makes immutable char[] and char[] have different  
underlying representations.

Apr 25 2009

"Unknown W. Brackets" <unknown simplemachines.org> writes:

1. Well, I think that immutable(T[]) ~ immutable(T[]) should be 
possible.  I thought it was.  Just that you cannot _append_ to 
immutable(T[]) which completely makes sense.

2. I don't know that it rarely concatenates.  I took Calc, but I am not 
a mathematician... mostly I use char and object references in my arrays. 
  Occasionally int[], and maybe real[] but it's not likely.

So, yes, I probably am not properly addressing this.  Frankly I do not 
know the use-cases where you would mutate such an array.

For objects, if you don't change the array length, it's unlikely you'll 
change its contents (pointers.)  Unfortunately, the referenced object 
would need to be mutable in many cases (although this is probably less 
than one would think offhand for the case of arrays of 
already-initialized objects.)

3. Okay.  I think for a multi-purpose builder, that's true.  However, 
speaking only of having an extended capacity (as is currently and as I 
was quoting), and not other features of a builder, I think what I say 
does make sense.

4. I don't think so.  Again, I'm not talking about a full-fledged 
builder (although I think more features of one would be possible without 
too much difference except fake methods), just about features.

So, to recap, I'm really talking about not over-allocating immutable 
arrays, but I'm not talking about changing current arrays.  Does that 
make my comments make more sense?

-[Unknown]


Robert Jacques wrote:
 On Sat, 25 Apr 2009 20:17:00 -0400, Unknown W. Brackets 
 <unknown simplemachines.org> wrote:
 
 Let me say it a different way:

 split() returns results that are, often, stored or compared.  It is 
 not often that these results are concatenated to or modified.  
 Possibly, they may not be sliced (further) often either.

 Such a case represents a perfect example of an array (or string) that 
 need not have a greater capacity than its length.  Indeed; no property 
 of the array is likely to change - its length, contents, capacity, or 
 even ptr.  All are likely to remain constant until the object is 
 deleted from memory.

 So, it would seem to me, this return value (as a quick example) could 
 easily be changed to one that fits the following:

 1. Is an array of data.
 2. Does not have mutable contents (data.)
 3. Does not allow its length or capacity to be extended (excuse me for 
 considering this mutation.)
 4. Is a reference, rather than a static array.

 My suggestion is that "char[]" should _be_ a StringBuilder, and 
 "int[]" should _be_ an ArrayBuilder, and we should adopt a separate 
 syntax for something that _isn't_ one of those.

 That said, I did just browse some of my D source files - an XML 1.0 
 parser/tree and a custom rfc-compliant ftp server - and nearly the 
 only concatenation I use is building exception strings.  I'm a little 
 surprised.  Only a couple times do I do anything a builder would be 
 useful for.

 One time is on XML child nodes; since I'm building a list of children, 
 it is a "builder" type thing.  However, my XML tree is fully 
 incremental - that means you could traverse it before all of the data 
 has been read from the net.  It would not be practical for me to use a 
 builder that didn't have proper read access for this case.

 I really think using a "StringBuilder" class is just like the solution 
 of using a "String" class.  D, in my opinion, proved the latter was 
 not necessary; I hold that the former isn't either.

 -[Unknown]

 
 Okay, I understand where you're coming from. Here are some counter-points
 1) Immutable strings are often concatenated, which you don't address
 2) int[], real[], and basically anything not a string rarely 
 concatenates, but often mutable, which you don't address
 3) Rather large thread a while ago concluded the builder overhead was 
 too high for general use.
 4) You're proposal makes immutable char[] and char[] have different 
 underlying representations.

Apr 25 2009

"Unknown W. Brackets" <unknown simplemachines.org> writes:

(and I know I said "char[] should _be_ a StringBuilder", which probably 
caused the confusion - sorry.  I was trying to differentiate the two, I 
worded it poorly for what I actually meant.)

-[Unknown]


Unknown W. Brackets wrote:
 1. Well, I think that immutable(T[]) ~ immutable(T[]) should be 
 possible.  I thought it was.  Just that you cannot _append_ to 
 immutable(T[]) which completely makes sense.
 
 2. I don't know that it rarely concatenates.  I took Calc, but I am not 
 a mathematician... mostly I use char and object references in my arrays. 
  Occasionally int[], and maybe real[] but it's not likely.
 
 So, yes, I probably am not properly addressing this.  Frankly I do not 
 know the use-cases where you would mutate such an array.
 
 For objects, if you don't change the array length, it's unlikely you'll 
 change its contents (pointers.)  Unfortunately, the referenced object 
 would need to be mutable in many cases (although this is probably less 
 than one would think offhand for the case of arrays of 
 already-initialized objects.)
 
 3. Okay.  I think for a multi-purpose builder, that's true.  However, 
 speaking only of having an extended capacity (as is currently and as I 
 was quoting), and not other features of a builder, I think what I say 
 does make sense.
 
 4. I don't think so.  Again, I'm not talking about a full-fledged 
 builder (although I think more features of one would be possible without 
 too much difference except fake methods), just about features.
 
 So, to recap, I'm really talking about not over-allocating immutable 
 arrays, but I'm not talking about changing current arrays.  Does that 
 make my comments make more sense?
 
 -[Unknown]
 
 
 Robert Jacques wrote:
 On Sat, 25 Apr 2009 20:17:00 -0400, Unknown W. Brackets 
 <unknown simplemachines.org> wrote:

 Let me say it a different way:

 split() returns results that are, often, stored or compared.  It is 
 not often that these results are concatenated to or modified.  
 Possibly, they may not be sliced (further) often either.

 Such a case represents a perfect example of an array (or string) that 
 need not have a greater capacity than its length.  Indeed; no 
 property of the array is likely to change - its length, contents, 
 capacity, or even ptr.  All are likely to remain constant until the 
 object is deleted from memory.

 So, it would seem to me, this return value (as a quick example) could 
 easily be changed to one that fits the following:

 1. Is an array of data.
 2. Does not have mutable contents (data.)
 3. Does not allow its length or capacity to be extended (excuse me 
 for considering this mutation.)
 4. Is a reference, rather than a static array.

 My suggestion is that "char[]" should _be_ a StringBuilder, and 
 "int[]" should _be_ an ArrayBuilder, and we should adopt a separate 
 syntax for something that _isn't_ one of those.

 That said, I did just browse some of my D source files - an XML 1.0 
 parser/tree and a custom rfc-compliant ftp server - and nearly the 
 only concatenation I use is building exception strings.  I'm a little 
 surprised.  Only a couple times do I do anything a builder would be 
 useful for.

 One time is on XML child nodes; since I'm building a list of 
 children, it is a "builder" type thing.  However, my XML tree is 
 fully incremental - that means you could traverse it before all of 
 the data has been read from the net.  It would not be practical for 
 me to use a builder that didn't have proper read access for this case.

 I really think using a "StringBuilder" class is just like the 
 solution of using a "String" class.  D, in my opinion, proved the 
 latter was not necessary; I hold that the former isn't either.

 -[Unknown]

 Okay, I understand where you're coming from. Here are some counter-points
 1) Immutable strings are often concatenated, which you don't address
 2) int[], real[], and basically anything not a string rarely 
 concatenates, but often mutable, which you don't address
 3) Rather large thread a while ago concluded the builder overhead was 
 too high for general use.
 4) You're proposal makes immutable char[] and char[] have different 
 underlying representations.

Apr 25 2009

Rainer Deyke <rainerd eldwood.com> writes:

Robert Jacques wrote:
 1) Immutable strings are often concatenated, which you don't address

There are two types of concatenation: in-place (operator ~=) and not
(operator ~).  The former can take advantage of capacity to avoid new
allocation, the latter can not.

When concatenating a lot of immutable strings, it might make sense to
make the left side mutable:

char[] tmp = string1.dup;
tmp ~= string2;
tmp ~= string3;
// ...
tmp ~= string99;
string result = tmp.idup;

 2) int[], real[], and basically anything not a string rarely
 concatenates, but often mutable, which you don't address

I frequently append individual elements to non-string arrays.


-- 
Rainer Deyke - rainerd eldwood.com

Apr 26 2009

"Unknown W. Brackets" <unknown simplemachines.org> writes:

So, how often do you have an int array that you do not intend to append 
elements to, but yet intend to change the values within it?  Excluding 
preallocation, of course.

Every time I use an int[], or honestly practically any array, I'm either 
creating it or reading it.  Nonetheless, I don't do much math with D 
(I'm a server guy), so it's interesting for me what uses other people have.

-[Unknown]


Rainer Deyke wrote:
 2) int[], real[], and basically anything not a string rarely
 concatenates, but often mutable, which you don't address

 
 I frequently append individual elements to non-string arrays.

Apr 26 2009

"Robert Jacques" <sandford jhu.edu> writes:

On Sun, 26 Apr 2009 03:21:08 -0400, Rainer Deyke <rainerd eldwood.com>  
wrote:
 Robert Jacques wrote:
 1) Immutable strings are often concatenated, which you don't address

 There are two types of concatenation: in-place (operator ~=) and not
 (operator ~).  The former can take advantage of capacity to avoid new
 allocation, the latter can not.

Actually, both ~= and ~ use capacity to avoid new allocation.

 When concatenating a lot of immutable strings, it might make sense to
 make the left side mutable:

 char[] tmp = string1.dup;
 tmp ~= string2;
 tmp ~= string3;
 // ...
 tmp ~= string99;
 string result = tmp.idup;

Why make 2 extra copies if you don't need too?

 2) int[], real[], and basically anything not a string rarely
 concatenates, but often mutable, which you don't address

 I frequently append individual elements to non-string arrays.

Cool. Care to elaborate on your use case?
I was thinking of scientific computing (linear algebra, optimization) and  
game programming.

Since we're off the topic of T[new] and onto a capacity field in general,  
I'd like to point out the older newsgroup posts:
http://www.digitalmars.com/d/archives/digitalmars/D/Array_Capacity_Field_71119.html
http://www.digitalmars.com/d/archives/digitalmars/D/Array_append_performance_75410.html
http://www.digitalmars.com/d/archives/digitalmars/D/Array_append_performance_2_75811.html

Apr 26 2009

Leandro Lucarella <llucax gmail.com> writes:

Robert Jacques, el 25 de abril a las 10:58 me escribiste:
 On Sat, 25 Apr 2009 09:16:39 -0400, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:
 
Stewart Gordon wrote:
Andrei Alexandrescu wrote:
It looks we can't make it with only T[]. We need a genuine container type, and
T[new] was suggested.

<snip>
 What do you mean by a "genuine container type"?

One with capacity and probably value semantics.

Andrei

 
 First, capacity is only static (i.e. can be included in the array as you
 suggest) for free-list based allocation. Which is _only_ used by malloc
 and mark-sweep GCs. And I'm really hoping Leandro gives us something
 better than mark-sweep.

I don't think I will, not at least for my thesis (I will be concentrating
in make the current GC concurrent and making pauses as small as possible).

But maybe Keith will:
http://www.dsource.org/projects/tango/forums/topic/743

8-)

-- 
Leandro Lucarella (luca) | Blog colectivo: http://www.mazziblog.com.ar/blog/
----------------------------------------------------------------------------
GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145  104C 949E BFB6 5F5A 8D05)
----------------------------------------------------------------------------
The diference is simple: hackers build things, crackers break them.

Apr 25 2009

"Robert Jacques" <sandford jhu.edu> writes:

On Sat, 25 Apr 2009 09:16:39 -0400, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 Stewart Gordon wrote:
 Andrei Alexandrescu wrote:
 It looks we can't make it with only T[]. We need a genuine container  
 type, and T[new] was suggested.

 <snip>
  What do you mean by a "genuine container type"?

 One with capacity and probably value semantics.

 Andrei

If you add capacity to arrays, why do need seperate slice and container  
types?
i.e.
.capacity  > 0 -> container
.capacity == 0 -> slice

Note: this use of a capacity field has been mentioned before.

Apr 25 2009

Michel Fortin <michel.fortin michelf.com> writes:

On 2009-04-25 09:07:52 -0400, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 It looks we can't make it with only T[]. We need a genuine container 
 type, and T[new] was suggested. It would probably have value semantics.
 
 T[U] seems to have the same problem. If T[U] is the range, then how do 
 you call the container?
 
 If we follow through with a no-garbage-collected option for the core 
 types, then we also need to distinguish between a slice and its 
 container. The fact that we (almost) got away with T[] being at the 
 same time the container and the slice was that the garbage collector 
 would collect all unused slices.

I always disliked T[new]. Here's a better candidate:

      | range  | container
----- | ------ | ---------
array | T[]    | T.[]
assoc | T[U]   | T.[U]

-- 
Michel Fortin
michel.fortin michelf.com
http://michelf.com/

Apr 25 2009

"Robert Jacques" <sandford jhu.edu> writes:

On Sat, 25 Apr 2009 09:07:52 -0400, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:
 It looks we can't make it with only T[]. We need a genuine container  
 type, and T[new] was suggested. It would probably have value semantics.

 T[U] seems to have the same problem. If T[U] is the range, then how do  
 you call the container?

 If we follow through with a no-garbage-collected option for the core  
 types, then we also need to distinguish between a slice and its  
 container. The fact that we (almost) got away with T[] being at the same  
 time the container and the slice was that the garbage collector would  
 collect all unused slices.


 Andrei

No, it's perfectly possible to have T[] be the same type for the container  
and the slice with reference counting instead of a full GC. All you need  
to do is store a pointer to the memory block's node. Then it's trivial to  
1) get the capacity and 2) increase / decrease the reference count. You  
could even add an 'allocated length' parameter to the node, so you'd avoid  
some of the slicing bugs in the current implementation.

struct T[] {
     T*     ptr;
     size_t length;
     void*  memory_block;
}

Apr 25 2009

"Unknown W. Brackets" <unknown simplemachines.org> writes:

The only downside to this is, it's a Bad Thing to have the size of an 
array be different between GC's, but I think it's probably not entirely 
avoidable.

But, I really think it's the only good way to do it, imho.  Maybe the 
exact implementation is different, but creating a new type just seems 
even messier... the beauty of slices in D is that they're Not Different 
(TM.)

-[Unknown]


Robert Jacques wrote:
 On Sat, 25 Apr 2009 09:07:52 -0400, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:
 It looks we can't make it with only T[]. We need a genuine container 
 type, and T[new] was suggested. It would probably have value semantics.

 T[U] seems to have the same problem. If T[U] is the range, then how do 
 you call the container?

 If we follow through with a no-garbage-collected option for the core 
 types, then we also need to distinguish between a slice and its 
 container. The fact that we (almost) got away with T[] being at the 
 same time the container and the slice was that the garbage collector 
 would collect all unused slices.


 Andrei

 
 No, it's perfectly possible to have T[] be the same type for the 
 container and the slice with reference counting instead of a full GC. 
 All you need to do is store a pointer to the memory block's node. Then 
 it's trivial to 1) get the capacity and 2) increase / decrease the 
 reference count. You could even add an 'allocated length' parameter to 
 the node, so you'd avoid some of the slicing bugs in the current 
 implementation.
 
 struct T[] {
     T*     ptr;
     size_t length;
     void*  memory_block;
 }

Apr 25 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Robert Jacques wrote:
 On Sat, 25 Apr 2009 09:07:52 -0400, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:
 It looks we can't make it with only T[]. We need a genuine container 
 type, and T[new] was suggested. It would probably have value semantics.

 T[U] seems to have the same problem. If T[U] is the range, then how do 
 you call the container?

 If we follow through with a no-garbage-collected option for the core 
 types, then we also need to distinguish between a slice and its 
 container. The fact that we (almost) got away with T[] being at the 
 same time the container and the slice was that the garbage collector 
 would collect all unused slices.


 Andrei

 
 No, it's perfectly possible to have T[] be the same type for the 
 container and the slice with reference counting instead of a full GC. 
 All you need to do is store a pointer to the memory block's node. Then 
 it's trivial to 1) get the capacity and 2) increase / decrease the 
 reference count. You could even add an 'allocated length' parameter to 
 the node, so you'd avoid some of the slicing bugs in the current 
 implementation.
 
 struct T[] {
     T*     ptr;
     size_t length;
     void*  memory_block;
 }

Yes, but that way you take the owner (i.e. the memory block) completely 
out of the picture. It would help to give the owner an identity.

Andrei

Apr 25 2009

"Robert Jacques" <sandford jhu.edu> writes:

On Sat, 25 Apr 2009 16:16:07 -0400, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 Robert Jacques wrote:
 On Sat, 25 Apr 2009 09:07:52 -0400, Andrei Alexandrescu  
 <SeeWebsiteForEmail erdani.org> wrote:
 It looks we can't make it with only T[]. We need a genuine container  
 type, and T[new] was suggested. It would probably have value semantics.

 T[U] seems to have the same problem. If T[U] is the range, then how do  
 you call the container?

 If we follow through with a no-garbage-collected option for the core  
 types, then we also need to distinguish between a slice and its  
 container. The fact that we (almost) got away with T[] being at the  
 same time the container and the slice was that the garbage collector  
 would collect all unused slices.


 Andrei

  No, it's perfectly possible to have T[] be the same type for the  
 container and the slice with reference counting instead of a full GC.  
 All you need to do is store a pointer to the memory block's node. Then  
 it's trivial to 1) get the capacity and 2) increase / decrease the  
 reference count. You could even add an 'allocated length' parameter to  
 the node, so you'd avoid some of the slicing bugs in the current  
 implementation.
  struct T[] {
     T*     ptr;
     size_t length;
     void*  memory_block;
 }

 Yes, but that way you take the owner (i.e. the memory block) completely  
 out of the picture. It would help to give the owner an identity.

 Andrei

1) In what way would it help to give the owner an identity?

2) How does reference counting arrays "take the owner completely out of  
the picture"? Isn't this simply a switch from single ownership to shared  
ownership?

Apr 25 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Robert Jacques wrote:
 1) In what way would it help to give the owner an identity?

For example, on an implementation that doesn't want gc, they'd need the 
owner for deterministic destruction.

Again, slices could be conflated with arrays mainly because the gc took 
care of the litter left after rebinding slices. If you want to offer the 
option to do without a gc, then arrays and slices (= ranges) must be 
separate entities. Then you hold on to the arrays and pass slices 
around. When you destroy the arrays, the slices originating from them 
become invalid (which can be checked or not).

With hashes it's even more interesting. We need two types: the hash and 
the hash range that crawls over that hash. Right now they are conflated 
into one V[K]. If we want to define a range crawling over a hash (and 
consequently do away with opApply) we'd need the range to store a little 
stack of breadcrumbs so the range knows how to iterate. So definitely 
there must be two entities. Traditionally we've been using V[K] both as 
a range and as a container, but the usage so far was closer to a range 
(I think without being sure). So one issue is to find a good type 
literal to express "type for which the corresponding range is V[K]".

Makes sense?


Andrei

Apr 25 2009

Michel Fortin <michel.fortin michelf.com> writes:

On 2009-04-25 18:04:32 -0400, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 Again, slices could be conflated with arrays mainly because the gc took 
 care of the litter left after rebinding slices. If you want to offer 
 the option to do without a gc, then arrays and slices (= ranges) must 
 be separate entities. Then you hold on to the arrays and pass slices 
 around. When you destroy the arrays, the slices originating from them 
 become invalid (which can be checked or not).
 
 With hashes it's even more interesting. We need two types: the hash and 
 the hash range that crawls over that hash. Right now they are conflated 
 into one V[K]. If we want to define a range crawling over a hash (and 
 consequently do away with opApply) we'd need the range to store a 
 little stack of breadcrumbs so the range knows how to iterate. So 
 definitely there must be two entities. Traditionally we've been using 
 V[K] both as a range and as a container, but the usage so far was 
 closer to a range (I think without being sure). So one issue is to find 
 a good type literal to express "type for which the corresponding range 
 is V[K]".

You say there must be two entities. But the two entities already exist, 
only the container is stored by reference. T[] is a reference to a 
portion of a fixed-size array (static array or heap array). T[U] is a 
reference to an associative array. If you could call "new" and "delete" 
on these two types to delete the referenced container, or if the 
container were reference-counted, you wouldn't need a garbage 
collector. It's pretty much the same as for classes in fact.


-- 
Michel Fortin
michel.fortin michelf.com
http://michelf.com/

Apr 25 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Michel Fortin wrote:
 On 2009-04-25 18:04:32 -0400, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> said:
 
 Again, slices could be conflated with arrays mainly because the gc 
 took care of the litter left after rebinding slices. If you want to 
 offer the option to do without a gc, then arrays and slices (= ranges) 
 must be separate entities. Then you hold on to the arrays and pass 
 slices around. When you destroy the arrays, the slices originating 
 from them become invalid (which can be checked or not).

 With hashes it's even more interesting. We need two types: the hash 
 and the hash range that crawls over that hash. Right now they are 
 conflated into one V[K]. If we want to define a range crawling over a 
 hash (and consequently do away with opApply) we'd need the range to 
 store a little stack of breadcrumbs so the range knows how to iterate. 
 So definitely there must be two entities. Traditionally we've been 
 using V[K] both as a range and as a container, but the usage so far 
 was closer to a range (I think without being sure). So one issue is to 
 find a good type literal to express "type for which the corresponding 
 range is V[K]".

 
 You say there must be two entities. But the two entities already exist, 
 only the container is stored by reference. T[] is a reference to a 
 portion of a fixed-size array (static array or heap array).

Yah, and the question is where is the entire array.

 T[U] is a 
 reference to an associative array.

Yah, and the question is where is the range that the associative array 
uses for iteration.

 If you could call "new" and "delete" 
 on these two types to delete the referenced container, or if the 
 container were reference-counted, you wouldn't need a garbage collector. 
 It's pretty much the same as for classes in fact.

It might be wise to make those types values, not references.


Andrei

Apr 25 2009

"Robert Jacques" <sandford jhu.edu> writes:

On Sat, 25 Apr 2009 18:04:32 -0400, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 Robert Jacques wrote:
 1) In what way would it help to give the owner an identity?

 For example, on an implementation that doesn't want gc, they'd need the  
 owner for deterministic destruction.

Why is a single owner needed for deterministic destruction?

 Again, slices could be conflated with arrays mainly because the gc took  
 care of the litter left after rebinding slices. If you want to offer the  
 option to do without a gc, then arrays and slices (= ranges) must be  
 separate entities. Then you hold on to the arrays and pass slices  
 around. When you destroy the arrays, the slices originating from them  
 become invalid (which can be checked or not).

What is wrong with reference counting arrays just like other objects?  
(which is what I've been proposing)

Also, I see a major logical problem with your proposal:
Given     A single owner is required for deterministic destruction.
Then      The array's owner is responsible for the deterministic  
destruction.
Therefore When the array's owner goes out of scope, it must destruct the  
array.
But Then  How does a function return an array?

Your original post mentioned value semantics, but that results in a lot of  
needless allocation and copying.

 With hashes it's even more interesting. We need two types: the hash and  
 the hash range that crawls over that hash. Right now they are conflated  
 into one V[K]. If we want to define a range crawling over a hash (and  
 consequently do away with opApply) we'd need the range to store a little  
 stack of breadcrumbs so the range knows how to iterate. So definitely  
 there must be two entities. Traditionally we've been using V[K] both as  
 a range and as a container, but the usage so far was closer to a range  
 (I think without being sure). So one issue is to find a good type  
 literal to express "type for which the corresponding range is V[K]".

While separating hashes and ranges is important, it seems orthogonal to  
the topic at hand.

 Makes sense?

Yes and no. What I wanted was some high-level reason why have T[new] and  
T[] was fundamentally better than only having T[]. What I got was a low  
level assertion that they are needed to do -nogc. However, since earlier  
in this thread I provided a counter example, i.e. reference counted  
arrays, your assertion stands as false. And a language change shouldn't be  
based on a faulty assumption.

Apr 25 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Robert Jacques wrote:
 On Sat, 25 Apr 2009 18:04:32 -0400, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:
 Also, I see a major logical problem with your proposal:
 Given     A single owner is required for deterministic destruction.
 Then      The array's owner is responsible for the deterministic 
 destruction.
 Therefore When the array's owner goes out of scope, it must destruct the 
 array.
 But Then  How does a function return an array?

My understanding is that you don't know how C++ and D value semantics 
work. It would be impossible to answer your (albeit good) question 
without first writing a tutorial on the topic, which unfortunately I 
don't have the time for.


Andrei

Apr 25 2009

"Robert Jacques" <sandford jhu.edu> writes:

On Sat, 25 Apr 2009 19:11:57 -0400, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 Robert Jacques wrote:
 On Sat, 25 Apr 2009 18:04:32 -0400, Andrei Alexandrescu  
 <SeeWebsiteForEmail erdani.org> wrote:
 Also, I see a major logical problem with your proposal:
 Given     A single owner is required for deterministic destruction.
 Then      The array's owner is responsible for the deterministic  
 destruction.
 Therefore When the array's owner goes out of scope, it must destruct  
 the array.
 But Then  How does a function return an array?

 My understanding is that you don't know how C++ and D value semantics  
 work. It would be impossible to answer your (albeit good) question  
 without first writing a tutorial on the topic, which unfortunately I  
 don't have the time for.

I know how value semantics work. But
1) You haven't mentioned that T[new] has value semantics, only that you  
were thinking about it.
2) Struct destructors are always called and don't get optimized away even  
if the memory copy does. (i.e. I know there's a nice optimization to get  
out of the extra copy, but I don't know how you'd do it in DMD currently).

 Andrei

Apr 25 2009

Christopher Wright <dhasenan gmail.com> writes:

Robert Jacques wrote:
 On Sat, 25 Apr 2009 18:04:32 -0400, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:
 
 Robert Jacques wrote:
 1) In what way would it help to give the owner an identity?

 For example, on an implementation that doesn't want gc, they'd need 
 the owner for deterministic destruction.

 
 Why is a single owner needed for deterministic destruction?
 
 Again, slices could be conflated with arrays mainly because the gc 
 took care of the litter left after rebinding slices. If you want to 
 offer the option to do without a gc, then arrays and slices (= ranges) 
 must be separate entities. Then you hold on to the arrays and pass 
 slices around. When you destroy the arrays, the slices originating 
 from them become invalid (which can be checked or not).

 
 What is wrong with reference counting arrays just like other objects? 
 (which is what I've been proposing)
 
 Also, I see a major logical problem with your proposal:
 Given     A single owner is required for deterministic destruction.
 Then      The array's owner is responsible for the deterministic 
 destruction.
 Therefore When the array's owner goes out of scope, it must destruct the 
 array.
 But Then  How does a function return an array?
 
 Your original post mentioned value semantics, but that results in a lot 
 of needless allocation and copying.
 
 With hashes it's even more interesting. We need two types: the hash 
 and the hash range that crawls over that hash. Right now they are 
 conflated into one V[K]. If we want to define a range crawling over a 
 hash (and consequently do away with opApply) we'd need the range to 
 store a little stack of breadcrumbs so the range knows how to iterate. 
 So definitely there must be two entities. Traditionally we've been 
 using V[K] both as a range and as a container, but the usage so far 
 was closer to a range (I think without being sure). So one issue is to 
 find a good type literal to express "type for which the corresponding 
 range is V[K]".

 
 While separating hashes and ranges is important, it seems orthogonal to 
 the topic at hand.
 
 Makes sense?

 
 Yes and no. What I wanted was some high-level reason why have T[new] and 
 T[] was fundamentally better than only having T[]. What I got was a low 
 level assertion that they are needed to do -nogc. However, since earlier 
 in this thread I provided a counter example, i.e. reference counted 
 arrays, your assertion stands as false. And a language change shouldn't 
 be based on a faulty assumption.

Simple solution: put the array definition in object.d and try 
implementing arrays with reference counting or manual memory management.

I think stored slices break manual memory management, even with a 
dedicated slice type; but they should work just fine with refcounting. 
If you don't want to change the language, object.Array will have to 
implement the logic for slices and for allocated arrays. It's a bit 
ugly, and it makes the Array type larger. Also, Array's reference count 
would need to be accessed by reference.

Apr 25 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Christopher Wright wrote:
 Simple solution: put the array definition in object.d and try 
 implementing arrays with reference counting or manual memory management.
 
 I think stored slices break manual memory management, even with a 
 dedicated slice type; but they should work just fine with refcounting. 
 If you don't want to change the language, object.Array will have to 
 implement the logic for slices and for allocated arrays. It's a bit 
 ugly, and it makes the Array type larger. Also, Array's reference count 
 would need to be accessed by reference.

There are three schemes that are good starting points:

1. As right now :o).

2. Using refcounting. Arrays will be 4 words long (begin, end, 
end-of-store, refcount*) and slices will be 3 words long (begin, end, 
and owner*)

3. Using manual management. Arrays will be 3 words long (begin, end, 
end-of-store) and slices will be 2 words long (begin, end). This is 
close to C++/STL. This case is less safe but very efficient.

If we manage to integrate them all... that's quite the holy grail. And I 
think it's entirely possible with only a few changes to the language.


Andrei

Apr 25 2009

"Robert Jacques" <sandford jhu.edu> writes:

On Sat, 25 Apr 2009 19:44:18 -0400, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:
 Christopher Wright wrote:
 Simple solution: put the array definition in object.d and try  
 implementing arrays with reference counting or manual memory management.
  I think stored slices break manual memory management, even with a  
 dedicated slice type; but they should work just fine with refcounting.  
 If you don't want to change the language, object.Array will have to  
 implement the logic for slices and for allocated arrays. It's a bit  
 ugly, and it makes the Array type larger. Also, Array's reference count  
 would need to be accessed by reference.

 There are three schemes that are good starting points:

 1. As right now :o).

 2. Using refcounting. Arrays will be 4 words long (begin, end,  
 end-of-store, refcount*) and slices will be 3 words long (begin, end,  
 and owner*)

Or, use (begin, length, store*) for both arrays and slices and have the  
store contain (start, capacity, refcount).

 3. Using manual management. Arrays will be 3 words long (begin, end,  
 end-of-store) and slices will be 2 words long (begin, end). This is  
 close to C++/STL. This case is less safe but very efficient.

Why store end-of-store? It's trivial to compute from the length. (If you  
have separate slice and array types, that is)

 If we manage to integrate them all... that's quite the holy grail. And I  
 think it's entirely possible with only a few changes to the language.


 Andrei

Apr 25 2009

bearophile <bearophileHUGS lycos.com> writes:

Andrei Alexandrescu:
 1. As right now :o).
 2. Using refcounting. [...]
 3. Using manual management. [...]
 If we manage to integrate them all... that's quite the holy grail. And I 
 think it's entirely possible with only a few changes to the language.

If you add such two other ways to manage memory, you end with different code.
Isn't this going to split the D codebase in three? (If I look for an already
written D1 module now I have to see if it's for Tango or Phobos. In such
situation I have to see what kind of memory does it use?)

And in big programs isn't the usage of a GC inevitable? So are the other two
options only for small programs?

When you see code written by other people you must be able to understand and
know how to use the refcounting too. So this adds some more things to know. D2
is getting more and more complex than D1. Eventually D2 may risk getting too
much complex for normal people. Both C and Java are simple enough languages,
but a cross of them may be too much complex for normal people.
 
Bye,
bearophile

Apr 26 2009

Michel Fortin <michel.fortin michelf.com> writes:

On 2009-04-25 19:44:18 -0400, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 There are three schemes that are good starting points:
 
 1. As right now :o).
 
 2. Using refcounting. Arrays will be 4 words long (begin, end, 
 end-of-store, refcount*) and slices will be 3 words long (begin, end, 
 and owner*)
 
 3. Using manual management. Arrays will be 3 words long (begin, end, 
 end-of-store) and slices will be 2 words long (begin, end). This is 
 close to C++/STL. This case is less safe but very efficient.
 
 If we manage to integrate them all... that's quite the holy grail. And 
 I think it's entirely possible with only a few changes to the language.

1 and 2 certainly look useful.

I've never found 3 to be efficient in the C++/STL. Making a copy every 
time you need to store a string in a struct isn't something I call 
efficient. Especially in a language that supports immutable strings, 
reference counting seems immencely preferable. And reference counting 
has the advantage that it does not require any change or addition to 
the language syntax or semantics.

Andrei, do you have a use case for 3 that would end up worse using 
reference counting? Because I only see appending, and even then, 
appending when the refcount is 1 could be done in-place which would be 
pretty efficient.


-- 
Michel Fortin
michel.fortin michelf.com
http://michelf.com/

Apr 26 2009

Christopher Wright <dhasenan gmail.com> writes:

Andrei Alexandrescu wrote:
 Christopher Wright wrote:
 Simple solution: put the array definition in object.d and try 
 implementing arrays with reference counting or manual memory management.

 I think stored slices break manual memory management, even with a 
 dedicated slice type; but they should work just fine with refcounting. 
 If you don't want to change the language, object.Array will have to 
 implement the logic for slices and for allocated arrays. It's a bit 
 ugly, and it makes the Array type larger. Also, Array's reference 
 count would need to be accessed by reference.

 
 There are three schemes that are good starting points:
 
 1. As right now :o).
 
 2. Using refcounting. Arrays will be 4 words long (begin, end, 
 end-of-store, refcount*) and slices will be 3 words long (begin, end, 
 and owner*)

Well, a slice will be begin, end, owner-ref-count*. The owner could move 
about the stack or get passed between classes on the heap or something 
like that.

 3. Using manual management. Arrays will be 3 words long (begin, end, 
 end-of-store) and slices will be 2 words long (begin, end). This is 
 close to C++/STL. This case is less safe but very efficient.
 
 If we manage to integrate them all... that's quite the holy grail. And I 
 think it's entirely possible with only a few changes to the language.

Agreed.

Apr 26 2009

"Craig Black" <craigblack2 cox.net> writes:

Just my 2 cents.  I use a C++ array template that optimizes memory usage by 
only storing a pointer.  It stores the capacity and the size in the heap 
allocation.  The benefit here is that empty arrays only require one word.  I 
am the primary author of a simulation engine (similar to a game engine but 
less graphics-focused).  In practice I have found about half of all arrays 
are empty, so this approach provides much better memory efficiency for my 
purposes.

-Craig

Apr 26 2009

Steve Schveighoffer <schveiguy yahoo.com> writes:

On Sat, 25 Apr 2009 08:07:52 -0500, Andrei Alexandrescu wrote:

 It looks we can't make it with only T[]. We need a genuine container
 type, and T[new] was suggested. It would probably have value semantics.
 
 T[U] seems to have the same problem. If T[U] is the range, then how do
 you call the container?
 
 If we follow through with a no-garbage-collected option for the core
 types, then we also need to distinguish between a slice and its
 container. The fact that we (almost) got away with T[] being at the same
 time the container and the slice was that the garbage collector would
 collect all unused slices.
 
 
 Andrei

You are confusing the difference between T[] and T[U].  T[U] is a strange 
beast because it does not need to be new'd, but it acts completely like a 
reference type.  But T[U] *is* the container type.  What you should be 
asking is what is the *slice* type for T[U].  My answer would be that a 
slice type for a hashtable doesn't make sense.  We don't need AAs to be 
completely on par with normal arrays, so just leave them alone ;)

-Steve

Apr 25 2009

"Unknown W. Brackets" <unknown simplemachines.org> writes:

Well, a range of an associative array is a possibility (not a slice, but 
e.g. from std.algorithm.)

I think it'd be a mistake to discount that.  Being able to treat 
associative arrays like arrays, in some cases (e.g. with count(), etc.) 
is nice.  That said, i don't even know that std.algorithm currently 
supports this.

-[Unknown]


Steve Schveighoffer wrote:
 On Sat, 25 Apr 2009 08:07:52 -0500, Andrei Alexandrescu wrote:
 
 It looks we can't make it with only T[]. We need a genuine container
 type, and T[new] was suggested. It would probably have value semantics.

 T[U] seems to have the same problem. If T[U] is the range, then how do
 you call the container?

 If we follow through with a no-garbage-collected option for the core
 types, then we also need to distinguish between a slice and its
 container. The fact that we (almost) got away with T[] being at the same
 time the container and the slice was that the garbage collector would
 collect all unused slices.


 Andrei

 
 You are confusing the difference between T[] and T[U].  T[U] is a strange 
 beast because it does not need to be new'd, but it acts completely like a 
 reference type.  But T[U] *is* the container type.  What you should be 
 asking is what is the *slice* type for T[U].  My answer would be that a 
 slice type for a hashtable doesn't make sense.  We don't need AAs to be 
 completely on par with normal arrays, so just leave them alone ;)
 
 -Steve

Apr 25 2009

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Sat, 25 Apr 2009 12:56:15 -0400, Unknown W. Brackets  
<unknown simplemachines.org> wrote:

 Well, a range of an associative array is a possibility (not a slice, but  
 e.g. from std.algorithm.)

 I think it'd be a mistake to discount that.  Being able to treat  
 associative arrays like arrays, in some cases (e.g. with count(), etc.)  
 is nice.  That said, i don't even know that std.algorithm currently  
 supports this.

Sure, but why should a range type of an AA be a special builtin type?  It  
makes sense with array slices because they are valuable to use in many  
different ways, can be useful as sub-ranges, and keep their value long  
after the array has been changed.  An AA range would be useful only for  
iterating over all the elements.  Once the AA changes, the AA range is  
invalid.  Making a subrange is pretty much useless, since there's no real  
order to the data.  When a range is necessarily the range of the entire  
container, it becomes a trivial addition to the typesystem just to meet  
interface requirements.  Array slices are way more useful than that.

I'd say A[U] is the container type, and AARange!(A, U) is the range if you  
wish to have a type name.  I see no reason to add a builtin type to deal  
with AA ranges.

-Steve

Apr 27 2009

Christopher Wright <dhasenan gmail.com> writes:

Andrei Alexandrescu wrote:
 It looks we can't make it with only T[].

Citation needed. If this is a continuation of a previous discussion, a 
link to the previous discussion will suffice.

I grant that appending to an array is expensive. Make Array a 
user-defined type and we can see what's appropriate.

Apr 25 2009

"Danny Wilson" <bluezenix gmail.com> writes:

Op Sat, 25 Apr 2009 15:07:52 +0200 schreef Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org>:

 It looks we can't make it with only T[]. We need a genuine container  
 type, and T[new] was suggested. It would probably have value semantics.

 T[U] seems to have the same problem. If T[U] is the range, then how do  
 you call the container?

T[new U] ?
T[new(U)] ?
T[U new] ? you knew...?


still, reusing new like that looks pretty akward to me.

Apr 25 2009

Daniel Keep <daniel.keep.lists gmail.com> writes:

Andrei Alexandrescu wrote:
 It looks we can't make it with only T[]. We need a genuine container
 type, and T[new] was suggested. It would probably have value semantics.
 
 T[U] seems to have the same problem. If T[U] is the range, then how do
 you call the container?
 
 If we follow through with a no-garbage-collected option for the core
 types, then we also need to distinguish between a slice and its
 container. The fact that we (almost) got away with T[] being at the same
 time the container and the slice was that the garbage collector would
 collect all unused slices.
 
 Andrei

I've got a few problems understanding this thread.

1. What exactly do you mean by "genuine container type"?  That says to
me there's some ISO standard or something arrays don't conform to.  I
saw earlier you specify the lack of a capacity field; is that really it?

   If so, why not call it a capacity-aware array (or capacitive array :P).

2. I don't see why you want these to have value semantics.  What's the
motivation here.  Note that "because we need it" isn't a valid answer; a
link to an explanation would be great, if one exists.  :)

3. I'm really uncomfortable with the idea that the semantics of slices
changes when you switch off the GC.  That means you have a valid program
before, you throw -gc, and suddenly you get inscrutable crashes.

As for the names, maybe we've got this backwards.  What about these:

T[]      -- a freshly created array
T[..]    -- a slice of an array
T[U]     -- a freshly created associative array
T[U..]   -- a slice of an associative array

Otherwise, creating a new array would be (new T[new]) which looks
stupid.  That, or it's (new T[x]) and the result type is (T[new]) which
is inconsistent with every other type in the language.

Also, this means static arrays stay as T[n].

On another topic, you mentioned that (using your notation again) T[new]
would have four words: begin, end (but I liked length! :( ),
end-of-capacity and refcount*.  Could you pull a COM trick and stick the
capacity and refcount out the front of the array data?

  [ ? ] [ 4 ] [ 1 ] [ 2 ] [...] [...]
    ^     ^     ^           ^
    |     |     |           |
    |     |   begin        end
    |  capacity
 refcount

That allows T[new] to stay 8 bytes, and it also avoids needing a second
allocation for the refcount (which should have the same lifetime as the
array itself).

  -- Daniel

Apr 25 2009

"Robert Jacques" <sandford jhu.edu> writes:

On Sat, 25 Apr 2009 21:14:28 -0400, Daniel Keep  
<daniel.keep.lists gmail.com> wrote:
 Andrei Alexandrescu wrote:
 It looks we can't make it with only T[]. We need a genuine container
 type, and T[new] was suggested. It would probably have value semantics.

 T[U] seems to have the same problem. If T[U] is the range, then how do
 you call the container?

 If we follow through with a no-garbage-collected option for the core
 types, then we also need to distinguish between a slice and its
 container. The fact that we (almost) got away with T[] being at the same
 time the container and the slice was that the garbage collector would
 collect all unused slices.

 Andrei

 I've got a few problems understanding this thread.

 1. What exactly do you mean by "genuine container type"?  That says to
 me there's some ISO standard or something arrays don't conform to.  I
 saw earlier you specify the lack of a capacity field; is that really it?

    If so, why not call it a capacity-aware array (or capacitive array  
 :P).

 From a Model-View-Container standpoint, D's arrays mix both View and  
Container. Many think this is a good thing, but there is an argument for  
separating the views (slices) from the container (arrays). The whole  
capacity thing is separate from this issue and exists because 1) it's  
fixed in a free-list based allocator. 2) it's somewhat slow to query from  
the GC and 3) concatenating arrays, mainly when building strings, does it  
a lot.
P.S. Great names.

 2. I don't see why you want these to have value semantics.  What's the
 motivation here.  Note that "because we need it" isn't a valid answer; a
 link to an explanation would be great, if one exists.  :)

It hasn't been explained so far. (Although there are some arguments for  
more general application of value semantics in older threads)

 3. I'm really uncomfortable with the idea that the semantics of slices
 changes when you switch off the GC.  That means you have a valid program
 before, you throw -gc, and suddenly you get inscrutable crashes.

Actually, I'm pretty sure Andrei is proposing changing the semantics for  
both GC and no GC.

 As for the names, maybe we've got this backwards.  What about these:

 T[]      -- a freshly created array
 T[..]    -- a slice of an array
 T[U]     -- a freshly created associative array
 T[U..]   -- a slice of an associative array

 Otherwise, creating a new array would be (new T[new]) which looks
 stupid.  That, or it's (new T[x]) and the result type is (T[new]) which
 is inconsistent with every other type in the language.

 Also, this means static arrays stay as T[n].

 On another topic, you mentioned that (using your notation again) T[new]
 would have four words: begin, end (but I liked length! :( ),
 end-of-capacity and refcount*.  Could you pull a COM trick and stick the
 capacity and refcount out the front of the array data?

   [ ? ] [ 4 ] [ 1 ] [ 2 ] [...] [...]
     ^     ^     ^           ^
     |     |     |           |
     |     |   begin        end
     |  capacity
  refcount

 That allows T[new] to stay 8 bytes, and it also avoids needing a second
 allocation for the refcount (which should have the same lifetime as the
 array itself).

Or alternatively, both slices and arrays could store a pointer to the  
(refount, capacity) so you wouldn't need separate types.

Apr 25 2009

Max Samukha <samukha voliacable.com.removethis> writes:

On Sat, 25 Apr 2009 08:07:52 -0500, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:

It looks we can't make it with only T[]. We need a genuine container 
type, and T[new] was suggested. It would probably have value semantics.

T[U] seems to have the same problem. If T[U] is the range, then how do 
you call the container?

If we follow through with a no-garbage-collected option for the core 
types, then we also need to distinguish between a slice and its 
container. The fact that we (almost) got away with T[] being at the same 
time the container and the slice was that the garbage collector would 
collect all unused slices.


Andrei

T![] T![U] (kidding)
T[:] T[:U] 

FWIW, Qt uses by-value containers with COW and it looks like they are
ok for many usages. I'd like to see a std.containers module defining
containers with semantics based on a policy:

enum CopyPolicy
{
   Always,  // by value
   OnWrite, // by value with COW
   Never      // by reference
}

Then we could use the most appropriate policy for built-in arrays (I
think it should be by-value with COW).

module object;
import std.containers;

template Array(T)
{
    alias CoolArray!(T, CopyPolicy.OnWrite) Array;
}

and still be able to use containers with different policies when
necessary.

Apr 26 2009

D Programming

C/C++ Programming

Other

digitalmars.D - If T[new] is the container for T[], then what is the container for