digitalmars.D - Paralysis of analysis

Andrei Alexandrescu (50/50) Dec 14 2010 I kept on literally losing sleep about a number of issues involving

so (4/4) Dec 14 2010 Could you please elaborate the disadvantages part?

Andrei Alexandrescu (19/21) Dec 14 2010 Consider the empty() property. A struct using a pointer internally can

Walter Bright (4/5) Dec 14 2010 I don't think that overhead is a problem. For small numbers of values, o...

Steven Schveighoffer (10/15) Dec 14 2010 The place I see it being an issue is for things like a map where the val...

Bruno Medeiros (9/15) Jan 27 2011 Phew... That seems quite acceptable. For a moment, when I read your

Steven Schveighoffer (30/68) Dec 14 2010 I agree.

Nick Sabalausky (19/44) Dec 14 2010 (First of all, Disclaimer: I might not even know what the hell I'm talki...

Simon Buerger (11/61) Dec 14 2010 I continue to belief, that containers should be value-types. In order

Andrei Alexandrescu (9/18) Dec 14 2010 Coming from an STL background I was also very comfortable with the

Simon Buerger (19/26) Dec 14 2010 True thing, C++/STL does much work to prevent the copy-mechanism, but

spir (17/21) Dec 15 2010 There is not nay good technical answer.

spir (42/50) Dec 15 2010 =20

Andrei Alexandrescu (24/40) Dec 15 2010 Optimization (or pessimization) is a concern, but not my primary one. My...

spir (56/80) Dec 15 2010 =20

Andrei Alexandrescu (31/50) Dec 15 2010 Both are good. The question is which should be the "default" one and

spir (25/73) Dec 15 2010 I agree this is also an issue, but this is not the one I had in mind (so...

Bruno Medeiros (21/53) Jan 27 2011 As someone who takes conceptual cleanliness very seriously, I had to

Jonathan =?UTF-8?B?U2NobWlkdC1Eb21pbsOp?= (7/11) Dec 14 2010 From my point of view reference counting is not very elegant. The compil...

Jonathan =?UTF-8?B?U2NobWlkdC1Eb21pbsOp?= (3/8) Dec 14 2010 However, maybe reference counting is a feasible way to go before better

Dmitry Olshansky (14/64) Dec 14 2010 What challenges do we face with this approach? Can you please outline

Andrei Alexandrescu (9/43) Dec 14 2010 Usage of wrappers, yes. Essentially you'd use e.g. RBTree as a class or

Kagamin (16/22) Dec 14 2010 Thinking about this I've found an interesting issue:

Andrei Alexandrescu (4/26) Dec 14 2010 Yah, this has been discussed many times. Essentially AAs have class-like...
Steven Schveighoffer (8/30) Dec 14 2010 That's been discussed very much in the past. There is no good solution,...

Jonathan M Davis (9/74) Dec 14 2010 One concern that I would have would be inlining. Containers need to be e...
Michel Fortin (11/12) Dec 14 2010 I'd prefer them to be value types.
Craig Black (16/17) Dec 14 2010 I feel like the odd man out here since my perspective is so different. ...

Jonathan M Davis (3/24) Dec 14 2010 Dynamic arrays are already on the GC heap...

Craig Black (5/35) Dec 14 2010 Using built-in D arrays, yes. Using a templated struct, they don't have...
Dmitry Olshansky (9/32) Dec 15 2010 Hm,

Steven Schveighoffer (8/48) Dec 15 2010 You can append them. The append code will recognize that it's not a GC ...

Dmitry Olshansky (6/56) Dec 15 2010 Right, this is very important! I just checked, and luckily I did this

Steven Schveighoffer (13/26) Dec 15 2010 I should also note, if you do this:

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

I kept on literally losing sleep about a number of issues involving 
containers, sealing, arbitrary-cost copying vs. reference counting and 
copy-on-write, and related issues. This stops me from making rapid 
progress on defining D containers and other artifacts in the standard 
library.

Clearly we need to break this paralysis, and just as clearly whatever 
decision taken now will influence the prevalent D style going forward. 
So a decision needs to be made soon, just not hastily. Easier said than 
done!

I continue to believe that containers should have reference semantics, 
just like classes. Copying a container wholesale is not something you 
want to be automatic.

I also continue to believe that controlled lifetime (i.e. 
reference-counted implementation) is important for a container. 
Containers tend to be large compared to other objects, so exercising 
strict control over their allocated storage makes a lot of sense. What 
has recently shifted in my beliefs is that we should attempt to 
implement controlled lifetime _outside_ the container definition, by 
using introspection. (Currently some containers use reference counting 
internally, which makes their implementation more complicated than it 
could be.)

Finally, I continue to believe that sealing is worthwhile. In brief, a 
sealing container never gives out addresses of its elements so it has 
great freedom in controlling the data layout (e.g. pack 8 bools in one 
ubyte) and in controlling the lifetime of its own storage. Currently I'm 
not sure whether that decision should be taken by the container, by the 
user of the container, or by an introspection-based wrapper around an 
unsealed container.

* * *

That all being said, I'd like to make a motion that should simplify 
everyone's life - if only for a bit. I'm thinking of making all 
containers classes (either final classes or at a minimum classes with 
only final methods). Currently containers are implemented as structs 
that are engineered to have reference semantics. Some collections use 
reference counting to keep track of the memory used.

Advantages of the change:

- Clear, self-documented reference semantics

- Uses the right tool (classes) for the job (define a type with 
reference semantics)

- Pushes deterministic lifetime issues outside the containers 
(simplifying them) and factors such issues into reusable wrappers a la 
RefCounted.

Disadvantages:

- Containers must be dynamically allocated to do anything - even calling 
empty requires allocation.

- There's a two-words overhead associated with any class object.

- Containers cannot do certain optimizations that depend on container's 
control over its own storage.


What say you?

Andrei

Dec 14 2010

so <so so.do> writes:

Could you please elaborate the disadvantages part?

Thanks!

-- 
Using Opera's revolutionary email client: http://www.opera.com/mail/

Dec 14 2010

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 12/14/10 1:19 PM, so wrote:
 Could you please elaborate the disadvantages part?

 Thanks!

Consider the empty() property. A struct using a pointer internally can 
return true from empty if the pointer is null. A class cannot do that.

struct Array {
     Impl * p;
      property bool empty() { return !p || p.empty; }
}

vs.

class Array {
      property final bool empty() { ... }
}

Whatever empty() does, it must be called against an already-allocated 
reference to an Array.

The two words overhead comes from the vtable and the mutex.

A struct that has control over its own storage can be more aggressive 
about releasing unused memory. A class does not have that kind of 
control because it doesn't know how many references are out there to the 
same object.


Andrei

Dec 14 2010

Walter Bright <newshound2 digitalmars.com> writes:

Andrei Alexandrescu wrote:
 The two words overhead comes from the vtable and the mutex.

I don't think that overhead is a problem. For small numbers of values, one 
should use an array. The more complex containers are for larger numbers of 
values, where 2 words is insignificant.

Dec 14 2010

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Tue, 14 Dec 2010 15:13:36 -0500, Walter Bright  
<newshound2 digitalmars.com> wrote:

 Andrei Alexandrescu wrote:
 The two words overhead comes from the vtable and the mutex.

 I don't think that overhead is a problem. For small numbers of values,  
 one should use an array. The more complex containers are for larger  
 numbers of values, where 2 words is insignificant.

The place I see it being an issue is for things like a map where the value  
type is a linked list.  It's quite conceivable that a map type like this  
could have thousands of one-element linked lists, with a few scattered 2  
or more element linked lists.

I don't think it's a common thing, but I think there can be solutions that  
work around this issue.  I agree the 2 words are not enough to dissuade  
using classes for container types.

-Steve

Dec 14 2010

Bruno Medeiros <brunodomedeiros+spam com.gmail> writes:

On 14/12/2010 19:30, Andrei Alexandrescu wrote:
 On 12/14/10 1:19 PM, so wrote:
 Could you please elaborate the disadvantages part?

 Thanks!


 Whatever empty() does, it must be called against an already-allocated
 reference to an Array.

Phew... That seems quite acceptable. For a moment, when I read your 
original words:
"- Containers must be dynamically allocated to do anything - even 
calling empty requires allocation. "
it almost reads as you saying that any and each call to empty requires 
an allocation... :S

-- 
Bruno Medeiros - Software Engineer

Jan 27 2011

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Tue, 14 Dec 2010 14:02:34 -0500, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 I continue to believe that containers should have reference semantics,  
 just like classes. Copying a container wholesale is not something you  
 want to be automatic.

I agree.

 I also continue to believe that controlled lifetime (i.e.  
 reference-counted implementation) is important for a container.  
 Containers tend to be large compared to other objects, so exercising  
 strict control over their allocated storage makes a lot of sense. What  
 has recently shifted in my beliefs is that we should attempt to  
 implement controlled lifetime _outside_ the container definition, by  
 using introspection. (Currently some containers use reference counting  
 internally, which makes their implementation more complicated than it  
 could be.)

I think ref counting needs to be fleshed out more before we use it.  I'm  
not of the mind that phobos should use concepts that are not properly  
implementable based on the current compiler/runtime design in hopes that  
the design gets better.  I'd rather design it to work now, and redesign  
later if the opportunity becomes available.

 Finally, I continue to believe that sealing is worthwhile. In brief, a  
 sealing container never gives out addresses of its elements so it has  
 great freedom in controlling the data layout (e.g. pack 8 bools in one  
 ubyte) and in controlling the lifetime of its own storage. Currently I'm  
 not sure whether that decision should be taken by the container, by the  
 user of the container, or by an introspection-based wrapper around an  
 unsealed container.

I agree that a sealed container is worthwhile.  I think it needs to be the  
container's decision (for instance, the pack bools into bits must be a  
container decision).

 That all being said, I'd like to make a motion that should simplify  
 everyone's life - if only for a bit. I'm thinking of making all  
 containers classes (either final classes or at a minimum classes with  
 only final methods). Currently containers are implemented as structs  
 that are engineered to have reference semantics. Some collections use  
 reference counting to keep track of the memory used.

I think this is the right move.  Responding to pros/cons below:

 Advantages of the change:

 - Clear, self-documented reference semantics

 - Uses the right tool (classes) for the job (define a type with  
 reference semantics)

 - Pushes deterministic lifetime issues outside the containers  
 (simplifying them) and factors such issues into reusable wrappers a la  
 RefCounted.

- exposes the issue of default initialization by disallowing that.  This  
is the problem of passing an uninitialized struct into a function and  
having the function not be able to affect the original.  A class has a  
more defined and better understood lifetime cycle -- nothing exists until  
new is used.

- no more need to "check if it's valid" in every member function.

 Disadvantages:

 - Containers must be dynamically allocated to do anything - even calling  
 empty requires allocation.

Can't emplace work to fix this?  At least for cases where you don't need  
the container to live beyond the scope of a function.

 - There's a two-words overhead associated with any class object.

I assume this is in response to containers of containers?  It's actually  
96 bits, because the minimal memory block size is 16 bytes.  Therefore, a  
container which could potentially have a 1-word footprint must have 4  
words.  For 64-bit, I'm unsure of the proposed GC implementation.

I have some ideas to solve this, but they are abstract in my head, I  
haven't solidified them enough to start a discussion yet.  Short story --  
I think if we clearly separate the implementation from the container, we  
might be able to combine implementations in a minimal way.

 - Containers cannot do certain optimizations that depend on container's  
 control over its own storage.

Can you explain this further?

-Steve

Dec 14 2010

"Nick Sabalausky" <a a.a> writes:

"Steven Schveighoffer" <schveiguy yahoo.com> wrote in message 
news:op.vnpx3oioeav7ka steve-laptop...
 On Tue, 14 Dec 2010 14:02:34 -0500, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:

 Advantages of the change:

 - Clear, self-documented reference semantics

 - Uses the right tool (classes) for the job (define a type with 
 reference semantics)

 - Pushes deterministic lifetime issues outside the containers 
 (simplifying them) and factors such issues into reusable wrappers a la 
 RefCounted.

 - exposes the issue of default initialization by disallowing that.  This 
 is the problem of passing an uninitialized struct into a function and 
 having the function not be able to affect the original.  A class has a 
 more defined and better understood lifetime cycle -- nothing exists until 
 new is used.

 - no more need to "check if it's valid" in every member function.

 Disadvantages:

 - Containers must be dynamically allocated to do anything - even calling 
 empty requires allocation.


(First of all, Disclaimer: I might not even know what the hell I'm talking 
about...)

I'd be surprised if typical usage would cause this to be an issue. It's not 
like people are going to be allocating a new container *every* time 
something like "empty" is called. And there's always checking if the 
reference itself is null - that doesn't require allocation.

 - There's a two-words overhead associated with any class object.


Since, like you said, containers usually carry a fair amount of data, I'd be 
surprised if this would really be an issue. If there is ever a need for lots 
of containers with very small data one could still just do it all manually 
with structs. And maybe some wrappers could be used to get it to play nice 
with existing stuff that expects a standard class-based container?

Although, if this overhead would be associated with, for instance, every 
node in a tree or graph, then it might be more of an issue.

 - Containers cannot do certain optimizations that depend on container's 
 control over its own storage.


This seems like it could be more of an issue than the other two drawbacks. I 
wonder how often those optimizations would be needed? If only on occasion, 
then forcing a manual solution in those cases might be worth it for the 
rather compelling advantages.

Dec 14 2010

Simon Buerger <krox gmx.net> writes:

On 14.12.2010 20:02, Andrei Alexandrescu wrote:
 I kept on literally losing sleep about a number of issues involving
 containers, sealing, arbitrary-cost copying vs. reference counting and
 copy-on-write, and related issues. This stops me from making rapid
 progress on defining D containers and other artifacts in the standard
 library.

 Clearly we need to break this paralysis, and just as clearly whatever
 decision taken now will influence the prevalent D style going forward.
 So a decision needs to be made soon, just not hastily. Easier said
 than done!

 I continue to believe that containers should have reference semantics,
 just like classes. Copying a container wholesale is not something you
 want to be automatic.

 I also continue to believe that controlled lifetime (i.e.
 reference-counted implementation) is important for a container.
 Containers tend to be large compared to other objects, so exercising
 strict control over their allocated storage makes a lot of sense. What
 has recently shifted in my beliefs is that we should attempt to
 implement controlled lifetime _outside_ the container definition, by
 using introspection. (Currently some containers use reference counting
 internally, which makes their implementation more complicated than it
 could be.)

 Finally, I continue to believe that sealing is worthwhile. In brief, a
 sealing container never gives out addresses of its elements so it has
 great freedom in controlling the data layout (e.g. pack 8 bools in one
 ubyte) and in controlling the lifetime of its own storage. Currently
 I'm not sure whether that decision should be taken by the container,
 by the user of the container, or by an introspection-based wrapper
 around an unsealed container.

 * * *

 That all being said, I'd like to make a motion that should simplify
 everyone's life - if only for a bit. I'm thinking of making all
 containers classes (either final classes or at a minimum classes with
 only final methods). Currently containers are implemented as structs
 that are engineered to have reference semantics. Some collections use
 reference counting to keep track of the memory used.

 Advantages of the change:

 - Clear, self-documented reference semantics

 - Uses the right tool (classes) for the job (define a type with
 reference semantics)

 - Pushes deterministic lifetime issues outside the containers
 (simplifying them) and factors such issues into reusable wrappers a la
 RefCounted.

 Disadvantages:

 - Containers must be dynamically allocated to do anything - even
 calling empty requires allocation.

 - There's a two-words overhead associated with any class object.

 - Containers cannot do certain optimizations that depend on
 container's control over its own storage.


 What say you?

 Andrei

I continue to belief, that containers should be value-types. In order 
to prevent useless copying you can use something like "Impl * impl" 
and reference-counting. Then you only do a copy on actual change. This 
is the way I'm currently implementing in my own container-classes.

But I see the point in making them reference-types, because copying is 
so rare in real world. Though I find the expression "new Set()" most 
strange, you are definitlely right in the following: If you make them 
reference-types, they should be classes, not structs (and final, to 
prevent strange overloading).

Krox

Dec 14 2010

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 12/14/10 1:38 PM, Simon Buerger wrote:
 I continue to belief, that containers should be value-types. In order to
 prevent useless copying you can use something like "Impl * impl" and
 reference-counting. Then you only do a copy on actual change. This is
 the way I'm currently implementing in my own container-classes.

 But I see the point in making them reference-types, because copying is
 so rare in real world. Though I find the expression "new Set()" most
 strange, you are definitlely right in the following: If you make them
 reference-types, they should be classes, not structs (and final, to
 prevent strange overloading).

Coming from an STL background I was also very comfortable with the 
notion of value. Walter pointed to me that in the STL what you worry 
about most of the time is to _undo_ the propensity of objects getting 
copied at the drop of a hat. For example, think of the common n00b error 
of passing containers by value.

So since we have the opportunity to decide now for eternity the right 
thing, I think reference semantics works great with containers.


Andrei

Dec 14 2010

Simon Buerger <krox gmx.net> writes:

On 14.12.2010 20:53, Andrei Alexandrescu wrote:
 Coming from an STL background I was also very comfortable with the
 notion of value. Walter pointed to me that in the STL what you worry
 about most of the time is to _undo_ the propensity of objects getting
 copied at the drop of a hat. For example, think of the common n00b
 error of passing containers by value.

True thing, C++/STL does much work to prevent the copy-mechanism, but 
it can be circumvented by using the indirection+refCount trick. Than 
it doesnt matter how you pass it, it gets copied layzily when the 
first actual change occurs. That places some overhead
1) increasing/decrasing refcount on every argument-passing
2) checking for refCount>1 on every modifying method-call (not on the 
reading methods)

I'm pretty sure (1) is insignificand. (2) I'm not sure about. For a 
very simple list-container it might be a problem, but for 
sophisticated structures like hashtables or trees this one check is 
probably insignificand.


 So since we have the opportunity to decide now for eternity the right
 thing, I think reference semantics works great with containers.

Indeed. Whichever way to go, you need a good reason. I hope, a similar 
discussion will be placed for the actual interface of the 
container-lib. (Which template-params should there be? T, Allocator, 
Comp are the three most classic ones, but more or less is possible, 
and what kinds of containers should be there at all?. Anyway, doesnt 
belong here now).

Krox

Dec 14 2010

spir <denis.spir gmail.com> writes:

On Tue, 14 Dec 2010 22:11:59 +0100
Simon Buerger <krox gmx.net> wrote:

 So since we have the opportunity to decide now for eternity the right
 thing, I think reference semantics works great with containers. =20

=20
 Indeed. Whichever way to go, you need a good reason.

There is not nay good technical answer.
The only answer is semantic, on a per-application basis: it depends on what=
 the collection actually represents. Every container, just like a composite=
 element (struct vs class (*)) can be required on both value & ref version.=
 That's why we cannot decide,there will always be people on both sides base=
d on personal preferences and previous experiences.
Value vs ref has nothing to do with the data type. We could go on arguing o=
n this question until the end of times ;-)

Denis

(*) That's I regret D structs (unlike eg Oberon's record) do not have the f=
ull expressiveness of classes (miss extension/inheritance and method dispat=
ch according to runtime-type).
-- -- -- -- -- -- --
vit esse estrany =E2=98=A3

spir.wikidot.com

Dec 15 2010

spir <denis.spir gmail.com> writes:

On Tue, 14 Dec 2010 13:53:39 -0600
Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:

 Coming from an STL background I was also very comfortable with the=20
 notion of value. Walter pointed to me that in the STL what you worry=20
 about most of the time is to _undo_ the propensity of objects getting=20
 copied at the drop of a hat. For example, think of the common n00b error=

=20
 of passing containers by value.
=20
 So since we have the opportunity to decide now for eternity the right=20
 thing, I think reference semantics works great with containers.

The issue for me in your reasoning is that what you are here talking about,=
 and what your choice is based on, is _not_ reference _semantics_, but some=
thing like "indirection efficiency". This is optimization that clearly belo=
ngs to implementation and has nothing to do with semantics. Now, I totally =
agree it is very important (esp avoiding useless copies).
Reference semantics has something to do with semantics, namely that an elem=
ent in the program represents a "thing", some kind of entity in the model t=
hat has a proper "identity" (selfsameness), left unchanged in time however =
its form changes, and that can be multiply referenced.
Confusion arises (esp in languages of the C-line) because pointers used for=
 implementation (of variable size elements like dyn arrays) & performance (=
avoid copy) are sometimes called "references"; and references themselves ar=
e most commonly implemented as pointers.
The choice whether an program element should be made plain value/data or th=
ing/entity/ref, only depends from the semantic point of view on what it rep=
resents in the model. In languages of the C line that expose many implement=
ation issues to the programmer, other considerations may then enter the dan=
ce and contradict semantics in some cases. In other words, the value/ref cr=
iterion is orthogonal to the common notion of type.
We may be forced to paradoxically "ref" elements that represent plain infor=
mation, like color values, just to avoid useles copies, for instance, becau=
se the compiler won't ref it under the hood when possible.

I am convinced this efficiency can be automagic in the compiler, and the pr=
ogrammer would not have to care about that. Actually, the only problematic =
case is the one of (input-only-) _value_ parameters. The aim is for the com=
piler to pass them by ref for efficiency, when (1) they are heavy & (2) the=
y are left unchanged.
In an ideal world, parameters would be read-only e basta! But since this se=
ems to be impossible in a C-like language, the compiler would have to check=
 whether a value parameter is changed (1 per-thousand of all cases?), and c=
opy it only in this case. I do not know how complicated this is, anyway it =
is certainly doable.

On the other hand, if arguments let your positon unchanged that containers =
must behave like refs, then I fully agree they should be implemented as cla=
sses.


Denis
-- -- -- -- -- -- --
vit esse estrany =E2=98=A3

spir.wikidot.com

Dec 15 2010

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 12/15/10 8:34 AM, spir wrote:
 On Tue, 14 Dec 2010 13:53:39 -0600 Andrei
 Alexandrescu<SeeWebsiteForEmail erdani.org>  wrote:

 Coming from an STL background I was also very comfortable with the
 notion of value. Walter pointed to me that in the STL what you
 worry about most of the time is to _undo_ the propensity of objects
 getting copied at the drop of a hat. For example, think of the
 common n00b error of passing containers by value.

 So since we have the opportunity to decide now for eternity the
 right thing, I think reference semantics works great with
 containers.

 The issue for me in your reasoning is that what you are here talking
 about, and what your choice is based on, is _not_ reference
 _semantics_, but something like "indirection efficiency". This is
 optimization that clearly belongs to implementation and has nothing
 to do with semantics.

Optimization (or pessimization) is a concern, but not my primary one. My 
concern is: most of the time, do you want to work on a container or on a 
copy of the container? Consider this path-of-least-resistance code:

void fun(Container!int c) {
     ...
     c[5] += 42;
     ...
}

Question is, what's the most encountered activity? Should fun operate on 
whatever container it was passed, or on a copy of it? Based on extensive 
experience with the STL, I can say that in the overwhelming majority of 
cases you want the function to mess with the container, or look without 
touch (by means of const). It is so overwhelming, any code reviewer in 
an STL-based environment will raise a flag when seeing the C++ 
equivalent to the code above - ironically, even if fun actually does 
need a copy of its input! (The common idiom is to pass the container by 
constant reference and then create a copy of it inside fun, which is 
suboptimal.)

In contrast, most of the time you want to work on a copy of a string, so 
strings are commonly not containers. (This is nicely effected by string 
being defined as arrays of immutable characters.) However, you sometimes 
do need to mutate a string, which is why char[] is useful on occasion.


Andrei

Dec 15 2010

spir <denis.spir gmail.com> writes:

On Wed, 15 Dec 2010 09:56:36 -0600
Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:

 Optimization (or pessimization) is a concern, but not my primary one. My=

=20
 concern is: most of the time, do you want to work on a container or on a=

=20
 copy of the container? Consider this path-of-least-resistance code:
=20
 void fun(Container!int c) {
      ...
      c[5] +=3D 42;
      ...
 }
=20
 Question is, what's the most encountered activity? Should fun operate on=

=20
 whatever container it was passed, or on a copy of it? Based on extensive=

=20
 experience with the STL, I can say that in the overwhelming majority of=20
 cases you want the function to mess with the container, or look without=20
 touch (by means of const). It is so overwhelming, any code reviewer in=20
 an STL-based environment will raise a flag when seeing the C++=20
 equivalent to the code above - ironically, even if fun actually does=20
 need a copy of its input! (The common idiom is to pass the container by=20
 constant reference and then create a copy of it inside fun, which is=20
 suboptimal.)

I do agree.

When a container is passed as parameter
* either it is a value in meaning and should be left unchanged (--> so that=
 the compiler can pass it as "constant reference")
* or it means an entity with identity, it makes sense to change it, and it =
should be implemented as a ref.

What I'm trying to fight is beeing forced to implement semantics values as =
concrete ref elements. This is very bad, a kind of conceptual distortion (t=
he author of XL calls this semantic mismatch) that leads to much confusion.
Example of semantic distinction:
Take a palette of predefined colors (red, green,..) used to draw visual wid=
gets. In the simple case, colors are plain information (=3Dvalues), and the=
 palette (a collection) as well. In this case, every widget holds its own s=
ubset of colors used for each part of itself. Meaning copies. Chenging a gi=
ven color assigned to a widget should & does not affect others.
Now, imagine this palette can be edited "live" by the user, meaning redefin=
ing the components of re, green,... This time, the semantics may well be th=
at such changes should aaffect all widgets, including already defined ones.=
 For this, the palette must be implemented as an "entity", and each as well=
. But the reason for this is that the palette does not mean the same thing =
at all: instead of information about an aspect (color) of every widget, we =
have now a kind of container of color _sources_. Instead of color values, t=
he widget fields point to kinds of paint pots; these fields should not be c=
alled "color".
[It is not always that simple to find real-world metaphors helping us and c=
orrectly understand what we have to model and write into programs. A progra=
m's world is not at all reality, not even similar to it, even in the (minor=
ity of) cases where it models reality. In this case, "color" is misleading.]

In the first case, palette must be a value, in the second case it must be a=
 ref. There is no way to escape the dilemma about having value or ref colle=
ctions. Conceptually, we absolutely need both. Again the ref/value semantic=
 duality is independant from data types. If the language provides one kind =
only, we have to hack, to cheat with it.

There is a special case in non-OO-only cisconstances: sometimes an element =
is passed as parameter while it is conceptually the "object" (in common sen=
se) on which an operation applies (~ OO receiver). In OO, it would be passe=
d by ref precisely to allow it beeing changed, even if it is a plain value =
(this prevents creating a new value at every tiny chenge, as opposed to imm=
utability). But this relevant distinction between object of an operation (w=
hat) and true parameters (how) does not exist in plain function-based style:
	func(object, param1, param12);
So that we have to pass the object by ref when the operation is precisely h=
ere to modify it. But conceptually it is  not a parameter.

 In contrast, most of the time you want to work on a copy of a string, so=

=20
 strings are commonly not containers. (This is nicely effected by string=20
 being defined as arrays of immutable characters.) However, you sometimes=

=20
 do need to mutate a string, which is why char[] is useful on occasion.

I agree with this as well. Do does the right thing for strings.


Denis
-- -- -- -- -- -- --
vit esse estrany =E2=98=A3

spir.wikidot.com

Dec 15 2010

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 12/15/10 11:05 AM, spir wrote:
 What I'm trying to fight is beeing forced to implement semantics
 values as concrete ref elements. This is very bad, a kind of
 conceptual distortion (the author of XL calls this semantic mismatch)
 that leads to much confusion.

[snip example]

 Conceptually, we absolutely need both.
 Again the ref/value semantic duality is independant from data types.
 If the language provides one kind only, we have to hack, to cheat
 with it.

Both are good. The question is which should be the "default" one and 
what should be the "other" one.

Your example is from a class of examples that basically say: a mutable 
reference object in a struct with value semantics is trouble. That is:

struct Widget // value type
{
     ...
     Array!Color colorMap; // oops, undue aliasing
}

That is correct. There are two solutions I envision:

1. Define a Value wrapper in std.container or std.typecons:

struct Widget // value type
{
     ...
     Value!(Array!Color) colorMap; // clones upon copying
}

2. Define this(this)

struct Widget // value type
{
     ...
     Array!Color colorMap; // manually cloned upon copying
     this(this) {
         colorMap = colorMap.clone;
     }
}

Note that if Widget is a class there is no such problem. The entire 
issue applies to designing value types.

 There is a special case in non-OO-only cisconstances: sometimes an
 element is passed as parameter while it is conceptually the "object"
 (in common sense) on which an operation applies (~ OO receiver). In
 OO, it would be passed by ref precisely to allow it beeing changed,
 even if it is a plain value (this prevents creating a new value at
 every tiny chenge, as opposed to immutability). But this relevant
 distinction between object of an operation (what) and true parameters
 (how) does not exist in plain function-based style: func(object,
 param1, param12); So that we have to pass the object by ref when the
 operation is precisely here to modify it. But conceptually it is  not
 a parameter.

I don't understand this part.


Andrei

Dec 15 2010

spir <denis.spir gmail.com> writes:

On Wed, 15 Dec 2010 11:57:32 -0600
Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:

 On 12/15/10 11:05 AM, spir wrote:
 What I'm trying to fight is beeing forced to implement semantics
 values as concrete ref elements. This is very bad, a kind of
 conceptual distortion (the author of XL calls this semantic mismatch)
 that leads to much confusion.

 [snip example]
=20
 Conceptually, we absolutely need both.
 Again the ref/value semantic duality is independant from data types.
 If the language provides one kind only, we have to hack, to cheat
 with it.

=20
 Both are good. The question is which should be the "default" one and=20
 what should be the "other" one.
=20
 Your example is from a class of examples that basically say: a mutable=20
 reference object in a struct with value semantics is trouble. That is:
=20
 struct Widget // value type
 {
      ...
      Array!Color colorMap; // oops, undue aliasing
 }
=20
 That is correct. There are two solutions I envision:

I agree this is also an issue, but this is not the one I had in mind (sorry=
, for unclear expression).

 1. Define a Value wrapper in std.container or std.typecons:
=20
 struct Widget // value type
 {
      ...
      Value!(Array!Color) colorMap; // clones upon copying
 }
=20
 2. Define this(this)
=20
 struct Widget // value type
 {
      ...
      Array!Color colorMap; // manually cloned upon copying
      this(this) {
          colorMap =3D colorMap.clone;
      }
 }
=20
 Note that if Widget is a class there is no such problem. The entire=20
 issue applies to designing value types.

Actually, that is not what I meant. The actual "nature" (class/struct) of W=
idget is not the problem I tried to point. Rather to have colorMap's type d=
efined as a class when its meaning (in the model) is of plain value (often =
to avoid useless copy); or conversely (eg to avoid cost of instanciation on=
 the heap).
Ideally, I would like to have ref vs value distinction orthogonal to the wh=
ole type system, meaning "entity with identity" vs "plain data". For this, =
the language must
(1) properly cope with implemention issues (read: code efficiency),
(2) provide a "ref-ing" syntax similar to "pointing".
I won't dream of the latter, but I guess the first feature can well be done=
 in D. In requires the compiler detecting when a value parameter is not tou=
ched in a func body, then passing it by ref behind the stage. This would al=
low defining conceptual values as instances of value types without fearing =
inefficiency.
For the converse issue, I have no idea.

 [...]=20

 I don't understand this part.

Not that important [a bit off-topic].

Denis
-- -- -- -- -- -- --
vit esse estrany =E2=98=A3

spir.wikidot.com

Dec 15 2010

Bruno Medeiros <brunodomedeiros+spam com.gmail> writes:

On 15/12/2010 17:05, spir wrote:
 On Wed, 15 Dec 2010 09:56:36 -0600
 Andrei Alexandrescu<SeeWebsiteForEmail erdani.org>  wrote:

 Optimization (or pessimization) is a concern, but not my primary one. My
 concern is: most of the time, do you want to work on a container or on a
 copy of the container? Consider this path-of-least-resistance code:

 void fun(Container!int c) {
       ...
       c[5] += 42;
       ...
 }

 Question is, what's the most encountered activity? Should fun operate on
 whatever container it was passed, or on a copy of it? Based on extensive
 experience with the STL, I can say that in the overwhelming majority of
 cases you want the function to mess with the container, or look without
 touch (by means of const). It is so overwhelming, any code reviewer in
 an STL-based environment will raise a flag when seeing the C++
 equivalent to the code above - ironically, even if fun actually does
 need a copy of its input! (The common idiom is to pass the container by
 constant reference and then create a copy of it inside fun, which is
 suboptimal.)

 I do agree.

 When a container is passed as parameter
 * either it is a value in meaning and should be left unchanged (-->  so that
the compiler can pass it as "constant reference")
 * or it means an entity with identity, it makes sense to change it, and it
should be implemented as a ref.

 What I'm trying to fight is beeing forced to implement semantics values as
concrete ref elements. This is very bad, a kind of conceptual distortion (the
author of XL calls this semantic mismatch) that leads to much confusion.

As someone who takes conceptual cleanliness very seriously, I had to 
chime in, as I don't quite agree with your points.

 Example of semantic distinction:
 Take a palette of predefined colors (red, green,..) used to draw visual
widgets. In the simple case, colors are plain information (=values), and the
palette (a collection) as well. In this case, every widget holds its own subset
of colors used for each part of itself. Meaning copies. Chenging a given color
assigned to a widget should&  does not affect others.
 Now, imagine this palette can be edited "live" by the user, meaning redefining
the components of re, green,... This time, the semantics may well be that such
changes should aaffect all widgets, including already defined ones. For this,
the palette must be implemented as an "entity", and each as well. But the
reason for this is that the palette does not mean the same thing at all:
instead of information about an aspect (color) of every widget, we have now a
kind of container of color _sources_. Instead of color values, the widget
fields point to kinds of paint pots; these fields should not be called "color".
 [It is not always that simple to find real-world metaphors helping us and
correctly understand what we have to model and write into programs. A program's
world is not at all reality, not even similar to it, even in the (minority of)
cases where it models reality. In this case, "color" is misleading.]

 In the first case, palette must be a value, in the second case it must be a
ref. There is no way to escape the dilemma about having value or ref
collections. Conceptually, we absolutely need both. Again the ref/value
semantic duality is independant from data types. If the language provides one
kind only, we have to hack, to cheat with it.

The discussion here is simply what should be the common, default case. 
Who said we can't have both? What's the justification for "If the 
language provides one kind only, we have to hack, to cheat with it." ??


Also, the things you say about collections being ref or value is badly 
worded. First of all, considering a collection on its own, there is no 
right answer to whether the collection /should/ have value or reference 
semantics. The statement is meaningless.
Only when a collection is associated with some other object does this 
question make sense. Is the collection part-of/owned-by the object, or 
it is merely referenced by it? So taking your example, does a Widget 
each have their own Palette of Colors, or is there only one common 
Palette? The answer depends on your domain, this is a modeling/design 
problem, not a language design one.
The only thing the language should strive for is being able to 
represent/code both possible designs as well as possible (in a clear 
way, less bug prone, etc.).

-- 
Bruno Medeiros - Software Engineer

Jan 27 2011

Jonathan =?UTF-8?B?U2NobWlkdC1Eb21pbsOp?= <devel the-user.org> writes:

 I continue to belief, that containers should be value-types. In order
 to prevent useless copying you can use something like "Impl * impl"
 and reference-counting. Then you only do a copy on actual change. This
 is the way I'm currently implementing in my own container-classes.

From my point of view reference counting is not very elegant. The compiler 
should take care (or give possibilities to take care!!) that no unneccessary 
copies are made. It could be much simplier than reference counting, but D is 
simply currently not powerfull enough to allow this in a generic way. As I 
said in another thread some minutes before: I agree that containers should 
definitely be by value.

The User

Dec 14 2010

Jonathan =?UTF-8?B?U2NobWlkdC1Eb21pbsOp?= <devel the-user.org> writes:

Jonathan Schmidt-Dominé wrote:

 I continue to belief, that containers should be value-types. In order
 to prevent useless copying you can use something like "Impl * impl"
 and reference-counting. Then you only do a copy on actual change. This
 is the way I'm currently implementing in my own container-classes.

 From my point of view reference counting is not very elegant.

However, maybe reference counting is a feasible way to go before better 
times will arise.

Dec 14 2010

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

On 14.12.2010 22:02, Andrei Alexandrescu wrote:
 I kept on literally losing sleep about a number of issues involving 
 containers, sealing, arbitrary-cost copying vs. reference counting and 
 copy-on-write, and related issues. This stops me from making rapid 
 progress on defining D containers and other artifacts in the standard 
 library.

 Clearly we need to break this paralysis, and just as clearly whatever 
 decision taken now will influence the prevalent D style going forward. 
 So a decision needs to be made soon, just not hastily. Easier said 
 than done!

 I continue to believe that containers should have reference semantics, 
 just like classes. Copying a container wholesale is not something you 
 want to be automatic.

Sure thing.
 I also continue to believe that controlled lifetime (i.e. 
 reference-counted implementation) is important for a container. 
 Containers tend to be large compared to other objects, so exercising 
 strict control over their allocated storage makes a lot of sense. What 
 has recently shifted in my beliefs is that we should attempt to 
 implement controlled lifetime _outside_ the container definition, by 
 using introspection. (Currently some containers use reference counting 
 internally, which makes their implementation more complicated than it 
 could be.)

What challenges do we face with this approach? Can you please outline 
the mechanics of that controlled lifetime outside the container part, 
e.g. is it by usage of some tricky wrappers?

 Finally, I continue to believe that sealing is worthwhile. In brief, a 
 sealing container never gives out addresses of its elements so it has 
 great freedom in controlling the data layout (e.g. pack 8 bools in one 
 ubyte) and in controlling the lifetime of its own storage. Currently 
 I'm not sure whether that decision should be taken by the container, 
 by the user of the container, or by an introspection-based wrapper 
 around an unsealed container.

Your change looks like going with third option, am I correct?

 * * *

 That all being said, I'd like to make a motion that should simplify 
 everyone's life - if only for a bit. I'm thinking of making all 
 containers classes (either final classes or at a minimum classes with 
 only final methods). Currently containers are implemented as structs 
 that are engineered to have reference semantics. Some collections use 
 reference counting to keep track of the memory used.

 Advantages of the change:

 - Clear, self-documented reference semantics

 - Uses the right tool (classes) for the job (define a type with 
 reference semantics)

 - Pushes deterministic lifetime issues outside the containers 
 (simplifying them) and factors such issues into reusable wrappers a la 
 RefCounted.

 Disadvantages:

 - Containers must be dynamically allocated to do anything - even 
 calling empty requires allocation.

I was of impression that you could allocate class instances almost 
anywhere (with help of emplace), it's just that heap being the safe default.

 - There's a two-words overhead associated with any class object.

 - Containers cannot do certain optimizations that depend on 
 container's control over its own storage.

That must have something to do with sealed container being wrappers over 
unsealed ones, so as I observe your change implies not only a change to 
final classes.  Clearly something is missing in your post can you please 
be more specific on that change?

 What say you?

 Andrei


-- 
Dmitry Olshansky

Dec 14 2010

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 12/14/10 1:42 PM, Dmitry Olshansky wrote:
 On 14.12.2010 22:02, Andrei Alexandrescu wrote:
 I also continue to believe that controlled lifetime (i.e.
 reference-counted implementation) is important for a container.
 Containers tend to be large compared to other objects, so exercising
 strict control over their allocated storage makes a lot of sense. What
 has recently shifted in my beliefs is that we should attempt to
 implement controlled lifetime _outside_ the container definition, by
 using introspection. (Currently some containers use reference counting
 internally, which makes their implementation more complicated than it
 could be.)

 What challenges do we face with this approach? Can you please outline
 the mechanics of that controlled lifetime outside the container part,
 e.g. is it by usage of some tricky wrappers?

Usage of wrappers, yes. Essentially you'd use e.g. RBTree as a class or 
RefCounted!RBTree, which calls clear() against the object when the 
reference count goes down to zero.

 Finally, I continue to believe that sealing is worthwhile. In brief, a
 sealing container never gives out addresses of its elements so it has
 great freedom in controlling the data layout (e.g. pack 8 bools in one
 ubyte) and in controlling the lifetime of its own storage. Currently
 I'm not sure whether that decision should be taken by the container,
 by the user of the container, or by an introspection-based wrapper
 around an unsealed container.

 Your change looks like going with third option, am I correct?

Steve correctly pointed out that sealing must belong in the container.

 - Containers must be dynamically allocated to do anything - even
 calling empty requires allocation.

 I was of impression that you could allocate class instances almost
 anywhere (with help of emplace), it's just that heap being the safe
 default.

Most people would simply call new.

 - There's a two-words overhead associated with any class object.

 - Containers cannot do certain optimizations that depend on
 container's control over its own storage.

 That must have something to do with sealed container being wrappers over
 unsealed ones, so as I observe your change implies not only a change to
 final classes. Clearly something is missing in your post can you please
 be more specific on that change?

I withdraw that comment because I don't have good examples aside from 
deterministic memory release, which I already discussed.


Andrei

Dec 14 2010

Kagamin <spam here.lot> writes:

Andrei Alexandrescu Wrote:

 That all being said, I'd like to make a motion that should simplify 
 everyone's life - if only for a bit. I'm thinking of making all 
 containers classes (either final classes or at a minimum classes with 
 only final methods). Currently containers are implemented as structs 
 that are engineered to have reference semantics. Some collections use 
 reference counting to keep track of the memory used.

Thinking about this I've found an interesting issue:

---
void foo(int[int] aa)
{
	aa[2]=2;
}

int main()
{
	int[int] aa;
	//aa[1]=1; //uncomment this and it will work
	foo(aa);
	assert(aa[2]==2);
	return 0;
}
---

Dec 14 2010

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 12/14/10 1:56 PM, Kagamin wrote:
 Andrei Alexandrescu Wrote:

 That all being said, I'd like to make a motion that should simplify
 everyone's life - if only for a bit. I'm thinking of making all
 containers classes (either final classes or at a minimum classes with
 only final methods). Currently containers are implemented as structs
 that are engineered to have reference semantics. Some collections use
 reference counting to keep track of the memory used.

 Thinking about this I've found an interesting issue:

 ---
 void foo(int[int] aa)
 {
 	aa[2]=2;
 }

 int main()
 {
 	int[int] aa;
 	//aa[1]=1; //uncomment this and it will work
 	foo(aa);
 	assert(aa[2]==2);
 	return 0;
 }
 ---

Yah, this has been discussed many times. Essentially AAs have class-like 
semantics with null in tow.

Andrei

Dec 14 2010

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Tue, 14 Dec 2010 14:56:55 -0500, Kagamin <spam here.lot> wrote:

 Andrei Alexandrescu Wrote:

 That all being said, I'd like to make a motion that should simplify
 everyone's life - if only for a bit. I'm thinking of making all
 containers classes (either final classes or at a minimum classes with
 only final methods). Currently containers are implemented as structs
 that are engineered to have reference semantics. Some collections use
 reference counting to keep track of the memory used.

 Thinking about this I've found an interesting issue:

 ---
 void foo(int[int] aa)
 {
 	aa[2]=2;
 }

 int main()
 {
 	int[int] aa;
 	//aa[1]=1; //uncomment this and it will work
 	foo(aa);
 	assert(aa[2]==2);
 	return 0;
 }
 ---

That's been discussed very much in the past.  There is no good solution,  
and it's one of the good reasons to make collections classes with clearly  
defined lifetimes.

I can't find the thread that talks about this, but I think it was over a  
year ago that I brought up this subtlety when people were wondering what  
the correct implementation for containers should be -- struct or class.

-Steve

Dec 14 2010

Jonathan M Davis <jmdavisProg gmx.com> writes:

On Tuesday, December 14, 2010 11:02:34 Andrei Alexandrescu wrote:
 I kept on literally losing sleep about a number of issues involving
 containers, sealing, arbitrary-cost copying vs. reference counting and
 copy-on-write, and related issues. This stops me from making rapid
 progress on defining D containers and other artifacts in the standard
 library.
 
 Clearly we need to break this paralysis, and just as clearly whatever
 decision taken now will influence the prevalent D style going forward.
 So a decision needs to be made soon, just not hastily. Easier said than
 done!
 
 I continue to believe that containers should have reference semantics,
 just like classes. Copying a container wholesale is not something you
 want to be automatic.
 
 I also continue to believe that controlled lifetime (i.e.
 reference-counted implementation) is important for a container.
 Containers tend to be large compared to other objects, so exercising
 strict control over their allocated storage makes a lot of sense. What
 has recently shifted in my beliefs is that we should attempt to
 implement controlled lifetime _outside_ the container definition, by
 using introspection. (Currently some containers use reference counting
 internally, which makes their implementation more complicated than it
 could be.)
 
 Finally, I continue to believe that sealing is worthwhile. In brief, a
 sealing container never gives out addresses of its elements so it has
 great freedom in controlling the data layout (e.g. pack 8 bools in one
 ubyte) and in controlling the lifetime of its own storage. Currently I'm
 not sure whether that decision should be taken by the container, by the
 user of the container, or by an introspection-based wrapper around an
 unsealed container.
 
 * * *
 
 That all being said, I'd like to make a motion that should simplify
 everyone's life - if only for a bit. I'm thinking of making all
 containers classes (either final classes or at a minimum classes with
 only final methods). Currently containers are implemented as structs
 that are engineered to have reference semantics. Some collections use
 reference counting to keep track of the memory used.
 
 Advantages of the change:
 
 - Clear, self-documented reference semantics
 
 - Uses the right tool (classes) for the job (define a type with
 reference semantics)
 
 - Pushes deterministic lifetime issues outside the containers
 (simplifying them) and factors such issues into reusable wrappers a la
 RefCounted.
 
 Disadvantages:
 
 - Containers must be dynamically allocated to do anything - even calling
 empty requires allocation.
 
 - There's a two-words overhead associated with any class object.
 
 - Containers cannot do certain optimizations that depend on container's
 control over its own storage.
 
 
 What say you?

One concern that I would have would be inlining. Containers need to be
efficient, 
and if their functions can't be inlined, that could be problematic. I expect 
that if a container is a class and its functions are final (and possibly the 
class itself), then the functions wouldn't be virtual, and then the inliner can 
do its job. But if the container's functions are virtual, then inlining won't 
work. How much of a problem that would be in practice, I don't know, but I
think 
that it's something that needs to be considered.

- Jonathan M Davis

Dec 14 2010

Michel Fortin <michel.fortin michelf.com> writes:

On 2010-12-14 14:02:34 -0500, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 What say you?

I'd prefer them to be value types.

And I agree that if you want to give them reference semantics it's much 
cleaner if they're implemented as a class. I fear the null pointers 
however...

I understand your paralysis.


-- 
Michel Fortin
michel.fortin michelf.com
http://michelf.com/

Dec 14 2010

"Craig Black" <craigblack2 cox.net> writes:

 What say you?

I feel like the odd man out here since my perspective is so different.  I 
use custom container classes even in C++, partly because I can usually get 
better performance that way, and because I can customize the the container 
however I like.  So I will probably be doing my own containers if/when I use 
D.

Beyond that, my own personal preferences seem so different that I hesitate 
to mention them.  I use dynamic arrays by far the most out of all container 
classes.  I use them so much that I cringe at the thought of allocating them 
on the GC heap.  My code is very high performance and I would like to keep 
it that way.

Also, my usage of arrays is such that most of them are empty, so it is 
important to me that the empty arrays are stored efficiently.  Using my 
custom container class, an empty array does not require a heap allocation, 
and only requires a single pointer to be allocated.

Not sure if these requirements are important to anyone else, but I don't 
mind making my own custom containers if I need to.

Dec 14 2010

Jonathan M Davis <jmdavisProg gmx.com> writes:

On Tuesday, December 14, 2010 16:35:34 Craig Black wrote:
 What say you?

 
 I feel like the odd man out here since my perspective is so different.  I
 use custom container classes even in C++, partly because I can usually get
 better performance that way, and because I can customize the the container
 however I like.  So I will probably be doing my own containers if/when I
 use D.
 
 Beyond that, my own personal preferences seem so different that I hesitate
 to mention them.  I use dynamic arrays by far the most out of all container
 classes.  I use them so much that I cringe at the thought of allocating
 them on the GC heap.  My code is very high performance and I would like to
 keep it that way.
 
 Also, my usage of arrays is such that most of them are empty, so it is
 important to me that the empty arrays are stored efficiently.  Using my
 custom container class, an empty array does not require a heap allocation,
 and only requires a single pointer to be allocated.
 
 Not sure if these requirements are important to anyone else, but I don't
 mind making my own custom containers if I need to.

Dynamic arrays are already on the GC heap...

- Jonathan M Davis

Dec 14 2010

"Craig Black" <craigblack2 cox.net> writes:

"Jonathan M Davis" <jmdavisProg gmx.com> wrote in message 
news:mailman.1005.1292374292.21107.digitalmars-d puremagic.com...
 On Tuesday, December 14, 2010 16:35:34 Craig Black wrote:
 What say you?

 I feel like the odd man out here since my perspective is so different.  I
 use custom container classes even in C++, partly because I can usually 
 get
 better performance that way, and because I can customize the the 
 container
 however I like.  So I will probably be doing my own containers if/when I
 use D.

 Beyond that, my own personal preferences seem so different that I 
 hesitate
 to mention them.  I use dynamic arrays by far the most out of all 
 container
 classes.  I use them so much that I cringe at the thought of allocating
 them on the GC heap.  My code is very high performance and I would like 
 to
 keep it that way.

 Also, my usage of arrays is such that most of them are empty, so it is
 important to me that the empty arrays are stored efficiently.  Using my
 custom container class, an empty array does not require a heap 
 allocation,
 and only requires a single pointer to be allocated.

 Not sure if these requirements are important to anyone else, but I don't
 mind making my own custom containers if I need to.

 Dynamic arrays are already on the GC heap...

 - Jonathan M Davis

Using built-in D arrays, yes.  Using a templated struct, they don't have to 
be.   C++ std::vector works just fine without GC.  The same can be done in 
D.

Dec 14 2010

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

On 15.12.2010 3:50, Jonathan M Davis wrote:
 On Tuesday, December 14, 2010 16:35:34 Craig Black wrote:
 What say you?

 I feel like the odd man out here since my perspective is so different.  I
 use custom container classes even in C++, partly because I can usually get
 better performance that way, and because I can customize the the container
 however I like.  So I will probably be doing my own containers if/when I
 use D.

 Beyond that, my own personal preferences seem so different that I hesitate
 to mention them.  I use dynamic arrays by far the most out of all container
 classes.  I use them so much that I cringe at the thought of allocating
 them on the GC heap.  My code is very high performance and I would like to
 keep it that way.

 Also, my usage of arrays is such that most of them are empty, so it is
 important to me that the empty arrays are stored efficiently.  Using my
 custom container class, an empty array does not require a heap allocation,
 and only requires a single pointer to be allocated.

 Not sure if these requirements are important to anyone else, but I don't
 mind making my own custom containers if I need to.

 Dynamic arrays are already on the GC heap...

 - Jonathan M Davis

Hm,
((T*)malloc(1024*T.sizeof))[0..size];
works. Just needs careful initialization of each field, since they are 
filled with trash ...
And you can even do slicing. Just don't append to them and keep track of 
the initial reference ;)

-- 
Dmitry Olshansky

Dec 15 2010

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Wed, 15 Dec 2010 14:18:20 -0500, Dmitry Olshansky  
<dmitry.olsh gmail.com> wrote:

 On 15.12.2010 3:50, Jonathan M Davis wrote:
 On Tuesday, December 14, 2010 16:35:34 Craig Black wrote:
 What say you?

 I feel like the odd man out here since my perspective is so  
 different.  I
 use custom container classes even in C++, partly because I can usually  
 get
 better performance that way, and because I can customize the the  
 container
 however I like.  So I will probably be doing my own containers if/when  
 I
 use D.

 Beyond that, my own personal preferences seem so different that I  
 hesitate
 to mention them.  I use dynamic arrays by far the most out of all  
 container
 classes.  I use them so much that I cringe at the thought of allocating
 them on the GC heap.  My code is very high performance and I would  
 like to
 keep it that way.

 Also, my usage of arrays is such that most of them are empty, so it is
 important to me that the empty arrays are stored efficiently.  Using my
 custom container class, an empty array does not require a heap  
 allocation,
 and only requires a single pointer to be allocated.

 Not sure if these requirements are important to anyone else, but I  
 don't
 mind making my own custom containers if I need to.

 Dynamic arrays are already on the GC heap...

 - Jonathan M Davis

 Hm,
 ((T*)malloc(1024*T.sizeof))[0..size];
 works. Just needs careful initialization of each field, since they are  
 filled with trash ...
 And you can even do slicing. Just don't append to them and keep track of  
 the initial reference ;)

You can append them.  The append code will recognize that it's not a GC  
block and reallocate.

What you need to do more importantly is depending on the type of T, you  
may need to register the block as a root in the GC.  Otherwise, if T  
contains GC references, those could be collected prematurely.

-Steve

Dec 15 2010

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

On 15.12.2010 22:52, Steven Schveighoffer wrote:
 On Wed, 15 Dec 2010 14:18:20 -0500, Dmitry Olshansky 
 <dmitry.olsh gmail.com> wrote:

 On 15.12.2010 3:50, Jonathan M Davis wrote:
 On Tuesday, December 14, 2010 16:35:34 Craig Black wrote:
 What say you?

 I feel like the odd man out here since my perspective is so 
 different.  I
 use custom container classes even in C++, partly because I can 
 usually get
 better performance that way, and because I can customize the the 
 container
 however I like.  So I will probably be doing my own containers 
 if/when I
 use D.

 Beyond that, my own personal preferences seem so different that I 
 hesitate
 to mention them.  I use dynamic arrays by far the most out of all 
 container
 classes.  I use them so much that I cringe at the thought of 
 allocating
 them on the GC heap.  My code is very high performance and I would 
 like to
 keep it that way.

 Also, my usage of arrays is such that most of them are empty, so it is
 important to me that the empty arrays are stored efficiently.  
 Using my
 custom container class, an empty array does not require a heap 
 allocation,
 and only requires a single pointer to be allocated.

 Not sure if these requirements are important to anyone else, but I 
 don't
 mind making my own custom containers if I need to.

 Dynamic arrays are already on the GC heap...

 - Jonathan M Davis

 Hm,
 ((T*)malloc(1024*T.sizeof))[0..size];
 works. Just needs careful initialization of each field, since they 
 are filled with trash ...
 And you can even do slicing. Just don't append to them and keep track 
 of the initial reference ;)

 You can append them.  The append code will recognize that it's not a 
 GC block and reallocate.

Good to know.
 What you need to do more importantly is depending on the type of T, 
 you may need to register the block as a root in the GC.  Otherwise, if 
 T contains GC references, those could be collected prematurely.

Right, this is very important! I just checked, and luckily I did this 
only with plain data structs.

 -Steve


-- 
Dmitry Olshansky

Dec 15 2010

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Wed, 15 Dec 2010 15:19:34 -0500, Dmitry Olshansky  
<dmitry.olsh gmail.com> wrote:

 On 15.12.2010 22:52, Steven Schveighoffer wrote:
 On Wed, 15 Dec 2010 14:18:20 -0500, Dmitry Olshansky  
 <dmitry.olsh gmail.com> wrote:
 Hm,
 ((T*)malloc(1024*T.sizeof))[0..size];
 works. Just needs careful initialization of each field, since they are  
 filled with trash ...
 And you can even do slicing. Just don't append to them and keep track  
 of the initial reference ;)

 You can append them.  The append code will recognize that it's not a GC  
 block and reallocate.

 Good to know.

I should also note, if you do this:

auto x = ((T*)malloc(1024*T.sizeof))[0..size];
x ~= T.init;

You have now lost the original reference to the data (because x now points  
to the GC allocated block), so it will leak!  So while appending does  
work, you have to take care to still keep track of the original data.  I'd  
recommend something like this if it's a temporary:

auto x = (cast(T*)malloc(1024*T.sizeof))[0..size];
const origdata = x.ptr;
scope(exit) free(origdata);

-Steve

Dec 15 2010

D Programming

C/C++ Programming

Other

digitalmars.D - Paralysis of analysis