www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Linus with some good observations on garbage collection

reply Walter Bright <newshound2 digitalmars.com> writes:
http://gcc.gnu.org/ml/gcc/2002-08/msg00552.html
Apr 22 2011
next sibling parent Andrej Mitrovic <andrej.mitrovich gmail.com> writes:
Just a reminder: that post is 9 years old.
Apr 22 2011
prev sibling parent reply Alvaro <alvaro.segura gmail.com> writes:
El 22/04/2011 19:36, Walter Bright escribió:
 http://gcc.gnu.org/ml/gcc/2002-08/msg00552.html
I've always been surprised when discussions usually just bring garbage collection as the only alternative to explicit manual memory management. I imagined it as a garbage truck that has its own schedule and may let a lot of trash pile up before passing by. I always naively thought, why not just free immediately when an object gets no references? Not an expert, so there may be reasons I don't see, but now that Linus says somethnig along the lines, I'll ask. Why not? Isn't it much easier to do refcount++ and refcount--, and if refcount==0 immediately "free()"? Memory will be available to other needs faster, no need for an additional thread, or a lot of memory consumed before the advanced garbage truck decides to come in, or slight pauses when collecting trash (maybe only in old implementations), and the implementation is much simpler... OK, I knew about that "cyclic references" problem. But Linus doesn't seem to see a big problem and solutions can be found with care...
Apr 22 2011
next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Fri, 22 Apr 2011 14:32:06 -0400, Alvaro <alvaro.segura gmail.com> wrote:

 El 22/04/2011 19:36, Walter Bright escribió:
 http://gcc.gnu.org/ml/gcc/2002-08/msg00552.html
I've always been surprised when discussions usually just bring garbage collection as the only alternative to explicit manual memory management. I imagined it as a garbage truck that has its own schedule and may let a lot of trash pile up before passing by. I always naively thought, why not just free immediately when an object gets no references?
Because you then have to update potentially two reference counts every time you assign a pointer. GC's save you from doing that. I know way way less than Torvalds, but my naive brain says GC's still win because often times, slightly noticeable drops in performance are worth having code that doesn't corrupt memory. This may not be true for kernel development, but then again, we aren't all developing kernels ;) -Steve
Apr 22 2011
prev sibling next sibling parent reply Michael Stover <michael.r.stover gmail.com> writes:
This sort of reference count with cyclic dependency detector is how a lot o=
f
scripting languages do it, or did it in the past.  The problem was that laz=
y
generational GCs are believed to have better throughput in general.

I'd like to say "were proved" rather than "are believed", but I don't
actually know where to go for such evidence.  However, I do believe many
scripting languages, such as python, eventually ditched the reference
counting technique for generational, and Java has very fast GC, so I am
inclined to believe those real-life solutions than Linus.

Mike

On Fri, Apr 22, 2011 at 2:32 PM, Alvaro <alvaro.segura gmail.com> wrote:

 El 22/04/2011 19:36, Walter Bright escribi=F3:

  http://gcc.gnu.org/ml/gcc/2002-08/msg00552.html

 I've always been surprised when discussions usually just bring garbage
 collection as the only alternative to explicit manual memory management. =
I
 imagined it as a garbage truck that has its own schedule and may let a lo=
t
 of trash pile up before passing by. I always naively thought, why not jus=
t
 free immediately when an object gets no references?

 Not an expert, so there may be reasons I don't see, but now that Linus sa=
ys
 somethnig along the lines, I'll ask. Why not? Isn't it much easier to do
 refcount++ and refcount--, and if refcount=3D=3D0 immediately "free()"? M=
emory
 will be available to other needs faster, no need for an additional thread=
,
 or a lot of memory consumed before the advanced garbage truck decides to
 come in, or slight pauses when collecting trash (maybe only in old
 implementations), and the implementation is much simpler...

 OK, I knew about that "cyclic references" problem. But Linus doesn't seem
 to see a big problem and solutions can be found with care...
Apr 22 2011
parent =?UTF-8?B?IkrDqXLDtG1lIE0uIEJlcmdlciI=?= <jeberger free.fr> writes:
Michael Stover wrote:
 This sort of reference count with cyclic dependency detector is how a
 lot of scripting languages do it, or did it in the past.  The problem
 was that lazy generational GCs are believed to have better throughput i=
n
 general.

 I'd like to say "were proved" rather than "are believed", but I don't
 actually know where to go for such evidence.  However, I do believe man=
y
 scripting languages, such as python, eventually ditched the reference
 counting technique for generational, and Java has very fast GC, so I am=
 inclined to believe those real-life solutions than Linus.
This is a single benchmark on a single obscure language, so take it with a grain of salt, but look at the "Nimrod" and "Nimrod Boehm" lines here: https://bitbucket.org/jmb/shootout/wiki/Home By default, Nimrod uses a reference counter but there is a compiler switch to use the Boehm GC instead. Of particular interest is the benchmark "Binary trees", since it is specially designed to exercise memory management. On that benchmark, the Boehm version is about twice as fast as the refcounting version, while on other benchmarks, performance is comparable. Jerome --=20 mailto:jeberger free.fr http://jeberger.free.fr Jabber: jeberger jabber.fr
Apr 23 2011
prev sibling parent reply Brad Roberts <braddr puremagic.com> writes:
Also add to it that in many cases you're dealing with a threaded environment,
so those refcounts have to be locked
(either via mutexes, or more commonly just atomic) operations which are far
more expensive than non-atomic.  More so
when there's actual contention for the refcounted resource.

On 4/22/2011 11:53 AM, Michael Stover wrote:
 This sort of reference count with cyclic dependency detector is how a lot of
scripting languages do it, or did it in the
 past.  The problem was that lazy generational GCs are believed to have better
throughput in general.
 
 I'd like to say "were proved" rather than "are believed", but I don't actually
know where to go for such evidence.
  However, I do believe many scripting languages, such as python, eventually
ditched the reference counting technique for
 generational, and Java has very fast GC, so I am inclined to believe those
real-life solutions than Linus.
 
 Mike
 
 On Fri, Apr 22, 2011 at 2:32 PM, Alvaro <alvaro.segura gmail.com
<mailto:alvaro.segura gmail.com>> wrote:
 
     El 22/04/2011 19:36, Walter Bright escribió:
 
         http://gcc.gnu.org/ml/gcc/2002-08/msg00552.html
 
 
     I've always been surprised when discussions usually just bring garbage
collection as the only alternative to
     explicit manual memory management. I imagined it as a garbage truck that
has its own schedule and may let a lot of
     trash pile up before passing by. I always naively thought, why not just
free immediately when an object gets no
     references?
 
     Not an expert, so there may be reasons I don't see, but now that Linus
says somethnig along the lines, I'll ask. Why
     not? Isn't it much easier to do refcount++ and refcount--, and if
refcount==0 immediately "free()"? Memory will be
     available to other needs faster, no need for an additional thread, or a
lot of memory consumed before the advanced
     garbage truck decides to come in, or slight pauses when collecting trash
(maybe only in old implementations), and
     the implementation is much simpler...
 
     OK, I knew about that "cyclic references" problem. But Linus doesn't seem
to see a big problem and solutions can be
     found with care...
 
 
Apr 22 2011
parent reply Timon Gehr <timon.gehr gmx.ch> writes:
Brad Roberts wrote:
 Also add to it that in many cases you're dealing with a threaded environment, >
so those refcounts have to be locked
 (either via mutexes, or more commonly just atomic) operations which are far
more
expensive than non-atomic. More so
 when there's actual contention for the refcounted resource.
That is only a problem if the reference count of that resource changes at a very high frequency. The described problem also implies that the threads would not need any form of synchronization for the data (otherwise the reference count certainly would not be a bottleneck.) I cannot, at the moment, think of a real-world example where this would not imply bad design. Can you help me out? Michael Stover wrote:
 I'd like to say "were proved" rather than "are believed", but I don't actually
know where to go for such evidence.
  However, I do believe many scripting languages, such as python, eventually
ditched the reference counting technique for
 generational, and Java has very fast GC, so I am inclined to believe those
real-life solutions than Linus.
 Mike
Well, the GC may be fast when compared to other GCs, but it has to be designed to run on general data whose reference structure can be arbitrary. Often, the objects/references have a somewhat specialized structure that a smart programmer can exploit, especially if the references point to private data. But that only matters if performance is of prime interest, and the gains may be not very big. But, as pointed out by Linus, the prime performance problem is _not_ the GC, but the mindset that comes with it. Most programmers that "grew up" in a managed environment tend to use very many "new" keywords all over their code, instead of this.) When they then try to write a C++ program, they do the same. The resulting memory bugs are then blamed on the lack of a GC (you can have GC in C/C++, but most of the people I have talked to do not know this.) They then happily change back to Java, that has a "very fast GC". The important thing to note here is that the work required to deallocate all these many memory locations does not magically disappear, but it is delegated to the GC, which will mostly do it faster and more reliable than a programmer which has to do it manually. But the problem does not lie in the deallocations, it's in the allocations. Consider this analogy: Scenario 1: Many people like candy. Those are wrapped in colorful little pieces of paper. Every day, every person buys one piece of candy in the candy shop (new Candy()!) and on the way back home they throw away the wrapping paper somewhere on the street (reassign reference). Those are garbage. In the evening, some creepy guy comes to search the whole street for those small pieces of paper. Would you call that guy a garbage collector? He collects garbage, but in the real world garbage collectors work more like this: Scenario 2: Still, many people like candy. Every person buys a bag of candy in the candy shop once a year (new Candy[365]). When all the candy is eaten, they put all the garbage in one bag and put it to their front door (reassign reference to whole array). A very handsome guy collects all those bags. He is very much more efficient than the guy in example 1. (Arguably, memory usage is bigger in that particular example, but in computer programs, the allocating process can reuse the memory. The analogy breaks down here.) Note that I am not saying that garbage collection is bad. If the references form a very complicated structure, or if a reference to the containing object is not necessarily kept (iE. array slicing), it can be very useful as well as faster than manual memory management. Also, custom allocators can reduce the lots-of-small-allocations problem, but they have some overhead too. Advanced GCs may do this automatically, I don't know. The reason Java is garbage collected is _not_ performance, but primarily reliability. In big programming projects, it can be hard to know where a reference to an object may be kept, if there are circular references etc, as programs keep expanding without the programmers understanding the whole project in detail. GC also removes bugs related to memory management. I think true and reliable OO is not possible without a GC. The downside is, that many Java/... programmers don't concern themselves much with the internals of memory allocations, and are _very_ proud of it. This is also the reason I think it is a bad idea to deprecate D's 'delete'. -Timon
Apr 22 2011
next sibling parent reply bearophile <bearophileHUGS lycos.com> writes:
Timon Gehr:

 But, as pointed out by Linus, the prime performance problem is _not_ the GC,
but
 the mindset that comes with it. Most programmers that "grew up" in a managed
 environment tend to use very many "new" keywords all over their code, instead
of

this.)
In C99 (and Ada) you avoid the allocation of some dynamic arrays with new thanks to variable length arrays.
 This is also the reason I think it is a bad idea to deprecate D's 'delete'.
D used to have scoped class instances, scoped classes, and delete, their replacements are not good enough yet. In CommonLisp you have hints for the GC, they are safe and they help you help speedup the work of the GC. Such hints probably need to be integrated with the type system, so they may need to be built-ins as scope/delete were. I am not seeing enough discussion about this. Bye, bearophile
Apr 22 2011
parent reply Iain Buclaw <ibuclaw ubuntu.com> writes:
== Quote from bearophile (bearophileHUGS lycos.com)'s article
 Timon Gehr:
 But, as pointed out by Linus, the prime performance problem is _not_ the GC,
but
 the mindset that comes with it. Most programmers that "grew up" in a managed
 environment tend to use very many "new" keywords all over their code, instead
of

this.)
In C99 (and Ada) you avoid the allocation of some dynamic arrays with new thanks
to variable length arrays. Variable length arrays are just sugary syntax for a call to alloca.
 This is also the reason I think it is a bad idea to deprecate D's 'delete'.
D used to have scoped class instances, scoped classes, and delete, their
replacements are not good enough yet. In CommonLisp you have hints for the GC, they are safe and they help you help speedup the work of the GC. Such hints probably need to be integrated with the type system, so they may need to be built-ins as scope/delete were. I am not seeing enough discussion about this.
 Bye,
 bearophile
I've always felt that Vala's system is better thought out, which is incidentally based on a reference counting system. This makes destructors in Vala deterministic and can be used to implement an RAII pattern for resource management. To get around the common pitfalls of reference counting systems, they introduce two keywords which alter the relationship between the allocated object and the GC, 'weak' and 'unowned'. Rather than bore you with the gritty details here, see link: http://live.gnome.org/Vala/ReferenceHandling
Apr 22 2011
next sibling parent reply bearophile <bearophileHUGS lycos.com> writes:
Iain Buclaw:

 Variable length arrays are just sugary syntax for a call to alloca.
I have an enhancement request in Bugzilla on VLA, with longer discussions. Just two comments: - It seems alloca() can be implemented with two different semantics: to deallocate at the end of the function or to deallocate at the end of the scope. Usually alloca() deallocates at the end of the function, but that semantic confusion is dangerous. VLA deallocate at the end of the scope, just like any other static array. - To use alloca you need to use pointers, maybe even slices, it's not DRY, etc. So syntax sugar helps. In the meantime I've changed my mind a little. Now D I prefer something better than C99 VLAs. I'd like D-VLAs with the same syntax as C99 VLAs but with a safer semantics, closer to this one (but the "alloca" used here must deallocate at the end of the scope): enum size_t MAX_VLA_SIZE = 1024; static assert (is(typeof(size) == size_t)); T* ptr = null; if ((size * T.sizeof) < MAX_VLA_SIZE) ptr = cast(T*)alloca(size * T.sizeof); T[] array = (ptr == null) ? new T[size] : ptr[0 .. size]; array[] = T.init; This has some advantages: when alloca returns null, or when the array is large, it uses the GC. This allows to both avoid some stack overflows and reduce the risk of the stack memory from becoming too much cold.
 Rather than bore you with the gritty details here, see link:
 http://live.gnome.org/Vala/ReferenceHandling
This is interesting. Bye, bearophile
Apr 22 2011
next sibling parent bearophile <bearophileHUGS lycos.com> writes:
 Now D I prefer something better than C99 VLAs. I'd like D-VLAs with the same
syntax as C99 VLAs but with a safer semantics,
Never mind, that semantics can't use that syntax, otherwise you have hidden heap allocations... The syntax has to change (because the D-VLA semantics seems OK to me). Bye, bearophile
Apr 22 2011
prev sibling parent Iain Buclaw <ibuclaw ubuntu.com> writes:
== Quote from bearophile (bearophileHUGS lycos.com)'s article
 Iain Buclaw:
 Variable length arrays are just sugary syntax for a call to alloca.
I have an enhancement request in Bugzilla on VLA, with longer discussions. Just
two comments:
 - It seems alloca() can be implemented with two different semantics: to
deallocate at the end of the function or to deallocate at the end of the scope. Usually alloca() deallocates at the end of the function, but that semantic confusion is dangerous. VLA deallocate at the end of the scope, just like any other static array. Yep, my understanding of the way C99 goes about it (I might be wrong here, so don't hurt me :), the stack is saved for the lifetime of the alloca'd array, and restored once it goes out of scope. ie: int sz = 42; { void * saved_stack = __stack_save (); // start of scope int array[0 : D.range]; uint D.range; // automatic variable void * D.ptr; // automatic variable D.range = sz + -1; D.ptr = __builtin_alloca (sz * 4); array = (int[0 : D.range] *) D.ptr; // later... __stack_restore (saved_stack); // end of scope }
Apr 23 2011
prev sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 4/22/11 5:21 PM, Iain Buclaw wrote:
 == Quote from bearophile (bearophileHUGS lycos.com)'s article
 Timon Gehr:
 But, as pointed out by Linus, the prime performance problem is _not_ the GC,
but
 the mindset that comes with it. Most programmers that "grew up" in a managed
 environment tend to use very many "new" keywords all over their code, instead
of

this.)
In C99 (and Ada) you avoid the allocation of some dynamic arrays with new thanks
to variable length arrays. Variable length arrays are just sugary syntax for a call to alloca.
Well in fairness they can be a fair amount more elaborate. The trouble with alloca is that it's impossible to compose with. So in general you need to write something like: bool onStack = smallEnough(length * sizeof(T)); auto a = (cast(T*) (onStack ? alloca : malloc)(length * sizeof(T))); scope(exit) if (!onStack) free(a.ptr); initialize(enforce(a)); scope(exit) clear(a); This block is difficult to factor out because of alloca. Andrei
Apr 23 2011
next sibling parent reply dsimcha <dsimcha yahoo.com> writes:
On 4/23/2011 9:48 AM, Andrei Alexandrescu wrote:
 On 4/22/11 5:21 PM, Iain Buclaw wrote:
 == Quote from bearophile (bearophileHUGS lycos.com)'s article
 Timon Gehr:
 But, as pointed out by Linus, the prime performance problem is _not_
 the GC, but
 the mindset that comes with it. Most programmers that "grew up" in a
 managed
 environment tend to use very many "new" keywords all over their
 code, instead of

 you to do this.)
In C99 (and Ada) you avoid the allocation of some dynamic arrays with new thanks
to variable length arrays. Variable length arrays are just sugary syntax for a call to alloca.
Well in fairness they can be a fair amount more elaborate. The trouble with alloca is that it's impossible to compose with. So in general you need to write something like: bool onStack = smallEnough(length * sizeof(T)); auto a = (cast(T*) (onStack ? alloca : malloc)(length * sizeof(T))); scope(exit) if (!onStack) free(a.ptr); initialize(enforce(a)); scope(exit) clear(a); This block is difficult to factor out because of alloca. Andrei
Right. This is exactly the kind of thing TempAlloc was meant to solve. (Actually, to give credit, the initial rough idea for TempAlloc was proposed by Andrei, though I fleshed out the details and implemented it.) With TempAlloc, you have a segmented stack (i.e. one that allocates a new segment instead of crashing the program if it's full) that is independent of the call stack (so you can return TempAlloc-allocated memory from functions). BTW, since when does the ternary operator work with functions, as opposed to variables?
Apr 23 2011
next sibling parent reply bearophile <bearophileHUGS lycos.com> writes:
dsimcha:

 Right.  This is exactly the kind of thing TempAlloc was meant to solve. 
I think TempAlloc is meant for larger allocations. D-VLAs are meant to allocate little arrays (like 1 KB or less) on the normal stack very quickly. Bye, bearophile
Apr 23 2011
parent reply dsimcha <dsimcha yahoo.com> writes:
On 4/23/2011 10:24 AM, bearophile wrote:
 dsimcha:

 Right.  This is exactly the kind of thing TempAlloc was meant to solve.
I think TempAlloc is meant for larger allocations. D-VLAs are meant to allocate little arrays (like 1 KB or less) on the normal stack very quickly. Bye, bearophile
Why would TempAlloc not work for this? It's slightly slower than VLAs, but the difference is negligible since it's still almost never a threading bottleneck and you still have to do something with the array you allocated.
Apr 23 2011
parent bearophile <bearophileHUGS lycos.com> writes:
dsimcha:

 It's slightly slower than VLAs, 
 but the difference is negligible since it's still almost never a 
 threading bottleneck and you still have to do something with the array 
 you allocated.
I have to compare the performance of both for my use cases (fast allocations of very small arrays, even a just a small number of chars). I think an advantage of stack allocations is that most times the top of the current stack is already in cache L1, so it's warm memory. Bye, bearophile
Apr 23 2011
prev sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 4/23/11 8:57 AM, dsimcha wrote:
 BTW, since when does the ternary operator work with functions, as
 opposed to variables?
They are converted to pointers to functions. Not something I'd recommend because it makes the call slower. Andrei
Apr 23 2011
parent reply Bruno Medeiros <brunodomedeiros+spam com.gmail> writes:
On 23/04/2011 15:45, Andrei Alexandrescu wrote:
 On 4/23/11 8:57 AM, dsimcha wrote:
 BTW, since when does the ternary operator work with functions, as
 opposed to variables?
And that's since the C days btw. The function call is just an operation, the callee an operand, so any expression can be there.
 They are converted to pointers to functions. Not something I'd recommend
 because it makes the call slower.

 Andrei
Hum, I didn't know that. Why does it make the call slower (other than the actual ternary operation), is it because it has to load the function address from a register/variable, instead of being a constant value? And/or not being able to inline the call? I mean, the first aspect seems like a very minor impact in performance, almost negligible. -- Bruno Medeiros - Software Engineer
Apr 29 2011
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 4/29/11 10:40 AM, Bruno Medeiros wrote:
 On 23/04/2011 15:45, Andrei Alexandrescu wrote:
 On 4/23/11 8:57 AM, dsimcha wrote:
 BTW, since when does the ternary operator work with functions, as
 opposed to variables?
And that's since the C days btw. The function call is just an operation, the callee an operand, so any expression can be there.
 They are converted to pointers to functions. Not something I'd recommend
 because it makes the call slower.

 Andrei
Hum, I didn't know that. Why does it make the call slower (other than the actual ternary operation), is it because it has to load the function address from a register/variable, instead of being a constant value? And/or not being able to inline the call? I mean, the first aspect seems like a very minor impact in performance, almost negligible.
It's an indirect call instead of a direct call. Andrei
Apr 29 2011
parent reply Bruno Medeiros <brunodomedeiros+spam com.gmail> writes:
On 29/04/2011 17:30, Andrei Alexandrescu wrote:
 On 4/29/11 10:40 AM, Bruno Medeiros wrote:
 On 23/04/2011 15:45, Andrei Alexandrescu wrote:
 On 4/23/11 8:57 AM, dsimcha wrote:
 BTW, since when does the ternary operator work with functions, as
 opposed to variables?
And that's since the C days btw. The function call is just an operation, the callee an operand, so any expression can be there.
 They are converted to pointers to functions. Not something I'd recommend
 because it makes the call slower.

 Andrei
Hum, I didn't know that. Why does it make the call slower (other than the actual ternary operation), is it because it has to load the function address from a register/variable, instead of being a constant value? And/or not being able to inline the call? I mean, the first aspect seems like a very minor impact in performance, almost negligible.
It's an indirect call instead of a direct call. Andrei
Well, yes, that's kinda of what I was thinking already (although I referred to more low-level, assembler terms). But my question remains, why is an indirect function call slower than a direct one, and is that difference significant? (excluding of course the possibilities of inlining and other static analysis that direct invocations offer) There is no extra overhead other than loading the function address from a register/variable, right? -- Bruno Medeiros - Software Engineer
May 03 2011
parent reply =?UTF-8?B?IkrDqXLDtG1lIE0uIEJlcmdlciI=?= <jeberger free.fr> writes:
Bruno Medeiros wrote:
 On 29/04/2011 17:30, Andrei Alexandrescu wrote:
 On 4/29/11 10:40 AM, Bruno Medeiros wrote:
 On 23/04/2011 15:45, Andrei Alexandrescu wrote:
 On 4/23/11 8:57 AM, dsimcha wrote:
 BTW, since when does the ternary operator work with functions, as
 opposed to variables?
And that's since the C days btw. The function call is just an operati=
on,
 the callee an operand, so any expression can be there.

 They are converted to pointers to functions. Not something I'd
 recommend
 because it makes the call slower.

 Andrei
Hum, I didn't know that. Why does it make the call slower (other than=
 the actual ternary operation), is it because it has to load the funct=
ion
 address from a register/variable, instead of being a constant value?
 And/or not being able to inline the call?
 I mean, the first aspect seems like a very minor impact in performanc=
e,
 almost negligible.
It's an indirect call instead of a direct call. Andrei
=20 Well, yes, that's kinda of what I was thinking already (although I referred to more low-level, assembler terms). But my question remains, why is an indirect function call slower than a=
 direct one, and is that difference significant? (excluding of course th=
e
 possibilities of inlining and other static analysis that direct
 invocations offer) There is no extra overhead other than loading the
 function address from a register/variable, right?
=20
AIUI, with a direct call the CPU can anticipate the call (because it knows the target address) and start filling the pipeline immediately. OTOH, with an indirect call the CPU cannot know the target until it reads the register, which means that the pipeline will empty itself before the call is executed. With the pipeline length of some modern CPUs (Pentium 4 being amongst the worst), this can result in very large slowdowns. Jerome --=20 mailto:jeberger free.fr http://jeberger.free.fr Jabber: jeberger jabber.fr
May 03 2011
next sibling parent Bruno Medeiros <brunodomedeiros+spam com.gmail> writes:
On 03/05/2011 22:41, "Jérôme M. Berger" wrote:
 Bruno Medeiros wrote:
 On 29/04/2011 17:30, Andrei Alexandrescu wrote:
 On 4/29/11 10:40 AM, Bruno Medeiros wrote:
 On 23/04/2011 15:45, Andrei Alexandrescu wrote:
 On 4/23/11 8:57 AM, dsimcha wrote:
 BTW, since when does the ternary operator work with functions, as
 opposed to variables?
And that's since the C days btw. The function call is just an operation, the callee an operand, so any expression can be there.
 They are converted to pointers to functions. Not something I'd
 recommend
 because it makes the call slower.

 Andrei
Hum, I didn't know that. Why does it make the call slower (other than the actual ternary operation), is it because it has to load the function address from a register/variable, instead of being a constant value? And/or not being able to inline the call? I mean, the first aspect seems like a very minor impact in performance, almost negligible.
It's an indirect call instead of a direct call. Andrei
Well, yes, that's kinda of what I was thinking already (although I referred to more low-level, assembler terms). But my question remains, why is an indirect function call slower than a direct one, and is that difference significant? (excluding of course the possibilities of inlining and other static analysis that direct invocations offer) There is no extra overhead other than loading the function address from a register/variable, right?
AIUI, with a direct call the CPU can anticipate the call (because it knows the target address) and start filling the pipeline immediately. OTOH, with an indirect call the CPU cannot know the target until it reads the register, which means that the pipeline will empty itself before the call is executed. With the pipeline length of some modern CPUs (Pentium 4 being amongst the worst), this can result in very large slowdowns. Jerome
Ah right, that makes sense, forgot about that one (I see it's the same issue with branch prediction). -- Bruno Medeiros - Software Engineer
May 04 2011
prev sibling parent reply Don <nospam nospam.com> writes:
Jérôme M. Berger wrote:
 Bruno Medeiros wrote:
 On 29/04/2011 17:30, Andrei Alexandrescu wrote:
 On 4/29/11 10:40 AM, Bruno Medeiros wrote:
 On 23/04/2011 15:45, Andrei Alexandrescu wrote:
 On 4/23/11 8:57 AM, dsimcha wrote:
 BTW, since when does the ternary operator work with functions, as
 opposed to variables?
And that's since the C days btw. The function call is just an operation, the callee an operand, so any expression can be there.
 They are converted to pointers to functions. Not something I'd
 recommend
 because it makes the call slower.

 Andrei
Hum, I didn't know that. Why does it make the call slower (other than the actual ternary operation), is it because it has to load the function address from a register/variable, instead of being a constant value? And/or not being able to inline the call? I mean, the first aspect seems like a very minor impact in performance, almost negligible.
It's an indirect call instead of a direct call. Andrei
Well, yes, that's kinda of what I was thinking already (although I referred to more low-level, assembler terms). But my question remains, why is an indirect function call slower than a direct one, and is that difference significant? (excluding of course the possibilities of inlining and other static analysis that direct invocations offer) There is no extra overhead other than loading the function address from a register/variable, right?
AIUI, with a direct call the CPU can anticipate the call (because it knows the target address) and start filling the pipeline immediately. OTOH, with an indirect call the CPU cannot know the target until it reads the register, which means that the pipeline will empty itself before the call is executed. With the pipeline length of some modern CPUs (Pentium 4 being amongst the worst), this can result in very large slowdowns. Jerome
That was true ten years ago, but not any more. Modern CPUs can do branch prediction of indirect jumps.
May 04 2011
parent dsimcha <dsimcha yahoo.com> writes:
On 5/4/2011 7:38 PM, Don wrote:
 Jérôme M. Berger wrote:
 Bruno Medeiros wrote:
 On 29/04/2011 17:30, Andrei Alexandrescu wrote:
 On 4/29/11 10:40 AM, Bruno Medeiros wrote:
 On 23/04/2011 15:45, Andrei Alexandrescu wrote:
 On 4/23/11 8:57 AM, dsimcha wrote:
 BTW, since when does the ternary operator work with functions, as
 opposed to variables?
And that's since the C days btw. The function call is just an operation, the callee an operand, so any expression can be there.
 They are converted to pointers to functions. Not something I'd
 recommend
 because it makes the call slower.

 Andrei
Hum, I didn't know that. Why does it make the call slower (other than the actual ternary operation), is it because it has to load the function address from a register/variable, instead of being a constant value? And/or not being able to inline the call? I mean, the first aspect seems like a very minor impact in performance, almost negligible.
It's an indirect call instead of a direct call. Andrei
Well, yes, that's kinda of what I was thinking already (although I referred to more low-level, assembler terms). But my question remains, why is an indirect function call slower than a direct one, and is that difference significant? (excluding of course the possibilities of inlining and other static analysis that direct invocations offer) There is no extra overhead other than loading the function address from a register/variable, right?
AIUI, with a direct call the CPU can anticipate the call (because it knows the target address) and start filling the pipeline immediately. OTOH, with an indirect call the CPU cannot know the target until it reads the register, which means that the pipeline will empty itself before the call is executed. With the pipeline length of some modern CPUs (Pentium 4 being amongst the worst), this can result in very large slowdowns. Jerome
That was true ten years ago, but not any more. Modern CPUs can do branch prediction of indirect jumps.
Yeh, I tested this a while back and can dig up/recreate the benchmark if need be. A virtual function call that's always resolving to the same address (so the branch is highly predictable) is not measurably slower than a _non-inlined_ direct function call. However, if the branch is not predictable, it's substantially slower and virtual functions usually can't be inlined.
May 04 2011
prev sibling next sibling parent bearophile <bearophileHUGS lycos.com> writes:
Andrei:

 Well in fairness they can be a fair amount more elaborate. The trouble 
 with alloca is that it's impossible to compose with. So in general you 
 need to write something like:
 
 bool onStack = smallEnough(length * sizeof(T));
 auto a = (cast(T*) (onStack ? alloca : malloc)(length * sizeof(T)));
 scope(exit) if (!onStack) free(a.ptr);
 initialize(enforce(a));
 scope(exit) clear(a);
 
 This block is difficult to factor out because of alloca.
See this too: http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=135208 Keep in mind that alloca() returns null more often than malloc, and when alloca() return null, it's still meaningful to try to use malloc(). So the semantics you have written is not good enough yet (as mine. A merge of the two seems better). My idea of D-style VLAs is something built-in that encapsulates that semantics (that's more complex than C99 VLA semantics). I am suggesting a built-in because there is some more semantics to keep into account for a good implementation: - Probably you want the arrays created on the stack to have an immutable length (otherwise you have to switch to a normal fully GC-heap dynamic array, but I don't like this). - You have to return such arrays from functions, and to give them to function arguments. As fixed-sized arrays they probably need to be managed by value (unless you use "ref"), so they are different from dynamic arrays, and closer to fixed-sized ones. Bye, bearophile
Apr 23 2011
prev sibling parent bearophile <bearophileHUGS lycos.com> writes:
Andrei:

 bool onStack = smallEnough(length * sizeof(T));
 auto a = (cast(T*) (onStack ? alloca : malloc)(length * sizeof(T)));
 scope(exit) if (!onStack) free(a.ptr);
 initialize(enforce(a));
 scope(exit) clear(a);
 
 This block is difficult to factor out because of alloca.
As I have explained in my old enhancement request about VLAs and in a recent post, this code is dangerous, because alloca() in some implementations deallocates at the end of the function, while in other cases it deallocates at the end of the scope. A well implemented D-VLA needs be sure to free the stack space at the end of the scope. Bye, bearophile
Apr 23 2011
prev sibling next sibling parent reply Ulrik Mikaelsson <ulrik.mikaelsson gmail.com> writes:
2011/4/22 Timon Gehr <timon.gehr gmx.ch>:
 That is only a problem if the reference count of that resource changes at=
a very
 high frequency. The described problem also implies that the threads would=
not need
 any form of synchronization for the data (otherwise the reference count c=
ertainly
 would not be a bottleneck.)
....
 Michael Stover wrote:
 I'd like to say "were proved" rather than "are believed", but I don't ac=
tually
 know where to go for such evidence.
 =C2=A0However, I do believe many scripting languages, such as python, ev=
entually
 ditched the reference counting technique for
 generational, and Java has very fast GC, so I am inclined to believe tho=
se
 real-life solutions than Linus.

 Well, the GC may be fast when compared to other GCs, but it has to be des=
igned to
 run on general data whose reference structure can be arbitrary. Often, th=
e
 objects/references have a somewhat specialized structure that a smart pro=
grammer
 can exploit, especially if the references point to private data. But that=
only
 matters if performance is of prime interest, and the gains may be not ver=
y big. All in all, I think the best approach is a pragmatic one, where different types of resources can be handled according to different schemes. I.E. default to GC-manage everything. After profiling, determining what resources are mostly used, and where, optimize allocation for those resources, preferably to scoped allocation, or if not possible, reference-counted. Premature optimization is a root of much evil, for instance, the malloc-paranoid might very well resort to abuse of struct:s, leading either to lots of manual pointers, or excessive memory copying. Incidentally, this was the main thing that attracted me to D. Be lazy/productive where performance doesn't matter much, and focus optimization on where it does.
Apr 22 2011
parent Timon Gehr <timon.gehr gmx.ch> writes:
Ulrik Mikaelsson wrote:
 All in all, I think the best approach is a pragmatic one, where
 different types of resources can be handled according to different
 schemes.

 I.E. default to GC-manage everything. After profiling, determining
 what resources are mostly used, and where, optimize allocation for
 those resources, preferably to scoped allocation, or if not possible,
 reference-counted.

 Premature optimization is a root of much evil, for instance, the
 malloc-paranoid might very well resort to abuse of struct:s, leading
 either to lots of manual pointers, or excessive memory copying.

 Incidentally, this was the main thing that attracted me to D. Be
 lazy/productive where performance doesn't matter much, and focus
 optimization on where it does.
That is very true. GC is almost always fast enough or even faster. And it is clearly most convenient. And yes, identify bottlenecks first, optimize later. But I also think programs that have some concern about efficient memory allocation (with GC or without GC) tend to be better designed in general. This actually increases productivity. Plus, it reduces the need for complicated optimizations later on. This in order increases maintainability. -Timon
Apr 22 2011
prev sibling next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 4/22/11 4:04 PM, Timon Gehr wrote:
[snip]
 This is also the reason I think it is a bad idea to deprecate D's
 'delete'.
The functionality is not going away. Andrei
Apr 23 2011
parent reply Timon Gehr <timon.gehr gmx.ch> writes:
 On 4/22/11 4:04 PM, Timon Gehr wrote:
 [snip]
 This is also the reason I think it is a bad idea to deprecate D's
 'delete'.
The functionality is not going away. Andrei
Yes I realize that. It is a matter of syntax and aesthetics. Handling allocation and deallocation non-uniformly renders the language ugly (or are you considering to remove the 'new' keyword too?). And it discourages people to get informed about custom allocators etc. I mean, how will custom allocators look like? The cost of having the delete keyword in the language is low. Before removing that, why not consider the 'body' keyword instead? Timon
Apr 23 2011
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 4/23/11 11:33 AM, Timon Gehr wrote:
 On 4/22/11 4:04 PM, Timon Gehr wrote:
 [snip]
 This is also the reason I think it is a bad idea to deprecate D's
 'delete'.
The functionality is not going away. Andrei
Yes I realize that. It is a matter of syntax and aesthetics. Handling allocation and deallocation non-uniformly renders the language ugly (or are you considering to remove the 'new' keyword too?).
Allocation and deallocation are not symmetric and as such handling them in a uniform way would be a mistake that perpetuates a poor understanding of the underlying realities of memory allocation and object creation. I suggest you reconsider.
 And it discourages people to get informed about
 custom allocators etc.
I don't see the relationship here.
 I mean, how will custom allocators look like? The cost of
 having the delete keyword in the language is low. Before removing that, why not
 consider the 'body' keyword instead?
The cost of keeping "delete" in the language is making a rarely-used, dangerous facility with loosely-defined semantics straight inside the language. It reflects poor language design for many reasons. Andrei
Apr 23 2011
parent reply Timon Gehr <timon.gehr gmx.ch> writes:
Andrei Alexandrescu wrote:
 Allocation and deallocation are not symmetric and as such handling them
 in a uniform way would be a mistake that perpetuates a poor
 understanding of the underlying realities of memory allocation and
 object creation. I suggest you reconsider.
'char' is completely orthogonal to memory allocation and it still gets handled uniformly to it syntactically. So I do not really get the point here.
 And it discourages people to get informed about
 custom allocators etc.
I don't see the relationship here.
I cannot imagine that the syntax for them will be particularly nice after the removal of 'delete'. Who likes features with unnecessarily clumsy syntax?
The cost of keeping "delete" in the language is making a rarely-used,
dangerous facility with loosely-defined semantics straight inside the
language. It reflects poor language design for many reasons.

Andrei
I think we are talking about different things. I do _not_ want to use the keyword to somehow try to deallocate GC allocated memory. (who would actually want to do this anyways?) I want it for custom allocators as in http://www.digitalmars.com/d/2.0/memory.html. In that case, the semantics would be completely specified by me. Removing the keyword means ruining that use case. You can also find a nice, clean and strict semantic specification for delete when called on GC memory that works on all platforms and with all GC implementations: "do nothing, GC memory is managed automatically." As it is done now, I agree it is not particularly nice. Timon
Apr 23 2011
next sibling parent reply Piotr Szturmaj <bncrbme jadamspam.pl> writes:
Timon Gehr wrote:
 Andrei Alexandrescu wrote:
 Allocation and deallocation are not symmetric and as such handling them
 in a uniform way would be a mistake that perpetuates a poor
 understanding of the underlying realities of memory allocation and
 object creation. I suggest you reconsider.
'char' is completely orthogonal to memory allocation and it still gets handled uniformly to it syntactically. So I do not really get the point here.
 And it discourages people to get informed about
 custom allocators etc.
I don't see the relationship here.
I cannot imagine that the syntax for them will be particularly nice after the removal of 'delete'. Who likes features with unnecessarily clumsy syntax?
 The cost of keeping "delete" in the language is making a rarely-used,
 dangerous facility with loosely-defined semantics straight inside the
 language. It reflects poor language design for many reasons.

 Andrei
I think we are talking about different things. I do _not_ want to use the keyword to somehow try to deallocate GC allocated memory. (who would actually want to do this anyways?) I want it for custom allocators as in http://www.digitalmars.com/d/2.0/memory.html. In that case, the semantics would be completely specified by me. Removing the keyword means ruining that use case. You can also find a nice, clean and strict semantic specification for delete when called on GC memory that works on all platforms and with all GC implementations: "do nothing, GC memory is managed automatically." As it is done now, I agree it is not particularly nice. Timon
Why not allow "delete" only in classes with custom allocators? AFAIK they could be determined at compile time. Please let me know if I'm wrong.
Apr 23 2011
parent Piotr Szturmaj <bncrbme jadamspam.pl> writes:
Piotr Szturmaj wrote:
 Why not allow "delete" only in classes with custom allocators? AFAIK
 they could be determined at compile time. Please let me know if I'm wrong.
Nevermind. Andrei informed that class allocators are being deprecated, so my question is no longer relevant.
Apr 24 2011
prev sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 4/23/11 1:05 PM, Timon Gehr wrote:
 Andrei Alexandrescu wrote:
 Allocation and deallocation are not symmetric and as such handling them
 in a uniform way would be a mistake that perpetuates a poor
 understanding of the underlying realities of memory allocation and
 object creation. I suggest you reconsider.
'char' is completely orthogonal to memory allocation and it still gets handled uniformly to it syntactically. So I do not really get the point here.
The point is that some naively believe "new" and "delete" are some simple do and undo actions and as such they somehow deserve the same level of support. In reality they have dramatically different issues and are profoundly asymmetric.
 And it discourages people to get informed about
 custom allocators etc.
I don't see the relationship here.
I cannot imagine that the syntax for them will be particularly nice after the removal of 'delete'. Who likes features with unnecessarily clumsy syntax?
 The cost of keeping "delete" in the language is making a rarely-used,
 dangerous facility with loosely-defined semantics straight inside the
 language. It reflects poor language design for many reasons.

 Andrei
I think we are talking about different things. I do _not_ want to use the keyword to somehow try to deallocate GC allocated memory. (who would actually want to do this anyways?) I want it for custom allocators as in http://www.digitalmars.com/d/2.0/memory.html. In that case, the semantics would be completely specified by me. Removing the keyword means ruining that use case. You can also find a nice, clean and strict semantic specification for delete when called on GC memory that works on all platforms and with all GC implementations: "do nothing, GC memory is managed automatically." As it is done now, I agree it is not particularly nice. Timon
The class-specific allocators are an exceedingly bad design carried over from C++. They will be deprecated from D as well. Even C++, where the features you are talking about have existed for many years, is shunning their use. Good applications make very sparingly use of "delete" and virtually none uses class-specific allocators. Experience with C++ is the best proof that the delete keyword and class-specific allocators have no place in the D programming language. Andrei
Apr 23 2011
parent reply Timon Gehr <timon.gehr gmx.ch> writes:
Andrei Alexandrescu wrote:
 The point is that some naively believe "new" and "delete" are some
 simple do and undo actions and as such they somehow deserve the same
 level of support. In reality they have dramatically different issues and
 are profoundly asymmetric.
 The class-specific allocators are an exceedingly bad design carried over
 from C++. They will be deprecated from D as well.
Does that imply that the feature is going away after all?
 Even C++, where the features you are talking about have existed for many
 years, is shunning their use. Good applications make very sparingly use
 of "delete" and virtually none uses class-specific allocators.
 Experience with C++ is the best proof that the delete keyword and
 class-specific allocators have no place in the D programming language.


 Andrei
Virtually no good application uses them? The _DMD compiler implementation_ uses them. Classes 'Token' and 'Scope' have class-specific optimized allocators. I don't think Walter would have done that if it did not give quite good benefits. D is a systems programming language after all. It is not Java or some scripting language. I don't think D should/can get rid of class-specific allocators if there is no good alternative for sane uses. Even if they are not used often and should not be used often. C++ is not garbage collected by default. Implementing a GC root class for example is another legitimate use case for class-specific allocators. In D, things are somewhat reverse, class-specific allocators would be used for manual memory management. D should support that. So if class-specific allocators go away, what is the alternative? Sure, it is always possible to allocate using custom functions (for D structs, at least.) But then, if allocations of one struct/class turn out to be a bottleneck somewhere late in the development, the whole code will have to be changed instead of just the addition of some simple custom class-specific allocator. (This is even worse when developing a library). Also, just testing the impact of a custom allocator will be a pain for the same reason. Why is that better? Timon
Apr 24 2011
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 04/24/2011 01:21 PM, Timon Gehr wrote:
 Andrei Alexandrescu wrote:
 The point is that some naively believe "new" and "delete" are some
 simple do and undo actions and as such they somehow deserve the same
 level of support. In reality they have dramatically different issues and
 are profoundly asymmetric.
 The class-specific allocators are an exceedingly bad design carried over
 from C++. They will be deprecated from D as well.
Does that imply that the feature is going away after all?
 Even C++, where the features you are talking about have existed for many
 years, is shunning their use. Good applications make very sparingly use
 of "delete" and virtually none uses class-specific allocators.
 Experience with C++ is the best proof that the delete keyword and
 class-specific allocators have no place in the D programming language.


 Andrei
Virtually no good application uses them? The _DMD compiler implementation_ uses them. Classes 'Token' and 'Scope' have class-specific optimized allocators. I don't think Walter would have done that if it did not give quite good benefits.
A class-specific freelist-based allocator in fact stands in the way of a good general-purpose allocator. At Facebook we did extensive measurements on a variety of applications and workloads and the best approach was to just disable all STL class-specific allocators (by defining GLIBCXX_FORCE_NEW) and let jemalloc take care of all allocations, to great effect. Today's good general-purpose allocators are as good or usually considerably better than custom allocators. The one category of custom allocators that is still significantly better are regions, but they have quite specific and restricted semantics.
 D
 is a systems programming language after all. It is not Java or some scripting
 language. I don't think D should/can get rid of class-specific allocators if
there
 is no good alternative for sane uses. Even if they are not used often and
should
 not be used often.

 C++ is not garbage collected by default. Implementing a GC root class for
example
 is another legitimate use case for class-specific allocators. In D, things are
 somewhat reverse, class-specific allocators would be used for manual memory
 management. D should support that.

 So if class-specific allocators go away, what is the alternative? Sure, it is
 always possible to allocate using custom functions (for D structs, at least.)
But
 then, if allocations of one struct/class turn out to be a bottleneck somewhere
 late in the development, the whole code will have to be changed instead of just
 the addition of some simple custom class-specific allocator. (This is even
worse
 when developing a library). Also, just testing the impact of a custom allocator
 will be a pain for the same reason. Why is that better?


 Timon
It is much better in very many ways, in particular in D. For one simple argument, consider: auto obj = new Widget; Simple modularity considerations would require this operator to have semantics independent of Widget, such that modular and generic code can use it properly. Custom class allocators ruin all that, because they would require the use of delete paired with new, whereas the general operator requires no action. Memory allocation and object construction are difficult topics, particularly in high-performance settings. A seasoned engineer understands that and does not naively believe that a smattering of well-intended but very badly executed language support will make that all easier. (Contrast that with e.g. reference counting semantics using RAII, which relies on a much better part of the language and does offer reasonable guarantees.) I am sorry, you simply have no case - each and every argument you put forth has no strength or is just wrong. We could spend time on debating each, but I suspect that would do little toward convincing you of what is after all a simple fact, but one with many subtle facets. It might not a good use of our time to further engage in a diatribe on this. The delete keyword will go, as will class-specific new and delete. Andrei
Apr 24 2011
next sibling parent reply Andrej Mitrovic <andrej.mitrovich gmail.com> writes:
Since this is definite we should remove them from the docs as well.
Want me to file it to bugzilla?
Apr 24 2011
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 04/24/2011 03:35 PM, Andrej Mitrovic wrote:
 Since this is definite we should remove them from the docs as well.
 Want me to file it to bugzilla?
Please do. BTW I just fixed the core-related documentation bug on d-programming-language.org. Andrei
Apr 24 2011
parent Andrej Mitrovic <andrej.mitrovich gmail.com> writes:
On 4/24/11, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:
 On 04/24/2011 03:35 PM, Andrej Mitrovic wrote:
 Since this is definite we should remove them from the docs as well.
 Want me to file it to bugzilla?
Please do. BTW I just fixed the core-related documentation bug on d-programming-language.org. Andrei
Fantastic, thank you.
Apr 24 2011
prev sibling next sibling parent reply bearophile <bearophileHUGS lycos.com> writes:
Andrei:

 It might not a good use of our time to further engage in a diatribe on this.
Just a note: you have not wasted your time discussing this, because I am not expert on such matters, so I learn something from informed discussions :-) If a book that explains C language is just 200 pages long, a book that explains the design purpose, design constraints and design compromises of every feature of C language requires a much longer book (800 pages?), and it's probably also much more interesting :-) Bye, bearophile
Apr 24 2011
parent Francisco Almeida <francisco.m.almeida gmail.com> writes:
== Quote from bearophile (bearophileHUGS lycos.com)'s article
 Andrei:
 It might not a good use of our time to further engage in a diatribe on this.
Just a note: you have not wasted your time discussing this, because I am not
expert on such matters, so I learn something from informed discussions :-)
 If a book that explains C language is just 200 pages long, a book that explains
the design purpose, design constraints and design compromises of every feature of C language requires a much longer book (800 pages?), and it's probably also much more interesting :-)
 Bye,
 bearophile
As one of those who were the most reluctant about abandoning delete, I have come to realize that it is better to make D classes strictly garbage collected, and drop delete. It should be noted that D supports RAII, as long as one uses struct (thus avoiding dynamic polymorphism, which shouldn't make a big difference once you think about it), and this gives D an important semantic separation lacking in C++ (or even those languages on the other side of the fence such as Java, for that matter). Making a habit of declaring struct instead of class when I intend to create "objects" on the stack instead of the heap, while resorting to garbage collected classes when it makes sense to do so, has made my life simpler. I do believe, however, that there is a recurrent communication error going on here. I would kindly ask the language designers to please update the documentation and write an article explaining the advantages of the approach (without making the mistake of turning it into a blind praise of garbage collection, that is). Newcomers need to know how to obtain garbage collected or RAII data structures from D, whenever they want to.
Apr 24 2011
prev sibling next sibling parent reply Timon Gehr <timon.gehr gmx.ch> writes:
Andrei Alexandrescu wrote:
 I am sorry, you simply have no case - each and every argument you put
 forth has no strength or is just wrong. We could spend time on debating
 each, but I suspect that would do little toward convincing you of what
 is after all a simple fact, but one with many subtle facets. It might
 not a good use of our time to further engage in a diatribe on this. The
 delete keyword will go, as will class-specific new and delete.


 Andrei
Ok, lets stop. Custom allocators and delete should probably be removed from D: Some of my facts: * freelist is faster if most allocation and deallocation concentrates on only one class. (see attachment) I think this is also the reason it increases DMDs speed. Token and Scope are quite obvious bottlenecks.. * such bottlenecks will most of the time only be detected when development is almost finished. (Almost never for library code, so I don't get why STL uses custom allocators at all.) * premature optimization is evil (see STL custom allocators) * late optimization: it is easy if there are custom allocators. In C++ it is even trivial. Nice design. Still no clue why STL is abusing it. * that form of custom allocators does not play nicely with Ds Garbage collector though, because it changes the semantics of 'new', ergo builtin custom allocators should probably indeed be removed from D. * this makes this form of optimization painful in D without custom allocators. This is what I was instinctively scared about. Fact is: it is also painful with them, because you don't get the 'delete's for free as in C++. I was somewhat inconsiderate on this. * it is possible that hinting the GC is only about twice as bad (again, see attachment). And some notes: * the fact that you think all my points were worthless or wrong implies that you either did not read all of them or have not considered them yourself when making your decision. I think that is scary. * There are many ways in which I am always right. The most dominant being that any sufficiently smart guy with a PhD in rocket science would totally agree with all my statements. No pun/offense intended. * this discussion would have been a waste of your time, had it not been public. Thanks for discussing. It was helpful. Timon PS: The current GC dies without hints on my benchmark, so it is indeed a nice idea to keep hints: version=GC: 1m45s version=GC_hints: ~6.8s version=freelist: ~3.7s version=c_heap: ~4.5s begin 644 benchmark.d M<VEO;CU'0SH ("` ("` ,6TT-7,*("H =F5R<VEO;CU'0U]H:6YT<SH ?C8N M.',*("H +R]U<VEN9R!C=7-T;VT 86QL;V-A=&]R<SH*("H =F5R<VEO;CUF M<F5E;&ES=#H ?C,N-W,*("H =F5R<VEO;CUC7VAE87`Z("` ?C0N-7,*("HO M" H*:6UP;W)T('-T9"YS=&1I;SL*:6UP;W)T('-T9"YC+G-T9&QI8CL*:6UP M;W)T('-T9"YC+G-T<FEN9SL*:6UP;W)T('-T9"YA;&=O<FET:&TZ<W=A<#L* M=F5R<VEO;CUF<F5E;&ES=#L*+R]V97)S:6]N/6-?:&5A<#L*" IV97)S:6]N M*&9R965L:7-T*2!V97)S:6]N/6-A;&QD96P["G9E<G-I;VXH1T-?:&EN=',I M('9E<G-I;VX]8V%L;&1E;#L*=F5R<VEO;BAC7VAE87`I('9E<G-I;VX]8V%L M;&1E;#L M=&EC(&9O;R!F<F5E;&ES=#L*"69O;R!N97AT.PH)=F5R<VEO;BAF<F5E;&ES M="E[" D);F5W*'-I>F5?="!S*7L*"0D):68H9G)E96QI<W0 (6ES(&YU;&PI M>PH)"0D)=F]I9"`J<CUC87-T*'9O:60J*69R965L:7-T.PH)"0D)9G)E96QI M<W0]9G)E96QI<W0N;F5X=#L*"0D)"7)E='5R;B!R.PH)"0E]" D)"7)E='5R M;B!C87-T*'9O:60J*6YE=R!C:&%R6W-=.R\O8V]U;&0 =7-E(&UA;&QO8RAS M*3L 969F96-T<R!A<F4 <VEM:6QA< H)"7T*"0ED96QE=&4H=F]I9"`J<"E[ M" D)"6EF*'`I>PH)"0D)9F]O(&]L9&9L/69R965L:7-T.PH)"0D)9G)E96QI M<W0]8V%S="AF;V\I<#L*"0D)"69R965L:7-T+FYE>'0];VQD9FP[" D)"7T* M"0E]" E]" EV97)S:6]N*&-?:&5A<"E[" D);F5W*'-I>F5?="!S*7MR971U M<FX ;6%L;&]C*',I.WT*"0ED96QE=&4H=F]I9"`J<"E[:68H<"D 9G)E92AP M:6YG+"`Q.B!E>'!E8W1E9"!S:')I;FMI;F<*"69O<F5A8V H=#LP M<CTP.PH)"7UE;'-E(&EF*'1O<#T M<G-I;VXH8V%L;&1E;"D 9&5L971E(&%;=&]P73L*"0D)9&ER/3$[" D)?65L M<V4 :68H9&ER7B$H<F%N9" I)3,I*7LO+VEF*"%D:7(F)B$H<F%N9" I)3,I M?'QD:7(F)G)A;F0H*24S*7L*"0D)=&]P+2T[" D)"79E<G-I;VXH8V%L;&1E M;"D 9&5L971E(&%;=&]P73L*"0E]96QS92!A6W1O<"LK73UN97< 9F]O.PH) M"6EF*"%R86YD*"DE,3`P*2!F;W(H:6YT(&D],#MI 985LP72QA6W)A;F0H*25T;W!=*3L*"7T*?0`` ` end
Apr 24 2011
next sibling parent reply Iain Buclaw <ibuclaw ubuntu.com> writes:
== Quote from Timon Gehr (timon.gehr gmx.ch)'s article
 Andrei Alexandrescu wrote:
 I am sorry, you simply have no case - each and every argument you put
 forth has no strength or is just wrong. We could spend time on debating
 each, but I suspect that would do little toward convincing you of what
 is after all a simple fact, but one with many subtle facets. It might
 not a good use of our time to further engage in a diatribe on this. The
 delete keyword will go, as will class-specific new and delete.


 Andrei
Is it correct that the current D GC implementation puts delete'd allocations into a freelist (or bucket) to be reused when 'new' is invoked again?
 Ok, lets stop. Custom allocators and delete should probably be removed from D:
 Some of my facts:
 * freelist is faster if most allocation and deallocation concentrates on only
one
 class. (see attachment) I think this is also the reason it increases DMDs
speed.
 Token and Scope are quite obvious bottlenecks..
 * such bottlenecks will most of the time only be detected when development is
 almost finished. (Almost never for library code, so I don't get why STL uses
 custom allocators at all.)
 * premature optimization is evil (see STL custom allocators)
 * late optimization: it is easy if there are custom allocators. In C++ it is
even
 trivial. Nice design. Still no clue why STL is abusing it.
 * that form of custom allocators does not play nicely with Ds Garbage collector
 though, because it changes the semantics of 'new', ergo builtin custom
allocators
 should probably indeed be removed from D.
 * this makes this form of optimization painful in D without custom allocators.
 This is what I was instinctively scared about. Fact is: it is also painful with
 them, because you don't get the 'delete's for free as in C++. I was somewhat
 inconsiderate on this.
 * it is possible that hinting the GC is only about twice as bad (again, see
 attachment).
 And some notes:
 * the fact that you think all my points were worthless or wrong implies that
you
 either did not read all of them or have not considered them yourself when
making
 your decision. I think that is scary.
I find it nuts that someone considers blaming/shunning language features just to mask bad runtime semantics they refuse to fix - but maybe it's just my ignorance on the subject that professes that view.
Apr 24 2011
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 04/24/2011 08:08 PM, Iain Buclaw wrote:
 == Quote from Timon Gehr (timon.gehr gmx.ch)'s article
 Andrei Alexandrescu wrote:
 I am sorry, you simply have no case - each and every argument you put
 forth has no strength or is just wrong. We could spend time on debating
 each, but I suspect that would do little toward convincing you of what
 is after all a simple fact, but one with many subtle facets. It might
 not a good use of our time to further engage in a diatribe on this. The
 delete keyword will go, as will class-specific new and delete.


 Andrei
Is it correct that the current D GC implementation puts delete'd allocations into a freelist (or bucket) to be reused when 'new' is invoked again?
I don't know. If it does, an additional level of freelists only hurts.
 Ok, lets stop. Custom allocators and delete should probably be removed from D:
 Some of my facts:
 * freelist is faster if most allocation and deallocation concentrates on only
one
 class. (see attachment) I think this is also the reason it increases DMDs
speed.
 Token and Scope are quite obvious bottlenecks..
 * such bottlenecks will most of the time only be detected when development is
 almost finished. (Almost never for library code, so I don't get why STL uses
 custom allocators at all.)
 * premature optimization is evil (see STL custom allocators)
 * late optimization: it is easy if there are custom allocators. In C++ it is
even
 trivial. Nice design. Still no clue why STL is abusing it.
 * that form of custom allocators does not play nicely with Ds Garbage collector
 though, because it changes the semantics of 'new', ergo builtin custom
allocators
 should probably indeed be removed from D.
 * this makes this form of optimization painful in D without custom allocators.
 This is what I was instinctively scared about. Fact is: it is also painful with
 them, because you don't get the 'delete's for free as in C++. I was somewhat
 inconsiderate on this.
 * it is possible that hinting the GC is only about twice as bad (again, see
 attachment).
 And some notes:
 * the fact that you think all my points were worthless or wrong implies that
you
 either did not read all of them or have not considered them yourself when
making
 your decision. I think that is scary.
I find it nuts that someone considers blaming/shunning language features just to mask bad runtime semantics they refuse to fix - but maybe it's just my ignorance on the subject that professes that view.
I'd appreciate if you provided more detail on that. Far as I can tell the problem is with the feature under discussion, not some alleged unwillingness to do work. Andrei
Apr 24 2011
next sibling parent reply "Robert Jacques" <sandford jhu.edu> writes:
On Sun, 24 Apr 2011 21:13:43 -0400, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 On 04/24/2011 08:08 PM, Iain Buclaw wrote:
 == Quote from Timon Gehr (timon.gehr gmx.ch)'s article
 Andrei Alexandrescu wrote:
 I am sorry, you simply have no case - each and every argument you put
 forth has no strength or is just wrong. We could spend time on  
 debating
 each, but I suspect that would do little toward convincing you of what
 is after all a simple fact, but one with many subtle facets. It might
 not a good use of our time to further engage in a diatribe on this.  
 The
 delete keyword will go, as will class-specific new and delete.


 Andrei
Is it correct that the current D GC implementation puts delete'd allocations into a freelist (or bucket) to be reused when 'new' is invoked again?
I don't know. If it does, an additional level of freelists only hurts.
To the best of my knowledge, for 'small' allocations, the current GC uses free-lists of power-of-2 sized blocks. (I think small is defined as anything under 4k)
Apr 24 2011
parent Sean Kelly <sean invisibleduck.org> writes:
On Apr 24, 2011, at 7:25 PM, Robert Jacques wrote:
=20
 To the best of my knowledge, for 'small' allocations, the current GC =
uses free-lists of power-of-2 sized blocks. (I think small is defined as = anything under 4k) That's correct.=
Apr 25 2011
prev sibling parent reply Iain Buclaw <ibuclaw ubuntu.com> writes:
== Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s article
 On 04/24/2011 08:08 PM, Iain Buclaw wrote:
 == Quote from Timon Gehr (timon.gehr gmx.ch)'s article
 Andrei Alexandrescu wrote:
 I am sorry, you simply have no case - each and every argument you put
 forth has no strength or is just wrong. We could spend time on debating
 each, but I suspect that would do little toward convincing you of what
 is after all a simple fact, but one with many subtle facets. It might
 not a good use of our time to further engage in a diatribe on this. The
 delete keyword will go, as will class-specific new and delete.


 Andrei
Is it correct that the current D GC implementation puts delete'd allocations into a freelist (or bucket) to be reused when 'new' is invoked again?
I don't know. If it does, an additional level of freelists only hurts.
 Ok, lets stop. Custom allocators and delete should probably be removed from D:
 Some of my facts:
 * freelist is faster if most allocation and deallocation concentrates on only
one
 class. (see attachment) I think this is also the reason it increases DMDs
speed.
 Token and Scope are quite obvious bottlenecks..
 * such bottlenecks will most of the time only be detected when development is
 almost finished. (Almost never for library code, so I don't get why STL uses
 custom allocators at all.)
 * premature optimization is evil (see STL custom allocators)
 * late optimization: it is easy if there are custom allocators. In C++ it is
even
 trivial. Nice design. Still no clue why STL is abusing it.
 * that form of custom allocators does not play nicely with Ds Garbage collector
 though, because it changes the semantics of 'new', ergo builtin custom
allocators
 should probably indeed be removed from D.
 * this makes this form of optimization painful in D without custom allocators.
 This is what I was instinctively scared about. Fact is: it is also painful with
 them, because you don't get the 'delete's for free as in C++. I was somewhat
 inconsiderate on this.
 * it is possible that hinting the GC is only about twice as bad (again, see
 attachment).
 And some notes:
 * the fact that you think all my points were worthless or wrong implies that
you
 either did not read all of them or have not considered them yourself when
making
 your decision. I think that is scary.
I find it nuts that someone considers blaming/shunning language features just to mask bad runtime semantics they refuse to fix - but maybe it's just my ignorance on the subject that professes that view.
I'd appreciate if you provided more detail on that. Far as I can tell the problem is with the feature under discussion, not some alleged unwillingness to do work. Andrei
I was thinking of where one implements an alternative runtime that may not necessarily interface with a GC, or has different semantics on the use of the keyword (the compiler, after all, doesn't know/care what exactly 'delete' does, it just does it's job at emitting the correct action we expect from it). Though it was late night and I perhaps didn't quite put it right anyway. Having literally slept on it though, a runtime with a good GC is one that doesn't offer a choice to 'delete' memory - so I am on your side. Though reflecting from my experience with the D2 GC (thinking from a compiler implementers POV), there isn't much I can really praise about it. Custom (de)allocators I can also see a very good point for removal too, however much a stab in the back for those who can use them in the - perhaps 'one-and-only' - case for a small amount of speed gain. Iain
Apr 25 2011
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 04/25/2011 10:06 AM, Iain Buclaw wrote:
 I was thinking of where one implements an alternative runtime that may not
 necessarily interface with a GC, or has different semantics on the use of the
 keyword (the compiler, after all, doesn't know/care what exactly 'delete'
does, it
 just does it's job at emitting the correct action we expect from it). Though it
 was late night and I perhaps didn't quite put it right anyway.

 Having literally slept on it though, a runtime with a good GC is one that
doesn't
 offer a choice to 'delete' memory - so I am on your side.  Though reflecting
from
 my experience with the D2 GC (thinking from a compiler implementers POV), there
 isn't much I can really praise about it.

 Custom (de)allocators I can also see a very good point for removal too, however
 much a stab in the back for those who can use them in the - perhaps
'one-and-only'
 - case for a small amount of speed gain.

 Iain
I understand. The right response to this is defining a good custom allocator API and providing good implementations of it. Andrei
Apr 25 2011
parent reply "Kim D" <eonwe linux.nu> writes:
 On 04/25/2011 10:06 AM, Iain Buclaw wrote:
 I was thinking of where one implements an alternative runtime that may 
 not
 necessarily interface with a GC, or has different semantics on the use of 
 the
 keyword (the compiler, after all, doesn't know/care what exactly 'delete' 
 does, it
 just does it's job at emitting the correct action we expect from it). 
 Though it
 was late night and I perhaps didn't quite put it right anyway.

 Having literally slept on it though, a runtime with a good GC is one that 
 doesn't
 offer a choice to 'delete' memory - so I am on your side.  Though 
 reflecting from
 my experience with the D2 GC (thinking from a compiler implementers POV), 
 there
 isn't much I can really praise about it.

 Custom (de)allocators I can also see a very good point for removal too, 
 however
 much a stab in the back for those who can use them in the - perhaps 
 'one-and-only'
 - case for a small amount of speed gain.

 Iain
I understand. The right response to this is defining a good custom allocator API and providing good implementations of it. Andrei
Sounds like a good way forward. However, certain operations are currently not supported unless the GC is enabled, for instance 'Associative Arrays of Strings'... do you imagine the future allocator API beeing flexible enough to facilitate this use-case in a GC-less environment? Or at least allowing more of the functionality which is currently unsupported without GC? Kim
Apr 25 2011
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 4/25/11 4:20 PM, Kim D wrote:
 On 04/25/2011 10:06 AM, Iain Buclaw wrote:
 I was thinking of where one implements an alternative runtime that may
 not
 necessarily interface with a GC, or has different semantics on the use of
 the
 keyword (the compiler, after all, doesn't know/care what exactly 'delete'
 does, it
 just does it's job at emitting the correct action we expect from it).
 Though it
 was late night and I perhaps didn't quite put it right anyway.

 Having literally slept on it though, a runtime with a good GC is one that
 doesn't
 offer a choice to 'delete' memory - so I am on your side.  Though
 reflecting from
 my experience with the D2 GC (thinking from a compiler implementers POV),
 there
 isn't much I can really praise about it.

 Custom (de)allocators I can also see a very good point for removal too,
 however
 much a stab in the back for those who can use them in the - perhaps
 'one-and-only'
 - case for a small amount of speed gain.

 Iain
I understand. The right response to this is defining a good custom allocator API and providing good implementations of it. Andrei
Sounds like a good way forward. However, certain operations are currently not supported unless the GC is enabled, for instance 'Associative Arrays of Strings'... do you imagine the future allocator API beeing flexible enough to facilitate this use-case in a GC-less environment? Or at least allowing more of the functionality which is currently unsupported without GC? Kim
An entity relying on GC will by definition have more freedom and more safety than one relying on an alternative allocator. It is possible to make that transparent to some extent (e.g. define a type to be either refcounted or not depending on the allocator used). Total transparency would be difficult to achieve. Andrei
Apr 25 2011
prev sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 04/24/2011 07:10 PM, Timon Gehr wrote:
 Andrei Alexandrescu wrote:
 I am sorry, you simply have no case - each and every argument you put
 forth has no strength or is just wrong. We could spend time on debating
 each, but I suspect that would do little toward convincing you of what
 is after all a simple fact, but one with many subtle facets. It might
 not a good use of our time to further engage in a diatribe on this. The
 delete keyword will go, as will class-specific new and delete.


 Andrei
Ok, lets stop. Custom allocators and delete should probably be removed from D:
Thank you for agreeing.
 Some of my facts:
 * freelist is faster if most allocation and deallocation concentrates on only
one
 class. (see attachment) I think this is also the reason it increases DMDs
speed.
 Token and Scope are quite obvious bottlenecks..
I agree. There are two other things that I don't agree with: 1. This particular point does nothing to help the original argument, which was to keep the delete keyword (and in particular class allocators) in the D programming language. 2. Freelists are not all smiles - have their big problems too. When deciding to go with freelist the implementer must also decide on a strategy for releasing that memory back to the backend allocator on some schedule. Choosing a "good" schedule is already a full-blown problem that is best left to the the specialists implementing the backend allocator. My colleagues discovered with horror that unless the environment variable GLIBCXX_FORCE_NEW was defined, the STL node-based containers would _never_ give back memory once allocated. Memory allocation is difficult, and for any allocator some workloads exist that makes it worse than all others. Nevertheless, recent developments in general-purpose memory allocators (e.g. the Low-Fragmentation Heap on Windows and a slew of allocators on Unix including ptmalloc, tcmalloc, jemalloc, and others) have shown that it's best to just use a good general-purpose allocator instead of fiddling with custom allocation.
 * such bottlenecks will most of the time only be detected when development is
 almost finished. (Almost never for library code, so I don't get why STL uses
 custom allocators at all.)
 * premature optimization is evil (see STL custom allocators)
 * late optimization: it is easy if there are custom allocators. In C++ it is
even
 trivial. Nice design. Still no clue why STL is abusing it.
 * that form of custom allocators does not play nicely with Ds Garbage collector
 though, because it changes the semantics of 'new', ergo builtin custom
allocators
 should probably indeed be removed from D.
 * this makes this form of optimization painful in D without custom allocators.
 This is what I was instinctively scared about. Fact is: it is also painful with
 them, because you don't get the 'delete's for free as in C++. I was somewhat
 inconsiderate on this.
 * it is possible that hinting the GC is only about twice as bad (again, see
 attachment).
OK. Allow me to add a speculation here. I believe that one mistake that both class-level new/delete and STL allocators have done was to associate allocators with types. That just seems to force the proverbial square peg through the round hole. General-purpose allocators just focus on size classes and do awfully well. In addition, they do that in an integrated, non-modular way (as work on HeapLayers has shown, modularizing allocators is difficult and not very intuitive).
 And some notes:
 * the fact that you think all my points were worthless or wrong implies that
you
 either did not read all of them or have not considered them yourself when
making
 your decision. I think that is scary.
Well I think I owe you an apology. It is very difficult to use limited time resources to convince someone of rather subtle matters, particularly if the discussion starts not with a question (e.g. "In light of what I said, I wonder whether removing delete is the right way to go."), but with an assertion. At that point I need to make a sort of a cold estimate - do I have a reasonable chance of convincing someone of something within a reasonable time frame? It's difficult to make and convey that decision without seeming impolite, although please believe me being curt pains me as much as anyone. To make matters more difficult, I'm not an expert in memory allocation and garbage collection. Probably I'm just at the point I can recognise good work when I see it. Yet, after a book chapter on memory allocation, a couple of articles written on memory allocation, a course I'm teaching on memory allocation, and a good amount of work on STL's interaction with jemalloc while at Facebook - I must admit I've become a bit jaded and as such less sanguine about arguing my points.
 * There are many ways in which I am always right. The most dominant being that
any
 sufficiently smart guy with a PhD in rocket science would totally agree with
all
 my statements. No pun/offense intended.
Not sure how you mean that. I mean, nobody here is discussing your competence on any personal trait of yours, and in particular I'm not keen of drawing any sweeping conclusion from any one discussion. You may as well know better than everybody about allocation, but so far you didn't seem to have found the right words to convey that. The simple fact is that here all anybody knows about you is what you write. And what you wrote failed to produce (now by your own admission as far as I can understand) a viable case for keeping the delete operator and class-level allocators in the D programming language.
 * this discussion would have been a waste of your time, had it not been public.

 Thanks for discussing. It was helpful.
I hope that's not in jest :o). Andrei
Apr 24 2011
prev sibling parent Sean Kelly <sean invisibleduck.org> writes:
On Apr 24, 2011, at 1:17 PM, Andrei Alexandrescu wrote:
=20
 It is much better in very many ways, in particular in D. For one =
simple argument, consider:
=20
 auto obj =3D new Widget;
=20
 Simple modularity considerations would require this operator to have =
semantics independent of Widget, such that modular and generic code can = use it properly. Custom class allocators ruin all that, because they = would require the use of delete paired with new, whereas the general = operator requires no action. Agreed, so long as there's some form of support for placement new. But = that could easily be an explicit __ctor() call into the allocated space, = which works in D but I don't believe it works in C++ (thus the explicit = placement new syntax).=
Apr 25 2011
prev sibling parent Walter Bright <newshound2 digitalmars.com> writes:
On 4/22/2011 2:04 PM, Timon Gehr wrote:
 The reason Java is garbage collected is _not_ performance, but primarily
 reliability.
I believe a GC is required if one wishes to prove memory safety, which is definitely a design goal of Java.
Apr 23 2011