digitalmars.D - Linus with some good observations on garbage collection
- Walter Bright (1/1) Apr 22 2011 http://gcc.gnu.org/ml/gcc/2002-08/msg00552.html
- Andrej Mitrovic (1/1) Apr 22 2011 Just a reminder: that post is 9 years old.
- Alvaro (16/17) Apr 22 2011 I've always been surprised when discussions usually just bring garbage
- Steven Schveighoffer (8/15) Apr 22 2011 Because you then have to update potentially two reference counts every
- Michael Stover (18/35) Apr 22 2011 This sort of reference count with cyclic dependency detector is how a lo...
- =?UTF-8?B?IkrDqXLDtG1lIE0uIEJlcmdlciI=?= (17/26) Apr 23 2011 y
- Brad Roberts (4/35) Apr 22 2011 Also add to it that in many cases you're dealing with a threaded environ...
- Timon Gehr (62/69) Apr 22 2011 expensive than non-atomic. More so
- bearophile (5/10) Apr 22 2011 D used to have scoped class instances, scoped classes, and delete, their...
- Iain Buclaw (15/25) Apr 22 2011 to variable length arrays.
- bearophile (16/19) Apr 22 2011 I have an enhancement request in Bugzilla on VLA, with longer discussion...
- bearophile (3/4) Apr 22 2011 Never mind, that semantics can't use that syntax, otherwise you have hid...
- Iain Buclaw (22/26) Apr 23 2011 deallocate at the end of the function or to deallocate at the end of the...
- Andrei Alexandrescu (11/20) Apr 23 2011 Well in fairness they can be a fair amount more elaborate. The trouble
- dsimcha (10/36) Apr 23 2011 Right. This is exactly the kind of thing TempAlloc was meant to solve.
- bearophile (4/5) Apr 23 2011 I think TempAlloc is meant for larger allocations. D-VLAs are meant to a...
- dsimcha (5/10) Apr 23 2011 Why would TempAlloc not work for this? It's slightly slower than VLAs,
- bearophile (5/9) Apr 23 2011 I have to compare the performance of both for my use cases (fast allocat...
- Andrei Alexandrescu (4/6) Apr 23 2011 They are converted to pointers to functions. Not something I'd recommend...
- Bruno Medeiros (11/17) Apr 29 2011 And that's since the C days btw. The function call is just an operation,...
- Andrei Alexandrescu (3/20) Apr 29 2011 It's an indirect call instead of a direct call.
- Bruno Medeiros (10/33) May 03 2011 Well, yes, that's kinda of what I was thinking already (although I
- =?UTF-8?B?IkrDqXLDtG1lIE0uIEJlcmdlciI=?= (17/53) May 03 2011 ion
- Bruno Medeiros (5/50) May 04 2011 Ah right, that makes sense, forgot about that one (I see it's the same
- Don (3/44) May 04 2011 That was true ten years ago, but not any more. Modern CPUs can do branch...
- dsimcha (7/54) May 04 2011 Yeh, I tested this a while back and can dig up/recreate the benchmark if...
- bearophile (10/21) Apr 23 2011 See this too:
- bearophile (4/11) Apr 23 2011 As I have explained in my old enhancement request about VLAs and in a re...
- Ulrik Mikaelsson (26/42) Apr 22 2011 not need
- Timon Gehr (8/21) Apr 22 2011 That is very true. GC is almost always fast enough or even faster. And i...
- Andrei Alexandrescu (4/6) Apr 23 2011 The functionality is not going away.
- Timon Gehr (7/13) Apr 23 2011 Yes I realize that. It is a matter of syntax and aesthetics. Handling al...
- Andrei Alexandrescu (10/26) Apr 23 2011 Allocation and deallocation are not symmetric and as such handling them
- Timon Gehr (15/26) Apr 23 2011 'char' is completely orthogonal to memory allocation and it still gets h...
- Piotr Szturmaj (3/32) Apr 23 2011 Why not allow "delete" only in classes with custom allocators? AFAIK
- Piotr Szturmaj (3/5) Apr 24 2011 Nevermind. Andrei informed that class allocators are being deprecated,
- Andrei Alexandrescu (13/41) Apr 23 2011 The point is that some naively believe "new" and "delete" are some
- Timon Gehr (21/33) Apr 24 2011 Virtually no good application uses them? The _DMD compiler implementatio...
- Andrei Alexandrescu (33/69) Apr 24 2011 A class-specific freelist-based allocator in fact stands in the way of a...
- Andrej Mitrovic (2/2) Apr 24 2011 Since this is definite we should remove them from the docs as well.
- Andrei Alexandrescu (4/6) Apr 24 2011 Please do. BTW I just fixed the core-related documentation bug on
- Andrej Mitrovic (2/8) Apr 24 2011 Fantastic, thank you.
- bearophile (5/6) Apr 24 2011 Just a note: you have not wasted your time discussing this, because I am...
- Francisco Almeida (21/27) Apr 24 2011 the design purpose, design constraints and design compromises of every f...
- Timon Gehr (74/81) Apr 24 2011 Ok, lets stop. Custom allocators and delete should probably be removed f...
- Iain Buclaw (6/40) Apr 24 2011 Is it correct that the current D GC implementation puts delete'd allocat...
- Andrei Alexandrescu (6/46) Apr 24 2011 I'd appreciate if you provided more detail on that. Far as I can tell
- Robert Jacques (5/24) Apr 24 2011 To the best of my knowledge, for 'small' allocations, the current GC use...
- Sean Kelly (4/6) Apr 25 2011 uses free-lists of power-of-2 sized blocks. (I think small is defined as...
- Iain Buclaw (14/62) Apr 25 2011 I was thinking of where one implements an alternative runtime that may n...
- Andrei Alexandrescu (4/17) Apr 25 2011 I understand. The right response to this is defining a good custom
- Kim D (6/35) Apr 25 2011 Sounds like a good way forward. However, certain operations are currentl...
- Andrei Alexandrescu (7/44) Apr 25 2011 An entity relying on GC will by definition have more freedom and more
- Andrei Alexandrescu (56/95) Apr 24 2011 I agree. There are two other things that I don't agree with:
- Sean Kelly (10/16) Apr 25 2011 semantics independent of Widget, such that modular and generic code can ...
- Walter Bright (3/5) Apr 23 2011 I believe a GC is required if one wishes to prove memory safety, which i...
http://gcc.gnu.org/ml/gcc/2002-08/msg00552.html
Apr 22 2011
Just a reminder: that post is 9 years old.
Apr 22 2011
El 22/04/2011 19:36, Walter Bright escribió:http://gcc.gnu.org/ml/gcc/2002-08/msg00552.htmlI've always been surprised when discussions usually just bring garbage collection as the only alternative to explicit manual memory management. I imagined it as a garbage truck that has its own schedule and may let a lot of trash pile up before passing by. I always naively thought, why not just free immediately when an object gets no references? Not an expert, so there may be reasons I don't see, but now that Linus says somethnig along the lines, I'll ask. Why not? Isn't it much easier to do refcount++ and refcount--, and if refcount==0 immediately "free()"? Memory will be available to other needs faster, no need for an additional thread, or a lot of memory consumed before the advanced garbage truck decides to come in, or slight pauses when collecting trash (maybe only in old implementations), and the implementation is much simpler... OK, I knew about that "cyclic references" problem. But Linus doesn't seem to see a big problem and solutions can be found with care...
Apr 22 2011
On Fri, 22 Apr 2011 14:32:06 -0400, Alvaro <alvaro.segura gmail.com> wrote:El 22/04/2011 19:36, Walter Bright escribió:Because you then have to update potentially two reference counts every time you assign a pointer. GC's save you from doing that. I know way way less than Torvalds, but my naive brain says GC's still win because often times, slightly noticeable drops in performance are worth having code that doesn't corrupt memory. This may not be true for kernel development, but then again, we aren't all developing kernels ;) -Stevehttp://gcc.gnu.org/ml/gcc/2002-08/msg00552.htmlI've always been surprised when discussions usually just bring garbage collection as the only alternative to explicit manual memory management. I imagined it as a garbage truck that has its own schedule and may let a lot of trash pile up before passing by. I always naively thought, why not just free immediately when an object gets no references?
Apr 22 2011
This sort of reference count with cyclic dependency detector is how a lot o= f scripting languages do it, or did it in the past. The problem was that laz= y generational GCs are believed to have better throughput in general. I'd like to say "were proved" rather than "are believed", but I don't actually know where to go for such evidence. However, I do believe many scripting languages, such as python, eventually ditched the reference counting technique for generational, and Java has very fast GC, so I am inclined to believe those real-life solutions than Linus. Mike On Fri, Apr 22, 2011 at 2:32 PM, Alvaro <alvaro.segura gmail.com> wrote:El 22/04/2011 19:36, Walter Bright escribi=F3: http://gcc.gnu.org/ml/gcc/2002-08/msg00552.htmlII've always been surprised when discussions usually just bring garbage collection as the only alternative to explicit manual memory management. =imagined it as a garbage truck that has its own schedule and may let a lo=tof trash pile up before passing by. I always naively thought, why not jus=tfree immediately when an object gets no references? Not an expert, so there may be reasons I don't see, but now that Linus sa=yssomethnig along the lines, I'll ask. Why not? Isn't it much easier to do refcount++ and refcount--, and if refcount=3D=3D0 immediately "free()"? M=emorywill be available to other needs faster, no need for an additional thread=,or a lot of memory consumed before the advanced garbage truck decides to come in, or slight pauses when collecting trash (maybe only in old implementations), and the implementation is much simpler... OK, I knew about that "cyclic references" problem. But Linus doesn't seem to see a big problem and solutions can be found with care...
Apr 22 2011
Michael Stover wrote:This sort of reference count with cyclic dependency detector is how a lot of scripting languages do it, or did it in the past. The problem was that lazy generational GCs are believed to have better throughput i=ngeneral. I'd like to say "were proved" rather than "are believed", but I don't actually know where to go for such evidence. However, I do believe man=yscripting languages, such as python, eventually ditched the reference counting technique for generational, and Java has very fast GC, so I am=inclined to believe those real-life solutions than Linus.This is a single benchmark on a single obscure language, so take it with a grain of salt, but look at the "Nimrod" and "Nimrod Boehm" lines here: https://bitbucket.org/jmb/shootout/wiki/Home By default, Nimrod uses a reference counter but there is a compiler switch to use the Boehm GC instead. Of particular interest is the benchmark "Binary trees", since it is specially designed to exercise memory management. On that benchmark, the Boehm version is about twice as fast as the refcounting version, while on other benchmarks, performance is comparable. Jerome --=20 mailto:jeberger free.fr http://jeberger.free.fr Jabber: jeberger jabber.fr
Apr 23 2011
Also add to it that in many cases you're dealing with a threaded environment, so those refcounts have to be locked (either via mutexes, or more commonly just atomic) operations which are far more expensive than non-atomic. More so when there's actual contention for the refcounted resource. On 4/22/2011 11:53 AM, Michael Stover wrote:This sort of reference count with cyclic dependency detector is how a lot of scripting languages do it, or did it in the past. The problem was that lazy generational GCs are believed to have better throughput in general. I'd like to say "were proved" rather than "are believed", but I don't actually know where to go for such evidence. However, I do believe many scripting languages, such as python, eventually ditched the reference counting technique for generational, and Java has very fast GC, so I am inclined to believe those real-life solutions than Linus. Mike On Fri, Apr 22, 2011 at 2:32 PM, Alvaro <alvaro.segura gmail.com <mailto:alvaro.segura gmail.com>> wrote: El 22/04/2011 19:36, Walter Bright escribió: http://gcc.gnu.org/ml/gcc/2002-08/msg00552.html I've always been surprised when discussions usually just bring garbage collection as the only alternative to explicit manual memory management. I imagined it as a garbage truck that has its own schedule and may let a lot of trash pile up before passing by. I always naively thought, why not just free immediately when an object gets no references? Not an expert, so there may be reasons I don't see, but now that Linus says somethnig along the lines, I'll ask. Why not? Isn't it much easier to do refcount++ and refcount--, and if refcount==0 immediately "free()"? Memory will be available to other needs faster, no need for an additional thread, or a lot of memory consumed before the advanced garbage truck decides to come in, or slight pauses when collecting trash (maybe only in old implementations), and the implementation is much simpler... OK, I knew about that "cyclic references" problem. But Linus doesn't seem to see a big problem and solutions can be found with care...
Apr 22 2011
Brad Roberts wrote:Also add to it that in many cases you're dealing with a threaded environment, >so those refcounts have to be locked(either via mutexes, or more commonly just atomic) operations which are far moreexpensive than non-atomic. More sowhen there's actual contention for the refcounted resource.That is only a problem if the reference count of that resource changes at a very high frequency. The described problem also implies that the threads would not need any form of synchronization for the data (otherwise the reference count certainly would not be a bottleneck.) I cannot, at the moment, think of a real-world example where this would not imply bad design. Can you help me out? Michael Stover wrote:I'd like to say "were proved" rather than "are believed", but I don't actuallyknow where to go for such evidence.However, I do believe many scripting languages, such as python, eventuallyditched the reference counting technique forgenerational, and Java has very fast GC, so I am inclined to believe thosereal-life solutions than Linus.MikeWell, the GC may be fast when compared to other GCs, but it has to be designed to run on general data whose reference structure can be arbitrary. Often, the objects/references have a somewhat specialized structure that a smart programmer can exploit, especially if the references point to private data. But that only matters if performance is of prime interest, and the gains may be not very big. But, as pointed out by Linus, the prime performance problem is _not_ the GC, but the mindset that comes with it. Most programmers that "grew up" in a managed environment tend to use very many "new" keywords all over their code, instead of this.) When they then try to write a C++ program, they do the same. The resulting memory bugs are then blamed on the lack of a GC (you can have GC in C/C++, but most of the people I have talked to do not know this.) They then happily change back to Java, that has a "very fast GC". The important thing to note here is that the work required to deallocate all these many memory locations does not magically disappear, but it is delegated to the GC, which will mostly do it faster and more reliable than a programmer which has to do it manually. But the problem does not lie in the deallocations, it's in the allocations. Consider this analogy: Scenario 1: Many people like candy. Those are wrapped in colorful little pieces of paper. Every day, every person buys one piece of candy in the candy shop (new Candy()!) and on the way back home they throw away the wrapping paper somewhere on the street (reassign reference). Those are garbage. In the evening, some creepy guy comes to search the whole street for those small pieces of paper. Would you call that guy a garbage collector? He collects garbage, but in the real world garbage collectors work more like this: Scenario 2: Still, many people like candy. Every person buys a bag of candy in the candy shop once a year (new Candy[365]). When all the candy is eaten, they put all the garbage in one bag and put it to their front door (reassign reference to whole array). A very handsome guy collects all those bags. He is very much more efficient than the guy in example 1. (Arguably, memory usage is bigger in that particular example, but in computer programs, the allocating process can reuse the memory. The analogy breaks down here.) Note that I am not saying that garbage collection is bad. If the references form a very complicated structure, or if a reference to the containing object is not necessarily kept (iE. array slicing), it can be very useful as well as faster than manual memory management. Also, custom allocators can reduce the lots-of-small-allocations problem, but they have some overhead too. Advanced GCs may do this automatically, I don't know. The reason Java is garbage collected is _not_ performance, but primarily reliability. In big programming projects, it can be hard to know where a reference to an object may be kept, if there are circular references etc, as programs keep expanding without the programmers understanding the whole project in detail. GC also removes bugs related to memory management. I think true and reliable OO is not possible without a GC. The downside is, that many Java/... programmers don't concern themselves much with the internals of memory allocations, and are _very_ proud of it. This is also the reason I think it is a bad idea to deprecate D's 'delete'. -Timon
Apr 22 2011
Timon Gehr:But, as pointed out by Linus, the prime performance problem is _not_ the GC, but the mindset that comes with it. Most programmers that "grew up" in a managed environment tend to use very many "new" keywords all over their code, instead of this.)In C99 (and Ada) you avoid the allocation of some dynamic arrays with new thanks to variable length arrays.This is also the reason I think it is a bad idea to deprecate D's 'delete'.D used to have scoped class instances, scoped classes, and delete, their replacements are not good enough yet. In CommonLisp you have hints for the GC, they are safe and they help you help speedup the work of the GC. Such hints probably need to be integrated with the type system, so they may need to be built-ins as scope/delete were. I am not seeing enough discussion about this. Bye, bearophile
Apr 22 2011
== Quote from bearophile (bearophileHUGS lycos.com)'s articleTimon Gehr:to variable length arrays. Variable length arrays are just sugary syntax for a call to alloca.But, as pointed out by Linus, the prime performance problem is _not_ the GC, but the mindset that comes with it. Most programmers that "grew up" in a managed environment tend to use very many "new" keywords all over their code, instead of this.)In C99 (and Ada) you avoid the allocation of some dynamic arrays with new thanksreplacements are not good enough yet. In CommonLisp you have hints for the GC, they are safe and they help you help speedup the work of the GC. Such hints probably need to be integrated with the type system, so they may need to be built-ins as scope/delete were. I am not seeing enough discussion about this.This is also the reason I think it is a bad idea to deprecate D's 'delete'.D used to have scoped class instances, scoped classes, and delete, theirBye, bearophileI've always felt that Vala's system is better thought out, which is incidentally based on a reference counting system. This makes destructors in Vala deterministic and can be used to implement an RAII pattern for resource management. To get around the common pitfalls of reference counting systems, they introduce two keywords which alter the relationship between the allocated object and the GC, 'weak' and 'unowned'. Rather than bore you with the gritty details here, see link: http://live.gnome.org/Vala/ReferenceHandling
Apr 22 2011
Iain Buclaw:Variable length arrays are just sugary syntax for a call to alloca.I have an enhancement request in Bugzilla on VLA, with longer discussions. Just two comments: - It seems alloca() can be implemented with two different semantics: to deallocate at the end of the function or to deallocate at the end of the scope. Usually alloca() deallocates at the end of the function, but that semantic confusion is dangerous. VLA deallocate at the end of the scope, just like any other static array. - To use alloca you need to use pointers, maybe even slices, it's not DRY, etc. So syntax sugar helps. In the meantime I've changed my mind a little. Now D I prefer something better than C99 VLAs. I'd like D-VLAs with the same syntax as C99 VLAs but with a safer semantics, closer to this one (but the "alloca" used here must deallocate at the end of the scope): enum size_t MAX_VLA_SIZE = 1024; static assert (is(typeof(size) == size_t)); T* ptr = null; if ((size * T.sizeof) < MAX_VLA_SIZE) ptr = cast(T*)alloca(size * T.sizeof); T[] array = (ptr == null) ? new T[size] : ptr[0 .. size]; array[] = T.init; This has some advantages: when alloca returns null, or when the array is large, it uses the GC. This allows to both avoid some stack overflows and reduce the risk of the stack memory from becoming too much cold.Rather than bore you with the gritty details here, see link: http://live.gnome.org/Vala/ReferenceHandlingThis is interesting. Bye, bearophile
Apr 22 2011
Now D I prefer something better than C99 VLAs. I'd like D-VLAs with the same syntax as C99 VLAs but with a safer semantics,Never mind, that semantics can't use that syntax, otherwise you have hidden heap allocations... The syntax has to change (because the D-VLA semantics seems OK to me). Bye, bearophile
Apr 22 2011
== Quote from bearophile (bearophileHUGS lycos.com)'s articleIain Buclaw:two comments:Variable length arrays are just sugary syntax for a call to alloca.I have an enhancement request in Bugzilla on VLA, with longer discussions. Just- It seems alloca() can be implemented with two different semantics: todeallocate at the end of the function or to deallocate at the end of the scope. Usually alloca() deallocates at the end of the function, but that semantic confusion is dangerous. VLA deallocate at the end of the scope, just like any other static array. Yep, my understanding of the way C99 goes about it (I might be wrong here, so don't hurt me :), the stack is saved for the lifetime of the alloca'd array, and restored once it goes out of scope. ie: int sz = 42; { void * saved_stack = __stack_save (); // start of scope int array[0 : D.range]; uint D.range; // automatic variable void * D.ptr; // automatic variable D.range = sz + -1; D.ptr = __builtin_alloca (sz * 4); array = (int[0 : D.range] *) D.ptr; // later... __stack_restore (saved_stack); // end of scope }
Apr 23 2011
On 4/22/11 5:21 PM, Iain Buclaw wrote:== Quote from bearophile (bearophileHUGS lycos.com)'s articleWell in fairness they can be a fair amount more elaborate. The trouble with alloca is that it's impossible to compose with. So in general you need to write something like: bool onStack = smallEnough(length * sizeof(T)); auto a = (cast(T*) (onStack ? alloca : malloc)(length * sizeof(T))); scope(exit) if (!onStack) free(a.ptr); initialize(enforce(a)); scope(exit) clear(a); This block is difficult to factor out because of alloca. AndreiTimon Gehr:to variable length arrays. Variable length arrays are just sugary syntax for a call to alloca.But, as pointed out by Linus, the prime performance problem is _not_ the GC, but the mindset that comes with it. Most programmers that "grew up" in a managed environment tend to use very many "new" keywords all over their code, instead of this.)In C99 (and Ada) you avoid the allocation of some dynamic arrays with new thanks
Apr 23 2011
On 4/23/2011 9:48 AM, Andrei Alexandrescu wrote:On 4/22/11 5:21 PM, Iain Buclaw wrote:Right. This is exactly the kind of thing TempAlloc was meant to solve. (Actually, to give credit, the initial rough idea for TempAlloc was proposed by Andrei, though I fleshed out the details and implemented it.) With TempAlloc, you have a segmented stack (i.e. one that allocates a new segment instead of crashing the program if it's full) that is independent of the call stack (so you can return TempAlloc-allocated memory from functions). BTW, since when does the ternary operator work with functions, as opposed to variables?== Quote from bearophile (bearophileHUGS lycos.com)'s articleWell in fairness they can be a fair amount more elaborate. The trouble with alloca is that it's impossible to compose with. So in general you need to write something like: bool onStack = smallEnough(length * sizeof(T)); auto a = (cast(T*) (onStack ? alloca : malloc)(length * sizeof(T))); scope(exit) if (!onStack) free(a.ptr); initialize(enforce(a)); scope(exit) clear(a); This block is difficult to factor out because of alloca. AndreiTimon Gehr:to variable length arrays. Variable length arrays are just sugary syntax for a call to alloca.But, as pointed out by Linus, the prime performance problem is _not_ the GC, but the mindset that comes with it. Most programmers that "grew up" in a managed environment tend to use very many "new" keywords all over their code, instead of you to do this.)In C99 (and Ada) you avoid the allocation of some dynamic arrays with new thanks
Apr 23 2011
dsimcha:Right. This is exactly the kind of thing TempAlloc was meant to solve.I think TempAlloc is meant for larger allocations. D-VLAs are meant to allocate little arrays (like 1 KB or less) on the normal stack very quickly. Bye, bearophile
Apr 23 2011
On 4/23/2011 10:24 AM, bearophile wrote:dsimcha:Why would TempAlloc not work for this? It's slightly slower than VLAs, but the difference is negligible since it's still almost never a threading bottleneck and you still have to do something with the array you allocated.Right. This is exactly the kind of thing TempAlloc was meant to solve.I think TempAlloc is meant for larger allocations. D-VLAs are meant to allocate little arrays (like 1 KB or less) on the normal stack very quickly. Bye, bearophile
Apr 23 2011
dsimcha:It's slightly slower than VLAs, but the difference is negligible since it's still almost never a threading bottleneck and you still have to do something with the array you allocated.I have to compare the performance of both for my use cases (fast allocations of very small arrays, even a just a small number of chars). I think an advantage of stack allocations is that most times the top of the current stack is already in cache L1, so it's warm memory. Bye, bearophile
Apr 23 2011
On 4/23/11 8:57 AM, dsimcha wrote:BTW, since when does the ternary operator work with functions, as opposed to variables?They are converted to pointers to functions. Not something I'd recommend because it makes the call slower. Andrei
Apr 23 2011
On 23/04/2011 15:45, Andrei Alexandrescu wrote:On 4/23/11 8:57 AM, dsimcha wrote:And that's since the C days btw. The function call is just an operation, the callee an operand, so any expression can be there.BTW, since when does the ternary operator work with functions, as opposed to variables?They are converted to pointers to functions. Not something I'd recommend because it makes the call slower. AndreiHum, I didn't know that. Why does it make the call slower (other than the actual ternary operation), is it because it has to load the function address from a register/variable, instead of being a constant value? And/or not being able to inline the call? I mean, the first aspect seems like a very minor impact in performance, almost negligible. -- Bruno Medeiros - Software Engineer
Apr 29 2011
On 4/29/11 10:40 AM, Bruno Medeiros wrote:On 23/04/2011 15:45, Andrei Alexandrescu wrote:It's an indirect call instead of a direct call. AndreiOn 4/23/11 8:57 AM, dsimcha wrote:And that's since the C days btw. The function call is just an operation, the callee an operand, so any expression can be there.BTW, since when does the ternary operator work with functions, as opposed to variables?They are converted to pointers to functions. Not something I'd recommend because it makes the call slower. AndreiHum, I didn't know that. Why does it make the call slower (other than the actual ternary operation), is it because it has to load the function address from a register/variable, instead of being a constant value? And/or not being able to inline the call? I mean, the first aspect seems like a very minor impact in performance, almost negligible.
Apr 29 2011
On 29/04/2011 17:30, Andrei Alexandrescu wrote:On 4/29/11 10:40 AM, Bruno Medeiros wrote:Well, yes, that's kinda of what I was thinking already (although I referred to more low-level, assembler terms). But my question remains, why is an indirect function call slower than a direct one, and is that difference significant? (excluding of course the possibilities of inlining and other static analysis that direct invocations offer) There is no extra overhead other than loading the function address from a register/variable, right? -- Bruno Medeiros - Software EngineerOn 23/04/2011 15:45, Andrei Alexandrescu wrote:It's an indirect call instead of a direct call. AndreiOn 4/23/11 8:57 AM, dsimcha wrote:And that's since the C days btw. The function call is just an operation, the callee an operand, so any expression can be there.BTW, since when does the ternary operator work with functions, as opposed to variables?They are converted to pointers to functions. Not something I'd recommend because it makes the call slower. AndreiHum, I didn't know that. Why does it make the call slower (other than the actual ternary operation), is it because it has to load the function address from a register/variable, instead of being a constant value? And/or not being able to inline the call? I mean, the first aspect seems like a very minor impact in performance, almost negligible.
May 03 2011
Bruno Medeiros wrote:On 29/04/2011 17:30, Andrei Alexandrescu wrote:on,On 4/29/11 10:40 AM, Bruno Medeiros wrote:On 23/04/2011 15:45, Andrei Alexandrescu wrote:On 4/23/11 8:57 AM, dsimcha wrote:And that's since the C days btw. The function call is just an operati=BTW, since when does the ternary operator work with functions, as opposed to variables?the callee an operand, so any expression can be there.They are converted to pointers to functions. Not something I'd recommend because it makes the call slower. AndreiHum, I didn't know that. Why does it make the call slower (other than=ionthe actual ternary operation), is it because it has to load the funct=e,address from a register/variable, instead of being a constant value? And/or not being able to inline the call? I mean, the first aspect seems like a very minor impact in performanc==20 Well, yes, that's kinda of what I was thinking already (although I referred to more low-level, assembler terms). But my question remains, why is an indirect function call slower than a=almost negligible.It's an indirect call instead of a direct call. Andreidirect one, and is that difference significant? (excluding of course th=epossibilities of inlining and other static analysis that direct invocations offer) There is no extra overhead other than loading the function address from a register/variable, right? =20AIUI, with a direct call the CPU can anticipate the call (because it knows the target address) and start filling the pipeline immediately. OTOH, with an indirect call the CPU cannot know the target until it reads the register, which means that the pipeline will empty itself before the call is executed. With the pipeline length of some modern CPUs (Pentium 4 being amongst the worst), this can result in very large slowdowns. Jerome --=20 mailto:jeberger free.fr http://jeberger.free.fr Jabber: jeberger jabber.fr
May 03 2011
On 03/05/2011 22:41, "Jérôme M. Berger" wrote:Bruno Medeiros wrote:Ah right, that makes sense, forgot about that one (I see it's the same issue with branch prediction). -- Bruno Medeiros - Software EngineerOn 29/04/2011 17:30, Andrei Alexandrescu wrote:AIUI, with a direct call the CPU can anticipate the call (because it knows the target address) and start filling the pipeline immediately. OTOH, with an indirect call the CPU cannot know the target until it reads the register, which means that the pipeline will empty itself before the call is executed. With the pipeline length of some modern CPUs (Pentium 4 being amongst the worst), this can result in very large slowdowns. JeromeOn 4/29/11 10:40 AM, Bruno Medeiros wrote:Well, yes, that's kinda of what I was thinking already (although I referred to more low-level, assembler terms). But my question remains, why is an indirect function call slower than a direct one, and is that difference significant? (excluding of course the possibilities of inlining and other static analysis that direct invocations offer) There is no extra overhead other than loading the function address from a register/variable, right?On 23/04/2011 15:45, Andrei Alexandrescu wrote:It's an indirect call instead of a direct call. AndreiOn 4/23/11 8:57 AM, dsimcha wrote:And that's since the C days btw. The function call is just an operation, the callee an operand, so any expression can be there.BTW, since when does the ternary operator work with functions, as opposed to variables?They are converted to pointers to functions. Not something I'd recommend because it makes the call slower. AndreiHum, I didn't know that. Why does it make the call slower (other than the actual ternary operation), is it because it has to load the function address from a register/variable, instead of being a constant value? And/or not being able to inline the call? I mean, the first aspect seems like a very minor impact in performance, almost negligible.
May 04 2011
Jérôme M. Berger wrote:Bruno Medeiros wrote:That was true ten years ago, but not any more. Modern CPUs can do branch prediction of indirect jumps.On 29/04/2011 17:30, Andrei Alexandrescu wrote:AIUI, with a direct call the CPU can anticipate the call (because it knows the target address) and start filling the pipeline immediately. OTOH, with an indirect call the CPU cannot know the target until it reads the register, which means that the pipeline will empty itself before the call is executed. With the pipeline length of some modern CPUs (Pentium 4 being amongst the worst), this can result in very large slowdowns. JeromeOn 4/29/11 10:40 AM, Bruno Medeiros wrote:Well, yes, that's kinda of what I was thinking already (although I referred to more low-level, assembler terms). But my question remains, why is an indirect function call slower than a direct one, and is that difference significant? (excluding of course the possibilities of inlining and other static analysis that direct invocations offer) There is no extra overhead other than loading the function address from a register/variable, right?On 23/04/2011 15:45, Andrei Alexandrescu wrote:It's an indirect call instead of a direct call. AndreiOn 4/23/11 8:57 AM, dsimcha wrote:And that's since the C days btw. The function call is just an operation, the callee an operand, so any expression can be there.BTW, since when does the ternary operator work with functions, as opposed to variables?They are converted to pointers to functions. Not something I'd recommend because it makes the call slower. AndreiHum, I didn't know that. Why does it make the call slower (other than the actual ternary operation), is it because it has to load the function address from a register/variable, instead of being a constant value? And/or not being able to inline the call? I mean, the first aspect seems like a very minor impact in performance, almost negligible.
May 04 2011
On 5/4/2011 7:38 PM, Don wrote:Jérôme M. Berger wrote:Yeh, I tested this a while back and can dig up/recreate the benchmark if need be. A virtual function call that's always resolving to the same address (so the branch is highly predictable) is not measurably slower than a _non-inlined_ direct function call. However, if the branch is not predictable, it's substantially slower and virtual functions usually can't be inlined.Bruno Medeiros wrote:That was true ten years ago, but not any more. Modern CPUs can do branch prediction of indirect jumps.On 29/04/2011 17:30, Andrei Alexandrescu wrote:AIUI, with a direct call the CPU can anticipate the call (because it knows the target address) and start filling the pipeline immediately. OTOH, with an indirect call the CPU cannot know the target until it reads the register, which means that the pipeline will empty itself before the call is executed. With the pipeline length of some modern CPUs (Pentium 4 being amongst the worst), this can result in very large slowdowns. JeromeOn 4/29/11 10:40 AM, Bruno Medeiros wrote:Well, yes, that's kinda of what I was thinking already (although I referred to more low-level, assembler terms). But my question remains, why is an indirect function call slower than a direct one, and is that difference significant? (excluding of course the possibilities of inlining and other static analysis that direct invocations offer) There is no extra overhead other than loading the function address from a register/variable, right?On 23/04/2011 15:45, Andrei Alexandrescu wrote:It's an indirect call instead of a direct call. AndreiOn 4/23/11 8:57 AM, dsimcha wrote:And that's since the C days btw. The function call is just an operation, the callee an operand, so any expression can be there.BTW, since when does the ternary operator work with functions, as opposed to variables?They are converted to pointers to functions. Not something I'd recommend because it makes the call slower. AndreiHum, I didn't know that. Why does it make the call slower (other than the actual ternary operation), is it because it has to load the function address from a register/variable, instead of being a constant value? And/or not being able to inline the call? I mean, the first aspect seems like a very minor impact in performance, almost negligible.
May 04 2011
Andrei:Well in fairness they can be a fair amount more elaborate. The trouble with alloca is that it's impossible to compose with. So in general you need to write something like: bool onStack = smallEnough(length * sizeof(T)); auto a = (cast(T*) (onStack ? alloca : malloc)(length * sizeof(T))); scope(exit) if (!onStack) free(a.ptr); initialize(enforce(a)); scope(exit) clear(a); This block is difficult to factor out because of alloca.See this too: http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=135208 Keep in mind that alloca() returns null more often than malloc, and when alloca() return null, it's still meaningful to try to use malloc(). So the semantics you have written is not good enough yet (as mine. A merge of the two seems better). My idea of D-style VLAs is something built-in that encapsulates that semantics (that's more complex than C99 VLA semantics). I am suggesting a built-in because there is some more semantics to keep into account for a good implementation: - Probably you want the arrays created on the stack to have an immutable length (otherwise you have to switch to a normal fully GC-heap dynamic array, but I don't like this). - You have to return such arrays from functions, and to give them to function arguments. As fixed-sized arrays they probably need to be managed by value (unless you use "ref"), so they are different from dynamic arrays, and closer to fixed-sized ones. Bye, bearophile
Apr 23 2011
Andrei:bool onStack = smallEnough(length * sizeof(T)); auto a = (cast(T*) (onStack ? alloca : malloc)(length * sizeof(T))); scope(exit) if (!onStack) free(a.ptr); initialize(enforce(a)); scope(exit) clear(a); This block is difficult to factor out because of alloca.As I have explained in my old enhancement request about VLAs and in a recent post, this code is dangerous, because alloca() in some implementations deallocates at the end of the function, while in other cases it deallocates at the end of the scope. A well implemented D-VLA needs be sure to free the stack space at the end of the scope. Bye, bearophile
Apr 23 2011
2011/4/22 Timon Gehr <timon.gehr gmx.ch>:That is only a problem if the reference count of that resource changes at=a veryhigh frequency. The described problem also implies that the threads would=not needany form of synchronization for the data (otherwise the reference count c=ertainlywould not be a bottleneck.)....Michael Stover wrote:tuallyI'd like to say "were proved" rather than "are believed", but I don't ac=know where to go for such evidence.entually=C2=A0However, I do believe many scripting languages, such as python, ev=ditched the reference counting technique forsegenerational, and Java has very fast GC, so I am inclined to believe tho=real-life solutions than Linus. Well, the GC may be fast when compared to other GCs, but it has to be des=igned torun on general data whose reference structure can be arbitrary. Often, th=eobjects/references have a somewhat specialized structure that a smart pro=grammercan exploit, especially if the references point to private data. But that=onlymatters if performance is of prime interest, and the gains may be not ver=y big. All in all, I think the best approach is a pragmatic one, where different types of resources can be handled according to different schemes. I.E. default to GC-manage everything. After profiling, determining what resources are mostly used, and where, optimize allocation for those resources, preferably to scoped allocation, or if not possible, reference-counted. Premature optimization is a root of much evil, for instance, the malloc-paranoid might very well resort to abuse of struct:s, leading either to lots of manual pointers, or excessive memory copying. Incidentally, this was the main thing that attracted me to D. Be lazy/productive where performance doesn't matter much, and focus optimization on where it does.
Apr 22 2011
Ulrik Mikaelsson wrote:All in all, I think the best approach is a pragmatic one, where different types of resources can be handled according to different schemes. I.E. default to GC-manage everything. After profiling, determining what resources are mostly used, and where, optimize allocation for those resources, preferably to scoped allocation, or if not possible, reference-counted. Premature optimization is a root of much evil, for instance, the malloc-paranoid might very well resort to abuse of struct:s, leading either to lots of manual pointers, or excessive memory copying. Incidentally, this was the main thing that attracted me to D. Be lazy/productive where performance doesn't matter much, and focus optimization on where it does.That is very true. GC is almost always fast enough or even faster. And it is clearly most convenient. And yes, identify bottlenecks first, optimize later. But I also think programs that have some concern about efficient memory allocation (with GC or without GC) tend to be better designed in general. This actually increases productivity. Plus, it reduces the need for complicated optimizations later on. This in order increases maintainability. -Timon
Apr 22 2011
On 4/22/11 4:04 PM, Timon Gehr wrote: [snip]This is also the reason I think it is a bad idea to deprecate D's 'delete'.The functionality is not going away. Andrei
Apr 23 2011
On 4/22/11 4:04 PM, Timon Gehr wrote: [snip]Yes I realize that. It is a matter of syntax and aesthetics. Handling allocation and deallocation non-uniformly renders the language ugly (or are you considering to remove the 'new' keyword too?). And it discourages people to get informed about custom allocators etc. I mean, how will custom allocators look like? The cost of having the delete keyword in the language is low. Before removing that, why not consider the 'body' keyword instead? TimonThis is also the reason I think it is a bad idea to deprecate D's 'delete'.The functionality is not going away. Andrei
Apr 23 2011
On 4/23/11 11:33 AM, Timon Gehr wrote:Allocation and deallocation are not symmetric and as such handling them in a uniform way would be a mistake that perpetuates a poor understanding of the underlying realities of memory allocation and object creation. I suggest you reconsider.On 4/22/11 4:04 PM, Timon Gehr wrote: [snip]Yes I realize that. It is a matter of syntax and aesthetics. Handling allocation and deallocation non-uniformly renders the language ugly (or are you considering to remove the 'new' keyword too?).This is also the reason I think it is a bad idea to deprecate D's 'delete'.The functionality is not going away. AndreiAnd it discourages people to get informed about custom allocators etc.I don't see the relationship here.I mean, how will custom allocators look like? The cost of having the delete keyword in the language is low. Before removing that, why not consider the 'body' keyword instead?The cost of keeping "delete" in the language is making a rarely-used, dangerous facility with loosely-defined semantics straight inside the language. It reflects poor language design for many reasons. Andrei
Apr 23 2011
Andrei Alexandrescu wrote:Allocation and deallocation are not symmetric and as such handling them in a uniform way would be a mistake that perpetuates a poor understanding of the underlying realities of memory allocation and object creation. I suggest you reconsider.'char' is completely orthogonal to memory allocation and it still gets handled uniformly to it syntactically. So I do not really get the point here.I cannot imagine that the syntax for them will be particularly nice after the removal of 'delete'. Who likes features with unnecessarily clumsy syntax?And it discourages people to get informed about custom allocators etc.I don't see the relationship here.The cost of keeping "delete" in the language is making a rarely-used, dangerous facility with loosely-defined semantics straight inside the language. It reflects poor language design for many reasons. AndreiI think we are talking about different things. I do _not_ want to use the keyword to somehow try to deallocate GC allocated memory. (who would actually want to do this anyways?) I want it for custom allocators as in http://www.digitalmars.com/d/2.0/memory.html. In that case, the semantics would be completely specified by me. Removing the keyword means ruining that use case. You can also find a nice, clean and strict semantic specification for delete when called on GC memory that works on all platforms and with all GC implementations: "do nothing, GC memory is managed automatically." As it is done now, I agree it is not particularly nice. Timon
Apr 23 2011
Timon Gehr wrote:Andrei Alexandrescu wrote:Why not allow "delete" only in classes with custom allocators? AFAIK they could be determined at compile time. Please let me know if I'm wrong.Allocation and deallocation are not symmetric and as such handling them in a uniform way would be a mistake that perpetuates a poor understanding of the underlying realities of memory allocation and object creation. I suggest you reconsider.'char' is completely orthogonal to memory allocation and it still gets handled uniformly to it syntactically. So I do not really get the point here.I cannot imagine that the syntax for them will be particularly nice after the removal of 'delete'. Who likes features with unnecessarily clumsy syntax?And it discourages people to get informed about custom allocators etc.I don't see the relationship here.The cost of keeping "delete" in the language is making a rarely-used, dangerous facility with loosely-defined semantics straight inside the language. It reflects poor language design for many reasons. AndreiI think we are talking about different things. I do _not_ want to use the keyword to somehow try to deallocate GC allocated memory. (who would actually want to do this anyways?) I want it for custom allocators as in http://www.digitalmars.com/d/2.0/memory.html. In that case, the semantics would be completely specified by me. Removing the keyword means ruining that use case. You can also find a nice, clean and strict semantic specification for delete when called on GC memory that works on all platforms and with all GC implementations: "do nothing, GC memory is managed automatically." As it is done now, I agree it is not particularly nice. Timon
Apr 23 2011
Piotr Szturmaj wrote:Why not allow "delete" only in classes with custom allocators? AFAIK they could be determined at compile time. Please let me know if I'm wrong.Nevermind. Andrei informed that class allocators are being deprecated, so my question is no longer relevant.
Apr 24 2011
On 4/23/11 1:05 PM, Timon Gehr wrote:Andrei Alexandrescu wrote:The point is that some naively believe "new" and "delete" are some simple do and undo actions and as such they somehow deserve the same level of support. In reality they have dramatically different issues and are profoundly asymmetric.Allocation and deallocation are not symmetric and as such handling them in a uniform way would be a mistake that perpetuates a poor understanding of the underlying realities of memory allocation and object creation. I suggest you reconsider.'char' is completely orthogonal to memory allocation and it still gets handled uniformly to it syntactically. So I do not really get the point here.The class-specific allocators are an exceedingly bad design carried over from C++. They will be deprecated from D as well. Even C++, where the features you are talking about have existed for many years, is shunning their use. Good applications make very sparingly use of "delete" and virtually none uses class-specific allocators. Experience with C++ is the best proof that the delete keyword and class-specific allocators have no place in the D programming language. AndreiI cannot imagine that the syntax for them will be particularly nice after the removal of 'delete'. Who likes features with unnecessarily clumsy syntax?And it discourages people to get informed about custom allocators etc.I don't see the relationship here.The cost of keeping "delete" in the language is making a rarely-used, dangerous facility with loosely-defined semantics straight inside the language. It reflects poor language design for many reasons. AndreiI think we are talking about different things. I do _not_ want to use the keyword to somehow try to deallocate GC allocated memory. (who would actually want to do this anyways?) I want it for custom allocators as in http://www.digitalmars.com/d/2.0/memory.html. In that case, the semantics would be completely specified by me. Removing the keyword means ruining that use case. You can also find a nice, clean and strict semantic specification for delete when called on GC memory that works on all platforms and with all GC implementations: "do nothing, GC memory is managed automatically." As it is done now, I agree it is not particularly nice. Timon
Apr 23 2011
Andrei Alexandrescu wrote:The point is that some naively believe "new" and "delete" are some simple do and undo actions and as such they somehow deserve the same level of support. In reality they have dramatically different issues and are profoundly asymmetric.The class-specific allocators are an exceedingly bad design carried over from C++. They will be deprecated from D as well.Does that imply that the feature is going away after all?Even C++, where the features you are talking about have existed for many years, is shunning their use. Good applications make very sparingly use of "delete" and virtually none uses class-specific allocators. Experience with C++ is the best proof that the delete keyword and class-specific allocators have no place in the D programming language. AndreiVirtually no good application uses them? The _DMD compiler implementation_ uses them. Classes 'Token' and 'Scope' have class-specific optimized allocators. I don't think Walter would have done that if it did not give quite good benefits. D is a systems programming language after all. It is not Java or some scripting language. I don't think D should/can get rid of class-specific allocators if there is no good alternative for sane uses. Even if they are not used often and should not be used often. C++ is not garbage collected by default. Implementing a GC root class for example is another legitimate use case for class-specific allocators. In D, things are somewhat reverse, class-specific allocators would be used for manual memory management. D should support that. So if class-specific allocators go away, what is the alternative? Sure, it is always possible to allocate using custom functions (for D structs, at least.) But then, if allocations of one struct/class turn out to be a bottleneck somewhere late in the development, the whole code will have to be changed instead of just the addition of some simple custom class-specific allocator. (This is even worse when developing a library). Also, just testing the impact of a custom allocator will be a pain for the same reason. Why is that better? Timon
Apr 24 2011
On 04/24/2011 01:21 PM, Timon Gehr wrote:Andrei Alexandrescu wrote:A class-specific freelist-based allocator in fact stands in the way of a good general-purpose allocator. At Facebook we did extensive measurements on a variety of applications and workloads and the best approach was to just disable all STL class-specific allocators (by defining GLIBCXX_FORCE_NEW) and let jemalloc take care of all allocations, to great effect. Today's good general-purpose allocators are as good or usually considerably better than custom allocators. The one category of custom allocators that is still significantly better are regions, but they have quite specific and restricted semantics.The point is that some naively believe "new" and "delete" are some simple do and undo actions and as such they somehow deserve the same level of support. In reality they have dramatically different issues and are profoundly asymmetric.The class-specific allocators are an exceedingly bad design carried over from C++. They will be deprecated from D as well.Does that imply that the feature is going away after all?Even C++, where the features you are talking about have existed for many years, is shunning their use. Good applications make very sparingly use of "delete" and virtually none uses class-specific allocators. Experience with C++ is the best proof that the delete keyword and class-specific allocators have no place in the D programming language. AndreiVirtually no good application uses them? The _DMD compiler implementation_ uses them. Classes 'Token' and 'Scope' have class-specific optimized allocators. I don't think Walter would have done that if it did not give quite good benefits.D is a systems programming language after all. It is not Java or some scripting language. I don't think D should/can get rid of class-specific allocators if there is no good alternative for sane uses. Even if they are not used often and should not be used often. C++ is not garbage collected by default. Implementing a GC root class for example is another legitimate use case for class-specific allocators. In D, things are somewhat reverse, class-specific allocators would be used for manual memory management. D should support that. So if class-specific allocators go away, what is the alternative? Sure, it is always possible to allocate using custom functions (for D structs, at least.) But then, if allocations of one struct/class turn out to be a bottleneck somewhere late in the development, the whole code will have to be changed instead of just the addition of some simple custom class-specific allocator. (This is even worse when developing a library). Also, just testing the impact of a custom allocator will be a pain for the same reason. Why is that better? TimonIt is much better in very many ways, in particular in D. For one simple argument, consider: auto obj = new Widget; Simple modularity considerations would require this operator to have semantics independent of Widget, such that modular and generic code can use it properly. Custom class allocators ruin all that, because they would require the use of delete paired with new, whereas the general operator requires no action. Memory allocation and object construction are difficult topics, particularly in high-performance settings. A seasoned engineer understands that and does not naively believe that a smattering of well-intended but very badly executed language support will make that all easier. (Contrast that with e.g. reference counting semantics using RAII, which relies on a much better part of the language and does offer reasonable guarantees.) I am sorry, you simply have no case - each and every argument you put forth has no strength or is just wrong. We could spend time on debating each, but I suspect that would do little toward convincing you of what is after all a simple fact, but one with many subtle facets. It might not a good use of our time to further engage in a diatribe on this. The delete keyword will go, as will class-specific new and delete. Andrei
Apr 24 2011
Since this is definite we should remove them from the docs as well. Want me to file it to bugzilla?
Apr 24 2011
On 04/24/2011 03:35 PM, Andrej Mitrovic wrote:Since this is definite we should remove them from the docs as well. Want me to file it to bugzilla?Please do. BTW I just fixed the core-related documentation bug on d-programming-language.org. Andrei
Apr 24 2011
On 4/24/11, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:On 04/24/2011 03:35 PM, Andrej Mitrovic wrote:Fantastic, thank you.Since this is definite we should remove them from the docs as well. Want me to file it to bugzilla?Please do. BTW I just fixed the core-related documentation bug on d-programming-language.org. Andrei
Apr 24 2011
Andrei:It might not a good use of our time to further engage in a diatribe on this.Just a note: you have not wasted your time discussing this, because I am not expert on such matters, so I learn something from informed discussions :-) If a book that explains C language is just 200 pages long, a book that explains the design purpose, design constraints and design compromises of every feature of C language requires a much longer book (800 pages?), and it's probably also much more interesting :-) Bye, bearophile
Apr 24 2011
== Quote from bearophile (bearophileHUGS lycos.com)'s articleAndrei:expert on such matters, so I learn something from informed discussions :-)It might not a good use of our time to further engage in a diatribe on this.Just a note: you have not wasted your time discussing this, because I am notIf a book that explains C language is just 200 pages long, a book that explainsthe design purpose, design constraints and design compromises of every feature of C language requires a much longer book (800 pages?), and it's probably also much more interesting :-)Bye, bearophileAs one of those who were the most reluctant about abandoning delete, I have come to realize that it is better to make D classes strictly garbage collected, and drop delete. It should be noted that D supports RAII, as long as one uses struct (thus avoiding dynamic polymorphism, which shouldn't make a big difference once you think about it), and this gives D an important semantic separation lacking in C++ (or even those languages on the other side of the fence such as Java, for that matter). Making a habit of declaring struct instead of class when I intend to create "objects" on the stack instead of the heap, while resorting to garbage collected classes when it makes sense to do so, has made my life simpler. I do believe, however, that there is a recurrent communication error going on here. I would kindly ask the language designers to please update the documentation and write an article explaining the advantages of the approach (without making the mistake of turning it into a blind praise of garbage collection, that is). Newcomers need to know how to obtain garbage collected or RAII data structures from D, whenever they want to.
Apr 24 2011
Andrei Alexandrescu wrote:I am sorry, you simply have no case - each and every argument you put forth has no strength or is just wrong. We could spend time on debating each, but I suspect that would do little toward convincing you of what is after all a simple fact, but one with many subtle facets. It might not a good use of our time to further engage in a diatribe on this. The delete keyword will go, as will class-specific new and delete. AndreiOk, lets stop. Custom allocators and delete should probably be removed from D: Some of my facts: * freelist is faster if most allocation and deallocation concentrates on only one class. (see attachment) I think this is also the reason it increases DMDs speed. Token and Scope are quite obvious bottlenecks.. * such bottlenecks will most of the time only be detected when development is almost finished. (Almost never for library code, so I don't get why STL uses custom allocators at all.) * premature optimization is evil (see STL custom allocators) * late optimization: it is easy if there are custom allocators. In C++ it is even trivial. Nice design. Still no clue why STL is abusing it. * that form of custom allocators does not play nicely with Ds Garbage collector though, because it changes the semantics of 'new', ergo builtin custom allocators should probably indeed be removed from D. * this makes this form of optimization painful in D without custom allocators. This is what I was instinctively scared about. Fact is: it is also painful with them, because you don't get the 'delete's for free as in C++. I was somewhat inconsiderate on this. * it is possible that hinting the GC is only about twice as bad (again, see attachment). And some notes: * the fact that you think all my points were worthless or wrong implies that you either did not read all of them or have not considered them yourself when making your decision. I think that is scary. * There are many ways in which I am always right. The most dominant being that any sufficiently smart guy with a PhD in rocket science would totally agree with all my statements. No pun/offense intended. * this discussion would have been a waste of your time, had it not been public. Thanks for discussing. It was helpful. Timon PS: The current GC dies without hints on my benchmark, so it is indeed a nice idea to keep hints: version=GC: 1m45s version=GC_hints: ~6.8s version=freelist: ~3.7s version=c_heap: ~4.5s begin 644 benchmark.d M<VEO;CU'0SH ("` ("` ,6TT-7,*("H =F5R<VEO;CU'0U]H:6YT<SH ?C8N M.',*("H +R]U<VEN9R!C=7-T;VT 86QL;V-A=&]R<SH*("H =F5R<VEO;CUF M<F5E;&ES=#H ?C,N-W,*("H =F5R<VEO;CUC7VAE87`Z("` ?C0N-7,*("HO M" H*:6UP;W)T('-T9"YS=&1I;SL*:6UP;W)T('-T9"YC+G-T9&QI8CL*:6UP M;W)T('-T9"YC+G-T<FEN9SL*:6UP;W)T('-T9"YA;&=O<FET:&TZ<W=A<#L* M=F5R<VEO;CUF<F5E;&ES=#L*+R]V97)S:6]N/6-?:&5A<#L*" IV97)S:6]N M*&9R965L:7-T*2!V97)S:6]N/6-A;&QD96P["G9E<G-I;VXH1T-?:&EN=',I M('9E<G-I;VX]8V%L;&1E;#L*=F5R<VEO;BAC7VAE87`I('9E<G-I;VX]8V%L M;&1E;#L M=&EC(&9O;R!F<F5E;&ES=#L*"69O;R!N97AT.PH)=F5R<VEO;BAF<F5E;&ES M="E[" D);F5W*'-I>F5?="!S*7L*"0D):68H9G)E96QI<W0 (6ES(&YU;&PI M>PH)"0D)=F]I9"`J<CUC87-T*'9O:60J*69R965L:7-T.PH)"0D)9G)E96QI M<W0]9G)E96QI<W0N;F5X=#L*"0D)"7)E='5R;B!R.PH)"0E]" D)"7)E='5R M;B!C87-T*'9O:60J*6YE=R!C:&%R6W-=.R\O8V]U;&0 =7-E(&UA;&QO8RAS M*3L 969F96-T<R!A<F4 <VEM:6QA< H)"7T*"0ED96QE=&4H=F]I9"`J<"E[ M" D)"6EF*'`I>PH)"0D)9F]O(&]L9&9L/69R965L:7-T.PH)"0D)9G)E96QI M<W0]8V%S="AF;V\I<#L*"0D)"69R965L:7-T+FYE>'0];VQD9FP[" D)"7T* M"0E]" E]" EV97)S:6]N*&-?:&5A<"E[" D);F5W*'-I>F5?="!S*7MR971U M<FX ;6%L;&]C*',I.WT*"0ED96QE=&4H=F]I9"`J<"E[:68H<"D 9G)E92AP M:6YG+"`Q.B!E>'!E8W1E9"!S:')I;FMI;F<*"69O<F5A8V H=#LP M<CTP.PH)"7UE;'-E(&EF*'1O<#T M<G-I;VXH8V%L;&1E;"D 9&5L971E(&%;=&]P73L*"0D)9&ER/3$[" D)?65L M<V4 :68H9&ER7B$H<F%N9" I)3,I*7LO+VEF*"%D:7(F)B$H<F%N9" I)3,I M?'QD:7(F)G)A;F0H*24S*7L*"0D)=&]P+2T[" D)"79E<G-I;VXH8V%L;&1E M;"D 9&5L971E(&%;=&]P73L*"0E]96QS92!A6W1O<"LK73UN97< 9F]O.PH) M"6EF*"%R86YD*"DE,3`P*2!F;W(H:6YT(&D],#MI 985LP72QA6W)A;F0H*25T;W!=*3L*"7T*?0`` ` end
Apr 24 2011
== Quote from Timon Gehr (timon.gehr gmx.ch)'s articleAndrei Alexandrescu wrote:Is it correct that the current D GC implementation puts delete'd allocations into a freelist (or bucket) to be reused when 'new' is invoked again?I am sorry, you simply have no case - each and every argument you put forth has no strength or is just wrong. We could spend time on debating each, but I suspect that would do little toward convincing you of what is after all a simple fact, but one with many subtle facets. It might not a good use of our time to further engage in a diatribe on this. The delete keyword will go, as will class-specific new and delete. AndreiOk, lets stop. Custom allocators and delete should probably be removed from D: Some of my facts: * freelist is faster if most allocation and deallocation concentrates on only one class. (see attachment) I think this is also the reason it increases DMDs speed. Token and Scope are quite obvious bottlenecks.. * such bottlenecks will most of the time only be detected when development is almost finished. (Almost never for library code, so I don't get why STL uses custom allocators at all.) * premature optimization is evil (see STL custom allocators) * late optimization: it is easy if there are custom allocators. In C++ it is even trivial. Nice design. Still no clue why STL is abusing it. * that form of custom allocators does not play nicely with Ds Garbage collector though, because it changes the semantics of 'new', ergo builtin custom allocators should probably indeed be removed from D. * this makes this form of optimization painful in D without custom allocators. This is what I was instinctively scared about. Fact is: it is also painful with them, because you don't get the 'delete's for free as in C++. I was somewhat inconsiderate on this. * it is possible that hinting the GC is only about twice as bad (again, see attachment). And some notes: * the fact that you think all my points were worthless or wrong implies that you either did not read all of them or have not considered them yourself when making your decision. I think that is scary.I find it nuts that someone considers blaming/shunning language features just to mask bad runtime semantics they refuse to fix - but maybe it's just my ignorance on the subject that professes that view.
Apr 24 2011
On 04/24/2011 08:08 PM, Iain Buclaw wrote:== Quote from Timon Gehr (timon.gehr gmx.ch)'s articleI don't know. If it does, an additional level of freelists only hurts.Andrei Alexandrescu wrote:Is it correct that the current D GC implementation puts delete'd allocations into a freelist (or bucket) to be reused when 'new' is invoked again?I am sorry, you simply have no case - each and every argument you put forth has no strength or is just wrong. We could spend time on debating each, but I suspect that would do little toward convincing you of what is after all a simple fact, but one with many subtle facets. It might not a good use of our time to further engage in a diatribe on this. The delete keyword will go, as will class-specific new and delete. AndreiI'd appreciate if you provided more detail on that. Far as I can tell the problem is with the feature under discussion, not some alleged unwillingness to do work. AndreiOk, lets stop. Custom allocators and delete should probably be removed from D: Some of my facts: * freelist is faster if most allocation and deallocation concentrates on only one class. (see attachment) I think this is also the reason it increases DMDs speed. Token and Scope are quite obvious bottlenecks.. * such bottlenecks will most of the time only be detected when development is almost finished. (Almost never for library code, so I don't get why STL uses custom allocators at all.) * premature optimization is evil (see STL custom allocators) * late optimization: it is easy if there are custom allocators. In C++ it is even trivial. Nice design. Still no clue why STL is abusing it. * that form of custom allocators does not play nicely with Ds Garbage collector though, because it changes the semantics of 'new', ergo builtin custom allocators should probably indeed be removed from D. * this makes this form of optimization painful in D without custom allocators. This is what I was instinctively scared about. Fact is: it is also painful with them, because you don't get the 'delete's for free as in C++. I was somewhat inconsiderate on this. * it is possible that hinting the GC is only about twice as bad (again, see attachment). And some notes: * the fact that you think all my points were worthless or wrong implies that you either did not read all of them or have not considered them yourself when making your decision. I think that is scary.I find it nuts that someone considers blaming/shunning language features just to mask bad runtime semantics they refuse to fix - but maybe it's just my ignorance on the subject that professes that view.
Apr 24 2011
On Sun, 24 Apr 2011 21:13:43 -0400, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:On 04/24/2011 08:08 PM, Iain Buclaw wrote:To the best of my knowledge, for 'small' allocations, the current GC uses free-lists of power-of-2 sized blocks. (I think small is defined as anything under 4k)== Quote from Timon Gehr (timon.gehr gmx.ch)'s articleI don't know. If it does, an additional level of freelists only hurts.Andrei Alexandrescu wrote:Is it correct that the current D GC implementation puts delete'd allocations into a freelist (or bucket) to be reused when 'new' is invoked again?I am sorry, you simply have no case - each and every argument you put forth has no strength or is just wrong. We could spend time on debating each, but I suspect that would do little toward convincing you of what is after all a simple fact, but one with many subtle facets. It might not a good use of our time to further engage in a diatribe on this. The delete keyword will go, as will class-specific new and delete. Andrei
Apr 24 2011
On Apr 24, 2011, at 7:25 PM, Robert Jacques wrote:=20 To the best of my knowledge, for 'small' allocations, the current GC =uses free-lists of power-of-2 sized blocks. (I think small is defined as = anything under 4k) That's correct.=
Apr 25 2011
== Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s articleOn 04/24/2011 08:08 PM, Iain Buclaw wrote:I was thinking of where one implements an alternative runtime that may not necessarily interface with a GC, or has different semantics on the use of the keyword (the compiler, after all, doesn't know/care what exactly 'delete' does, it just does it's job at emitting the correct action we expect from it). Though it was late night and I perhaps didn't quite put it right anyway. Having literally slept on it though, a runtime with a good GC is one that doesn't offer a choice to 'delete' memory - so I am on your side. Though reflecting from my experience with the D2 GC (thinking from a compiler implementers POV), there isn't much I can really praise about it. Custom (de)allocators I can also see a very good point for removal too, however much a stab in the back for those who can use them in the - perhaps 'one-and-only' - case for a small amount of speed gain. Iain== Quote from Timon Gehr (timon.gehr gmx.ch)'s articleI don't know. If it does, an additional level of freelists only hurts.Andrei Alexandrescu wrote:Is it correct that the current D GC implementation puts delete'd allocations into a freelist (or bucket) to be reused when 'new' is invoked again?I am sorry, you simply have no case - each and every argument you put forth has no strength or is just wrong. We could spend time on debating each, but I suspect that would do little toward convincing you of what is after all a simple fact, but one with many subtle facets. It might not a good use of our time to further engage in a diatribe on this. The delete keyword will go, as will class-specific new and delete. AndreiI'd appreciate if you provided more detail on that. Far as I can tell the problem is with the feature under discussion, not some alleged unwillingness to do work. AndreiOk, lets stop. Custom allocators and delete should probably be removed from D: Some of my facts: * freelist is faster if most allocation and deallocation concentrates on only one class. (see attachment) I think this is also the reason it increases DMDs speed. Token and Scope are quite obvious bottlenecks.. * such bottlenecks will most of the time only be detected when development is almost finished. (Almost never for library code, so I don't get why STL uses custom allocators at all.) * premature optimization is evil (see STL custom allocators) * late optimization: it is easy if there are custom allocators. In C++ it is even trivial. Nice design. Still no clue why STL is abusing it. * that form of custom allocators does not play nicely with Ds Garbage collector though, because it changes the semantics of 'new', ergo builtin custom allocators should probably indeed be removed from D. * this makes this form of optimization painful in D without custom allocators. This is what I was instinctively scared about. Fact is: it is also painful with them, because you don't get the 'delete's for free as in C++. I was somewhat inconsiderate on this. * it is possible that hinting the GC is only about twice as bad (again, see attachment). And some notes: * the fact that you think all my points were worthless or wrong implies that you either did not read all of them or have not considered them yourself when making your decision. I think that is scary.I find it nuts that someone considers blaming/shunning language features just to mask bad runtime semantics they refuse to fix - but maybe it's just my ignorance on the subject that professes that view.
Apr 25 2011
On 04/25/2011 10:06 AM, Iain Buclaw wrote:I was thinking of where one implements an alternative runtime that may not necessarily interface with a GC, or has different semantics on the use of the keyword (the compiler, after all, doesn't know/care what exactly 'delete' does, it just does it's job at emitting the correct action we expect from it). Though it was late night and I perhaps didn't quite put it right anyway. Having literally slept on it though, a runtime with a good GC is one that doesn't offer a choice to 'delete' memory - so I am on your side. Though reflecting from my experience with the D2 GC (thinking from a compiler implementers POV), there isn't much I can really praise about it. Custom (de)allocators I can also see a very good point for removal too, however much a stab in the back for those who can use them in the - perhaps 'one-and-only' - case for a small amount of speed gain. IainI understand. The right response to this is defining a good custom allocator API and providing good implementations of it. Andrei
Apr 25 2011
On 04/25/2011 10:06 AM, Iain Buclaw wrote:Sounds like a good way forward. However, certain operations are currently not supported unless the GC is enabled, for instance 'Associative Arrays of Strings'... do you imagine the future allocator API beeing flexible enough to facilitate this use-case in a GC-less environment? Or at least allowing more of the functionality which is currently unsupported without GC? KimI was thinking of where one implements an alternative runtime that may not necessarily interface with a GC, or has different semantics on the use of the keyword (the compiler, after all, doesn't know/care what exactly 'delete' does, it just does it's job at emitting the correct action we expect from it). Though it was late night and I perhaps didn't quite put it right anyway. Having literally slept on it though, a runtime with a good GC is one that doesn't offer a choice to 'delete' memory - so I am on your side. Though reflecting from my experience with the D2 GC (thinking from a compiler implementers POV), there isn't much I can really praise about it. Custom (de)allocators I can also see a very good point for removal too, however much a stab in the back for those who can use them in the - perhaps 'one-and-only' - case for a small amount of speed gain. IainI understand. The right response to this is defining a good custom allocator API and providing good implementations of it. Andrei
Apr 25 2011
On 4/25/11 4:20 PM, Kim D wrote:An entity relying on GC will by definition have more freedom and more safety than one relying on an alternative allocator. It is possible to make that transparent to some extent (e.g. define a type to be either refcounted or not depending on the allocator used). Total transparency would be difficult to achieve. AndreiOn 04/25/2011 10:06 AM, Iain Buclaw wrote:Sounds like a good way forward. However, certain operations are currently not supported unless the GC is enabled, for instance 'Associative Arrays of Strings'... do you imagine the future allocator API beeing flexible enough to facilitate this use-case in a GC-less environment? Or at least allowing more of the functionality which is currently unsupported without GC? KimI was thinking of where one implements an alternative runtime that may not necessarily interface with a GC, or has different semantics on the use of the keyword (the compiler, after all, doesn't know/care what exactly 'delete' does, it just does it's job at emitting the correct action we expect from it). Though it was late night and I perhaps didn't quite put it right anyway. Having literally slept on it though, a runtime with a good GC is one that doesn't offer a choice to 'delete' memory - so I am on your side. Though reflecting from my experience with the D2 GC (thinking from a compiler implementers POV), there isn't much I can really praise about it. Custom (de)allocators I can also see a very good point for removal too, however much a stab in the back for those who can use them in the - perhaps 'one-and-only' - case for a small amount of speed gain. IainI understand. The right response to this is defining a good custom allocator API and providing good implementations of it. Andrei
Apr 25 2011
On 04/24/2011 07:10 PM, Timon Gehr wrote:Andrei Alexandrescu wrote:Thank you for agreeing.I am sorry, you simply have no case - each and every argument you put forth has no strength or is just wrong. We could spend time on debating each, but I suspect that would do little toward convincing you of what is after all a simple fact, but one with many subtle facets. It might not a good use of our time to further engage in a diatribe on this. The delete keyword will go, as will class-specific new and delete. AndreiOk, lets stop. Custom allocators and delete should probably be removed from D:Some of my facts: * freelist is faster if most allocation and deallocation concentrates on only one class. (see attachment) I think this is also the reason it increases DMDs speed. Token and Scope are quite obvious bottlenecks..I agree. There are two other things that I don't agree with: 1. This particular point does nothing to help the original argument, which was to keep the delete keyword (and in particular class allocators) in the D programming language. 2. Freelists are not all smiles - have their big problems too. When deciding to go with freelist the implementer must also decide on a strategy for releasing that memory back to the backend allocator on some schedule. Choosing a "good" schedule is already a full-blown problem that is best left to the the specialists implementing the backend allocator. My colleagues discovered with horror that unless the environment variable GLIBCXX_FORCE_NEW was defined, the STL node-based containers would _never_ give back memory once allocated. Memory allocation is difficult, and for any allocator some workloads exist that makes it worse than all others. Nevertheless, recent developments in general-purpose memory allocators (e.g. the Low-Fragmentation Heap on Windows and a slew of allocators on Unix including ptmalloc, tcmalloc, jemalloc, and others) have shown that it's best to just use a good general-purpose allocator instead of fiddling with custom allocation.* such bottlenecks will most of the time only be detected when development is almost finished. (Almost never for library code, so I don't get why STL uses custom allocators at all.) * premature optimization is evil (see STL custom allocators) * late optimization: it is easy if there are custom allocators. In C++ it is even trivial. Nice design. Still no clue why STL is abusing it. * that form of custom allocators does not play nicely with Ds Garbage collector though, because it changes the semantics of 'new', ergo builtin custom allocators should probably indeed be removed from D. * this makes this form of optimization painful in D without custom allocators. This is what I was instinctively scared about. Fact is: it is also painful with them, because you don't get the 'delete's for free as in C++. I was somewhat inconsiderate on this. * it is possible that hinting the GC is only about twice as bad (again, see attachment).OK. Allow me to add a speculation here. I believe that one mistake that both class-level new/delete and STL allocators have done was to associate allocators with types. That just seems to force the proverbial square peg through the round hole. General-purpose allocators just focus on size classes and do awfully well. In addition, they do that in an integrated, non-modular way (as work on HeapLayers has shown, modularizing allocators is difficult and not very intuitive).And some notes: * the fact that you think all my points were worthless or wrong implies that you either did not read all of them or have not considered them yourself when making your decision. I think that is scary.Well I think I owe you an apology. It is very difficult to use limited time resources to convince someone of rather subtle matters, particularly if the discussion starts not with a question (e.g. "In light of what I said, I wonder whether removing delete is the right way to go."), but with an assertion. At that point I need to make a sort of a cold estimate - do I have a reasonable chance of convincing someone of something within a reasonable time frame? It's difficult to make and convey that decision without seeming impolite, although please believe me being curt pains me as much as anyone. To make matters more difficult, I'm not an expert in memory allocation and garbage collection. Probably I'm just at the point I can recognise good work when I see it. Yet, after a book chapter on memory allocation, a couple of articles written on memory allocation, a course I'm teaching on memory allocation, and a good amount of work on STL's interaction with jemalloc while at Facebook - I must admit I've become a bit jaded and as such less sanguine about arguing my points.* There are many ways in which I am always right. The most dominant being that any sufficiently smart guy with a PhD in rocket science would totally agree with all my statements. No pun/offense intended.Not sure how you mean that. I mean, nobody here is discussing your competence on any personal trait of yours, and in particular I'm not keen of drawing any sweeping conclusion from any one discussion. You may as well know better than everybody about allocation, but so far you didn't seem to have found the right words to convey that. The simple fact is that here all anybody knows about you is what you write. And what you wrote failed to produce (now by your own admission as far as I can understand) a viable case for keeping the delete operator and class-level allocators in the D programming language.* this discussion would have been a waste of your time, had it not been public. Thanks for discussing. It was helpful.I hope that's not in jest :o). Andrei
Apr 24 2011
On Apr 24, 2011, at 1:17 PM, Andrei Alexandrescu wrote:=20 It is much better in very many ways, in particular in D. For one =simple argument, consider:=20 auto obj =3D new Widget; =20 Simple modularity considerations would require this operator to have =semantics independent of Widget, such that modular and generic code can = use it properly. Custom class allocators ruin all that, because they = would require the use of delete paired with new, whereas the general = operator requires no action. Agreed, so long as there's some form of support for placement new. But = that could easily be an explicit __ctor() call into the allocated space, = which works in D but I don't believe it works in C++ (thus the explicit = placement new syntax).=
Apr 25 2011
On 4/22/2011 2:04 PM, Timon Gehr wrote:The reason Java is garbage collected is _not_ performance, but primarily reliability.I believe a GC is required if one wishes to prove memory safety, which is definitely a design goal of Java.
Apr 23 2011