www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Allocatoin policy in Phobos - Was: Vote for std.process

reply Manu <turkeyman gmail.com> writes:
On 12 April 2013 21:00, Vladimir Panteleev <vladimir thecybershadow.net>wrote:

 Consider the following hypothetical decisions and outcomes:

 1. std.process is left at is. One user is angry / turned away because it
 performs 0.1% slower than it can be.

 2. std.process is rewritten to minimize allocations. Code complexity goes
 up, new improvements are challenging to add; bugs pop up and go unfixed for
 a while because fewer programmers are qualified or willing to commit the
 effort of making correct fixes. More people are angry / turned away from D
 because its standard library is buggy.

 Of course, the above is an exaggerated illustration. But would optimizing
 all code left and right really make more D users happier?
Just to be clear, I'm not arguing optimisation for performance here, I'm arguing intolerance for __unnecessary__ allocations as a policy, or at least a habit. There's a whole separate thread on the topic of fighting unnecessary garbage, and having the ability to use D with strict control over the GC and/or allocation in general. If std functions have no reason to allocate, why should they? There's also the question of priorities. Would you rather than effort is
 spent on optimizing std.process (and dealing with all the fallout from any
 such optimizations), or working on something that is acutely missing and
 hurting D?
If it's somehow hard to put a string on the stack, then there may be a hole in phobos. I'm not suggesting changes that are somehow hard to implement, or obscure in some way... they should be utterly trivial. D is a systems programming language, there is hope that it will penetrate
 a wide range of systems and environments - sure in many cases a little bit
 of memory use or performance loss is unimportant, but for many it will be
 the decisive factor which makes D unusable there.
This is surely an exaggeration. D does not attempt to please everyone out there who is choosing a programming language for their next project. There is no such language, nor can one exist. One has to accept that D has a number of goals, none of which are absolute, but merely point towards a certain, but not overly specific, point in the multidimensional matrix of trade-offs. D never was about achieving maximum performance in all possible cases.
And I never suggested we scrap phobos and rewrite it so it maximises performance at all costs. I highlighted, and suggested trivial changes that would make a big difference and don't hurt anyone. If it were habit of phobos devs to generally consider and try and avoid unnecessary allocations (almost all of which would be approached by using the stack wherever applicable), the situation would be much better in general. End-users can write D code however they want, but phobos should strive to be usable in as many types of software as possible, otherwise what good is a standard library?
Apr 12 2013
next sibling parent reply "Vladimir Panteleev" <vladimir thecybershadow.net> writes:
On Friday, 12 April 2013 at 12:52:39 UTC, Manu wrote:
 If it's somehow hard to put a string on the stack, then there 
 may be a hole
 in phobos. I'm not suggesting changes that are somehow hard to 
 implement,
 or obscure in some way... they should be utterly trivial.
Well, ironically or not, it is not something utterly trivial. The main issue is that the stack can't hold a lot of data. This is not a problem with the heap, which is limited by the amount of memory and address space; these (usually abundant) limits are usually the user's concern, not the programmer's. Did you know that Linux does not impose a limit on the size of the environment? The default stack size seems to be 8MB... Now, what would happen if on certain machines that, for one reason or another, have an environment larger than that, and std.process did not account for it? So, to perform the task correctly, std.process would need to perform most allocations on the stack if they are up to a certain size, and on the heap otherwise. What would be a good limit for stack allocations? You may want to choose a value based on whatever's the default stack size on today's Linux versions (after all, std.process is near the "leaf" parts of call stacks). However, certain applications create a lot of stacks, for example for use in lightweight threads (fibers). When restricted by a small address space (32-bit architecture), the stacks need to be much smaller than usual...
 I highlighted, and suggested trivial changes that
 would make a big difference and don't hurt anyone.
Well, why do you think they would make a big difference in std.process? I don't think any of the Phobos developers are against improving performance when the cost is low. So, it's not that I think you're wrong in general, but that the std.process scapegoat (for lack of better word) was not the best choice. I suggest that you file enhancement requests on Bugzilla for each specific component of Phobos / Druntime, improving the allocation behavior of which would result in a real-world benefit for you.
Apr 12 2013
parent reply Manu <turkeyman gmail.com> writes:
On 12 April 2013 23:32, Vladimir Panteleev <vladimir thecybershadow.net>wrote:

 On Friday, 12 April 2013 at 12:52:39 UTC, Manu wrote:

 If it's somehow hard to put a string on the stack, then there may be a
 hole
 in phobos. I'm not suggesting changes that are somehow hard to implement,
 or obscure in some way... they should be utterly trivial.
Well, ironically or not, it is not something utterly trivial. The main issue is that the stack can't hold a lot of data. This is not a problem with the heap, which is limited by the amount of memory and address space; these (usually abundant) limits are usually the user's concern, not the programmer's. Did you know that Linux does not impose a limit on the size of the environment? The default stack size seems to be 8MB... Now, what would happen if on certain machines that, for one reason or another, have an environment larger than that, and std.process did not account for it? So, to perform the task correctly, std.process would need to perform most allocations on the stack if they are up to a certain size, and on the heap otherwise. What would be a good limit for stack allocations? You may want to choose a value based on whatever's the default stack size on today's Linux versions (after all, std.process is near the "leaf" parts of call stacks). However, certain applications create a lot of stacks, for example for use in lightweight threads (fibers). When restricted by a small address space (32-bit architecture), the stacks need to be much smaller than usual...
Yes, you're right, there's an 'if' required here to catch unreasonably large environment blocks, but I still consider that within the realm of 'trivial'. This is processed in an appending loop, just check the next bit fits, and if it overflows 1kb or so of stack string, revert to the heap and continue. I reckon helpers could be written to assist with common cases of this (which would have to be mixin template based I guess?)... And I really like the variable-length static array idea! I highlighted, and suggested trivial changes that
 would make a big difference and don't hurt anyone.
Well, why do you think they would make a big difference in std.process? I don't think any of the Phobos developers are against improving performance when the cost is low. So, it's not that I think you're wrong in general, but that the std.process scapegoat (for lack of better word) was not the best choice.
Fuck, I've repeated myself so many times now. The point I make is a general issue I have with phobos, I consider it an issue that should be made policy (irrespective of module being considered), and std.process came into question right at the moment I thought to make the point. It may not be the strongest case for the principle, it's just the one that appeared. I suggest that you file enhancement requests on Bugzilla for each specific
 component of Phobos / Druntime, improving the allocation behavior of which
 would result in a real-world benefit for you.
I'll start doing it myself, but I also suggest it be made a policy, and carefully considered when considering acceptance of ANY new module. That way, new code that suffers the unpredictable/"surprise!" allocation problems won't be introduced.
Apr 12 2013
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 4/12/2013 7:05 AM, Manu wrote:
 I'll start doing it myself, but I also suggest it be made a policy, and
 carefully considered when considering acceptance of ANY new module. That way,
 new code that suffers the unpredictable/"surprise!" allocation problems won't
be
 introduced.
I would also expect that Phobos modules that know the lifetimes of their allocated data use malloc/free rather than the gc. Of course, that entails more effort in coding the modules to ensure no leaks, but we can certainly expect that of phobos developers.
Apr 12 2013
next sibling parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Fri, 12 Apr 2013 13:41:57 -0400, Walter Bright  
<newshound2 digitalmars.com> wrote:

 On 4/12/2013 7:05 AM, Manu wrote:
 I'll start doing it myself, but I also suggest it be made a policy, and
 carefully considered when considering acceptance of ANY new module.  
 That way,
 new code that suffers the unpredictable/"surprise!" allocation problems  
 won't be
 introduced.
I would also expect that Phobos modules that know the lifetimes of their allocated data use malloc/free rather than the gc.
I would like a better solution. Allocating things with malloc/free means no GC references involved,or clunky addroot/removeroot calls. That is dangerous to say the least. What about dsimcha's region allocator? -Steve
Apr 12 2013
parent Walter Bright <newshound2 digitalmars.com> writes:
On 4/12/2013 10:56 AM, Steven Schveighoffer wrote:
 On Fri, 12 Apr 2013 13:41:57 -0400, Walter Bright <newshound2 digitalmars.com>
 wrote:

 On 4/12/2013 7:05 AM, Manu wrote:
 I'll start doing it myself, but I also suggest it be made a policy, and
 carefully considered when considering acceptance of ANY new module. That way,
 new code that suffers the unpredictable/"surprise!" allocation problems won't
be
 introduced.
I would also expect that Phobos modules that know the lifetimes of their allocated data use malloc/free rather than the gc.
I would like a better solution. Allocating things with malloc/free means no GC references involved,or clunky addroot/removeroot calls. That is dangerous to say the least.
Yes, it takes some engineering work to do it right.
 What about dsimcha's region allocator?
Seems like overkill for a small issue.
Apr 12 2013
prev sibling next sibling parent "deadalnix" <deadalnix gmail.com> writes:
On Friday, 12 April 2013 at 17:41:59 UTC, Walter Bright wrote:
 On 4/12/2013 7:05 AM, Manu wrote:
 I'll start doing it myself, but I also suggest it be made a 
 policy, and
 carefully considered when considering acceptance of ANY new 
 module. That way,
 new code that suffers the unpredictable/"surprise!" allocation 
 problems won't be
 introduced.
I would also expect that Phobos modules that know the lifetimes of their allocated data use malloc/free rather than the gc.
Why not use GC.free ? malloc is invisible for the GC, so nothing GCed can be stored there safely.
Apr 12 2013
prev sibling parent "Kagamin" <spam here.lot> writes:
On Friday, 12 April 2013 at 17:41:59 UTC, Walter Bright wrote:
 I would also expect that Phobos modules that know the lifetimes 
 of their allocated data use malloc/free rather than the gc.
Or even better use new/delete. Delete also nullifies the pointer.
Apr 12 2013
prev sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 4/12/2013 5:52 AM, Manu wrote:
 Just to be clear, I'm not arguing optimisation for performance here, I'm
arguing
 intolerance for __unnecessary__ allocations as a policy, or at least a habit.
 There's a whole separate thread on the topic of fighting unnecessary garbage,
 and having the ability to use D with strict control over the GC and/or
 allocation in general.

 If std functions have no reason to allocate, why should they?
Absolutely right. All phobos functions should not allocate unless absolutely necessary.
Apr 12 2013
next sibling parent "Andrej Mitrovic" <andrej.mitrovich gmail.com> writes:
On Friday, 12 April 2013 at 17:37:52 UTC, Walter Bright wrote:
 Absolutely right. All phobos functions should not allocate 
 unless absolutely necessary.
Well that came out of nowhere, when has this rule ever been defined anywhere?
Apr 12 2013
prev sibling parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Fri, 12 Apr 2013 13:37:50 -0400, Walter Bright  
<newshound2 digitalmars.com> wrote:

 On 4/12/2013 5:52 AM, Manu wrote:
 Just to be clear, I'm not arguing optimisation for performance here,  
 I'm arguing
 intolerance for __unnecessary__ allocations as a policy, or at least a  
 habit.
 There's a whole separate thread on the topic of fighting unnecessary  
 garbage,
 and having the ability to use D with strict control over the GC and/or
 allocation in general.

 If std functions have no reason to allocate, why should they?
Absolutely right. All phobos functions should not allocate unless absolutely necessary.
Define "absolutely." For example, there was an objection to accepting an AA as an "environment" map to std.process.spawnX functions because even though reading the AA would not require allocation, allocation would certainly be required to build the AA. Is that acceptable? Certainly we could invent a new non-allocating map type and accept that instead. I think we need clearer lines drawn here, if they are to be respected. -Steve
Apr 12 2013
next sibling parent Walter Bright <newshound2 digitalmars.com> writes:
On 4/12/2013 10:51 AM, Steven Schveighoffer wrote:
 Define "absolutely."  For example, there was an objection to accepting an AA as
 an "environment" map to std.process.spawnX functions because even though
reading
 the AA would not require allocation, allocation would certainly be required to
 build the AA.  Is that acceptable?  Certainly we could invent a new
 non-allocating map type and accept that instead.

 I think we need clearer lines drawn here, if they are to be respected.
I think this is best done on a case-by-case basis using best engineering judgement.
Apr 12 2013
prev sibling parent Manu <turkeyman gmail.com> writes:
On 13 April 2013 03:51, Steven Schveighoffer <schveiguy yahoo.com> wrote:

 On Fri, 12 Apr 2013 13:37:50 -0400, Walter Bright <
 newshound2 digitalmars.com> wrote:

  On 4/12/2013 5:52 AM, Manu wrote:
 Just to be clear, I'm not arguing optimisation for performance here, I'm
 arguing
 intolerance for __unnecessary__ allocations as a policy, or at least a
 habit.
 There's a whole separate thread on the topic of fighting unnecessary
 garbage,
 and having the ability to use D with strict control over the GC and/or
 allocation in general.

 If std functions have no reason to allocate, why should they?
Absolutely right. All phobos functions should not allocate unless absolutely necessary.
Define "absolutely." For example, there was an objection to accepting an AA as an "environment" map to std.process.spawnX functions because even though reading the AA would not require allocation, allocation would certainly be required to build the AA. Is that acceptable? Certainly we could invent a new non-allocating map type and accept that instead. I think we need clearer lines drawn here, if they are to be respected.
Great! I was raising the issue, with the intent to open it for discussion. I never said an AA was intrinsically bad, only that it was impossible to call the function with an environment without allocating, ie, there is no way to pass a literal, and it's just being parsed and piped straight through to a system call, seems redundant to me.
Apr 12 2013