digitalmars.D - RFC: moving forward with nogc Phobos

Andrei Alexandrescu (71/71) Sep 29 2014 Back when I've first introduced RCString I hinted that we have a larger

Daniel Kozak via Digitalmars-d (9/106) Sep 29 2014 V Mon, 29 Sep 2014 03:49:52 -0700

Andrei Alexandrescu (13/19) Sep 29 2014 (please don't overquote!)

Daniel N (19/34) Sep 29 2014 How about having something like ResourceManagementPolicy.infer,
eles (3/4) Sep 29 2014 Finally!

eles (15/19) Sep 29 2014 Sorry, enthusiasm. I really think this is the key for doing the

Vladimir Panteleev (7/17) Sep 29 2014 Is this practically feasible without blowing up Phobos several

Andrei Alexandrescu (6/22) Sep 29 2014 I believe so. For the most part implementations will be identical - just...

Dicebot (12/12) Sep 29 2014 Any assumption that library code can go away with some set of

Andrei Alexandrescu (8/18) Sep 29 2014 That's making exactly the confusion I was - that memory allocation

Dicebot (9/36) Sep 29 2014 Yes but neither decision belongs to library code except for very

Andrei Alexandrescu (13/40) Sep 29 2014 You just assert it, so all I can say is "I understand you believe this"....

Dicebot (20/28) Sep 29 2014 I probably have missed the part with arguments :) Your reasoning

Andrei Alexandrescu (14/45) Sep 29 2014 =================

Dicebot (10/13) Sep 29 2014 Resisting to go on meaningless argument on other points, this

Andrei Alexandrescu (2/13) Sep 29 2014 I trust you'll be. -- Andrei

Chris Williams (12/18) Sep 29 2014 I think the key to this sort of issue is to try and get as much

Paulo Pinto (8/18) Sep 29 2014 Personally, I would go just for (b) with compiler support for

Andrei Alexandrescu (3/6) Sep 29 2014 Compiler already knows (after inlining) that ++i and --i cancel each

Marco Leise (10/17) Sep 30 2014 That helps with very small, inlined functions until Marc
Manu via Digitalmars-d (6/13) Sep 30 2014 The compiler doesn't know that MyLibrary_AddRef(Thing *t); and

deadalnix (4/23) Sep 30 2014 Even with simply i++ and i--, the information that they always go

Jacob Carlborg (5/36) Sep 29 2014 How does allocators fit in this? Will it be an additional argument to

Andrei Alexandrescu (9/11) Sep 29 2014 There would be one allocator per thread (changeable) deferring to a

Johannes Pfau (11/31) Sep 30 2014 So you propose RC + global/thread local allocators as the solution for

Peter Alexander (10/13) Sep 30 2014 Agreed. This is the common case we need to solve for, but this is

Andrei Alexandrescu (8/17) Sep 30 2014 There would be no possibility to do that. I mean it's not there but it

Jacob Carlborg (4/5) Sep 30 2014 Weren't all methods in Object supposed to be lifted out from Object anyw...

Jonathan M Davis via Digitalmars-d (6/9) Oct 28 2014 Yes, but not much work has been done on it, and the little work that has...

Johannes Pfau (8/15) Sep 30 2014 Passing buffers or sink delegates (like we already do for toString) is

Vladimir Panteleev (5/13) Sep 30 2014 I don't understand, why wouldn't you be able to temporarily set

Johannes Pfau (21/36) Sep 30 2014 That's possible but insanely dangerous in case you forget to reset the

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (29/41) Sep 30 2014 Yes, I agree. One option would be to have thread-local region

Paulo Pinto (9/18) Sep 30 2014 It works when two big ifs come together.

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (4/10) Sep 30 2014 But Objective-C has thread safe ref-counting?!

Paulo Pinto (3/16) Sep 30 2014 Did you read my second bullet?

Ola Fosheim Grostad (4/25) Sep 30 2014 Yes? I dont want builtin rc default for single threaded use

Mike (3/5) Sep 30 2014 I agree.

Andrei Alexandrescu (3/12) Sep 30 2014 That's doable, but you don't get to place the string at a _specific_

Andrei Alexandrescu (5/14) Sep 30 2014 Correct. The output of toStringz would be either a GC string or an RC

Johannes Pfau (21/39) Sep 30 2014 The sarcasm is supposed to be here: '_all_ memory related problems' ;-)

Sean Kelly (5/10) Sep 30 2014 Yes, I'm hoping this is an adjunct to changes in Phobos to reduce
Andrei Alexandrescu (16/43) Oct 01 2014 Agreed.

Johannes Pfau (4/66) Oct 06 2014 OK then I got you wrong and I agree with everything you wrote above.

Chris Williams (23/30) Sep 29 2014 Forcing someone (or rather, a team of someones) to call into the
Shammah Chancellor (14/109) Sep 29 2014 I don't like the idea of having to pass in template parameters

Andrei Alexandrescu (5/11) Sep 29 2014 Don't confuse memory allocation with memory management. There's no such

Shammah Chancellor (7/24) Sep 29 2014 Sure, but combining the two could be very useful -- as we have noticed
Daniel N (7/11) Sep 29 2014 There was a solution earlier in this thread which avoids that

Andrei Alexandrescu (2/14) Oct 01 2014 I'm not sure whether we can do this within D's type system. -- Andrei

Uranuz (64/71) Sep 29 2014 I'll try to destroy ;) Before thinking out some answers to this

Mike (7/19) Sep 29 2014 This really hits the nail on the head, and I think your other
Andrei Alexandrescu (25/87) Oct 01 2014 Sadly this is the way things are going (not only in D, but other

Freddy (23/102) Sep 29 2014 Internally we should have something like:

Andrei Alexandrescu (2/23) Sep 29 2014 That's correct. -- Andrei
Andrei Alexandrescu (4/26) Oct 01 2014 Good idea, and it seems Sean's is even better because it groups

Foo (20/20) Sep 30 2014 I hate the fact that this will produce template bloat for each

Foo (3/23) Sep 30 2014 Of course each method/function in Phobos should use the global
Andrei Alexandrescu (3/23) Sep 30 2014 This won't work because the type of "string" is different for RC vs. GC....

Foo (3/31) Sep 30 2014 But it would work for phobos functions without template bloat.

"Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> (3/9) Sep 30 2014 Only for internal allocations. If the functions want to return

Andrei Alexandrescu (2/12) Sep 30 2014 Ah, now I understand the point. Thanks. -- Andrei

Andrei Alexandrescu (5/36) Sep 30 2014 How is the fact there's less bloat relevant for code that doesn't work?

John Colvin (19/98) Sep 30 2014 Instead of adding a new template parameter to every function

Andrei Alexandrescu (4/7) Oct 01 2014 Nice idea, but let's try and explore possibilities within the existing

Sean Kelly (53/66) Sep 30 2014 Is this for exposition purposes or actually how you expect it to

H. S. Teoh via Digitalmars-d (42/75) Sep 30 2014 Yeah, this echoes my concern. This looks not that much different, from a

Andrei Alexandrescu (16/55) Oct 01 2014 The parallel with STL allocators is interesting, but I'm not worried

Andrei Alexandrescu (33/97) Oct 01 2014 That's pretty much what it would take. The key here is that RCString is

Sean Kelly (4/17) Oct 01 2014 I'm confused. Is this a general-purpose solution or just one

Andrei Alexandrescu (2/22) Oct 01 2014 General purpose since your suggested change. -- Andrei

Sean Kelly (13/47) Oct 01 2014 Both, I suppose? A static if block at the top of each function

Andrei Alexandrescu (2/5) Oct 01 2014 Correct. -- Andrei

Dmitry Olshansky (5/14) Sep 30 2014 Incredible code bloat? Boilerplate in each function for the win?

Andrei Alexandrescu (3/16) Oct 01 2014 Sean's idea to make string an alias of the policy takes care of this

H. S. Teoh via Digitalmars-d (11/29) Oct 01 2014 But Sean's idea only takes strings into account. Strings aren't the only

Kiith-Sa (9/45) Oct 01 2014 MMP.Ref!redBlackTreeNode ?
Sean Kelly (13/25) Oct 01 2014 Assuming you're willing to take the memoryModel type as a

Cliff (5/34) Oct 01 2014 If you were to forget D restrictions for a moment, and consider

Andrei Alexandrescu (3/10) Oct 01 2014 There's management for T[], pointers to structs, pointers to class

"Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> (58/136) Sep 30 2014 Ok, here are my few cents:

"Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> (42/52) Sep 30 2014 Ok. What we need for it:
Andrei Alexandrescu (3/9) Oct 01 2014 I'm not very sure. A GC might need to interoperate closely with the

"Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> (8/21) Oct 01 2014 It needs to know what to scan (ideally with type info), and which

Oren Tirosh (22/54) Oct 01 2014 Bingo. Have some way to mark the function return type as a unique

bearophile (5/8) Oct 01 2014 Let's have full-fledged memory zones tracking in the D type
Andrei Alexandrescu (9/21) Oct 01 2014 I'm skeptical about this approach (though clearly we need to explore it

Oren T (10/40) Oct 01 2014 The idea is that the unique property is very short-lived: the

Andrei Alexandrescu (3/10) Oct 01 2014 This all... looks arcane. I'm not sure how it can even made to work if

Oren T (5/21) Oct 01 2014 At the moment, @nogc code can't call any function returning a
Oren T (9/25) Oct 01 2014 At the moment, @nogc code can't call any function returning a

Jacob Carlborg (8/15) Oct 01 2014 Can't we do something like this, or it might be what you're proposing:

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (42/46) Oct 02 2014 That would be better, but how do you deal with "bar(foo())" ?

Jacob Carlborg (8/14) Oct 02 2014 I haven't really thought how it could be implemented but I was hoping

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (4/8) Oct 02 2014 I haven't looked at Rust in detail, but doesn't the Rust compiler

Paulo Pinto (8/17) Oct 02 2014 Rust makes use of the type system and the borrow checker to

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (7/11) Oct 02 2014 They constrain usage so that you cannot share mutable objects. It

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (22/25) Oct 02 2014 Some Rust details. «sendable» means that a reference can be

Paulo Pinto (4/29) Oct 02 2014 The Gc type is gone as of this week.

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (4/6) Oct 02 2014 Thanks, apparently they do it because they want to make a proper

"Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> (7/19) Oct 01 2014 Sure? I already showed in an example how it is possible to chain

Andrei Alexandrescu (2/20) Oct 01 2014 I'd think so. -- Andrei

"Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> (36/83) Oct 01 2014 I don't have all answers to these questions. Still, I'm convinced
"Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> (37/42) Oct 02 2014 Here's an example implementation of what I have in mind (totally

Manu via Digitalmars-d (24/27) Sep 30 2014 I generally like the idea, but my immediate concern is that it implies

Andrei Alexandrescu (10/32) Oct 01 2014 If a lib chooses one specific memory management policy, it can of course...

=?UTF-8?B?Ik5vcmRsw7Z3Ig==?= (4/6) Sep 30 2014 Slightly related :)

Andrei Alexandrescu (2/7) Oct 01 2014 Nice, thanks! -- Andrei

Dmitry Olshansky (14/16) Oct 03 2014 [snip]

Andrei Alexandrescu (9/23) Oct 03 2014 Awesome. I just started

Dmitry Olshansky (8/24) Oct 03 2014 Glad you liked it.

Andrei Alexandrescu (2/26) Oct 03 2014 D script that generates wikitable from that -> awesomeness. -- Andrei

Dmitry Olshansky (4/35) Oct 03 2014 I'm on it. With GitHub source links. D's regex rocks ;)

Dmitry Olshansky (12/27) Oct 03 2014 Forgot my wiki credentials. Anyhow I got passable Markdown page fairly

Dmitry Olshansky (5/27) Oct 03 2014 Ehm, rather (without '!' at the end):

Dmitry Olshansky (10/26) Oct 03 2014 Got it:

Andrei Alexandrescu (3/10) Oct 03 2014 Tried to insert it, looks weird. Probably it would be most effective if

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Back when I've first introduced RCString I hinted that we have a larger 
strategy in mind. Here it is.

The basic tenet of the approach is to reckon and act on the fact that 
memory allocation (the subject of allocators) is an entirely distinct 
topic from memory management, and more generally resource management. 
This clarifies that it would be wrong to approach alternatives to GC in 
Phobos by means of allocators. GC is not only an approach to memory 
allocation, but also an approach to memory management. Reducing it to 
either one is a mistake. In hindsight this looks rather obvious but it 
has caused me and many people better than myself a lot of headache.

That said allocators are nice to have and use, and I will definitely 
follow up with std.allocator. However, std.allocator is not the key to a 
 nogc Phobos.

Nor are ranges. There is an attitude that either output ranges, or input 
ranges in conjunction with lazy computation, would solve the issue of 
creating garbage. 
https://github.com/D-Programming-Language/phobos/pull/2423 is a good 
illustration of the latter approach: a range would be lazily created by 
chaining stuff together. A range-based approach would take us further 
than the allocators, but I see the following issues with it:

(a) the whole approach doesn't stand scrutiny for non-linear outputs, 
e.g. outputting some sort of associative array or really any composite 
type quickly becomes tenuous either with an output range (eager) or with 
exposing an input range (lazy);

(b) makes the style of programming without GC radically different, and 
much more cumbersome, than programming with GC; as a consequence, 
programmers who consider changing one approach to another, or 
implementing an algorithm neutral to it, are looking at a major rewrite;

(c) would make D/ nogc a poor cousin of C++. This is quite out of 
character; technically, I have long gotten used to seeing most elaborate 
C++ code like poor emulation of simple D idioms. But C++ has spent years 
and decades taking to perfection an approach without a tracing garbage 
collector. A departure from that would need to be superior, and that 
doesn't seem to be the case with range-based approaches.

===========

Now that we clarified that these existing attempts are not going to work 
well, the question remains what does. For Phobos I'm thinking of 
defining and using three policies:

enum MemoryManagementPolicy { gc, rc, mrc }
immutable
     gc = ResourceManagementPolicy.gc,
     rc = ResourceManagementPolicy.rc,
     mrc = ResourceManagementPolicy.mrc;

The three policies are:

(a) gc is the classic garbage-collected style of management;

(b) rc is a reference-counted style still backed by the GC, i.e. the GC 
will still be able to pick up cycles and other kinds of leaks.

(c) mrc is a reference-counted style backed by malloc.

(It should be possible to collapse rc and mrc together and make the 
distinction dynamically, at runtime. I'm distinguishing them statically 
here for expository purposes.)

The policy is a template parameter to functions in Phobos (and 
elsewhere), and informs the functions e.g. what types to return. Consider:

auto setExtension(MemoryManagementPolicy mmp = gc, R1, R2)(R1 path, R2 ext)
if (...)
{
     static if (mmp == gc) alias S = string;
     else alias S = RCString;
     S result;
     ...
     return result;
}

On the caller side:

auto p1 = setExtension("hello", ".txt"); // fine, use gc
auto p2 = setExtension!gc("hello", ".txt"); // same
auto p3 = setExtension!rc("hello", ".txt"); // fine, use rc

So by default it's going to continue being business as usual, but 
certain functions will allow passing in a (defaulted) policy for memory 
management.

Destroy!


Andrei

Sep 29 2014

Daniel Kozak via Digitalmars-d <digitalmars-d puremagic.com> writes:

V Mon, 29 Sep 2014 03:49:52 -0700
Andrei Alexandrescu via Digitalmars-d <digitalmars-d puremagic.com>
napsáno:

 Back when I've first introduced RCString I hinted that we have a
 larger strategy in mind. Here it is.
 
 The basic tenet of the approach is to reckon and act on the fact that 
 memory allocation (the subject of allocators) is an entirely distinct 
 topic from memory management, and more generally resource management. 
 This clarifies that it would be wrong to approach alternatives to GC
 in Phobos by means of allocators. GC is not only an approach to
 memory allocation, but also an approach to memory management.
 Reducing it to either one is a mistake. In hindsight this looks
 rather obvious but it has caused me and many people better than
 myself a lot of headache.
 
 That said allocators are nice to have and use, and I will definitely 
 follow up with std.allocator. However, std.allocator is not the key
 to a  nogc Phobos.
 
 Nor are ranges. There is an attitude that either output ranges, or
 input ranges in conjunction with lazy computation, would solve the
 issue of creating garbage. 
 https://github.com/D-Programming-Language/phobos/pull/2423 is a good 
 illustration of the latter approach: a range would be lazily created
 by chaining stuff together. A range-based approach would take us
 further than the allocators, but I see the following issues with it:
 
 (a) the whole approach doesn't stand scrutiny for non-linear outputs, 
 e.g. outputting some sort of associative array or really any
 composite type quickly becomes tenuous either with an output range
 (eager) or with exposing an input range (lazy);
 
 (b) makes the style of programming without GC radically different,
 and much more cumbersome, than programming with GC; as a consequence, 
 programmers who consider changing one approach to another, or 
 implementing an algorithm neutral to it, are looking at a major
 rewrite;
 
 (c) would make D/ nogc a poor cousin of C++. This is quite out of 
 character; technically, I have long gotten used to seeing most
 elaborate C++ code like poor emulation of simple D idioms. But C++
 has spent years and decades taking to perfection an approach without
 a tracing garbage collector. A departure from that would need to be
 superior, and that doesn't seem to be the case with range-based
 approaches.
 
 ===========
 
 Now that we clarified that these existing attempts are not going to
 work well, the question remains what does. For Phobos I'm thinking of 
 defining and using three policies:
 
 enum MemoryManagementPolicy { gc, rc, mrc }
 immutable
      gc = ResourceManagementPolicy.gc,
      rc = ResourceManagementPolicy.rc,
      mrc = ResourceManagementPolicy.mrc;
 
 The three policies are:
 
 (a) gc is the classic garbage-collected style of management;
 
 (b) rc is a reference-counted style still backed by the GC, i.e. the
 GC will still be able to pick up cycles and other kinds of leaks.
 
 (c) mrc is a reference-counted style backed by malloc.
 
 (It should be possible to collapse rc and mrc together and make the 
 distinction dynamically, at runtime. I'm distinguishing them
 statically here for expository purposes.)
 
 The policy is a template parameter to functions in Phobos (and 
 elsewhere), and informs the functions e.g. what types to return.
 Consider:
 
 auto setExtension(MemoryManagementPolicy mmp = gc, R1, R2)(R1 path,
 R2 ext) if (...)
 {
      static if (mmp == gc) alias S = string;
      else alias S = RCString;
      S result;
      ...
      return result;
 }
 
 On the caller side:
 
 auto p1 = setExtension("hello", ".txt"); // fine, use gc
 auto p2 = setExtension!gc("hello", ".txt"); // same
 auto p3 = setExtension!rc("hello", ".txt"); // fine, use rc
 
 So by default it's going to continue being business as usual, but 
 certain functions will allow passing in a (defaulted) policy for
 memory management.
 
 Destroy!
 
 
 Andrei

I would add something like this:

 DefaultMemoryManagementPolicy(rc)
module A;

void main() {
    auto p1 = setExtension("hello", ".txt"); // use rc
}

Sep 29 2014

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 9/29/14, 4:03 AM, Daniel Kozak via Digitalmars-d wrote:
 I would add something like this:

  DefaultMemoryManagementPolicy(rc)
 module A;

 void main() {
      auto p1 = setExtension("hello", ".txt"); // use rc
 }

(please don't overquote!)

Yah, I realized I forgot to mention this: if we play our cards right, a 
lot of code will build in both approaches to memory management by just 
flipping a switch. In particular, the switch can be defaulted to 
something else.

I was thinking of leaving it to the user:

module A;
immutable myMMP = rc;

void main() {
     auto p1 = setExtension!myMMP("hello", ".txt");
}


Andrei

Sep 29 2014

"Daniel N" <ufo orbiting.us> writes:

On Monday, 29 September 2014 at 10:49:53 UTC, Andrei Alexandrescu 
wrote:
 Back when I've first introduced RCString I hinted that we have 
 a larger strategy in mind. Here it is.

 The policy is a template parameter to functions in Phobos (and 
 elsewhere), and informs the functions e.g. what types to 
 return. Consider:

 auto setExtension(MemoryManagementPolicy mmp = gc, R1, R2)(R1 
 path, R2 ext)
 if (...)
 {
     static if (mmp == gc) alias S = string;
     else alias S = RCString;
     S result;
     ...
     return result;
 }

How about having something like ResourceManagementPolicy.infer, 
which under the hood could work something like below... you could 
combine it with your original suggestion, with an overridable 
MemoryManagementPolicy(just removed it to make the example 
shorter)

auto setExtension(R1, R2)(R1 path, R2 ext)
if (...)
{
     static if(functionAttributes!(__traits(parent, setExtension)) 
& FunctionAttribute.nogc)
       alias S = RCString;
     else
       alias S = string;
     ...
     return result;
}

Daniel N

Sep 29 2014

"eles" <eles eles.com> writes:

On Monday, 29 September 2014 at 10:49:53 UTC, Andrei Alexandrescu 
wrote:


 entirely distinct topic

Finally!

Sep 29 2014

"eles" <eles eles.com> writes:

On Monday, 29 September 2014 at 11:37:00 UTC, eles wrote:
 On Monday, 29 September 2014 at 10:49:53 UTC, Andrei 
 Alexandrescu wrote:


 entirely distinct topic

 Finally!

Sorry, enthusiasm. I really think this is the key for doing the 
management of all resources in the right way. For me, the memory 
should be seen as a resource that simply happens to have the 
possibility of being manageable in a more flexible way and with 
specific constraints.

For example, with respect to other kind of resources, you could 
use a lazy approach to deallocate memory, as unlike many other 
resources memory is like money: is fungible [1]. Other resources 
are not. OTOH, the memory comes with some of its own quirks, such 
as the cycles (these could be, in theory, possible for other kind 
of resources, but are exceptions).

Memory management is not necessarily deterministic neither. Other 
resources might require determinism, however.

[1] http://en.wikipedia.org/wiki/Fungibility

Sep 29 2014

"Vladimir Panteleev" <vladimir thecybershadow.net> writes:

On Monday, 29 September 2014 at 10:49:53 UTC, Andrei Alexandrescu 
wrote:
 auto setExtension(MemoryManagementPolicy mmp = gc, R1, R2)(R1 
 path, R2 ext)
 if (...)
 {
     static if (mmp == gc) alias S = string;
     else alias S = RCString;
     S result;
     ...
     return result;
 }

Is this practically feasible without blowing up Phobos several 
times in size and complexity?

And I'm not sure adding a template parameter to every function is 
going to work well, what with all the existing template 
parameters - especially the optional ones.

Sep 29 2014

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 9/29/14, 5:06 AM, Vladimir Panteleev wrote:
 On Monday, 29 September 2014 at 10:49:53 UTC, Andrei Alexandrescu wrote:
 auto setExtension(MemoryManagementPolicy mmp = gc, R1, R2)(R1 path, R2
 ext)
 if (...)
 {
     static if (mmp == gc) alias S = string;
     else alias S = RCString;
     S result;
     ...
     return result;
 }

 Is this practically feasible without blowing up Phobos several times in
 size and complexity?

I believe so. For the most part implementations will be identical - just 
look at the RCString primitives, which are virtually the same as string's.

 And I'm not sure adding a template parameter to every function is going
 to work well, what with all the existing template parameters -
 especially the optional ones.

Not all functions, just those that allocate. I agree there will be a few 
decisions to be made there.


Andrei

Sep 29 2014

"Dicebot" <public dicebot.lv> writes:

Any assumption that library code can go away with some set of 
pre-defined allocation strategies is crap. This whole discussion 
was about how important it is to move allocation decisions to 
user code (ranges are just one tool to achieve that, Don has been 
presenting examples of how we do that with plain arrays in DConf 
2014 talk).

In that regard allocators + ranges are still the way to go in my 
opinion. Yes, sometimes those result in very hard to use API - 
providing GC-heavy but friendly alternatives for those shouldn't 
do any harm. But in general full decoupling of algorithms from 
allocations is necessary. If that makes D poor cousin of C++ we 
may have a learn few tricks from C++.

Sep 29 2014

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 9/29/14, 5:29 AM, Dicebot wrote:
 Any assumption that library code can go away with some set of
 pre-defined allocation strategies is crap. This whole discussion was
 about how important it is to move allocation decisions to user code
 (ranges are just one tool to achieve that, Don has been presenting
 examples of how we do that with plain arrays in DConf 2014 talk).

That's making exactly the confusion I was - that memory allocation 
strategy is the same as memory management strategy.

 In that regard allocators + ranges are still the way to go in my
 opinion. Yes, sometimes those result in very hard to use API - providing
 GC-heavy but friendly alternatives for those shouldn't do any harm. But
 in general full decoupling of algorithms from allocations is necessary.
 If that makes D poor cousin of C++ we may have a learn few tricks from C++.

As long as things are trivial they can be done with relative ease, 
albeit with more pain. But consider e.g. the recent JSON library by 
Sönke. It needs to create a lookup data structure and return things like 
strings from it. What primitives do you think could it define?


Andrei

Sep 29 2014

"Dicebot" <public dicebot.lv> writes:

On Monday, 29 September 2014 at 15:18:40 UTC, Andrei Alexandrescu 
wrote:
 On 9/29/14, 5:29 AM, Dicebot wrote:
 Any assumption that library code can go away with some set of
 pre-defined allocation strategies is crap. This whole 
 discussion was
 about how important it is to move allocation decisions to user 
 code
 (ranges are just one tool to achieve that, Don has been 
 presenting
 examples of how we do that with plain arrays in DConf 2014 
 talk).

 That's making exactly the confusion I was - that memory 
 allocation strategy is the same as memory management strategy.

Yes but neither decision belongs to library code except for very 
rare cases.

 In that regard allocators + ranges are still the way to go in 
 my
 opinion. Yes, sometimes those result in very hard to use API - 
 providing
 GC-heavy but friendly alternatives for those shouldn't do any 
 harm. But
 in general full decoupling of algorithms from allocations is 
 necessary.
 If that makes D poor cousin of C++ we may have a learn few 
 tricks from C++.

 As long as things are trivial they can be done with relative 
 ease, albeit with more pain. But consider e.g. the recent JSON 
 library by Sönke. It needs to create a lookup data structure 
 and return things like strings from it. What primitives do you 
 think could it define?

Sounds like it may have to define own kind of allocator with 
certain implementation restrictions (and implement it in terms of 
GC by default). I have not actually read the code for that 
proposal so hard to guess. Will need to do it if it really 
matters.

Sep 29 2014

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 9/29/14, 8:53 AM, Dicebot wrote:
 On Monday, 29 September 2014 at 15:18:40 UTC, Andrei Alexandrescu wrote:
 On 9/29/14, 5:29 AM, Dicebot wrote:
 Any assumption that library code can go away with some set of
 pre-defined allocation strategies is crap. This whole discussion was
 about how important it is to move allocation decisions to user code
 (ranges are just one tool to achieve that, Don has been presenting
 examples of how we do that with plain arrays in DConf 2014 talk).

 That's making exactly the confusion I was - that memory allocation
 strategy is the same as memory management strategy.

 Yes but neither decision belongs to library code except for very rare
 cases.

You just assert it, so all I can say is "I understand you believe this". 
I've motivated my argument. You may want to do the same for yours.

 In that regard allocators + ranges are still the way to go in my
 opinion. Yes, sometimes those result in very hard to use API - providing
 GC-heavy but friendly alternatives for those shouldn't do any harm. But
 in general full decoupling of algorithms from allocations is necessary.
 If that makes D poor cousin of C++ we may have a learn few tricks
 from C++.

 As long as things are trivial they can be done with relative ease,
 albeit with more pain. But consider e.g. the recent JSON library by
 Sönke. It needs to create a lookup data structure and return things
 like strings from it. What primitives do you think could it define?

 Sounds like it may have to define own kind of allocator with certain
 implementation restrictions (and implement it in terms of GC by
 default). I have not actually read the code for that proposal so hard to
 guess. Will need to do it if it really matters.

So you don't have an answer. And again you are confusing memory 
allocation with memory management.

I have sketched an approach that works and will take us to Phobos being 
most transparently usable with tracing collection or with reference 
counting. Part of that is RCString (and generally reference counted 
slices and hashtables), and another part is the  refcounted attribute 
for classes. I will push it through. If you have any objections, it 
would be great if you argued them properly.


Thanks,

Andrei

Sep 29 2014

"Dicebot" <public dicebot.lv> writes:

On Monday, 29 September 2014 at 17:04:54 UTC, Andrei Alexandrescu 
wrote:
 Yes but neither decision belongs to library code except for 
 very rare
 cases.

 You just assert it, so all I can say is "I understand you 
 believe this". I've motivated my argument. You may want to do 
 the same for yours.

I probably have missed the part with arguments :) Your reasoning 
is not fundamentally different from "GC should be enough" but 
extended to several options from single one.

My argument is simple - one can't forsee everything. I remember 
reading book of one guy who has been advocating thing called 
"policy-based design", you may know him ;) Was quite impressed 
with the simple but practical basic idea - decoupling parts of 
the implementation that are not inherently related.

 So you don't have an answer. And again you are confusing memory 
 allocation with memory management.

Yes, sorry, I don't have an answer. Or time do deeply dive into 
the code unless it is really important or my direct 
responsibility.

Unfortunately, I don't see an answer how your proposal fits our 
code either. Most of Sociomantic code relies on using arrays as 
ref arguments to avoid creating of new GC roots (no, we don't 
need/want to switch to ARC). This was several times called as the 
reason why Phobos in its current shape is largely unusable for 
out needs even when D2 switch is finished. I don't see how 
proposal in original post changes that.

Sep 29 2014

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 9/29/14, 10:19 AM, Dicebot wrote:
 On Monday, 29 September 2014 at 17:04:54 UTC, Andrei Alexandrescu wrote:
 Yes but neither decision belongs to library code except for very rare
 cases.

 You just assert it, so all I can say is "I understand you believe
 this". I've motivated my argument. You may want to do the same for yours.

 I probably have missed the part with arguments :)

No problem, let me paste it again:

 The basic tenet of the approach is to reckon and act on the fact that memory
allocation (the subject of allocators) is an entirely distinct topic from
memory management, and more generally resource management. This clarifies that
it would be wrong to approach alternatives to GC in Phobos by means of
allocators. GC is not only an approach to memory allocation, but also an
approach to memory management. Reducing it to either one is a mistake. In
hindsight this looks rather obvious but it has caused me and many people better
than myself a lot of headache.

 That said allocators are nice to have and use, and I will definitely follow up
with std.allocator. However, std.allocator is not the key to a  nogc Phobos.

 Nor are ranges. There is an attitude that either output ranges, or input
ranges in conjunction with lazy computation, would solve the issue of creating
garbage. https://github.com/D-Programming-Language/phobos/pull/2423 is a good
illustration of the latter approach: a range would be lazily created by
chaining stuff together. A range-based approach would take us further than the
allocators, but I see the following issues with it:

 (a) the whole approach doesn't stand scrutiny for non-linear outputs, e.g.
outputting some sort of associative array or really any composite type quickly
becomes tenuous either with an output range (eager) or with exposing an input
range (lazy);

 (b) makes the style of programming without GC radically different, and much
more cumbersome, than programming with GC; as a consequence, programmers who
consider changing one approach to another, or implementing an algorithm neutral
to it, are looking at a major rewrite;

 (c) would make D/ nogc a poor cousin of C++. This is quite out of character;
technically, I have long gotten used to seeing most elaborate C++ code like
poor emulation of simple D idioms. But C++ has spent years and decades taking
to perfection an approach without a tracing garbage collector. A departure from
that would need to be superior, and that doesn't seem to be the case with
range-based approaches.

=================

 Your reasoning is not
 fundamentally different from "GC should be enough" but extended to
 several options from single one.

Where's RC in the "GC should be enough"?

 My argument is simple - one can't forsee everything. I remember reading
 book of one guy who has been advocating thing called "policy-based
 design", you may know him ;) Was quite impressed with the simple but
 practical basic idea - decoupling parts of the implementation that are
 not inherently related.

Totally. Then it would be great if you trusted the guy when he makes a 
judgment call in which reasonable people may disagree.

There are many memory /allocation/ policies but precious few memory 
/management/ policies. I only know "manual", "scoped", "reference 
counted", and "tracing" based on... the last 50 years of software 
development.

 So you don't have an answer. And again you are confusing memory
 allocation with memory management.

 Yes, sorry, I don't have an answer. Or time do deeply dive into the code
 unless it is really important or my direct responsibility.

 Unfortunately, I don't see an answer how your proposal fits our code
 either. Most of Sociomantic code relies on using arrays as ref arguments
 to avoid creating of new GC roots (no, we don't need/want to switch to
 ARC). This was several times called as the reason why Phobos in its
 current shape is largely unusable for out needs even when D2 switch is
 finished. I don't see how proposal in original post changes that.

Passing arrays by reference is plenty adequate with all memory 
management strategies. You'll need to wait and see how the proposal 
changes that, but if you naysay, back it up.


Andrei

Sep 29 2014

"Dicebot" <public dicebot.lv> writes:

On Monday, 29 September 2014 at 22:18:38 UTC, Andrei Alexandrescu 
wrote:
 Passing arrays by reference is plenty adequate with all memory 
 management strategies. You'll need to wait and see how the 
 proposal changes that, but if you naysay, back it up.

Resisting to go on meaningless argument on other points, this 
pretty much says that focus on things that are important for me 
is abandoned in favor of something that mostly doesn't matter. Am 
I supposed to be happy? :) Am I supposed to be twice as happy 
when you propose to close pull requests that do help because of 
this proposal?

I am waiting for what comes next but right now "not impressed" is 
most optimistic way to put this. Sorry :(

Sep 29 2014

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 9/29/14, 3:43 PM, Dicebot wrote:
 On Monday, 29 September 2014 at 22:18:38 UTC, Andrei Alexandrescu wrote:
 Passing arrays by reference is plenty adequate with all memory
 management strategies. You'll need to wait and see how the proposal
 changes that, but if you naysay, back it up.

 Resisting to go on meaningless argument on other points, this pretty
 much says that focus on things that are important for me is abandoned in
 favor of something that mostly doesn't matter. Am I supposed to be
 happy? :) Am I supposed to be twice as happy when you propose to close
 pull requests that do help because of this proposal?

 I am waiting for what comes next but right now "not impressed" is most
 optimistic way to put this. Sorry :(

I trust you'll be. -- Andrei

Sep 29 2014

"Chris Williams" <yoreanon-chrisw yahoo.co.jp> writes:

On Monday, 29 September 2014 at 12:29:33 UTC, Dicebot wrote:
 Any assumption that library code can go away with some set of 
 pre-defined allocation strategies is crap. This whole 
 discussion was about how important it is to move allocation 
 decisions to user code (ranges are just one tool to achieve 
 that, Don has been presenting examples of how we do that with 
 plain arrays in DConf 2014 talk).

I think the key to this sort of issue is to try and get as much 
functionality in Phobos marked  nogc as possible. After that, 
building new library-like functionality into a DUB package that 
assumes  nogc and only uses the  nogc code in Phobos would be the 
next step. Should that get to a state where it's popular and 
supported, pulling it in as std.nogc.* might make sense, but 
trying to redo Phobos as a manual memory collection library is 
infeasible.

Were I your company, I'd start working on leading such an effort.

Unlike Tango, I don't think a development like this would split 
the community nor the community's resources in a useless fashion.

Sep 29 2014

Paulo Pinto <pjmlp progtools.org> writes:

Am 29.09.2014 12:49, schrieb Andrei Alexandrescu:
 [...]

 The three policies are:

 (a) gc is the classic garbage-collected style of management;

 (b) rc is a reference-counted style still backed by the GC, i.e. the GC
 will still be able to pick up cycles and other kinds of leaks.

 (c) mrc is a reference-counted style backed by malloc.

 (It should be possible to collapse rc and mrc together and make the
 distinction dynamically, at runtime. I'm distinguishing them statically
 here for expository purposes.)

 ...

Personally, I would go just for (b) with compiler support for 
increment/decrement removal, as I think it will be too complex having to 
support everything and this will complicate all libraries.

Anyway, that was just my 0.02�. Stepping out the thread as I just toy 
around with D and cannot add much more to the discussion.

--
Paulo

Sep 29 2014

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 9/29/14, 10:16 AM, Paulo Pinto wrote:
 Personally, I would go just for (b) with compiler support for
 increment/decrement removal, as I think it will be too complex having to
 support everything and this will complicate all libraries.

Compiler already knows (after inlining) that ++i and --i cancel each 
other, so we should be in good shape there. -- Andrei

Sep 29 2014

Marco Leise <Marco.Leise gmx.de> writes:

Am Mon, 29 Sep 2014 15:04:03 -0700
schrieb Andrei Alexandrescu <SeeWebsiteForEmail erdani.org>:

 On 9/29/14, 10:16 AM, Paulo Pinto wrote:
 Personally, I would go just for (b) with compiler support for
 increment/decrement removal, as I think it will be too complex having to
 support everything and this will complicate all libraries.

=20
 Compiler already knows (after inlining) that ++i and --i cancel each=20
 other, so we should be in good shape there. -- Andrei

That helps with very small, inlined functions until Marc
Sch=C3=BCtz's work on borrowed pointers makes it redundant by
unifying scoped copies of GC, RC and stack pointers.
In any case inc/dec elision is an optimization and and not an
enabling feature. It sure is on the radar and can be improved
later on.

--=20
Marco

Sep 30 2014

Manu via Digitalmars-d <digitalmars-d puremagic.com> writes:

On 30 September 2014 08:04, Andrei Alexandrescu via Digitalmars-d
<digitalmars-d puremagic.com> wrote:
 On 9/29/14, 10:16 AM, Paulo Pinto wrote:
 Personally, I would go just for (b) with compiler support for
 increment/decrement removal, as I think it will be too complex having to
 support everything and this will complicate all libraries.


 Compiler already knows (after inlining) that ++i and --i cancel each other,
 so we should be in good shape there. -- Andrei

The compiler doesn't know that MyLibrary_AddRef(Thing *t); and
MyLibrary_DecRef(Thing *t); cancel eachother out though...
rc needs primitives that the compiler understands implicitly, so that
rc logic can be more complex than ++i/--i;

Sep 30 2014

"deadalnix" <deadalnix gmail.com> writes:

On Wednesday, 1 October 2014 at 01:26:45 UTC, Manu via
Digitalmars-d wrote:
 On 30 September 2014 08:04, Andrei Alexandrescu via 
 Digitalmars-d
 <digitalmars-d puremagic.com> wrote:
 On 9/29/14, 10:16 AM, Paulo Pinto wrote:
 Personally, I would go just for (b) with compiler support for
 increment/decrement removal, as I think it will be too 
 complex having to
 support everything and this will complicate all libraries.


 Compiler already knows (after inlining) that ++i and --i 
 cancel each other,
 so we should be in good shape there. -- Andrei

 The compiler doesn't know that MyLibrary_AddRef(Thing *t); and
 MyLibrary_DecRef(Thing *t); cancel eachother out though...
 rc needs primitives that the compiler understands implicitly, 
 so that
 rc logic can be more complex than ++i/--i;

Even with simply i++ and i--, the information that they always go
by pair is lost on the compiler in many cases.

Sep 30 2014

Jacob Carlborg <doob me.com> writes:

On 2014-09-29 12:49, Andrei Alexandrescu wrote:

 Now that we clarified that these existing attempts are not going to work
 well, the question remains what does. For Phobos I'm thinking of
 defining and using three policies:

 enum MemoryManagementPolicy { gc, rc, mrc }
 immutable
      gc = ResourceManagementPolicy.gc,
      rc = ResourceManagementPolicy.rc,
      mrc = ResourceManagementPolicy.mrc;

 The three policies are:

 (a) gc is the classic garbage-collected style of management;

 (b) rc is a reference-counted style still backed by the GC, i.e. the GC
 will still be able to pick up cycles and other kinds of leaks.

 (c) mrc is a reference-counted style backed by malloc.

 (It should be possible to collapse rc and mrc together and make the
 distinction dynamically, at runtime. I'm distinguishing them statically
 here for expository purposes.)

 The policy is a template parameter to functions in Phobos (and
 elsewhere), and informs the functions e.g. what types to return. Consider:

 auto setExtension(MemoryManagementPolicy mmp = gc, R1, R2)(R1 path, R2 ext)
 if (...)
 {
      static if (mmp == gc) alias S = string;
      else alias S = RCString;
      S result;
      ...
      return result;
 }

 On the caller side:

 auto p1 = setExtension("hello", ".txt"); // fine, use gc
 auto p2 = setExtension!gc("hello", ".txt"); // same
 auto p3 = setExtension!rc("hello", ".txt"); // fine, use rc

How does allocators fit in this? Will it be an additional argument to 
the function. Or a separate stack that one can push and pop allocators to?

-- 
/Jacob Carlborg

Sep 29 2014

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 9/29/14, 10:25 AM, Jacob Carlborg wrote:
 How does allocators fit in this? Will it be an additional argument to
 the function. Or a separate stack that one can push and pop allocators to?

There would be one allocator per thread (changeable) deferring to a 
global interlocked allocator. Most algorithms would just use whatever 
allocator is installed.

I know the notion of a thread-local and then global allocator is liable 
to cause some an apoplexy attack. But it's time to model things as they 
are - memory is a global resource and it ought to be treated as such. No 
need to pass allocators around except for special cases.


Andrei

Sep 29 2014

Johannes Pfau <nospam example.com> writes:

Am Mon, 29 Sep 2014 15:11:26 -0700
schrieb Andrei Alexandrescu <SeeWebsiteForEmail erdani.org>:

 On 9/29/14, 10:25 AM, Jacob Carlborg wrote:
 How does allocators fit in this? Will it be an additional argument
 to the function. Or a separate stack that one can push and pop
 allocators to?

 
 There would be one allocator per thread (changeable) deferring to a 
 global interlocked allocator. Most algorithms would just use whatever 
 allocator is installed.
 
 I know the notion of a thread-local and then global allocator is
 liable to cause some an apoplexy attack. But it's time to model
 things as they are - memory is a global resource and it ought to be
 treated as such. No need to pass allocators around except for special
 cases.
 
 
 Andrei
 

 No need to pass allocators around except for special
 cases.

So you propose RC + global/thread local allocators as the solution for
all memory related problems as 'memory management is not allocation'.
And you claim that using output ranges / providing buffers / allocators
is not an option because it only works in some special cases?

What if I don't want automated memory _management_? What if I want a
function to use a stack buffer? Or if I want to free manually?

If I want std.string.toStringz to put the result into a temporary stack
buffer your solution doesn't help at all. Passing an ouput range,
allocator or buffer would all solve this.

Sep 30 2014

"Peter Alexander" <peter.alexander.au gmail.com> writes:

On Tuesday, 30 September 2014 at 08:34:26 UTC, Johannes Pfau 
wrote:
 What if I don't want automated memory _management_? What if I 
 want a
 function to use a stack buffer? Or if I want to free manually?

Agreed. This is the common case we need to solve for, but this is 
memory allocation, not management. I'm not sure where manual 
management fits into Andrei's scheme. Andrei, could you give an 
example of, e.g. how toStringz would work with a stack buffer in 
your proposed scheme?

Another thought: if we use a template parameter, what's the story 
for virtual functions (e.g. Object.toString)? They can't be 
templated.

Sep 30 2014

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 9/30/14, 3:41 AM, Peter Alexander wrote:
 On Tuesday, 30 September 2014 at 08:34:26 UTC, Johannes Pfau wrote:
 What if I don't want automated memory _management_? What if I want a
 function to use a stack buffer? Or if I want to free manually?

 Agreed. This is the common case we need to solve for, but this is memory
 allocation, not management. I'm not sure where manual management fits
 into Andrei's scheme. Andrei, could you give an example of, e.g. how
 toStringz would work with a stack buffer in your proposed scheme?

There would be no possibility to do that. I mean it's not there but it 
can be added e.g. as a "manual" option of performing memory management. 
The "manual" overloads for functions would require an output range 
parameter. Not all functions might support a "manual" option - that'd be 
rejected statically.

 Another thought: if we use a template parameter, what's the story for
 virtual functions (e.g. Object.toString)? They can't be templated.

Good point. We need to think about that.


Andrei

Sep 30 2014

Jacob Carlborg <doob me.com> writes:

On 30/09/14 14:29, Andrei Alexandrescu wrote:

 Good point. We need to think about that.

Weren't all methods in Object supposed to be lifted out from Object anyway?

-- 
/Jacob Carlborg

Sep 30 2014

Jonathan M Davis via Digitalmars-d <digitalmars-d puremagic.com> writes:

On Tuesday, September 30, 2014 15:18:17 Jacob Carlborg via Digitalmars-d 
wrote:
 On 30/09/14 14:29, Andrei Alexandrescu wrote:
 Good point. We need to think about that.

 Weren't all methods in Object supposed to be lifted out from Object anyway?

Yes, but not much work has been done on it, and the little work that has been
done is blocked by at least one compiler bug:

https://issues.dlang.org/show_bug.cgi?id=12537

- Jonathan M Davis

Oct 28 2014

Johannes Pfau <nospam example.com> writes:

Am Tue, 30 Sep 2014 05:29:55 -0700
schrieb Andrei Alexandrescu <SeeWebsiteForEmail erdani.org>:

 
 Another thought: if we use a template parameter, what's the story
 for virtual functions (e.g. Object.toString)? They can't be
 templated.

 
 Good point. We need to think about that.
 

Passing buffers or sink delegates (like we already do for toString) is
possible for some functions. For toString it works fine. Then implement
to!RCString(object) using the toString(sink delegate) overload.

For all other functions RC is indeed difficult, probably only possible
with different manually written overloads (and a dummy parameter as we
can't overload on return type)?

Sep 30 2014

"Vladimir Panteleev" <vladimir thecybershadow.net> writes:

On Tuesday, 30 September 2014 at 08:34:26 UTC, Johannes Pfau 
wrote:
 What if I don't want automated memory _management_? What if I 
 want a
 function to use a stack buffer? Or if I want to free manually?

 If I want std.string.toStringz to put the result into a 
 temporary stack
 buffer your solution doesn't help at all. Passing an ouput 
 range,
 allocator or buffer would all solve this.

I don't understand, why wouldn't you be able to temporarily set 
the thread-local allocator to use the stack buffer, and restore 
it once done?

Sep 30 2014

Johannes Pfau <nospam example.com> writes:

Am Tue, 30 Sep 2014 10:47:54 +0000
schrieb "Vladimir Panteleev" <vladimir thecybershadow.net>:

 On Tuesday, 30 September 2014 at 08:34:26 UTC, Johannes Pfau 
 wrote:
 What if I don't want automated memory _management_? What if I 
 want a
 function to use a stack buffer? Or if I want to free manually?

 If I want std.string.toStringz to put the result into a 
 temporary stack
 buffer your solution doesn't help at all. Passing an ouput 
 range,
 allocator or buffer would all solve this.

 
 I don't understand, why wouldn't you be able to temporarily set 
 the thread-local allocator to use the stack buffer, and restore 
 it once done?

That's possible but insanely dangerous in case you forget to reset the
thread allocator. Also storing stack pointers in global state (even
thread-local) is dangerous, for example interaction with fibers could
lead to bugs, etc. (What if I set the allocator to a stack allocator
and call a function which yields from a Fiber?).

You also loose all possibilities to use 'scope' or a similar mechanism
to prevent escaping a stack pointer. 

Also a stack buffer is not a complete allocator, but in some
cases like toStringz it works even better than allocators (less
overhead as you know the required buffer size before calling toStringz
and there's only one allocation)

And it is a hack. Of course you can provide a wrapper which does
oldAlloc = threadLocalAllocator;
threadLocalAllocator = stackbuf;
func();
scope(exit)
    threadLocalAllocator = oldAlloc;

But how could anybody think this is good API design? I think I'd rather
fork the required Phobos functions instead of using such a wrapper.

Sep 30 2014

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:

On Tuesday, 30 September 2014 at 12:02:10 UTC, Johannes Pfau 
wrote:
 That's possible but insanely dangerous in case you forget to 
 reset the
 thread allocator. Also storing stack pointers in global state 
 (even
 thread-local) is dangerous, for example interaction with fibers 
 could
 lead to bugs, etc. (What if I set the allocator to a stack 
 allocator
 and call a function which yields from a Fiber?).

 You also loose all possibilities to use 'scope' or a similar 
 mechanism
 to prevent escaping a stack pointer.

Yes, I agree. One option would be to have thread-local region 
allocator that can only be used for "scoped" allocation. That is, 
only for allocations that are not assigned to globals or can get 
stuck in fibers and that are returned to the calling function. 
That way the context can free the region when done and you can 
get away with little allocation overhead if used prudently.

I also don't agree with the sentiment that allocation/management 
can be kept fully separate. If you have a region allocator that 
is refcounted it most certainly is interrelated with a fairly 
tight coupling.

Also the idea exposed in this thread that release()/retain() is 
purely arithmetic and can be optimized as such is quite wrong. 
retain() is conceptually a locking construct on a memory region 
that prevents reuse. I've made a case for TSX, but one can 
probably come up with other multi-threaded examples.

These hacks are not making D more attractive to people who find 
C++ lacking in elegance.

Actually, creating a phobos light with nothrow, nogc, a light 
runtime and basic building blocks such as intrinsics to build 
your own RC with compiler support sounds like a more interesting 
option.

I am really not interested in library provided allocators or RC. 
If I am not going to use malloc/GC then I want to write my own 
and have dedicated allocators for the most common objects.

I think it is quite reasonable that people who want to take the 
difficult road of not using GC at all also have to do some extra 
work, but provide a clean slate to work from!

Sep 30 2014

"Paulo Pinto" <pjmlp progtools.org> writes:

On Tuesday, 30 September 2014 at 12:32:08 UTC, Ola Fosheim 
Grøstad wrote:
 On Tuesday, 30 September 2014 at 12:02:10 UTC, Johannes Pfau 
 wrote:
 ...

 Also the idea exposed in this thread that release()/retain() 
 is

 purely arithmetic and can be optimized as such is quite wrong. 
 retain() is conceptually a locking construct on a memory region 
 that prevents reuse. I've made a case for TSX, but one can 
 probably come up with other multi-threaded examples.

It works when two big ifs come together.

- inside the same scope (e.g. function level)

- when the referece is not shared between threads.

While it is of limited applicability, Objective-C (and eventually 
Swift) codebases prove it helps in most real life use cases.

--
Paulo

Sep 30 2014

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:

On Tuesday, 30 September 2014 at 12:51:25 UTC, Paulo  Pinto wrote:
 It works when two big ifs come together.

 - inside the same scope (e.g. function level)

 - when the referece is not shared between threads.

 While it is of limited applicability, Objective-C (and 
 eventually Swift) codebases prove it helps in most real life 
 use cases.

But Objective-C has thread safe ref-counting?!

If it isn't thread safe it is of very limited utility, you can 
usually get away with unique_ptr in single threaded scenarios.

Sep 30 2014

Paulo Pinto <pjmlp progtools.org> writes:

Am 30.09.2014 14:55, schrieb "Ola Fosheim Grøstad" 
<ola.fosheim.grostad+dlang gmail.com>":
 On Tuesday, 30 September 2014 at 12:51:25 UTC, Paulo  Pinto wrote:
 It works when two big ifs come together.

 - inside the same scope (e.g. function level)

 - when the referece is not shared between threads.

 While it is of limited applicability, Objective-C (and eventually
 Swift) codebases prove it helps in most real life use cases.

 But Objective-C has thread safe ref-counting?!

 If it isn't thread safe it is of very limited utility, you can usually
 get away with unique_ptr in single threaded scenarios.

Did you read my second bullet?

Sep 30 2014

"Ola Fosheim Grostad" <ola.fosheim.grostad+dlang gmail.com> writes:

On Tuesday, 30 September 2014 at 20:13:38 UTC, Paulo Pinto wrote:
 Am 30.09.2014 14:55, schrieb "Ola Fosheim Grøstad" 
 <ola.fosheim.grostad+dlang gmail.com>":
 On Tuesday, 30 September 2014 at 12:51:25 UTC, Paulo  Pinto 
 wrote:
 It works when two big ifs come together.

 - inside the same scope (e.g. function level)

 - when the referece is not shared between threads.

 While it is of limited applicability, Objective-C (and 
 eventually
 Swift) codebases prove it helps in most real life use cases.

 But Objective-C has thread safe ref-counting?!

 If it isn't thread safe it is of very limited utility, you can 
 usually
 get away with unique_ptr in single threaded scenarios.

 Did you read my second bullet?

Yes? I dont want builtin rc default for single threaded use 
cases. I do want it when references are shared between threads, 
e.g. for cache objects.

Sep 30 2014

"Mike" <none none.com> writes:

On Tuesday, 30 September 2014 at 12:32:08 UTC, Ola Fosheim 
Grøstad wrote:
 ...basic building blocks such as intrinsics to build your own 
 RC with compiler support sounds like a more interesting option.

I agree.

Sep 30 2014

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 9/30/14, 3:47 AM, Vladimir Panteleev wrote:
 On Tuesday, 30 September 2014 at 08:34:26 UTC, Johannes Pfau wrote:
 What if I don't want automated memory _management_? What if I want a
 function to use a stack buffer? Or if I want to free manually?

 If I want std.string.toStringz to put the result into a temporary stack
 buffer your solution doesn't help at all. Passing an ouput range,
 allocator or buffer would all solve this.

 I don't understand, why wouldn't you be able to temporarily set the
 thread-local allocator to use the stack buffer, and restore it once done?

That's doable, but you don't get to place the string at a _specific_ 
buffer. -- Andrei

Sep 30 2014

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 9/30/14, 1:34 AM, Johannes Pfau wrote:
 So you propose RC + global/thread local allocators as the solution for
 all memory related problems as 'memory management is not allocation'.
 And you claim that using output ranges / providing buffers / allocators
 is not an option because it only works in some special cases?

Correct. I assume you meant an irony/sarcasm somewhere :o).

 What if I don't want automated memory _management_? What if I want a
 function to use a stack buffer? Or if I want to free manually?

 If I want std.string.toStringz to put the result into a temporary stack
 buffer your solution doesn't help at all. Passing an ouput range,
 allocator or buffer would all solve this.

Correct. The output of toStringz would be either a GC string or an RC 
string.


Andrei

Sep 30 2014

Johannes Pfau <nospam example.com> writes:

Am Tue, 30 Sep 2014 05:23:29 -0700
schrieb Andrei Alexandrescu <SeeWebsiteForEmail erdani.org>:

 On 9/30/14, 1:34 AM, Johannes Pfau wrote:
 So you propose RC + global/thread local allocators as the solution
 for all memory related problems as 'memory management is not
 allocation'. And you claim that using output ranges / providing
 buffers / allocators is not an option because it only works in some
 special cases?

 
 Correct. I assume you meant an irony/sarcasm somewhere :o).

The sarcasm is supposed to be here: '_all_ memory related problems' ;-)

I guess my point is that although RC is useful in some cases output
ranges / sink delegates / pre-allocated buffers are still necessary in
other cases and RC is not the solution for _everything_.

As Manu often pointed out sometimes you do not want any dynamic
allocation (toStringz in games is a good example) and here RC doesn't
help.

Another example is format which can already write to output ranges and
uses sink delegates internally. That's a much better abstraction than
simply returning a reference counted string (allocated with a thread
local allocator). Using sink delegates internally is also more
efficient than creating temporary RCStrings. And sometimes there's no
allocation at all this way (directly writing to a socket/file). 

 
 What if I don't want automated memory _management_? What if I want a
 function to use a stack buffer? Or if I want to free manually?

 If I want std.string.toStringz to put the result into a temporary
 stack buffer your solution doesn't help at all. Passing an ouput
 range, allocator or buffer would all solve this.

 
 Correct. The output of toStringz would be either a GC string or an RC 
 string.

But why not provide 3 overloads then?

toStringz(OutputRange)
string toStringz(Policy) //char*, actually
RCString toStringz(Policy)

The notion I got from some of your posts is that you're opposed to such
overloads, or did I misinterpret that?

Sep 30 2014

"Sean Kelly" <sean invisibleduck.org> writes:

On Tuesday, 30 September 2014 at 16:49:48 UTC, Johannes Pfau
wrote:
 I guess my point is that although RC is useful in some cases 
 output
 ranges / sink delegates / pre-allocated buffers are still 
 necessary in
 other cases and RC is not the solution for _everything_.

Yes, I'm hoping this is an adjunct to changes in Phobos to reduce
the frequency of implicit allocation in general.  The less
garbage that's generated, the less GC vs. RC actually matters.

Sep 30 2014

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 9/30/14, 9:49 AM, Johannes Pfau wrote:
 I guess my point is that although RC is useful in some cases output
 ranges / sink delegates / pre-allocated buffers are still necessary in
 other cases and RC is not the solution for _everything_.

Agreed.

 As Manu often pointed out sometimes you do not want any dynamic
 allocation (toStringz in games is a good example) and here RC doesn't
 help.

 Another example is format which can already write to output ranges and
 uses sink delegates internally. That's a much better abstraction than
 simply returning a reference counted string (allocated with a thread
 local allocator). Using sink delegates internally is also more
 efficient than creating temporary RCStrings. And sometimes there's no
 allocation at all this way (directly writing to a socket/file).

Agreed.

 What if I don't want automated memory _management_? What if I want a
 function to use a stack buffer? Or if I want to free manually?

 If I want std.string.toStringz to put the result into a temporary
 stack buffer your solution doesn't help at all. Passing an ouput
 range, allocator or buffer would all solve this.

 Correct. The output of toStringz would be either a GC string or an RC
 string.

 But why not provide 3 overloads then?

 toStringz(OutputRange)
 string toStringz(Policy) //char*, actually
 RCString toStringz(Policy)

 The notion I got from some of your posts is that you're opposed to such
 overloads, or did I misinterpret that?

I'm not opposed. Here's what I think.

As an approach to using Phobos without a GC, it's been suggested that we 
supplement garbage-creating functions with new functions that use output 
ranges everywhere, or lazy ranges everywhere.

I think a better approach is to make memory management a policy that 
makes convenient use of reference counting possible. So instead of 
garbage there'd be reference counted stuff.

Of course, to the extent using lazy computation and/or output ranges is 
a good thing to have for various reasons, they remain valid techniques 
that are and will continue being used in Phobos.

My point is that acknowledging and systematically using reference 
counted types is an essential part of the entire approach.


Andrei

Oct 01 2014

Johannes Pfau <nospam example.com> writes:

Am Wed, 01 Oct 2014 02:21:44 -0700
schrieb Andrei Alexandrescu <SeeWebsiteForEmail erdani.org>:

 On 9/30/14, 9:49 AM, Johannes Pfau wrote:
 I guess my point is that although RC is useful in some cases output
 ranges / sink delegates / pre-allocated buffers are still necessary
 in other cases and RC is not the solution for _everything_.

 
 Agreed.
 
 As Manu often pointed out sometimes you do not want any dynamic
 allocation (toStringz in games is a good example) and here RC
 doesn't help.

 Another example is format which can already write to output ranges
 and uses sink delegates internally. That's a much better
 abstraction than simply returning a reference counted string
 (allocated with a thread local allocator). Using sink delegates
 internally is also more efficient than creating temporary
 RCStrings. And sometimes there's no allocation at all this way
 (directly writing to a socket/file).

 
 Agreed.
 
 What if I don't want automated memory _management_? What if I
 want a function to use a stack buffer? Or if I want to free
 manually?

 If I want std.string.toStringz to put the result into a temporary
 stack buffer your solution doesn't help at all. Passing an ouput
 range, allocator or buffer would all solve this.

 Correct. The output of toStringz would be either a GC string or an
 RC string.

 But why not provide 3 overloads then?

 toStringz(OutputRange)
 string toStringz(Policy) //char*, actually
 RCString toStringz(Policy)

 The notion I got from some of your posts is that you're opposed to
 such overloads, or did I misinterpret that?

 
 I'm not opposed. Here's what I think.
 
 As an approach to using Phobos without a GC, it's been suggested that
 we supplement garbage-creating functions with new functions that use
 output ranges everywhere, or lazy ranges everywhere.
 
 I think a better approach is to make memory management a policy that 
 makes convenient use of reference counting possible. So instead of 
 garbage there'd be reference counted stuff.
 
 Of course, to the extent using lazy computation and/or output ranges
 is a good thing to have for various reasons, they remain valid
 techniques that are and will continue being used in Phobos.
 
 My point is that acknowledging and systematically using reference 
 counted types is an essential part of the entire approach.
 
 
 Andrei
 
 

OK then I got you wrong and I agree with everything you wrote above.
Thanks for clarifying.

Oct 06 2014

"Chris Williams" <yoreanon-chrisw yahoo.co.jp> writes:

On Monday, 29 September 2014 at 10:49:53 UTC, Andrei Alexandrescu 
wrote:
 On the caller side:

 auto p1 = setExtension("hello", ".txt"); // fine, use gc
 auto p2 = setExtension!gc("hello", ".txt"); // same
 auto p3 = setExtension!rc("hello", ".txt"); // fine, use rc

 So by default it's going to continue being business as usual, 
 but certain functions will allow passing in a (defaulted) 
 policy for memory management.

Forcing someone (or rather, a team of someones) to call into the 
library in a consistent fashion like this seems like a rather 
risky venture. I suppose that you could add some special compiler 
checks to make sure that people are being consistent, but I'd 
probably rather see some way of templating modules so that the 
chances for human error are reduced.

--- foo.d ---
module std.foo(GC = gc);

void bar() {
    static if (gc) {
       ...
    }
}

--- usercode.d ---
import std.foo!rc;

void fooCaller() {
     bar();
}

Though truthfully, I'd rather it be a compiler flag. But I 
presume that there's an issue with that, which it is too early 
for my brain to think of.

Sep 29 2014

Shammah Chancellor <email domain.com> writes:

On 2014-09-29 10:49:52 +0000, Andrei Alexandrescu said:

 Back when I've first introduced RCString I hinted that we have a larger 
 strategy in mind. Here it is.
 
 The basic tenet of the approach is to reckon and act on the fact that 
 memory allocation (the subject of allocators) is an entirely distinct 
 topic from memory management, and more generally resource management. 
 This clarifies that it would be wrong to approach alternatives to GC in 
 Phobos by means of allocators. GC is not only an approach to memory 
 allocation, but also an approach to memory management. Reducing it to 
 either one is a mistake. In hindsight this looks rather obvious but it 
 has caused me and many people better than myself a lot of headache.
 
 That said allocators are nice to have and use, and I will definitely 
 follow up with std.allocator. However, std.allocator is not the key to 
 a  nogc Phobos.
 
 Nor are ranges. There is an attitude that either output ranges, or 
 input ranges in conjunction with lazy computation, would solve the 
 issue of creating garbage. 
 https://github.com/D-Programming-Language/phobos/pull/2423 is a good 
 illustration of the latter approach: a range would be lazily created by 
 chaining stuff together. A range-based approach would take us further 
 than the allocators, but I see the following issues with it:
 
 (a) the whole approach doesn't stand scrutiny for non-linear outputs, 
 e.g. outputting some sort of associative array or really any composite 
 type quickly becomes tenuous either with an output range (eager) or 
 with exposing an input range (lazy);
 
 (b) makes the style of programming without GC radically different, and 
 much more cumbersome, than programming with GC; as a consequence, 
 programmers who consider changing one approach to another, or 
 implementing an algorithm neutral to it, are looking at a major rewrite;
 
 (c) would make D/ nogc a poor cousin of C++. This is quite out of 
 character; technically, I have long gotten used to seeing most 
 elaborate C++ code like poor emulation of simple D idioms. But C++ has 
 spent years and decades taking to perfection an approach without a 
 tracing garbage collector. A departure from that would need to be 
 superior, and that doesn't seem to be the case with range-based 
 approaches.
 
 ===========
 
 Now that we clarified that these existing attempts are not going to 
 work well, the question remains what does. For Phobos I'm thinking of 
 defining and using three policies:
 
 enum MemoryManagementPolicy { gc, rc, mrc }
 immutable
      gc = ResourceManagementPolicy.gc,
      rc = ResourceManagementPolicy.rc,
      mrc = ResourceManagementPolicy.mrc;
 
 The three policies are:
 
 (a) gc is the classic garbage-collected style of management;
 
 (b) rc is a reference-counted style still backed by the GC, i.e. the GC 
 will still be able to pick up cycles and other kinds of leaks.
 
 (c) mrc is a reference-counted style backed by malloc.
 
 (It should be possible to collapse rc and mrc together and make the 
 distinction dynamically, at runtime. I'm distinguishing them statically 
 here for expository purposes.)
 
 The policy is a template parameter to functions in Phobos (and 
 elsewhere), and informs the functions e.g. what types to return. 
 Consider:
 
 auto setExtension(MemoryManagementPolicy mmp = gc, R1, R2)(R1 path, R2 ext)
 if (...)
 {
      static if (mmp == gc) alias S = string;
      else alias S = RCString;
      S result;
      ...
      return result;
 }
 
 On the caller side:
 
 auto p1 = setExtension("hello", ".txt"); // fine, use gc
 auto p2 = setExtension!gc("hello", ".txt"); // same
 auto p3 = setExtension!rc("hello", ".txt"); // fine, use rc
 
 So by default it's going to continue being business as usual, but 
 certain functions will allow passing in a (defaulted) policy for memory 
 management.
 
 Destroy!
 
 
 Andrei

I don't like the idea of having to pass in template parameters 
everywhere -- even for allocators.  Is there some way we could have 
"allocator contexts"?

E.G.

with( auto allocator = ReferencedCounted() )
{
	auto foo = setExtension("hello", "txt");
}

ReferenceCounted() could replace a thread-local "new" delegate with 
something it has, and when it goes out of scope, it would reset it to 
whatever it was before.   This would create some runtime overhead -- 
but I'm not sure how much more than already exists.

-Shammah

Sep 29 2014

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 9/29/14, 11:44 AM, Shammah Chancellor wrote:
 I don't like the idea of having to pass in template parameters
 everywhere -- even for allocators.

I agree.

 Is there some way we could have
 "allocator contexts"?

 E.G.

 with( auto allocator = ReferencedCounted() )

Don't confuse memory allocation with memory management. There's no such 
a thing as a "reference counted allocator".


Andrei

Sep 29 2014

Shammah Chancellor <email domain.com> writes:

On 2014-09-29 22:15:33 +0000, Andrei Alexandrescu said:

 On 9/29/14, 11:44 AM, Shammah Chancellor wrote:
 I don't like the idea of having to pass in template parameters
 everywhere -- even for allocators.

 
 I agree.
 
 Is there some way we could have
 "allocator contexts"?
 
 E.G.
 
 with( auto allocator = ReferencedCounted() )

 
 Don't confuse memory allocation with memory management. There's no such 
 a thing as a "reference counted allocator".
 
 Andrei

Sure, but combining the two could be very useful -- as we have noticed 
with a allocators that work off of a garbage collector.  With regards 
to reference counting, you could implement one that automatically wraps 
the type in an RC struct and proxies them.   Being able to redefined 
aliases during different sections of compilation would be required 
though.

Sep 29 2014

"Daniel N" <ufo orbiting.us> writes:

On Monday, 29 September 2014 at 22:15:32 UTC, Andrei Alexandrescu 
wrote:
 On 9/29/14, 11:44 AM, Shammah Chancellor wrote:
 I don't like the idea of having to pass in template parameters
 everywhere -- even for allocators.

 I agree.

There was a solution earlier in this thread which avoids that 
problem. When a function is annotated with  nogc there's 
sufficient info to chose the correct implementation without any 
parameters, it's already known whether we are instantiated from a 
 nogc block or not.

Sep 29 2014

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 9/29/14, 11:44 AM, Shammah Chancellor wrote:
 I don't like the idea of having to pass in template parameters
 everywhere -- even for allocators.  Is there some way we could have
 "allocator contexts"?

 E.G.

 with( auto allocator = ReferencedCounted() )
 {
      auto foo = setExtension("hello", "txt");
 }

 ReferenceCounted() could replace a thread-local "new" delegate with
 something it has, and when it goes out of scope, it would reset it to
 whatever it was before.   This would create some runtime overhead -- but
 I'm not sure how much more than already exists.

I'm not sure whether we can do this within D's type system. -- Andrei

Oct 01 2014

"Uranuz" <neuranuz gmail.com> writes:

 auto p1 = setExtension("hello", ".txt"); // fine, use gc
 auto p2 = setExtension!gc("hello", ".txt"); // same
 auto p3 = setExtension!rc("hello", ".txt"); // fine, use rc

 So by default it's going to continue being business as usual, 
 but certain functions will allow passing in a (defaulted) 
 policy for memory management.

 Destroy!

I'll try to destroy ;) Before thinking out some answers to this
problem let me ask a little more questions.

1. As far as I understand allocation and memory management of
entities like class (Object), dynamic arrays and associative
arrays is part of language/ runtime. What is proposed here is
*fix* to standart library. But that allocation and MM happening
via GC is not *fault* of standart library but is predefined
behaviour of D lang itself and it's runtime. The standard library
becomes a `hostage` of runtime library in this situation. Do you
really sure that we should "fix" standart library in that way?
For me it looks like implementing struts for standard lib (which
is not broken yet ;) ) in order to compensate behaviour of
runtime lib.

2. Second question is slightly oftopic, but I still want put it
there. What I dislike about ranges and standart library is that
it's hard to understand what is the returned value of library
function. I have some *pedals* (front, popFront) to push and do
some magic. Of course it was made for purpose of making universal
algorithms. But the mor I use ranges, *auto* then less I believe
that I use static-typed language. What is wanted to make code
clear is having distinct variable declaration with specification
of it's type. With all of these auto's logic of programme becomes
unclear, because data structures are unclear. So I came to the
question: is the memory management or allocation policy
syntacticaly part of declaration or is it a inner implementation
detail that should not be shown in decl?

Should rc and gc string look simillar or not?

string str1 = makeGCString("test");
string str2 = makeRCString("test");

// --- vs ---

GCString str1 = "test";
RCString str2 = "test";

// --- or ---

String!GC str1 = "test";
String!RC str2 = "test";

// --- or even ---
 gc string str1 = "test";
 rc string str2 = "test";

As far as I understand currently we will have:
string str1 = "test";
RCString str2 = "test";

So another question is why the same object "string" is
implemented as different types. Array and struct (class)?

3. Should algorithms based on range interface care about
allocation? Range is about iteration and access to elements but
not about allocation and memory mangement.

I would like to have attributes  rc,  gc (or like these) to
switch MM-policy versus *String!RC* or *RCString* but we cannot
apply attributes to literal. Passing to allgorithm something like
this:

find(  rc "test",  rc "t" )

is syntactically incorrect. But we can use this form:

find( RCString("test"), RCString("t") )

But above form is more verbose. As continuation of this question
I have next question.

4. How to deal with literals? How to make them ref-counted?

I ask this because even when writing RCString("test")
syntactically expression "test" is still GC-managed literal. I
pass GC-managed literal into struct to make it RC-managed. Why
just not make it RC from the start?

Adding some additional template parameter to algrorithm wil not
fix this. It is a problem of D itself and it's runtime library.


So I assume that std lib is not broken this way and we should not
try to fix it this way. Thanks for attention.

Sep 29 2014

"Mike" <none none.com> writes:

On Monday, 29 September 2014 at 20:07:41 UTC, Uranuz wrote:

 1. As far as I understand allocation and memory management of
 entities like class (Object), dynamic arrays and associative
 arrays is part of language/ runtime. What is proposed here is
 *fix* to standart library. But that allocation and MM happening
 via GC is not *fault* of standart library but is predefined
 behaviour of D lang itself and it's runtime. The standard 
 library
 becomes a `hostage` of runtime library in this situation. Do you
 really sure that we should "fix" standart library in that way?
 For me it looks like implementing struts for standard lib (which
 is not broken yet ;) ) in order to compensate behaviour of
 runtime lib.

This really hits the nail on the head, and I think your other 
comments and questions are also quite insightful.

IMO the proposal that started this thread,  nogc, and -vgc are 
all beating around the bush rather than addressing the 
fundamental problem.

Mike

Sep 29 2014

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 9/29/14, 1:07 PM, Uranuz wrote:
 1. As far as I understand allocation and memory management of
 entities like class (Object), dynamic arrays and associative
 arrays is part of language/ runtime. What is proposed here is
 *fix* to standart library. But that allocation and MM happening
 via GC is not *fault* of standart library but is predefined
 behaviour of D lang itself and it's runtime. The standard library
 becomes a `hostage` of runtime library in this situation. Do you
 really sure that we should "fix" standart library in that way?
 For me it looks like implementing struts for standard lib (which
 is not broken yet ;) ) in order to compensate behaviour of
 runtime lib.

The change will be to both the runtime and the standard library.

 2. Second question is slightly oftopic, but I still want put it
 there. What I dislike about ranges and standart library is that
 it's hard to understand what is the returned value of library
 function. I have some *pedals* (front, popFront) to push and do
 some magic. Of course it was made for purpose of making universal
 algorithms. But the mor I use ranges, *auto* then less I believe
 that I use static-typed language. What is wanted to make code
 clear is having distinct variable declaration with specification
 of it's type. With all of these auto's logic of programme becomes
 unclear, because data structures are unclear. So I came to the
 question: is the memory management or allocation policy
 syntacticaly part of declaration or is it a inner implementation
 detail that should not be shown in decl?

Sadly this is the way things are going (not only in D, but other 
languages such as C++, Haskell, Scala, etc). Type proliferation has 
costs, but also a ton of benefits.

Most often the memory management policy will be part of function 
signatures because it affects data type definitions.

 Should rc and gc string look simillar or not?

 string str1 = makeGCString("test");
 string str2 = makeRCString("test");

 // --- vs ---

 GCString str1 = "test";
 RCString str2 = "test";

 // --- or ---

 String!GC str1 = "test";
 String!RC str2 = "test";

 // --- or even ---
  gc string str1 = "test";
  rc string str2 = "test";

 As far as I understand currently we will have:
 string str1 = "test";
 RCString str2 = "test";

Per Sean's idea things would go GC.string vs. RC.string, where GC and RC 
are two memory management policies (simple structs defining aliases and 
probably a few primitives).

 So another question is why the same object "string" is
 implemented as different types. Array and struct (class)?

A reference counted string has a different layout than immutable(char)[].

 3. Should algorithms based on range interface care about
 allocation? Range is about iteration and access to elements but
 not about allocation and memory mangement.

Most don't.

 I would like to have attributes  rc,  gc (or like these) to
 switch MM-policy versus *String!RC* or *RCString* but we cannot
 apply attributes to literal. Passing to allgorithm something like
 this:

 find(  rc "test",  rc "t" )

 is syntactically incorrect. But we can use this form:

 find( RCString("test"), RCString("t") )

 But above form is more verbose. As continuation of this question
 I have next question.

If language changes are necessary, we will make language changes. I'm 
trying first to explore solutions within the language.

 4. How to deal with literals? How to make them ref-counted?

I don't know yet.

 I ask this because even when writing RCString("test")
 syntactically expression "test" is still GC-managed literal. I
 pass GC-managed literal into struct to make it RC-managed. Why
 just not make it RC from the start?

 Adding some additional template parameter to algrorithm wil not
 fix this. It is a problem of D itself and it's runtime library.

I understand. The problem is actually worse with array literals, which 
are silently dynamically allocated on the garbage-collected heap:

auto s = "hello"; // at least there's no allocation
auto a = [1, 2, 3]; // dynamic allocation

A language-based solution would change array literal syntax. A 
library-based solution would leave array literals with today's syntax 
and semantics and offer a controlled alternative a la:

auto a = MyMemPolicy.array(1, 2, 3); // cool

 So I assume that std lib is not broken this way and we should not
 try to fix it this way. Thanks for attention.

And thanks for your great points.


Andrei

Oct 01 2014

"Freddy" <Hexagonalstar64 gmail.com> writes:

On Monday, 29 September 2014 at 10:49:53 UTC, Andrei Alexandrescu
wrote:
 Back when I've first introduced RCString I hinted that we have 
 a larger strategy in mind. Here it is.

 The basic tenet of the approach is to reckon and act on the 
 fact that memory allocation (the subject of allocators) is an 
 entirely distinct topic from memory management, and more 
 generally resource management. This clarifies that it would be 
 wrong to approach alternatives to GC in Phobos by means of 
 allocators. GC is not only an approach to memory allocation, 
 but also an approach to memory management. Reducing it to 
 either one is a mistake. In hindsight this looks rather obvious 
 but it has caused me and many people better than myself a lot 
 of headache.

 That said allocators are nice to have and use, and I will 
 definitely follow up with std.allocator. However, std.allocator 
 is not the key to a  nogc Phobos.

 Nor are ranges. There is an attitude that either output ranges, 
 or input ranges in conjunction with lazy computation, would 
 solve the issue of creating garbage. 
 https://github.com/D-Programming-Language/phobos/pull/2423 is a 
 good illustration of the latter approach: a range would be 
 lazily created by chaining stuff together. A range-based 
 approach would take us further than the allocators, but I see 
 the following issues with it:

 (a) the whole approach doesn't stand scrutiny for non-linear 
 outputs, e.g. outputting some sort of associative array or 
 really any composite type quickly becomes tenuous either with 
 an output range (eager) or with exposing an input range (lazy);

 (b) makes the style of programming without GC radically 
 different, and much more cumbersome, than programming with GC; 
 as a consequence, programmers who consider changing one 
 approach to another, or implementing an algorithm neutral to 
 it, are looking at a major rewrite;

 (c) would make D/ nogc a poor cousin of C++. This is quite out 
 of character; technically, I have long gotten used to seeing 
 most elaborate C++ code like poor emulation of simple D idioms. 
 But C++ has spent years and decades taking to perfection an 
 approach without a tracing garbage collector. A departure from 
 that would need to be superior, and that doesn't seem to be the 
 case with range-based approaches.

 ===========

 Now that we clarified that these existing attempts are not 
 going to work well, the question remains what does. For Phobos 
 I'm thinking of defining and using three policies:

 enum MemoryManagementPolicy { gc, rc, mrc }
 immutable
     gc = ResourceManagementPolicy.gc,
     rc = ResourceManagementPolicy.rc,
     mrc = ResourceManagementPolicy.mrc;

 The three policies are:

 (a) gc is the classic garbage-collected style of management;

 (b) rc is a reference-counted style still backed by the GC, 
 i.e. the GC will still be able to pick up cycles and other 
 kinds of leaks.

 (c) mrc is a reference-counted style backed by malloc.

 (It should be possible to collapse rc and mrc together and make 
 the distinction dynamically, at runtime. I'm distinguishing 
 them statically here for expository purposes.)

 The policy is a template parameter to functions in Phobos (and 
 elsewhere), and informs the functions e.g. what types to 
 return. Consider:

 auto setExtension(MemoryManagementPolicy mmp = gc, R1, R2)(R1 
 path, R2 ext)
 if (...)
 {
     static if (mmp == gc) alias S = string;
     else alias S = RCString;
     S result;
     ...
     return result;
 }

 On the caller side:

 auto p1 = setExtension("hello", ".txt"); // fine, use gc
 auto p2 = setExtension!gc("hello", ".txt"); // same
 auto p3 = setExtension!rc("hello", ".txt"); // fine, use rc

 So by default it's going to continue being business as usual, 
 but certain functions will allow passing in a (defaulted) 
 policy for memory management.

 Destroy!


 Andrei

Internally we should have something like:

---
template String(MemoryManagementPolicy mmp=gc){
      /++ ... +/
}
auto setExtension(MemoryManagementPolicy mmp = gc, R1, R2)(R1
path, R2 ext)
if (...)
{
      auto result=String!mmp();
      /++ +/
}
----

or maybe even allowing user types in the template argument(the
original purpose of templates)

---
auto setExtension(String = string, R1, R2)(R1
path, R2){
      /++ +/
}
----

Sep 29 2014

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 9/29/14, 3:11 PM, Freddy wrote:

 Internally we should have something like:

 ---
 template String(MemoryManagementPolicy mmp=gc){
       /++ ... +/
 }
 auto setExtension(MemoryManagementPolicy mmp = gc, R1, R2)(R1
 path, R2 ext)
 if (...)
 {
       auto result=String!mmp();
       /++ +/
 }
 ----

 or maybe even allowing user types in the template argument(the
 original purpose of templates)

 ---
 auto setExtension(String = string, R1, R2)(R1
 path, R2){
       /++ +/
 }

That's correct. -- Andrei

Sep 29 2014

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 9/29/14, 3:11 PM, Freddy wrote:

 Internally we should have something like:

 ---
 template String(MemoryManagementPolicy mmp=gc){
       /++ ... +/
 }
 auto setExtension(MemoryManagementPolicy mmp = gc, R1, R2)(R1
 path, R2 ext)
 if (...)
 {
       auto result=String!mmp();
       /++ +/
 }
 ----

 or maybe even allowing user types in the template argument(the
 original purpose of templates)

 ---
 auto setExtension(String = string, R1, R2)(R1
 path, R2){
       /++ +/
 }
 ----

Good idea, and it seems Sean's is even better because it groups 
everything related to memory management where it belongs - in the memory 
management policy. -- Andrei

Oct 01 2014

"Foo" <Foo test.de> writes:

I hate the fact that this will produce template bloat for each 
function/method.
I'm also in favor of "let the user pick", but I would use a 
global variable:

----
enum MemoryManagementPolicy { gc, rc, mrc }
immutable
     gc = ResourceManagementPolicy.gc,
     rc = ResourceManagementPolicy.rc,
     mrc = ResourceManagementPolicy.mrc;

auto RMP = gc;
----

and in my code:

----
RMP = rc;
string str = "foo"; // compiler knows -> ref counted
// ...
RMP = gc;
string str2 = "bar"; // normal behaviour restored
----

Sep 30 2014

"Foo" <Foo test.de> writes:

On Tuesday, 30 September 2014 at 13:38:43 UTC, Foo wrote:
 I hate the fact that this will produce template bloat for each 
 function/method.
 I'm also in favor of "let the user pick", but I would use a 
 global variable:

 ----
 enum MemoryManagementPolicy { gc, rc, mrc }
 immutable
     gc = ResourceManagementPolicy.gc,
     rc = ResourceManagementPolicy.rc,
     mrc = ResourceManagementPolicy.mrc;

 auto RMP = gc;
 ----

 and in my code:

 ----
 RMP = rc;
 string str = "foo"; // compiler knows -> ref counted
 // ...
 RMP = gc;
 string str2 = "bar"; // normal behaviour restored
 ----

Of course each method/function in Phobos should use the global
RMP.

Sep 30 2014

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 9/30/14, 6:38 AM, Foo wrote:
 I hate the fact that this will produce template bloat for each
 function/method.
 I'm also in favor of "let the user pick", but I would use a global
 variable:

 ----
 enum MemoryManagementPolicy { gc, rc, mrc }
 immutable
      gc = ResourceManagementPolicy.gc,
      rc = ResourceManagementPolicy.rc,
      mrc = ResourceManagementPolicy.mrc;

 auto RMP = gc;
 ----

 and in my code:

 ----
 RMP = rc;
 string str = "foo"; // compiler knows -> ref counted
 // ...
 RMP = gc;
 string str2 = "bar"; // normal behaviour restored
 ----

This won't work because the type of "string" is different for RC vs. GC. 
-- Andrei

Sep 30 2014

"Foo" <Foo test.de> writes:

On Tuesday, 30 September 2014 at 13:59:23 UTC, Andrei
Alexandrescu wrote:
 On 9/30/14, 6:38 AM, Foo wrote:
 I hate the fact that this will produce template bloat for each
 function/method.
 I'm also in favor of "let the user pick", but I would use a 
 global
 variable:

 ----
 enum MemoryManagementPolicy { gc, rc, mrc }
 immutable
     gc = ResourceManagementPolicy.gc,
     rc = ResourceManagementPolicy.rc,
     mrc = ResourceManagementPolicy.mrc;

 auto RMP = gc;
 ----

 and in my code:

 ----
 RMP = rc;
 string str = "foo"; // compiler knows -> ref counted
 // ...
 RMP = gc;
 string str2 = "bar"; // normal behaviour restored
 ----

 This won't work because the type of "string" is different for 
 RC vs. GC. -- Andrei

But it would work for phobos functions without template bloat.

Sep 30 2014

"Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> writes:

On Tuesday, 30 September 2014 at 14:05:43 UTC, Foo wrote:
 On Tuesday, 30 September 2014 at 13:59:23 UTC, Andrei
 Alexandrescu wrote:
 On 9/30/14, 6:38 AM, Foo wrote:
 This won't work because the type of "string" is different for 
 RC vs. GC. -- Andrei

 But it would work for phobos functions without template bloat.

Only for internal allocations. If the functions want to return 
something, the type must known.

Sep 30 2014

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 9/30/14, 7:13 AM, "Marc Schütz" <schuetzm gmx.net>" wrote:
 On Tuesday, 30 September 2014 at 14:05:43 UTC, Foo wrote:
 On Tuesday, 30 September 2014 at 13:59:23 UTC, Andrei
 Alexandrescu wrote:
 On 9/30/14, 6:38 AM, Foo wrote:
 This won't work because the type of "string" is different for RC vs.
 GC. -- Andrei

 But it would work for phobos functions without template bloat.

 Only for internal allocations. If the functions want to return
 something, the type must known.

Ah, now I understand the point. Thanks. -- Andrei

Sep 30 2014

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 9/30/14, 7:05 AM, Foo wrote:
 On Tuesday, 30 September 2014 at 13:59:23 UTC, Andrei
 Alexandrescu wrote:
 On 9/30/14, 6:38 AM, Foo wrote:
 I hate the fact that this will produce template bloat for each
 function/method.
 I'm also in favor of "let the user pick", but I would use a global
 variable:

 ----
 enum MemoryManagementPolicy { gc, rc, mrc }
 immutable
     gc = ResourceManagementPolicy.gc,
     rc = ResourceManagementPolicy.rc,
     mrc = ResourceManagementPolicy.mrc;

 auto RMP = gc;
 ----

 and in my code:

 ----
 RMP = rc;
 string str = "foo"; // compiler knows -> ref counted
 // ...
 RMP = gc;
 string str2 = "bar"; // normal behaviour restored
 ----

 This won't work because the type of "string" is different for RC vs.
 GC. -- Andrei

 But it would work for phobos functions without template bloat.

How is the fact there's less bloat relevant for code that doesn't work? 
I.e. it doesn't compile. It needs to return string for GC and RCString 
for RC.

Andrei

Sep 30 2014

"John Colvin" <john.loughran.colvin gmail.com> writes:

On Monday, 29 September 2014 at 10:49:53 UTC, Andrei Alexandrescu 
wrote:
 Back when I've first introduced RCString I hinted that we have 
 a larger strategy in mind. Here it is.

 The basic tenet of the approach is to reckon and act on the 
 fact that memory allocation (the subject of allocators) is an 
 entirely distinct topic from memory management, and more 
 generally resource management. This clarifies that it would be 
 wrong to approach alternatives to GC in Phobos by means of 
 allocators. GC is not only an approach to memory allocation, 
 but also an approach to memory management. Reducing it to 
 either one is a mistake. In hindsight this looks rather obvious 
 but it has caused me and many people better than myself a lot 
 of headache.

 That said allocators are nice to have and use, and I will 
 definitely follow up with std.allocator. However, std.allocator 
 is not the key to a  nogc Phobos.

 Nor are ranges. There is an attitude that either output ranges, 
 or input ranges in conjunction with lazy computation, would 
 solve the issue of creating garbage. 
 https://github.com/D-Programming-Language/phobos/pull/2423 is a 
 good illustration of the latter approach: a range would be 
 lazily created by chaining stuff together. A range-based 
 approach would take us further than the allocators, but I see 
 the following issues with it:

 (a) the whole approach doesn't stand scrutiny for non-linear 
 outputs, e.g. outputting some sort of associative array or 
 really any composite type quickly becomes tenuous either with 
 an output range (eager) or with exposing an input range (lazy);

 (b) makes the style of programming without GC radically 
 different, and much more cumbersome, than programming with GC; 
 as a consequence, programmers who consider changing one 
 approach to another, or implementing an algorithm neutral to 
 it, are looking at a major rewrite;

 (c) would make D/ nogc a poor cousin of C++. This is quite out 
 of character; technically, I have long gotten used to seeing 
 most elaborate C++ code like poor emulation of simple D idioms. 
 But C++ has spent years and decades taking to perfection an 
 approach without a tracing garbage collector. A departure from 
 that would need to be superior, and that doesn't seem to be the 
 case with range-based approaches.

 ===========

 Now that we clarified that these existing attempts are not 
 going to work well, the question remains what does. For Phobos 
 I'm thinking of defining and using three policies:

 enum MemoryManagementPolicy { gc, rc, mrc }
 immutable
     gc = ResourceManagementPolicy.gc,
     rc = ResourceManagementPolicy.rc,
     mrc = ResourceManagementPolicy.mrc;

 The three policies are:

 (a) gc is the classic garbage-collected style of management;

 (b) rc is a reference-counted style still backed by the GC, 
 i.e. the GC will still be able to pick up cycles and other 
 kinds of leaks.

 (c) mrc is a reference-counted style backed by malloc.

 (It should be possible to collapse rc and mrc together and make 
 the distinction dynamically, at runtime. I'm distinguishing 
 them statically here for expository purposes.)

 The policy is a template parameter to functions in Phobos (and 
 elsewhere), and informs the functions e.g. what types to 
 return. Consider:

 auto setExtension(MemoryManagementPolicy mmp = gc, R1, R2)(R1 
 path, R2 ext)
 if (...)
 {
     static if (mmp == gc) alias S = string;
     else alias S = RCString;
     S result;
     ...
     return result;
 }

 On the caller side:

 auto p1 = setExtension("hello", ".txt"); // fine, use gc
 auto p2 = setExtension!gc("hello", ".txt"); // same
 auto p3 = setExtension!rc("hello", ".txt"); // fine, use rc

 So by default it's going to continue being business as usual, 
 but certain functions will allow passing in a (defaulted) 
 policy for memory management.

 Destroy!


 Andrei

Instead of adding a new template parameter to every function 
(which won't necessarily play nicely with existing IFTI and 
variadic templates), why not allow template modules?

import stringRC = std.string!rc;
import stringGC = std.string!gc;


// in std/string.d
module std.string(MemoryManagementPolicy mmp)

pure  trusted S capitalize(S)(S s)
     if (isSomeString!S)
{
     //...

     static if(mmp == MemoryManagementPolicy.gc)
     {
         //...
     }
     else static if .......
}

Sep 30 2014

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 9/30/14, 7:07 AM, John Colvin wrote:
 Instead of adding a new template parameter to every function (which
 won't necessarily play nicely with existing IFTI and variadic
 templates), why not allow template modules?

Nice idea, but let's try and explore possibilities within the existing 
rich language. If a need for new language features arises, I trust we'll 
see it. -- Andrei

Oct 01 2014

"Sean Kelly" <sean invisibleduck.org> writes:

On Monday, 29 September 2014 at 10:49:53 UTC, Andrei Alexandrescu 
wrote:
 The policy is a template parameter to functions in Phobos (and 
 elsewhere), and informs the functions e.g. what types to 
 return. Consider:

 auto setExtension(MemoryManagementPolicy mmp = gc, R1, R2)(R1 
 path, R2 ext)
 if (...)
 {
     static if (mmp == gc) alias S = string;
     else alias S = RCString;
     S result;
     ...
     return result;
 }

Is this for exposition purposes or actually how you expect it to 
work?  Quite honestly, I can't imagine how I could write a 
template function in D that needs to work with this approach.

As much as I hate to say it, this is pretty much exactly what C++ 
allocators were designed for.  They handle allocation, sure, but 
they also hold aliases for all relevant types for the data being 
allocated.  If the MemoryManagementPolicy enum were replaced with 
an alias to a type that I could use to at least obtain relevant 
aliases, that would be something.  But even that approach 
dramatically complicates code that uses it.

Having written standards-compliant containers in C++, I honestly 
can't imagine the average user writing code that works this way.  
Once you assert that the reference type may be a pointer or it 
may be some complex proxy to data stored elsewhere, a lot of 
composability pretty much flies right out the window.

For example, I have an implementation of C++ 
unordered_map/set/etc designed to be a customizable cache, so one 
of its template arguments is a policy type that allows eviction 
behavior to be chosen at declaration time.  Maybe the cache is 
size-limited, maybe it's age-limited, maybe it's a combination of 
the two or something even more complicated.  The problem is that 
the container defines all the aliases relating to the underlying 
data, but the policy, which needs to be aware of these, is passed 
as a template argument to this container.

To make something that's fully aware of C++ allocators then, I'd 
have to define a small type that takes the container template 
arguments (the contained type and the allocator type) and 
generates the aliases and pass this to the policy, which in turn 
passes the type through to the underlying container so it can 
declare its public aliases and whatever else is true 
standards-compliant fashion (or let the container derive this 
itself, but then you run into the potential for disagreement).  
And while this is possible, doing so would complicate the 
creation of the cache policies to the point where it subverts 
their intent, which was to make it easy for the user to tune the 
behavior of the cache to their own particular needs by defining a 
simple type which implements a few functions.  Ultimately, I 
decided against this approach for the cache container and decided 
to restrict the allocators to those which defined a pointer to T 
as T* so the policies could be coded with basically no knowledge 
of the underlying storage.

So... while I support the goal you're aiming at, I want to see a 
much more comprehensive example of how this will work and how it 
will affect code written by D *users*.  Because it isn't enough 
for Phobos to be written this way.  Basically all D code will 
have to take this into account for the strategy to be truly 
viable.  Simply outlining one of the most basic functions in 
Phobos, which already looks like it will have a static 
conditional at the beginning and *need to be aware of the fact 
that an RCString type exists* makes me terrified of what a 
realistic example will look like.

Sep 30 2014

"H. S. Teoh via Digitalmars-d" <digitalmars-d puremagic.com> writes:

On Tue, Sep 30, 2014 at 04:10:43PM +0000, Sean Kelly via Digitalmars-d wrote:
 On Monday, 29 September 2014 at 10:49:53 UTC, Andrei Alexandrescu wrote:
The policy is a template parameter to functions in Phobos (and
elsewhere), and informs the functions e.g. what types to return.
Consider:

auto setExtension(MemoryManagementPolicy mmp = gc, R1, R2)(R1 path, R2
ext)
if (...)
{
    static if (mmp == gc) alias S = string;
    else alias S = RCString;
    S result;
    ...
    return result;
}

 
 Is this for exposition purposes or actually how you expect it to work?
 Quite honestly, I can't imagine how I could write a template function
 in D that needs to work with this approach.
 
 As much as I hate to say it, this is pretty much exactly what C++
 allocators were designed for.  They handle allocation, sure, but they
 also hold aliases for all relevant types for the data being allocated.

[...]
 So... while I support the goal you're aiming at, I want to see a much
 more comprehensive example of how this will work and how it will
 affect code written by D *users*.  Because it isn't enough for Phobos
 to be written this way.  Basically all D code will have to take this
 into account for the strategy to be truly viable.  Simply outlining
 one of the most basic functions in Phobos, which already looks like it
 will have a static conditional at the beginning and *need to be aware
 of the fact that an RCString type exists* makes me terrified of what a
 realistic example will look like.

Yeah, this echoes my concern. This looks not that much different, from a
user's POV, from C++ containers' allocator template parameters. Yes I
know we're not talking about *allocators* per se but about *memory
management*, but I'm talking about the need to explicitly pass mmp to
*every* *single* *function* if you desire anything but the default. How
many people actually *use* the allocator parameter in STL? Certainly,
many people do... but the code is anything but readable / maintainable.

Not only that, but every single function will have to handle this
parameter somehow, and if static if's at the top of the function is what
we're starting with, I fear seeing what we end up with.

Furthermore, in order for this to actually work, it has to be percolated
throughout the entire codebase -- any D library that even remotely uses
Phobos for anything will have to percolate this parameter throughout its
API -- at least, any part of the API that might potentially use a Phobos
function. Otherwise, you still have the situation where a given D
library doesn't allow the user to select a memory management scheme, and
internally calls Phobos functions with the default settings. So this
still doesn't solve the problem that today, people who need to use  nogc
can't use a lot of existing libraries because the library depends on the
GC, even if it doesn't assume anything about the MM scheme, but just
happens to call some obscure Phobos function with the default MM
parameter. The only way this could work was if *every* D library author
voluntarily rewrites a lot of code in order to percolate this MM
parameter through to the API, on the off-chance that some obscure user
somewhere might have need to use it. I don't see much likelihood of this
actually happening.

Then there's the matter of functions like parseJSON() that needs to
allocate nodes and return a tree (or whatever) of these nodes. Note that
they need to *allocate*, not just know what kind of memory management
model is to be used. So how do you propose to address this? Via another
parameter (compile-time or otherwise) to specify which allocator to use?
So how does the memory management parameter solve anything then? And how
would such a thing be implemented? Using a 3-way static-if branch in
every single point in parseJSON where it needs to allocate nodes? We
could just as well write it in C++, if that's the case.

This proposal has many glaring holes that need to be fixed before it can
be viable.


T

-- 
EMACS = Extremely Massive And Cumbersome System

Sep 30 2014

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 9/30/14, 10:33 AM, H. S. Teoh via Digitalmars-d wrote:
 Yeah, this echoes my concern. This looks not that much different, from a
 user's POV, from C++ containers' allocator template parameters. Yes I
 know we're not talking about*allocators*  per se but about *memory
 management*, but I'm talking about the need to explicitly pass mmp to
 *every*  *single*  *function*  if you desire anything but the default. How
 many people actually*use*  the allocator parameter in STL? Certainly,
 many people do... but the code is anything but readable / maintainable.

The parallel with STL allocators is interesting, but I'm not worried 
about it that much. I don't want to go off on a tangent but I'm fairly 
certain std::allocator is hard to use for entirely different reasons 
than the intended use patterns of MemoryManagementPolicy.

 Not only that, but every single function will have to handle this
 parameter somehow, and if static if's at the top of the function is what
 we're starting with, I fear seeing what we end up with.

Apparently Sean's idea would take care of that.

 Furthermore, in order for this to actually work, it has to be percolated
 throughout the entire codebase -- any D library that even remotely uses
 Phobos for anything will have to percolate this parameter throughout its
 API -- at least, any part of the API that might potentially use a Phobos
 function.

Yes, but that's entirely expected. We're adding genuinely new 
functionality to Phobos.

 Otherwise, you still have the situation where a given D
 library doesn't allow the user to select a memory management scheme, and
 internally calls Phobos functions with the default settings.

Correct.

 So this
 still doesn't solve the problem that today, people who need to use  nogc
 can't use a lot of existing libraries because the library depends on the
 GC, even if it doesn't assume anything about the MM scheme, but just
 happens to call some obscure Phobos function with the default MM
 parameter. The only way this could work was if*every*  D library author
 voluntarily rewrites a lot of code in order to percolate this MM
 parameter through to the API, on the off-chance that some obscure user
 somewhere might have need to use it. I don't see much likelihood of this
 actually happening.

A simple way to put this is Libraries that use the GC will continue to 
use the GC. There's no way around that unless we choose to break them all.

 Then there's the matter of functions like parseJSON() that needs to
 allocate nodes and return a tree (or whatever) of these nodes. Note that
 they need to*allocate*, not just know what kind of memory management
 model is to be used. So how do you propose to address this? Via another
 parameter (compile-time or otherwise) to specify which allocator to use?
 So how does the memory management parameter solve anything then? And how
 would such a thing be implemented? Using a 3-way static-if branch in
 every single point in parseJSON where it needs to allocate nodes? We
 could just as well write it in C++, if that's the case.

parseJSON() would get a memory management policy parameter, and will use 
the currently installed memory allocator for allocation.

 This proposal has many glaring holes that need to be fixed before it can
 be viable.

Affirmative. That's why it's an RFC, very far from a proposal. I'm glad 
I got a bunch of good ideas.


Andrei

Oct 01 2014

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 9/30/14, 9:10 AM, Sean Kelly wrote:
 On Monday, 29 September 2014 at 10:49:53 UTC, Andrei Alexandrescu wrote:
 The policy is a template parameter to functions in Phobos (and
 elsewhere), and informs the functions e.g. what types to return.
 Consider:

 auto setExtension(MemoryManagementPolicy mmp = gc, R1, R2)(R1 path, R2
 ext)
 if (...)
 {
     static if (mmp == gc) alias S = string;
     else alias S = RCString;
     S result;
     ...
     return result;
 }

 Is this for exposition purposes or actually how you expect it to work?

That's pretty much what it would take. The key here is that RCString is 
almost a drop-in replacement for string, so the code using it is almost 
identical. There will be places where code needs to be replaced, e.g.

auto s = "literal";

would need to become

S s = "literal";

So creation of strings will change a bit, but overall there's not a lot 
of churn.

 Quite honestly, I can't imagine how I could write a template function
 in D that needs to work with this approach.

You mean write a function that accepts a memory management policy, or a 
function that uses one?

 As much as I hate to say it, this is pretty much exactly what C++
 allocators were designed for.  They handle allocation, sure, but they
 also hold aliases for all relevant types for the data being allocated.
 If the MemoryManagementPolicy enum were replaced with an alias to a type
 that I could use to at least obtain relevant aliases, that would be
 something.  But even that approach dramatically complicates code that
 uses it.

I think making MemoryManagementPolicy a meaningful type is a great idea. 
It would e.g. define the string type, so the code becomes:

auto setExtension(alias MemoryManagementPolicy = gc, R1, R2)(R1 path, R2 
ext)
if (...)
{
     MemoryManagementPolicy.string result;
     ...
     return result;
}

This is a lot more general and extensible. Thanks!

Why do you think there'd be dramatic complication of code? (Granted, at 
some point we must acknowledge that some egg breaking is necessary for 
the proverbial omelette.)

 Having written standards-compliant containers in C++, I honestly can't
 imagine the average user writing code that works this way. Once you
 assert that the reference type may be a pointer or it may be some
 complex proxy to data stored elsewhere, a lot of composability pretty
 much flies right out the window.

The thing is, again, we must make some changes if we want D to be usable 
without a GC. One of them is e.g. to not allocate built-in slices all 
over the place.

 For example, I have an implementation of C++ unordered_map/set/etc
 designed to be a customizable cache, so one of its template arguments is
 a policy type that allows eviction behavior to be chosen at declaration
 time.  Maybe the cache is size-limited, maybe it's age-limited, maybe
 it's a combination of the two or something even more complicated.  The
 problem is that the container defines all the aliases relating to the
 underlying data, but the policy, which needs to be aware of these, is
 passed as a template argument to this container.

 To make something that's fully aware of C++ allocators then, I'd have to
 define a small type that takes the container template arguments (the
 contained type and the allocator type) and generates the aliases and
 pass this to the policy, which in turn passes the type through to the
 underlying container so it can declare its public aliases and whatever
 else is true standards-compliant fashion (or let the container derive
 this itself, but then you run into the potential for disagreement). And
 while this is possible, doing so would complicate the creation of the
 cache policies to the point where it subverts their intent, which was to
 make it easy for the user to tune the behavior of the cache to their own
 particular needs by defining a simple type which implements a few
 functions.  Ultimately, I decided against this approach for the cache
 container and decided to restrict the allocators to those which defined
 a pointer to T as T* so the policies could be coded with basically no
 knowledge of the underlying storage.

That sounds like a rather involved artifact. Hopefully we can leverage 
D's better expressiveness to make building such complex libraries easier.

 So... while I support the goal you're aiming at, I want to see a much
 more comprehensive example of how this will work and how it will affect
 code written by D *users*.

Agreed.

 Because it isn't enough for Phobos to be
 written this way.  Basically all D code will have to take this into
 account for the strategy to be truly viable.  Simply outlining one of
 the most basic functions in Phobos, which already looks like it will
 have a static conditional at the beginning and *need to be aware of the
 fact that an RCString type exists* makes me terrified of what a
 realistic example will look like.

That would be overreacting :o).


Andrei

Oct 01 2014

"Sean Kelly" <sean invisibleduck.org> writes:

On Wednesday, 1 October 2014 at 08:55:55 UTC, Andrei Alexandrescu 
wrote:
 On 9/30/14, 9:10 AM, Sean Kelly wrote:
 Is this for exposition purposes or actually how you expect it 
 to work?

 That's pretty much what it would take. The key here is that 
 RCString is almost a drop-in replacement for string, so the 
 code using it is almost identical. There will be places where 
 code needs to be replaced, e.g.

 auto s = "literal";

 would need to become

 S s = "literal";

 So creation of strings will change a bit, but overall there's 
 not a lot of churn.

I'm confused.  Is this a general-purpose solution or just one 
that switches between string and RCString?

Oct 01 2014

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 10/1/14, 6:52 AM, Sean Kelly wrote:
 On Wednesday, 1 October 2014 at 08:55:55 UTC, Andrei Alexandrescu wrote:
 On 9/30/14, 9:10 AM, Sean Kelly wrote:
 Is this for exposition purposes or actually how you expect it to work?

 That's pretty much what it would take. The key here is that RCString
 is almost a drop-in replacement for string, so the code using it is
 almost identical. There will be places where code needs to be
 replaced, e.g.

 auto s = "literal";

 would need to become

 S s = "literal";

 So creation of strings will change a bit, but overall there's not a
 lot of churn.

 I'm confused.  Is this a general-purpose solution or just one that
 switches between string and RCString?

General purpose since your suggested change. -- Andrei

Oct 01 2014

"Sean Kelly" <sean invisibleduck.org> writes:

On Wednesday, 1 October 2014 at 08:55:55 UTC, Andrei Alexandrescu 
wrote:
 On 9/30/14, 9:10 AM, Sean Kelly wrote:

 Quite honestly, I can't imagine how I could write a template 
 function in D that needs to work with this approach.

 You mean write a function that accepts a memory management 
 policy, or a function that uses one?

Both, I suppose?  A static if block at the top of each function 
that must be aware of every RC type the user may expect?  What if 
it's a user-defined RC type and this function is in Phobos?


 As much as I hate to say it, this is pretty much exactly what 
 C++
 allocators were designed for.  They handle allocation, sure, 
 but they
 also hold aliases for all relevant types for the data being 
 allocated.
 If the MemoryManagementPolicy enum were replaced with an alias 
 to a type that I could use to at least obtain relevant 
 aliases, that would be something.  But even that approach 
 dramatically complicates code that uses it.

 I think making MemoryManagementPolicy a meaningful type is a 
 great idea. It would e.g. define the string type, so the code 
 becomes:

 auto setExtension(alias MemoryManagementPolicy = gc, R1, R2)(R1 
 path, R2 ext)
 if (...)
 {
     MemoryManagementPolicy.string result;
     ...
     return result;
 }

 This is a lot more general and extensible. Thanks!

 Why do you think there'd be dramatic complication of code? 
 (Granted, at some point we must acknowledge that some egg 
 breaking is necessary for the proverbial omelette.)

 From my experience with C++ containers.  Having an alias for a 
type is okay, but bank of aliases where one is a pointer to the 
type, one is a const pointer to the type, etc, makes writing the 
involved code feel really unnatural.


 The thing is, again, we must make some changes if we want D to 
 be usable without a GC. One of them is e.g. to not allocate 
 built-in slices all over the place.

So let the user supply a scratch buffer that will hold the 
result?  With the RC approach we're still allocating, they just 
aren't built-in slices, correct?


 That would be overreacting :o).

I hope it is :-)

Oct 01 2014

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 10/1/14, 7:03 AM, Sean Kelly wrote:
 So let the user supply a scratch buffer that will hold the result?  With
 the RC approach we're still allocating, they just aren't built-in
 slices, correct?

Correct. -- Andrei

Oct 01 2014

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

29-Sep-2014 14:49, Andrei Alexandrescu пишет:
 auto setExtension(MemoryManagementPolicy mmp = gc, R1, R2)(R1 path, R2 ext)
 if (...)
 {
      static if (mmp == gc) alias S = string;
      else alias S = RCString;
      S result;
      ...
      return result;
 }

Incredible code bloat? Boilerplate in each function for the win?
I'm at loss as to how it would make things better.


-- 
Dmitry Olshansky

Sep 30 2014

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 9/30/14, 11:06 AM, Dmitry Olshansky wrote:
 29-Sep-2014 14:49, Andrei Alexandrescu пишет:
 auto setExtension(MemoryManagementPolicy mmp = gc, R1, R2)(R1 path, R2
 ext)
 if (...)
 {
      static if (mmp == gc) alias S = string;
      else alias S = RCString;
      S result;
      ...
      return result;
 }

 Incredible code bloat? Boilerplate in each function for the win?
 I'm at loss as to how it would make things better.

Sean's idea to make string an alias of the policy takes care of this 
concern. -- Andrei

Oct 01 2014

"H. S. Teoh via Digitalmars-d" <digitalmars-d puremagic.com> writes:

On Wed, Oct 01, 2014 at 02:51:08AM -0700, Andrei Alexandrescu via Digitalmars-d
wrote:
 On 9/30/14, 11:06 AM, Dmitry Olshansky wrote:
29-Sep-2014 14:49, Andrei Alexandrescu пишет:
auto setExtension(MemoryManagementPolicy mmp = gc, R1, R2)(R1 path, R2
ext)
if (...)
{
     static if (mmp == gc) alias S = string;
     else alias S = RCString;
     S result;
     ...
     return result;
}

Incredible code bloat? Boilerplate in each function for the win?
I'm at loss as to how it would make things better.

 
 Sean's idea to make string an alias of the policy takes care of this
 concern. -- Andrei

But Sean's idea only takes strings into account. Strings aren't the only
allocated resource Phobos needs to deal with. So extrapolating from that
idea, each memory management struct (or whatever other aggregate we end
up using), say call it MMP, will have to define MMP.string, MMP.jsonNode
(since parseJSON() need to allocate not only strings but JSON nodes),
MMP.redBlackTreeNode, MMP.listNode, MMP.userDefinedNode, ...

Nope, still don't see how this could work. Please clarify, kthx.


T

-- 
Sometimes the best solution to morale problems is just to fire all of the
unhappy people. -- despair.com

Oct 01 2014

"Kiith-Sa" <kiithsacmp gmail.com> writes:

On Wednesday, 1 October 2014 at 17:53:43 UTC, H. S. Teoh via 
Digitalmars-d wrote:
 On Wed, Oct 01, 2014 at 02:51:08AM -0700, Andrei Alexandrescu 
 via Digitalmars-d wrote:
 On 9/30/14, 11:06 AM, Dmitry Olshansky wrote:
29-Sep-2014 14:49, Andrei Alexandrescu пишет:
auto setExtension(MemoryManagementPolicy mmp = gc, R1, 
R2)(R1 path, R2
ext)
if (...)
{
     static if (mmp == gc) alias S = string;
     else alias S = RCString;
     S result;
     ...
     return result;
}

Incredible code bloat? Boilerplate in each function for the 
win?
I'm at loss as to how it would make things better.

 
 Sean's idea to make string an alias of the policy takes care 
 of this
 concern. -- Andrei

 But Sean's idea only takes strings into account. Strings aren't 
 the only
 allocated resource Phobos needs to deal with. So extrapolating 
 from that
 idea, each memory management struct (or whatever other 
 aggregate we end
 up using), say call it MMP, will have to define MMP.string, 
 MMP.jsonNode
 (since parseJSON() need to allocate not only strings but JSON 
 nodes),
 MMP.redBlackTreeNode, MMP.listNode, MMP.userDefinedNode, ...

 Nope, still don't see how this could work. Please clarify, kthx.


 T


MMP.Ref!redBlackTreeNode ?

(where Ref is e.g. a ref-counted pointer type (like RefCounted 
but with class support) for RC MMP but plain GC reference for GC 
MMP, etc.)

I kinda like this idea, since it might possibly allow 
user-defined memory management policies (which wouldn't get 
special compiler treatment that e.g. RC may need, though).

Oct 01 2014

"Sean Kelly" <sean invisibleduck.org> writes:

On Wednesday, 1 October 2014 at 17:53:43 UTC, H. S. Teoh via
Digitalmars-d wrote:
 But Sean's idea only takes strings into account. Strings aren't 
 the only
 allocated resource Phobos needs to deal with. So extrapolating 
 from that
 idea, each memory management struct (or whatever other 
 aggregate we end
 up using), say call it MMP, will have to define MMP.string, 
 MMP.jsonNode
 (since parseJSON() need to allocate not only strings but JSON 
 nodes),
 MMP.redBlackTreeNode, MMP.listNode, MMP.userDefinedNode, ...

 Nope, still don't see how this could work. Please clarify, kthx.

Assuming you're willing to take the memoryModel type as a
template argument, I imagine we could do something where the user
can specialize the memoryModel for their own types, a bit like
how information is derived for iterators in C++.  The problem is
that this still means passing the memoryModel in as a template
argument.  What I'd really want is for it to be a global, except
that templated virtuals is logically impossible.  I guess
something could maybe be sorted out via a factory design, but
that's not terribly D-like.  I'm at a loss for how to make this
memoryModel thing work the way I'd actually want it to if I were
to use it.

Oct 01 2014

"Cliff" <cliff.s.hudson gmail.com> writes:

On Wednesday, 1 October 2014 at 18:37:50 UTC, Sean Kelly wrote:
 On Wednesday, 1 October 2014 at 17:53:43 UTC, H. S. Teoh via
 Digitalmars-d wrote:
 But Sean's idea only takes strings into account. Strings 
 aren't the only
 allocated resource Phobos needs to deal with. So extrapolating 
 from that
 idea, each memory management struct (or whatever other 
 aggregate we end
 up using), say call it MMP, will have to define MMP.string, 
 MMP.jsonNode
 (since parseJSON() need to allocate not only strings but JSON 
 nodes),
 MMP.redBlackTreeNode, MMP.listNode, MMP.userDefinedNode, ...

 Nope, still don't see how this could work. Please clarify, 
 kthx.

 Assuming you're willing to take the memoryModel type as a
 template argument, I imagine we could do something where the 
 user
 can specialize the memoryModel for their own types, a bit like
 how information is derived for iterators in C++.  The problem is
 that this still means passing the memoryModel in as a template
 argument.  What I'd really want is for it to be a global, except
 that templated virtuals is logically impossible.  I guess
 something could maybe be sorted out via a factory design, but
 that's not terribly D-like.  I'm at a loss for how to make this
 memoryModel thing work the way I'd actually want it to if I were
 to use it.

If you were to forget D restrictions for a moment, and consider 
an idealized language, how would you express this?  Maybe 
providing that will trigger some ideas from people beyond what we 
have seen so far by removing implied restrictions.

Oct 01 2014

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 10/1/14, 10:51 AM, H. S. Teoh via Digitalmars-d wrote:
 But Sean's idea only takes strings into account. Strings aren't the only
 allocated resource Phobos needs to deal with. So extrapolating from that
 idea, each memory management struct (or whatever other aggregate we end
 up using), say call it MMP, will have to define MMP.string, MMP.jsonNode
 (since parseJSON() need to allocate not only strings but JSON nodes),
 MMP.redBlackTreeNode, MMP.listNode, MMP.userDefinedNode, ...

 Nope, still don't see how this could work. Please clarify, kthx.

There's management for T[], pointers to structs, pointers to class 
objects, associative arrays, and that covers everything. -- Andrei

Oct 01 2014

"Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> writes:

Ok, here are my few cents:

On Monday, 29 September 2014 at 10:49:53 UTC, Andrei Alexandrescu 
wrote:
 Back when I've first introduced RCString I hinted that we have 
 a larger strategy in mind. Here it is.

 The basic tenet of the approach is to reckon and act on the 
 fact that memory allocation (the subject of allocators) is an 
 entirely distinct topic from memory management, and more 
 generally resource management. This clarifies that it would be 
 wrong to approach alternatives to GC in Phobos by means of 
 allocators. GC is not only an approach to memory allocation, 
 but also an approach to memory management. Reducing it to 
 either one is a mistake. In hindsight this looks rather obvious 
 but it has caused me and many people better than myself a lot 
 of headache.

I would argue that GC is at its core _only_ a memory management 
strategy. It just so happens that the one in D's runtime also 
comes with an allocator, with which it is tightly integrated. In 
theory, a GC can work with any (and multiple) allocators, and you 
could of course also call GC.free() manually, because, as you 
say, management and allocation are entirely distinct topics.

 That said allocators are nice to have and use, and I will 
 definitely follow up with std.allocator. However, std.allocator 
 is not the key to a  nogc Phobos.

Agreed.

 Nor are ranges. There is an attitude that either output ranges, 
 or input ranges in conjunction with lazy computation, would 
 solve the issue of creating garbage. 
 https://github.com/D-Programming-Language/phobos/pull/2423 is a 
 good illustration of the latter approach: a range would be 
 lazily created by chaining stuff together. A range-based 
 approach would take us further than the allocators, but I see 
 the following issues with it:

 (a) the whole approach doesn't stand scrutiny for non-linear 
 outputs, e.g. outputting some sort of associative array or 
 really any composite type quickly becomes tenuous either with 
 an output range (eager) or with exposing an input range (lazy);

 (b) makes the style of programming without GC radically 
 different, and much more cumbersome, than programming with GC; 
 as a consequence, programmers who consider changing one 
 approach to another, or implementing an algorithm neutral to 
 it, are looking at a major rewrite;

 (c) would make D/ nogc a poor cousin of C++. This is quite out 
 of character; technically, I have long gotten used to seeing 
 most elaborate C++ code like poor emulation of simple D idioms. 
 But C++ has spent years and decades taking to perfection an 
 approach without a tracing garbage collector. A departure from 
 that would need to be superior, and that doesn't seem to be the 
 case with range-based approaches.

I agree with this, too.

 ===========

 Now that we clarified that these existing attempts are not 
 going to work well, the question remains what does. For Phobos 
 I'm thinking of defining and using three policies:

 enum MemoryManagementPolicy { gc, rc, mrc }
 immutable
     gc = ResourceManagementPolicy.gc,
     rc = ResourceManagementPolicy.rc,
     mrc = ResourceManagementPolicy.mrc;

 The three policies are:

 (a) gc is the classic garbage-collected style of management;

 (b) rc is a reference-counted style still backed by the GC, 
 i.e. the GC will still be able to pick up cycles and other 
 kinds of leaks.

 (c) mrc is a reference-counted style backed by malloc.

 (It should be possible to collapse rc and mrc together and make 
 the distinction dynamically, at runtime. I'm distinguishing 
 them statically here for expository purposes.)

 The policy is a template parameter to functions in Phobos (and 
 elsewhere), and informs the functions e.g. what types to 
 return. Consider:

 auto setExtension(MemoryManagementPolicy mmp = gc, R1, R2)(R1 
 path, R2 ext)
 if (...)
 {
     static if (mmp == gc) alias S = string;
     else alias S = RCString;
     S result;
     ...
     return result;
 }

 On the caller side:

 auto p1 = setExtension("hello", ".txt"); // fine, use gc
 auto p2 = setExtension!gc("hello", ".txt"); // same
 auto p3 = setExtension!rc("hello", ".txt"); // fine, use rc

 So by default it's going to continue being business as usual, 
 but certain functions will allow passing in a (defaulted) 
 policy for memory management.

This, however, I disagree with strongly. For one thing - this has 
already been noted by others - it would make the functions' 
implementation extremely ugly (`static if` hell), it would make 
them harder to unit test, and from a user's point of view, it's 
very tedious and might interfere badly with UFCS.

But more importantly, IMO, it's the wrong thing to do. These 
functions shouldn't know anything about memory management policy 
at all. They allocate, which means they need to know about 
_allocation_ policy, but memory _management_ policy needs to be 
decided by the user.

Now, your suggestion in a way still leaves that decision to the 
user, but does so in a very intrusive way, by passing a template 
flag. This is clearly a violation of the separation of concerns. 
Contrary to the typical case, implementation details of the 
user's code leak into the library code, and not the other way 
round, but that's just as bad.

I'm convinced this isn't necessary. Let's take `setExtension()` 
as an example, standing in for any of a class of similar 
functions. This function allocates memory, returns it, and 
abandons it; it gives up ownership of the memory. The fact that 
the memory has been freshly allocated means that it is (head) 
unique, and therefore the caller (= library user) can take over 
the ownership. This, in turn, means that the caller can decide 
how she wants to manage it.

(I'll try to make a sketch on how this can be implemented in 
another post.)

As a conclusion, I would say that APIs should strive for the 
following principles, in this order:

1. Avoid allocation altogether, for example by laziness (ranges), 
or by accepting sinks.

2. If allocations are necessary (or desirable, to make the API 
more easily usable), try hard to return a unique value (this of 
course needs to be expressed in the return type).

3. If both of the above fails, only then return a GCed pointer, 
or alternatively provide several variants of the function (though 
this shouldn't be necessary often). An interesting alternative: 
Instead of passing a flag directly describing the policy, pass 
the function a type that it should wrap it's return value in.

As for the _allocation_ strategy: It indeed needs to be 
configurable, but here, the same objections against a template 
parameter apply. As the allocator doesn't necessarily need to be 
part of the type, a (thread) global variable can be used to 
specify it. This lends itself well to idioms like

     with(MyAllocator alloc) {
         // ...
     }

 Destroy!

Done :-)

Sep 30 2014

"Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> writes:

On Tuesday, 30 September 2014 at 19:10:19 UTC, Marc Schütz wrote:
 I'm convinced this isn't necessary. Let's take `setExtension()` 
 as an example, standing in for any of a class of similar 
 functions. This function allocates memory, returns it, and 
 abandons it; it gives up ownership of the memory. The fact that 
 the memory has been freshly allocated means that it is (head) 
 unique, and therefore the caller (= library user) can take over 
 the ownership. This, in turn, means that the caller can decide 
 how she wants to manage it.

 (I'll try to make a sketch on how this can be implemented in 
 another post.)

Ok. What we need for it:

1)  unique, or a way to expressly specify uniqueness on a 
function's return type, as well as restrict function params by it 
(and preferably overloading on uniqueness). DMD already has this 
concept internally, it just needs to be formalized.

2) A few modifications to RefCounted to be constructable from 
unique values.

3) A wrapper type similar to std.typecons.Unique, that also 
supports moving. Let's called it Owned(T).

4) Borrowing.

setExtension() can then look like this:

     Owned!string setExtension(in char[] path, in char[] ext);

To be used:

     void saveFileAs(in char[] name) {
         import std.path: setExtension;
         import std.file: write;
         name.                    // scope const(char[])
             setExtension("txt"). // Owned!string
             write(data);
     }

The Owned(T) value implicitly converts to `scope!this(T)` via 
alias this; it can therefore be conveniently passed to 
std.file.write() (which already takes the filename as `in`) 
without copying or moving. The value then is released 
automatically at the end of the statement, because it is only a 
temporary and is not assigned to a variable.

For transferring ownership:

     RefCounted!string[] filenames;
     // ...
     filenames ~= name.setExtension("txt").release;

`Owned!T.release()` returns the payload as a unique value, and 
resets the payload to it's init value (in this case `null`). 
RefCounted's constructor then accepts this unique value and takes 
ownership of it. When the Owned value's destructor is called, it 
finds the payload to be null and doesn't free the memory. 
Inlining and subsequent optimization can turn the destructor into 
a no-op in this case.

Optionally, Owned!T can provide an `alias this` to its release 
method; in this case, the method doesn't need to be called 
explicitly. It is however debatable whether being explicit with 
moving isn't the better choice.

Sep 30 2014

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 9/30/14, 12:10 PM, "Marc Schütz" <schuetzm gmx.net>" wrote:
 I would argue that GC is at its core _only_ a memory management
 strategy. It just so happens that the one in D's runtime also comes with
 an allocator, with which it is tightly integrated. In theory, a GC can
 work with any (and multiple) allocators, and you could of course also
 call GC.free() manually, because, as you say, management and allocation
 are entirely distinct topics.

I'm not very sure. A GC might need to interoperate closely with the 
allocator. -- Andrei

Oct 01 2014

"Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> writes:

On Wednesday, 1 October 2014 at 09:52:46 UTC, Andrei Alexandrescu 
wrote:
 On 9/30/14, 12:10 PM, "Marc Schütz" <schuetzm gmx.net>" wrote:
 I would argue that GC is at its core _only_ a memory management
 strategy. It just so happens that the one in D's runtime also 
 comes with
 an allocator, with which it is tightly integrated. In theory, 
 a GC can
 work with any (and multiple) allocators, and you could of 
 course also
 call GC.free() manually, because, as you say, management and 
 allocation
 are entirely distinct topics.

 I'm not very sure. A GC might need to interoperate closely with 
 the allocator. -- Andrei

It needs to know what to scan (ideally with type info), and which 
allocator to release memory with, but it doesn't need to be an 
allocator itself. It certainly helps with the implementation, but 
ideally there would be a well defined interface between 
allocators and GCs, so that both can be plugged in as desired, 
even with multiple GCs in parallel.

Oct 01 2014

"Oren Tirosh" <orent hishome.net> writes:

On Tuesday, 30 September 2014 at 19:10:19 UTC, Marc Schütz wrote:
 [...]

 I'm convinced this isn't necessary. Let's take `setExtension()` 
 as an example, standing in for any of a class of similar 
 functions. This function allocates memory, returns it, and 
 abandons it; it gives up ownership of the memory. The fact that 
 the memory has been freshly allocated means that it is (head) 
 unique, and therefore the caller (= library user) can take over 
 the ownership. This, in turn, means that the caller can decide 
 how she wants to manage it.

Bingo. Have some way to mark the function return type as a unique 
pointer. This does not imply full-fledged unique pointer type 
support in the language - just enough to have the caller ensure 
continuity of memory management policy from there.

One problem with actually implementing this is that using 
reference counting as a memory management policy requires extra 
space for the reference counter in the object, just as garbage 
collection requires support for scanning and identification of 
interior object memory range. While allocation and memory 
management may be quite independent in theory, practical high 
performance implementations tend to be intimately related.

 (I'll try to make a sketch on how this can be implemented in 
 another post.)

Do elaborate!

 As a conclusion, I would say that APIs should strive for the 
 following principles, in this order:

 1. Avoid allocation altogether, for example by laziness 
 (ranges), or by accepting sinks.

 2. If allocations are necessary (or desirable, to make the API 
 more easily usable), try hard to return a unique value (this of 
 course needs to be expressed in the return type).

 3. If both of the above fails, only then return a GCed pointer, 
 or alternatively provide several variants of the function 
 (though this shouldn't be necessary often). An interesting 
 alternative: Instead of passing a flag directly describing the 
 policy, pass the function a type that it should wrap it's 
 return value in.

 As for the _allocation_ strategy: It indeed needs to be 
 configurable, but here, the same objections against a template 
 parameter apply. As the allocator doesn't necessarily need to 
 be part of the type, a (thread) global variable can be used to 
 specify it. This lends itself well to idioms like

     with(MyAllocator alloc) {
         // ...
     }

Assuming there is some dependency between the allocator and the 
memory management policy I guess this would be initialized on 
thread start that cannot be modified later. All code running 
inside the thread would need to either match the configured 
policy, not handle any kind of pointers or use a limited subset 
of unique pointers. Another way to ensure that code can run on 
either RC or GC is to make certain objects (specifically, 
Exceptions) always allocate a reference counter, regardless of 
the currently configured policy.

Oct 01 2014

"bearophile" <bearophileHUGS lycos.com> writes:

Oren Tirosh:

 Bingo. Have some way to mark the function return type as a 
 unique pointer. This does not imply full-fledged unique pointer 
 type support in the language

Let's have full-fledged memory zones tracking in the D type 
system :-)

Bye,
bearophile

Oct 01 2014

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 10/1/14, 8:48 AM, Oren Tirosh wrote:
 On Tuesday, 30 September 2014 at 19:10:19 UTC, Marc Schütz wrote:
 [...]

 I'm convinced this isn't necessary. Let's take `setExtension()` as an
 example, standing in for any of a class of similar functions. This
 function allocates memory, returns it, and abandons it; it gives up
 ownership of the memory. The fact that the memory has been freshly
 allocated means that it is (head) unique, and therefore the caller (=
 library user) can take over the ownership. This, in turn, means that
 the caller can decide how she wants to manage it.

 Bingo. Have some way to mark the function return type as a unique
 pointer.

I'm skeptical about this approach (though clearly we need to explore it 
for e.g. passing ownership of data across threads). For strings and 
other "casual" objects I think we should focus on GC/RC strategies. This 
is because people do things like:

auto s = setExtension(s1, s2);

and then attempt to use s as a regular variable (copy it etc). Making s 
unique would make usage quite surprising and cumbersome.


Andrei

Oct 01 2014

"Oren T" <orent hishome.net> writes:

On Wednesday, 1 October 2014 at 17:13:38 UTC, Andrei Alexandrescu 
wrote:
 On 10/1/14, 8:48 AM, Oren Tirosh wrote:
 On Tuesday, 30 September 2014 at 19:10:19 UTC, Marc Schütz 
 wrote:
 [...]

 I'm convinced this isn't necessary. Let's take 
 `setExtension()` as an
 example, standing in for any of a class of similar functions. 
 This
 function allocates memory, returns it, and abandons it; it 
 gives up
 ownership of the memory. The fact that the memory has been 
 freshly
 allocated means that it is (head) unique, and therefore the 
 caller (=
 library user) can take over the ownership. This, in turn, 
 means that
 the caller can decide how she wants to manage it.

 Bingo. Have some way to mark the function return type as a 
 unique
 pointer.

 I'm skeptical about this approach (though clearly we need to 
 explore it for e.g. passing ownership of data across threads). 
 For strings and other "casual" objects I think we should focus 
 on GC/RC strategies. This is because people do things like:

 auto s = setExtension(s1, s2);

 and then attempt to use s as a regular variable (copy it etc). 
 Making s unique would make usage quite surprising and 
 cumbersome.

The idea is that the unique property is very short-lived: the 
caller immediately assigns it to a pointer of the appropriate 
policy: either RC or GC. This keeps the callee agnostic of the 
chosen policy and does not require templating multiple versions 
of the code. The allocator configured for the thread must match 
the generated code at the call site i.e. if the caller uses RC 
pointers the allocator must allocate space for the reference 
counter (at negative offset to keep compatibility).

Oct 01 2014

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 10/1/14, 10:25 AM, Oren T wrote:
 The idea is that the unique property is very short-lived: the caller
 immediately assigns it to a pointer of the appropriate policy: either RC
 or GC. This keeps the callee agnostic of the chosen policy and does not
 require templating multiple versions of the code. The allocator
 configured for the thread must match the generated code at the call site
 i.e. if the caller uses RC pointers the allocator must allocate space
 for the reference counter (at negative offset to keep compatibility).

This all... looks arcane. I'm not sure how it can even made to work if 
user code just uses "auto". -- Andrei

Oct 01 2014

"Oren T" <orent hishome.net> writes:

On Wednesday, 1 October 2014 at 17:33:34 UTC, Andrei Alexandrescu 
wrote:
 On 10/1/14, 10:25 AM, Oren T wrote:
 The idea is that the unique property is very short-lived: the 
 caller
 immediately assigns it to a pointer of the appropriate policy: 
 either RC
 or GC. This keeps the callee agnostic of the chosen policy and 
 does not
 require templating multiple versions of the code. The allocator
 configured for the thread must match the generated code at the 
 call site
 i.e. if the caller uses RC pointers the allocator must 
 allocate space
 for the reference counter (at negative offset to keep 
 compatibility).

 This all... looks arcane. I'm not sure how it can even made to 
 work if user code just uses "auto". -- Andrei

At the moment,  nogc code can't call any function returning a 
pointer. Under this scheme  nogc is allowed to call either code 
that returns an explicitly RC ty

Oct 01 2014

"Oren T" <orent hishome.net> writes:

On Wednesday, 1 October 2014 at 17:33:34 UTC, Andrei Alexandrescu 
wrote:
 On 10/1/14, 10:25 AM, Oren T wrote:
 The idea is that the unique property is very short-lived: the 
 caller
 immediately assigns it to a pointer of the appropriate policy: 
 either RC
 or GC. This keeps the callee agnostic of the chosen policy and 
 does not
 require templating multiple versions of the code. The allocator
 configured for the thread must match the generated code at the 
 call site
 i.e. if the caller uses RC pointers the allocator must 
 allocate space
 for the reference counter (at negative offset to keep 
 compatibility).

 This all... looks arcane. I'm not sure how it can even made to 
 work if user code just uses "auto". -- Andrei

At the moment,  nogc code can't call any function returning a 
pointer. Under this scheme  nogc is allowed to call either code 
that returns an explicitly RC type (Exception, RCString) or code 
returning an "agnostic" unique pointer that may be used from 
either  gc or  nogc code.
I already see some holes and problems, but I wonder if something 
along these lines may be made to work.

Oct 01 2014

Jacob Carlborg <doob me.com> writes:

On 01/10/14 19:25, Oren T wrote:

 The idea is that the unique property is very short-lived: the caller
 immediately assigns it to a pointer of the appropriate policy: either RC
 or GC. This keeps the callee agnostic of the chosen policy and does not
 require templating multiple versions of the code. The allocator
 configured for the thread must match the generated code at the call site
 i.e. if the caller uses RC pointers the allocator must allocate space
 for the reference counter (at negative offset to keep compatibility).

Can't we do something like this, or it might be what you're proposing:

Foo foo () { return new Foo; }

 gc a = foo(); // a contains an instance of Foo allocated with the GC
 rc b = foo(); // b contains an instance of Foo allocated with the RC 
allocator

-- 
/Jacob Carlborg

Oct 01 2014

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:

On Thursday, 2 October 2014 at 06:29:24 UTC, Jacob Carlborg wrote:
  gc a = foo(); // a contains an instance of Foo allocated with 
 the GC
  rc b = foo(); // b contains an instance of Foo allocated with 
 the RC allocator

That would be better, but how do you deal with "bar(foo())" ? 
Context dependent instantiation is a semantic challenge when you 
also have overloading, but I guess you can get somewhere if you 
make whole program optimization mandatory and use a 
state-of-the-art constraint solver to handle the type system. 
Could lead you to NP-complete type resolution? But still doable 
(in most cases).

I think you basically have 2 realistic choices if you want 
easy-going syntax for the end user:

1. implement rc everywhere in standard libraries and make it 
possible to turn off rc in a call-chain by having compiler 
support (and whole program optimization). To support manual 
management you need some kind of protocol for traversing 
allocated data-structures to free them.

e.g.:

define memory strategy  malloc = 
some…manual…allocation…strategy…description;

auto a = bar(foo()); //  use gc or rc based on compiler flag
auto a =  rc( bar(foo()) ); // use rc in a gc context
auto a =  malloc( bar(foo()) ); // manual management (requires a 
protocol for traversal of recursive datastructures)


2. provide allocation strategy as a parameter

e.g.:

auto a = foo(); // alloc with gc
auto a = foo!rc(); // alloc with rc
auto a = foo!malloc(); // alloc with malloc

But going the C++ way of having explicit allocators and 
non-embedded reference counters (double indirectio) probably is 
the easier solution in terms of bringing D to completion.

How many years are you going to spend on making D ref count by 
default in a flawless and performant manner? Sure having RC being 
as easy to use as GC is a nice idea, but if it turns out to be 
either slower or more bug ridden than GC, then what is the point?

Note that:

1. A write to a ref-count means the 64 bytes cacheline is dirty 
and has to be written back to memory. So you don't write 4 bytes, 
  you write to 64 bytes. That's pretty expensive.

2. The memory bus is increasingly becoming the bottle neck of 
hardware architectures.

=> RC everywhere without heavy duty compiler/hardware support is 
a bad long term idea.

Oct 02 2014

Jacob Carlborg <doob me.com> writes:

On 02/10/14 11:41, "Ola Fosheim Grøstad" 
<ola.fosheim.grostad+dlang gmail.com>" wrote:

 That would be better, but how do you deal with "bar(foo())" ? Context
 dependent instantiation is a semantic challenge when you also have
 overloading, but I guess you can get somewhere if you make whole program
 optimization mandatory and use a state-of-the-art constraint solver to
 handle the type system. Could lead you to NP-complete type resolution?
 But still doable (in most cases).

I haven't really thought how it could be implemented but I was hoping 
that the caller could magically decide the allocation strategy instead 
of the callee. It looks like Rust is doing something like that but I 
haven't looked at it in detail.

-- 
/Jacob Carlborg

Oct 02 2014

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:

On Thursday, 2 October 2014 at 11:41:14 UTC, Jacob Carlborg wrote:
 I haven't really thought how it could be implemented but I was 
 hoping that the caller could magically decide the allocation 
 strategy instead of the callee. It looks like Rust is doing 
 something like that but I haven't looked at it in detail.

I haven't looked at Rust in detail, but doesn't the Rust compiler 
take full control over memory management? I think that is a good 
idea, but it is at odds with D's general direction.

Oct 02 2014

"Paulo Pinto" <pjmlp progtools.org> writes:

On Thursday, 2 October 2014 at 13:29:58 UTC, Ola Fosheim Grøstad
wrote:
 On Thursday, 2 October 2014 at 11:41:14 UTC, Jacob Carlborg 
 wrote:
 I haven't really thought how it could be implemented but I was 
 hoping that the caller could magically decide the allocation 
 strategy instead of the callee. It looks like Rust is doing 
 something like that but I haven't looked at it in detail.

 I haven't looked at Rust in detail, but doesn't the Rust 
 compiler take full control over memory management? I think that 
 is a good idea, but it is at odds with D's general direction.

Rust makes use of the type system and the borrow checker to
validate how the pointers are being used.

The usual errors when dealing with pointers are compile time
errors in Rust.

--
Paulo

Oct 02 2014

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:

On Thursday, 2 October 2014 at 18:52:18 UTC, Paulo Pinto wrote:
 Rust makes use of the type system and the borrow checker to
 validate how the pointers are being used.

 The usual errors when dealing with pointers are compile time
 errors in Rust.

They constrain usage so that you cannot share mutable objects. It 
is described in reasonable high level here:

http://doc.rust-lang.org/0.11.0/rust.html#memory-and-concurrency-models

But is sketchy on implementation details, semantic restrictions 
that follows and consequences when interacting with foreign code 
etc.

Oct 02 2014

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:

On Thursday, 2 October 2014 at 19:45:17 UTC, Ola Fosheim Grøstad 
wrote:
 But is sketchy on implementation details, semantic restrictions 
 that follows and consequences when interacting with foreign 
 code etc.

Some Rust details. «sendable» means that a reference can be 
transferred to another thread (or task/fiber/whatever).

 From http://doc.rust-lang.org/std/gc/ :

«The Gc type provides shared ownership of an immutable value. 
Destruction is not deterministic, and will occur some time 
between every Gc handle being gone and the end of the task. The 
garbage collector is task-local so Gc<T> is not sendable.»

 From http://doc.rust-lang.org/std/rc/index.html :

«The Rc type provides shared ownership of an immutable value. 
Destruction is deterministic, and will occur as soon as the last 
owner is gone. It is marked as non-sendable because it avoids the 
overhead of atomic reference counting.

The downgrade method can be used to create a non-owning Weak 
pointer to the box. A Weak pointer can be upgraded to an Rc 
pointer, but will return None if the value has already been 
freed.»

So… they don't really solve the issues a  nogc version of D 
should be able to deal with beyond having built-in unique_ptr 
style semantics?

Or?

Oct 02 2014

"Paulo Pinto" <pjmlp progtools.org> writes:

On Thursday, 2 October 2014 at 20:10:42 UTC, Ola Fosheim Grøstad 
wrote:
 On Thursday, 2 October 2014 at 19:45:17 UTC, Ola Fosheim 
 Grøstad wrote:
 But is sketchy on implementation details, semantic 
 restrictions that follows and consequences when interacting 
 with foreign code etc.

 Some Rust details. «sendable» means that a reference can be 
 transferred to another thread (or task/fiber/whatever).

 From http://doc.rust-lang.org/std/gc/ :

 «The Gc type provides shared ownership of an immutable value. 
 Destruction is not deterministic, and will occur some time 
 between every Gc handle being gone and the end of the task. The 
 garbage collector is task-local so Gc<T> is not sendable.»

 From http://doc.rust-lang.org/std/rc/index.html :

 «The Rc type provides shared ownership of an immutable value. 
 Destruction is deterministic, and will occur as soon as the 
 last owner is gone. It is marked as non-sendable because it 
 avoids the overhead of atomic reference counting.

 The downgrade method can be used to create a non-owning Weak 
 pointer to the box. A Weak pointer can be upgraded to an Rc 
 pointer, but will return None if the value has already been 
 freed.»

 So… they don't really solve the issues a  nogc version of D 
 should be able to deal with beyond having built-in unique_ptr 
 style semantics?

 Or?

The Gc type is gone as of this week.

https://github.com/rust-lang/meeting-minutes/blob/master/weekly-meetings/2014-09-30.md

Oct 02 2014

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:

On Thursday, 2 October 2014 at 20:42:16 UTC, Paulo Pinto wrote:
 The Gc type is gone as of this week.

 https://github.com/rust-lang/meeting-minutes/blob/master/weekly-meetings/2014-09-30.md

Thanks, apparently they do it because they want to make a proper 
tracing gc available later:

https://github.com/pnkfelix/rfcs/blob/fsk-remove-refcounting-gc-of-t/active/0000-remove-refcounting-gc-of-t.md

Oct 02 2014

"Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> writes:

On Wednesday, 1 October 2014 at 17:13:38 UTC, Andrei Alexandrescu 
wrote:
 On 10/1/14, 8:48 AM, Oren Tirosh wrote:
 Bingo. Have some way to mark the function return type as a 
 unique
 pointer.

 I'm skeptical about this approach (though clearly we need to 
 explore it for e.g. passing ownership of data across threads). 
 For strings and other "casual" objects I think we should focus 
 on GC/RC strategies. This is because people do things like:

 auto s = setExtension(s1, s2);

 and then attempt to use s as a regular variable (copy it etc). 
 Making s unique would make usage quite surprising and 
 cumbersome.

Sure? I already showed in an example how it is possible to chain 
calls seamlessly that return unique objects. The users would only 
notice it when they are trying to make a real copy (i.e. not 
borrowing). Do you think this happens frequently enough to be of 
concern?

Oct 01 2014

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 10/1/14, 1:56 PM, "Marc Schütz" <schuetzm gmx.net>" wrote:
 On Wednesday, 1 October 2014 at 17:13:38 UTC, Andrei Alexandrescu wrote:
 On 10/1/14, 8:48 AM, Oren Tirosh wrote:
 Bingo. Have some way to mark the function return type as a unique
 pointer.

 I'm skeptical about this approach (though clearly we need to explore
 it for e.g. passing ownership of data across threads). For strings and
 other "casual" objects I think we should focus on GC/RC strategies.
 This is because people do things like:

 auto s = setExtension(s1, s2);

 and then attempt to use s as a regular variable (copy it etc). Making
 s unique would make usage quite surprising and cumbersome.

 Sure? I already showed in an example how it is possible to chain calls
 seamlessly that return unique objects. The users would only notice it
 when they are trying to make a real copy (i.e. not borrowing). Do you
 think this happens frequently enough to be of concern?

I'd think so. -- Andrei

Oct 01 2014

"Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> writes:

On Wednesday, 1 October 2014 at 15:48:39 UTC, Oren Tirosh wrote:
 On Tuesday, 30 September 2014 at 19:10:19 UTC, Marc Schütz 
 wrote:
 One problem with actually implementing this is that using 
 reference counting as a memory management policy requires extra 
 space for the reference counter in the object, just as garbage 
 collection requires support for scanning and identification of 
 interior object memory range. While allocation and memory 
 management may be quite independent in theory, practical high 
 performance implementations tend to be intimately related.

 (I'll try to make a sketch on how this can be implemented in 
 another post.)

 Do elaborate!

 As a conclusion, I would say that APIs should strive for the 
 following principles, in this order:

 1. Avoid allocation altogether, for example by laziness 
 (ranges), or by accepting sinks.

 2. If allocations are necessary (or desirable, to make the API 
 more easily usable), try hard to return a unique value (this 
 of course needs to be expressed in the return type).

 3. If both of the above fails, only then return a GCed 
 pointer, or alternatively provide several variants of the 
 function (though this shouldn't be necessary often). An 
 interesting alternative: Instead of passing a flag directly 
 describing the policy, pass the function a type that it should 
 wrap it's return value in.

 As for the _allocation_ strategy: It indeed needs to be 
 configurable, but here, the same objections against a template 
 parameter apply. As the allocator doesn't necessarily need to 
 be part of the type, a (thread) global variable can be used to 
 specify it. This lends itself well to idioms like

    with(MyAllocator alloc) {
        // ...
    }

 Assuming there is some dependency between the allocator and the 
 memory management policy I guess this would be initialized on 
 thread start that cannot be modified later. All code running 
 inside the thread would need to either match the configured 
 policy, not handle any kind of pointers or use a limited subset 
 of unique pointers. Another way to ensure that code can run on 
 either RC or GC is to make certain objects (specifically, 
 Exceptions) always allocate a reference counter, regardless of 
 the currently configured policy.

I don't have all answers to these questions. Still, I'm convinced 
this is doable.

A straight-forwarding and general way to convert a unique object 
to a ref-counted one is to allocate new memory for it plus the 
reference count, move the original object into it, and release 
the original memory. This is safe, because there can be no 
external pointers to the object, as it is unique. Of course, this 
can be optimized if the allocator supports extending an 
allocation. It could then preallocate a few extra bytes at the 
end to make the extend operation always succeed, similar to your 
suggestion to always allocate a reference counter.

I think the most difficult part is to find an efficient and 
user-friendly way for the wrapper types to get at the allocator. 
Maybe the allocators should all implement an interface (a real 
one, not duck-typing). The wrappers (Owned, RC) can then include 
a pointer to the allocator (or for RC, embed it next to the 
reference count). This would make it possible to specify a 
(thread) global default allocator at runtime, which all library 
functions use by convention (for example let's call it `alloc`, 
then they would call `alloc.make!MyStruct()`). At the same time, 
it is safe to change the default allocator at any time, and to 
use different allocators in parallel in the same thread.

The alternative is obviously a template parameter to the function 
that returns the unique object. But this unfortunately is then 
not restricted to just the function, but "infects" the return 
type, too. And from there, it needs to spread to the RC wrapper, 
or any containers. Thus we'd have incompatible RC types, which I 
would imagine would be very inconvenient and restrictive. 
Besides, it would probably be too tedious to specify the 
allocator everywhere.

Therfore, I think the additional cost of an allocator interface 
pointer is worth it. For Owned!T (with T being a pointer or 
reference), it would just be two words, which we can return 
efficiently. We already have slices doing that, and AFAIK there's 
no significantly worse performance because of them.

Oct 01 2014

"Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> writes:

On Wednesday, 1 October 2014 at 15:48:39 UTC, Oren Tirosh wrote:
 On Tuesday, 30 September 2014 at 19:10:19 UTC, Marc Schütz 
 wrote:
 (I'll try to make a sketch on how this can be implemented in 
 another post.)

 Do elaborate!

Here's an example implementation of what I have in mind (totally 
untested and won't compile because of `scope`):

http://wiki.dlang.org/User:Schuetzm/RC,_Owned_and_allocators

This is just a sketch to explain the general idea. Some things 
probably won't work as implemented, especially the disable 
postblit and opAssign() of Owned!T. I think it needs to implement 
implicit moving, otherwise one would have to call `release()` 
everywhere.

As in the other post, the function that produces the value 
returns Owned!T. The types don't require  unique however 
(although integration with DMD's idea of unique would still be 
useful).

Because of auto-borrowing via alias this, Owned!T and RC!T both 
can pass their payloads to functions that accept them by `scope`. 
The ref-count is not touched for borrowing.

Usage examples:

     Owned!string setExtension(in char[] path, in char[] ext);

     void saveFileAs(in char[] name) {
         import std.path: setExtension;
         import std.file: write;
         name.                    // scope const(char[])
             setExtension("txt"). // Owned!string
             write(data);
     }

     RC!string[] stringList;

     void addToGlobalList(scope RC!string s) {
         stringList ~= s;    // increments ref-count
     }

     RC!string foo;
     addToGlobalList(foo);   // borrowing doesn't change ref-count

     auto newFileName = "hello-world".setExtension("txt");
     auto tmp1 = newFileName;       // ERROR: cannot copy
     scope tmp2 = newFileName;      // OK, borrowing
     foo = newFileName;             // ERROR: cannot copy
     foo = newFileName.release();   // OK, move
     auto bar = newFileName.toRC(); // ditto

Oct 02 2014

Manu via Digitalmars-d <digitalmars-d puremagic.com> writes:

On 29 September 2014 20:49, Andrei Alexandrescu via Digitalmars-d
<digitalmars-d puremagic.com> wrote:
 [...]

 Destroy!

 Andrei

I generally like the idea, but my immediate concern is that it implies
that every function that may deal with allocation is a template.
This interferes with C/C++ compatibility in a pretty big way. Or more
generally, the idea of a lib. Does this mean that a lib will be
required to produce code for every permutation of functions according
to memory management strategy? Usually libs don't contain code for
uninstantiated templates.

With this in place, I worry that traditional use of libs, separate
compilation, external language linkage, etc, all become very
problematic.
Pervasive templates can only work well if all code is D code, and if
all code is compiled together.
Most non-OSS industry doesn't ship source, they ship libs. And if libs
are to become impractical, then dependencies become a problem; instead
of linking libphobos.so, you pretty much have to compile phobos
together with your app (already basically true for phobos, but it's
fairly unique).
What if that were a much larger library? What if you have 10s of
dependencies all distributed in this manner? Does it scale?

I guess this doesn't matter if this is only a proposal for phobos...
but I suspect the pattern will become pervasive if it works, and yeah,
I'm not sure where that leads.

Sep 30 2014

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 9/30/14, 6:53 PM, Manu via Digitalmars-d wrote:
 I generally like the idea, but my immediate concern is that it implies
 that every function that may deal with allocation is a template.
 This interferes with C/C++ compatibility in a pretty big way. Or more
 generally, the idea of a lib. Does this mean that a lib will be
 required to produce code for every permutation of functions according
 to memory management strategy? Usually libs don't contain code for
 uninstantiated templates.

If a lib chooses one specific memory management policy, it can of course 
be non-templated with regard to that. If it wants to offer its users the 
choice, it would probably have to offer some templates.

 With this in place, I worry that traditional use of libs, separate
 compilation, external language linkage, etc, all become very
 problematic.
 Pervasive templates can only work well if all code is D code, and if
 all code is compiled together.
 Most non-OSS industry doesn't ship source, they ship libs. And if libs
 are to become impractical, then dependencies become a problem; instead
 of linking libphobos.so, you pretty much have to compile phobos
 together with your app (already basically true for phobos, but it's
 fairly unique).
 What if that were a much larger library? What if you have 10s of
 dependencies all distributed in this manner? Does it scale?

 I guess this doesn't matter if this is only a proposal for phobos...
 but I suspect the pattern will become pervasive if it works, and yeah,
 I'm not sure where that leads.

Thanks for the point. I submit that Phobos has and will be different 
from other D libraries; as the standard library, it has the role of 
supporting widely varying needs, and as such it makes a lot of sense to 
make it highly generic and configurable. Libraries that are for specific 
domains can avail themselves of a narrower design scope.


Andrei

Oct 01 2014

=?UTF-8?B?Ik5vcmRsw7Z3Ig==?= <per.nordlow gmail.com> writes:

On Monday, 29 September 2014 at 10:49:53 UTC, Andrei Alexandrescu 
wrote:
 Back when I've first introduced RCString I hinted that we have 
 a larger strategy in mind. Here it is.

Slightly related :)

https://github.com/D-Programming-Language/phobos/pull/2573

Sep 30 2014

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 9/30/14, 10:46 PM, "Nordlöw" wrote:
 On Monday, 29 September 2014 at 10:49:53 UTC, Andrei Alexandrescu wrote:
 Back when I've first introduced RCString I hinted that we have a
 larger strategy in mind. Here it is.

 Slightly related :)

 https://github.com/D-Programming-Language/phobos/pull/2573

Nice, thanks! -- Andrei

Oct 01 2014

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

29-Sep-2014 14:49, Andrei Alexandrescu пишет:
 Back when I've first introduced RCString I hinted that we have a larger
 strategy in mind. Here it is.

[snip]

I think it would be well worth it to actually do a bit of research.
Before we get into the fry and spill blood (or LOCs) everywhere.

Can we:

1. Present a list of allocating functions.

2. What they (currently) allocate: string, T[], V[K] or something else.

3. See what alternatives they have (that do not allocate if any).

4. Plot course for these that do not have. (Just listing how function 
signature would change is good enough).

Thanks!

P.S. If there are no takers I'd get do myself it in a week or so.

-- 
Dmitry Olshansky

Oct 03 2014

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 10/3/14, 11:27 AM, Dmitry Olshansky wrote:
 29-Sep-2014 14:49, Andrei Alexandrescu пишет:
 Back when I've first introduced RCString I hinted that we have a larger
 strategy in mind. Here it is.

 [snip]

 I think it would be well worth it to actually do a bit of research.
 Before we get into the fry and spill blood (or LOCs) everywhere.

 Can we:

 1. Present a list of allocating functions.

Awesome. I just started 
http://wiki.dlang.org/Stuff_in_Phobos_That_Generates_Garbage and I 
encourage us all to add to it (sorted by module and then by artifact name).

 2. What they (currently) allocate: string, T[], V[K] or something else.

Mention that in the "Possible Fix(es)" column.

 3. See what alternatives they have (that do not allocate if any).

Yah.

 4. Plot course for these that do not have. (Just listing how function
 signature would change is good enough).

Yah.

 Thanks!

 P.S. If there are no takers I'd get do myself it in a week or so.

Let's all get this rolling!


Andrei

Oct 03 2014

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

03-Oct-2014 23:50, Andrei Alexandrescu пишет:
 On 10/3/14, 11:27 AM, Dmitry Olshansky wrote:
 29-Sep-2014 14:49, Andrei Alexandrescu пишет:
 Back when I've first introduced RCString I hinted that we have a larger
 strategy in mind. Here it is.

 [snip]

 I think it would be well worth it to actually do a bit of research.
 Before we get into the fry and spill blood (or LOCs) everywhere.

 Can we:

 1. Present a list of allocating functions.

 Awesome. I just started
 http://wiki.dlang.org/Stuff_in_Phobos_That_Generates_Garbage and I
 encourage us all to add to it (sorted by module and then by artifact name).

Glad you liked it.

Being in favor of automation as a start I just toggled -vgc flag in 
Win64 makefile and built phobos. Raw data (CSV) is here:

https://gist.github.com/anonymous/763adcd62ab60a66e9d8

Time to mine it...

-- 
Dmitry Olshansky

Oct 03 2014

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 10/3/14, 1:18 PM, Dmitry Olshansky wrote:
 03-Oct-2014 23:50, Andrei Alexandrescu пишет:
 On 10/3/14, 11:27 AM, Dmitry Olshansky wrote:
 29-Sep-2014 14:49, Andrei Alexandrescu пишет:
 Back when I've first introduced RCString I hinted that we have a larger
 strategy in mind. Here it is.

 [snip]

 I think it would be well worth it to actually do a bit of research.
 Before we get into the fry and spill blood (or LOCs) everywhere.

 Can we:

 1. Present a list of allocating functions.

 Awesome. I just started
 http://wiki.dlang.org/Stuff_in_Phobos_That_Generates_Garbage and I
 encourage us all to add to it (sorted by module and then by artifact
 name).

 Glad you liked it.

 Being in favor of automation as a start I just toggled -vgc flag in
 Win64 makefile and built phobos. Raw data (CSV) is here:

 https://gist.github.com/anonymous/763adcd62ab60a66e9d8

 Time to mine it...

D script that generates wikitable from that -> awesomeness. -- Andrei

Oct 03 2014

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

04-Oct-2014 00:21, Andrei Alexandrescu пишет:
 On 10/3/14, 1:18 PM, Dmitry Olshansky wrote:
 03-Oct-2014 23:50, Andrei Alexandrescu пишет:
 On 10/3/14, 11:27 AM, Dmitry Olshansky wrote:
 29-Sep-2014 14:49, Andrei Alexandrescu пишет:
 Back when I've first introduced RCString I hinted that we have a
 larger
 strategy in mind. Here it is.

 [snip]

 I think it would be well worth it to actually do a bit of research.
 Before we get into the fry and spill blood (or LOCs) everywhere.

 Can we:

 1. Present a list of allocating functions.

 Awesome. I just started
 http://wiki.dlang.org/Stuff_in_Phobos_That_Generates_Garbage and I
 encourage us all to add to it (sorted by module and then by artifact
 name).

 Glad you liked it.

 Being in favor of automation as a start I just toggled -vgc flag in
 Win64 makefile and built phobos. Raw data (CSV) is here:

 https://gist.github.com/anonymous/763adcd62ab60a66e9d8

 Time to mine it...

 D script that generates wikitable from that -> awesomeness. -- Andrei

I'm on it. With GitHub source links. D's regex rocks ;)


-- 
Dmitry Olshansky

Oct 03 2014

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

04-Oct-2014 00:21, Dmitry Olshansky пишет:
 04-Oct-2014 00:21, Andrei Alexandrescu пишет:
 On 10/3/14, 1:18 PM, Dmitry Olshansky wrote:
 03-Oct-2014 23:50, Andrei Alexandrescu пишет:



[snip]
 Glad you liked it.

 Being in favor of automation as a start I just toggled -vgc flag in
 Win64 makefile and built phobos. Raw data (CSV) is here:

 https://gist.github.com/anonymous/763adcd62ab60a66e9d8

 Time to mine it...

 D script that generates wikitable from that -> awesomeness. -- Andrei

 I'm on it. With GitHub source links. D's regex rocks ;)

Forgot my wiki credentials. Anyhow I got passable Markdown page fairly 
quickly. Looks like this:

https://github.com/DmitryOlshansky/phobos/wiki/Phobos-GC-happy-list!

Tool to get it, anybody feel free to take over from here:

https://gist.github.com/anonymous/dc0000d3b801a7bedff0

Takes DMD's output from stdin, so:

make -f posix.mak | ./this_script

(Needs -vgc flag obviously)

-- 
Dmitry Olshansky

Oct 03 2014

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

04-Oct-2014 00:42, Dmitry Olshansky пишет:
 04-Oct-2014 00:21, Dmitry Olshansky пишет:
 04-Oct-2014 00:21, Andrei Alexandrescu пишет:
 On 10/3/14, 1:18 PM, Dmitry Olshansky wrote:
 03-Oct-2014 23:50, Andrei Alexandrescu пишет:



 [snip]
 Glad you liked it.

 Being in favor of automation as a start I just toggled -vgc flag in
 Win64 makefile and built phobos. Raw data (CSV) is here:

 https://gist.github.com/anonymous/763adcd62ab60a66e9d8

 Time to mine it...

 D script that generates wikitable from that -> awesomeness. -- Andrei

 I'm on it. With GitHub source links. D's regex rocks ;)

 Forgot my wiki credentials. Anyhow I got passable Markdown page fairly
 quickly. Looks like this:

 https://github.com/DmitryOlshansky/phobos/wiki/Phobos-GC-happy-list!

Ehm, rather (without '!' at the end):
https://github.com/DmitryOlshansky/phobos/wiki/Phobos-GC-happy-list

-- 
Dmitry Olshansky

Oct 03 2014

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

04-Oct-2014 00:56, Dmitry Olshansky пишет:
 04-Oct-2014 00:42, Dmitry Olshansky пишет:
 04-Oct-2014 00:21, Dmitry Olshansky пишет:
 04-Oct-2014 00:21, Andrei Alexandrescu пишет:
 On 10/3/14, 1:18 PM, Dmitry Olshansky wrote:
 03-Oct-2014 23:50, Andrei Alexandrescu пишет:



 [snip]
 Glad you liked it.

 Being in favor of automation as a start I just toggled -vgc flag in
 Win64 makefile and built phobos. Raw data (CSV) is here:

 https://gist.github.com/anonymous/763adcd62ab60a66e9d8

 Time to mine it...

 D script that generates wikitable from that -> awesomeness. -- Andrei




Got it:
https://gist.github.com/DmitryOlshansky/d718be4ec12158cf2f02

Tries hard to detect class & function name (it's all on heuristics + 
regex... e-hm) and generates mediawiki table.

DWiki won't let me edit it, but the output is here:
https://gist.github.com/DmitryOlshansky/341aa7f6d6f0d53ffc59

Anybody with a proper D parser may do a way better job ;)

-- 
Dmitry Olshansky

Oct 03 2014

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 10/3/14, 3:59 PM, Dmitry Olshansky wrote:
 Got it:
 https://gist.github.com/DmitryOlshansky/d718be4ec12158cf2f02

 Tries hard to detect class & function name (it's all on heuristics +
 regex... e-hm) and generates mediawiki table.

 DWiki won't let me edit it, but the output is here:
 https://gist.github.com/DmitryOlshansky/341aa7f6d6f0d53ffc59

 Anybody with a proper D parser may do a way better job ;)

Tried to insert it, looks weird. Probably it would be most effective if 
you fixed your wiki account. Thanks! -- Andrei

Oct 03 2014

D Programming

C/C++ Programming

Other

digitalmars.D - RFC: moving forward with nogc Phobos