digitalmars.D - On heap segregation, GC optimization and nogc relaxing

deadalnix (183/183) Nov 11 2014 Hi all,

Orvid King (33/221) Nov 11 2014 I think a combination of the C++'s standard library's approach
Rikki Cattermole (60/229) Nov 11 2014 Humm.

deadalnix (6/7) Nov 11 2014 yes and no. The ideas is similar, but it is not doable at library

"Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> (11/19) Nov 12 2014 I'm not sure. A library implementation may be feasible. For

"Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> (14/35) Nov 13 2014 I now remember that I played around with a related idea when

deadalnix (7/16) Nov 13 2014 If you have a owned type, you don't need a builtin RC type. As

Walter Bright (3/4) Nov 11 2014 Thanks for an excellent summary of the problem. I can't just read your s...

deadalnix (5/9) Nov 11 2014 That is quite difficult to explain with drawing. Maybe I can

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (71/101) Nov 12 2014 "ownership of data" is one possible solution, but not the problem.

deadalnix (21/27) Nov 12 2014 I'm sorry to be blunt, but there is nothing actionable in your

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (14/27) Nov 12 2014 My point is that you are making too many assumptions about both
Paulo Pinto (9/13) Nov 12 2014 Given the support on Haskell, Clojure and C++ I am not sure if

ponce (12/25) Nov 12 2014 I actually tested Haswell HLE and was underwhelmed (not the full

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (7/9) Nov 12 2014 STM = software based transactional memory (without hardware

ponce (8/14) Nov 12 2014 I was meaning HTM instead, good catch.

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (27/36) Nov 12 2014 Yes, Intel style HTM is only an optimization for high contention

"Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> (11/66) Nov 12 2014 All this is unfortunately only true if there are no references

deadalnix (4/7) Nov 12 2014 Yes, that is exactly why I'm listing the case where these can be

"Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> (9/18) Nov 13 2014 Hmm... I can't find that in what you wrote. To clarify: I'm

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (7/10) Nov 13 2014 I sometimes wonder if the very pragmatic and brutal option of
deadalnix (9/29) Nov 13 2014 You need a set of root from the TL heap. The trick being to

"Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> (12/20) Nov 14 2014 Consider this:

deadalnix (6/31) Nov 14 2014 That is a well covered subject and told you what to google for as

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (18/24) Nov 14 2014 It would help if you post links to articles.

deadalnix (35/35) Nov 14 2014 ML is interesting because it emphasis immutability. For

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (10/19) Nov 15 2014 Thanks for the link! I agree ML has some interesting work done

deadalnix (5/14) Nov 15 2014 The small heap do not apply for us. We can't reproduce that part

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (25/42) Nov 18 2014 Maybe, but immutable in ML means that two different immutable

deadalnix (26/28) Nov 18 2014 What you need is for each thread to provide you a list of roots.

Guillaume Chatelet (4/4) Nov 21 2014 What would be the next step to keep this effort going forward ?

"Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> (7/12) Nov 16 2014 Ok, you caught me there :-P

Paulo Pinto (3/16) Nov 17 2014 If you mean OCaml, the runtime is being made multi-thread.

Dmitry Olshansky (25/104) Nov 12 2014 Aye.

deadalnix (24/41) Nov 12 2014 Yes, the unsafe road must always be open, we are a system

Dmitry Olshansky (17/62) Nov 14 2014 Sorry, forgot to reply.

"Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> (10/22) Nov 14 2014 It needs to be `owned(Exception)`, otherwise, how could the

deadalnix (3/11) Nov 14 2014 That is a very good question. I do think this should show the

Walter Bright (3/11) Nov 13 2014 If that wrapper is automatically generated by the compiler, so the user ...

deadalnix (5/21) Nov 13 2014 Yes, that is the intention. It means that my proposal does

Walter Bright (8/8) Nov 17 2014 Nice article.

deadalnix (60/71) Nov 17 2014 Ok, I'm gonna respond in reverse order, so I can use one answer

Walter Bright (5/8) Nov 17 2014 There was a lot of skepticism initially when I proposed that ref be a st...

deadalnix (3/13) Nov 17 2014 ref only make sense at interface boundary. That is not the same

Dicebot (6/6) Dec 04 2014 Finally got a look at your proposal. While I do agree with many

deadalnix (20/26) Dec 04 2014 I don't think this is solving the same problem as Marc's proposal

Dicebot (12/22) Dec 05 2014 Indeed. But both combined add too much complexity to be feasible

"deadalnix" <deadalnix gmail.com> writes:

Hi all,

I want to get back on the subject of ownership, lifetime and 
propose some solution, but before, propose to state the problem 
in a way that haven't seen before (even if I have no doubt some 
have came to the same conclusion in the past).

The problem at hand is double: memory management and thread 
safety. Number one has been a hot topic for ages, and number 2 
has become very over the past years, to the widespreading of 
multicores CPU.

The problem at hand here is ownership of data. There are 3 roads 
you can go about it:
  - immutability and GC. Effectively, these 2 technique allow you 
to get rid of ownership. There are advantages and drawbacks i'm 
going to discuss later.
  - Being unsafe and rely on convention. This is the C++ road (and 
a possible road in D). It allow to implement almost any wanted 
scheme, but come at great cost for the developer.
  - Annotations. This is the Rust road. It also come a great cost 
for the developer, as some schemes may be non trivial to express 
granted the type system, but, contrary to the C++ road, is safe.

These approach all have some very nice things going on for them, 
but also some killer scenarios.

Immutability+GC allow to have safety while keeping interfaces 
simple. That is of great value. It also come with some nice 
goodies, in the sense that is it easy and safe to shared data 
without bookkeeping, allowing one to fit more in cache, and 
reduce the amount of garbage created. Most text processing apps 
fall into this category and this is why D is that good at them. 
Another big goodies is that many lock free algorithm become 
possible. Once you remove the need for bookkeeping of ownership 
many operations can be implemented in an atomic manner. 
Additionally, it is possible to implement various GC optimization 
on immutable heap, which make the GC generally more efficient. 
But the cost is also real. For some use case, this mean having a 
large amount of garbage generated (Carmack wrote a piece on 
haskell were he mention the disastrous effect that having a 
framebuffer immutable would have: you'd have to clone it 
everytime you draw in it, which is a no go). GC also tend to 
cause unpredictable runtime characteristics, which programs with 
real time constraint can have hard time to deal with.

Relying on convention has the advantage that any scheme can be 
implemented without constraint, while keeping interface simple. 
The obvious drawback is that it is time consuming and error 
prone. It also make a lot of things unclear, and dev choose the 
better safe than sorry road. That mean excessive copying to make 
sure one own the data, which is wasteful (in term of work for the 
copy itself, garbage generation and cache pressure). If this must 
be an option locally for system code, it doesn't seems like this 
is the right option at program scale and we do it in C++ simply 
because we have to.

Finally, annotations are a great way to combine safety and speed, 
but generally come at a great cost when implenting uncommon 
ownership strategies where you ends up having to express complex 
lifetime and ownership relations.

Ideally, we want to map with what the hardware does. So what does 
the hardware do ?

Multicore CPU have various cores, each of them having layers of 
cache. Cache is organized in cache line and each cache line can 
be in various modes. Actual system are quite complex and deal 
with problems we are not very interesting here (like writeback) 
but the general idea is that every cache line is owned with 
different modes.

Either the cache line is owned by a single core and can be 
written to, or the cache line shared by several cores, each of 
them having a local copy of the line, but none of them can write 
to. There is an internal bus where cores can exchange cache line 
with each other and messages to acquire cache line in read or 
read/write mode. That mean CPU are good at thread local 
read/write, shared immutable and transfer of ownership from one 
core to the other. They are bad at shared writable data (as 
effectively, the cache line will have to bounce back and forth 
between cores, and all memory access will need to be serialized 
instead of performed out of order).

In that world, D has a bizaro position were it use a combination 
of annotations (immutable, shared) and GC. Ultimately, this is a 
good solution. Using annotation for common cases, fallback on 
GC/unsafe code when these annotations fall short.

Before going into why it is fallign short, a digression on GC and 
the benefits of segregating the heap. In D, the heap is almost 
segregated in 3 groups: thread local, shared and immutable. These 
group are very interesting for the GC:
  - Thread local heap can be collected while disturbing only one 
thread. It should be possible to use different strategy in 
different threads.
  - Immutable heap can be collected 100% concurrently without any 
synchronization with the program.
  - Shared heap is the only one that require disturbing the whole 
program, but as a matter of good practice, this heap should be 
small anyway.

Various ML family languages (like OCaml) have adopted segregated 
heap strategy and get great benefice out of it. For instance, 
OCaml's GC is known to outperform Java's in most scenarios.

We are sitting on a huge GC goldmine here, but 3 things prevent 
us to exploit it:
  - Exceptions. They can bubble from one thread to the other and 
create implicit sharing.
  - Uniqueness (as it is defined now) as it allow for unique 
object to be merged with any heap.
  - message passing. Ownership transfert is not possible and so 
unsafe casting ensue.

  * It has to be noted that delegate allow as well for this kind 
of stunt, but this is recognized as a bug by now and hopefully it 
is gonna be fixed.

D has a type qualifier system for which we pay a big price. 
Getting everything const correct is difficult. We'd want to get 
the most bang for the buck. One of the bang we are not far to be 
able to get is segregating the heap. That mean shitty GC and 
unsafe code.

Let's present a concrete exemple using ownership:
pure Object foo() { ... }
immutable o = foo();

This is valid code. However, foo can do arbitrary manipulation to 
come up with the object. These include various allocations. These 
allocation are mutable into foo, which makes it impossible to 
allocate them on the immutable heap (as a GC relying on this 
immutability could mess up things pretty bad). They also cannot 
be allocated on the TL heap as once promoted to immutable, the 
data become shared as well.

On the other hand, ownership means that the compiler can know 
when things go out of scope and free them explicitly. Which is a 
plus as generating less garbage is always a way to improve 
garbage collection. The most efficient work there is is the one 
that do not need to be done.

I'd argue for the introduction of a basic ownership system. 
Something much simpler than rust's, that do not cover all uses 
cases. But the good thing is that we can fallback on GC or unsafe 
code when the system show its limits. That mean we rely less on 
the GC, while being able to provide a better GC.

We already pay a cost at interface with type qualifier, let's 
make the best of it ! I'm proposing to introduce a new type 
qualifier for owned data.

Now it means that throw statement expect a owned(Throwable), that 
pure function that currently return an implicitly unique object 
will return owned(Object) and that message passing will accept to 
pass around owned stuff.

The GC heap can be segregated into island. We currently have 3 
types of islands : Thread local, shared and immutable. These are 
builtin island with special characteristics in the language. The 
new qualifier introduce a new type of island, the owned island.

owned island can only refers to other owned island and to 
immutable. they can be merged in any other island at any time 
(that is why they can't refers to TL or shared).

owned(T) can be passed around as function parameter or returned, 
or stored as fields. When doing so they are consumed. When an 
owned is not consumed and goes out of scope, the whole island is 
freed.

That means that owned(T) can implicitly decay into T, 
immutable(T), shared(T) at any time. When doing so, a call to the 
runtime is done to merge the owned island to the corresponding 
island. It is passed around as owned, then the ownership is 
transferred and all local references to the island are 
invalidated (using them is an error).

On an implementation level, a call to a pure function that return 
an owned could look like this :

{
   IslandID __saved = gc_switch_new_island();
   scope(exit) gc_restore_island(__saved);

   call_pure_function();
}

This allow us to rely much less on the GC and allow for a better 
GC implementation.

 nogc . Remember ? It was in the title. What does a  nogc 
function look like ? a no gc function o not produce any garbage 
or trigger the collection cycle. there is no reason per se to 
prevent the  nogc code to allocate on the GC as long as you know 
it won't produce garbage. That mean the only operation you need 
to ban are the one that merge the owned things into TL, shared or 
immutable heap.

This solves the problem of the  nogc + Exception. As Exception 
are isolated, they can be allocated, throw and catched into  nogc 
code without generating garbage. They can safely bubble out of 
the  nogc section of the code and still be safe.

The same way, it open the door for a LOT of code that is not 
 nogc to be. If the code allocate memory in an owned island and 
return it, then it is now up to the caller to decide whether is 
want's it garbage collected or keep it as owned (and/or make it 
reference counted for instance).

The solution of passing a policy at compile for allocation is 
close to what C++'s stdlib is doing, and even if the proposed 
approach by Andrei is better, I don't think this is a good one. 
The proposed approach allow for a lot of code to be marked as 
 nogc and allow for the caller to decide. That is ultimately what 
we want libraries to look like.

Nov 11 2014

"Orvid King" <blah38621 gmail.com> writes:

On Wednesday, 12 November 2014 at 02:34:55 UTC, deadalnix wrote:
 Hi all,

 I want to get back on the subject of ownership, lifetime and 
 propose some solution, but before, propose to state the problem 
 in a way that haven't seen before (even if I have no doubt some 
 have came to the same conclusion in the past).

 The problem at hand is double: memory management and thread 
 safety. Number one has been a hot topic for ages, and number 2 
 has become very over the past years, to the widespreading of 
 multicores CPU.

 The problem at hand here is ownership of data. There are 3 
 roads you can go about it:
  - immutability and GC. Effectively, these 2 technique allow 
 you to get rid of ownership. There are advantages and drawbacks 
 i'm going to discuss later.
  - Being unsafe and rely on convention. This is the C++ road 
 (and a possible road in D). It allow to implement almost any 
 wanted scheme, but come at great cost for the developer.
  - Annotations. This is the Rust road. It also come a great 
 cost for the developer, as some schemes may be non trivial to 
 express granted the type system, but, contrary to the C++ road, 
 is safe.

 These approach all have some very nice things going on for 
 them, but also some killer scenarios.

 Immutability+GC allow to have safety while keeping interfaces 
 simple. That is of great value. It also come with some nice 
 goodies, in the sense that is it easy and safe to shared data 
 without bookkeeping, allowing one to fit more in cache, and 
 reduce the amount of garbage created. Most text processing apps 
 fall into this category and this is why D is that good at them. 
 Another big goodies is that many lock free algorithm become 
 possible. Once you remove the need for bookkeeping of ownership 
 many operations can be implemented in an atomic manner. 
 Additionally, it is possible to implement various GC 
 optimization on immutable heap, which make the GC generally 
 more efficient. But the cost is also real. For some use case, 
 this mean having a large amount of garbage generated (Carmack 
 wrote a piece on haskell were he mention the disastrous effect 
 that having a framebuffer immutable would have: you'd have to 
 clone it everytime you draw in it, which is a no go). GC also 
 tend to cause unpredictable runtime characteristics, which 
 programs with real time constraint can have hard time to deal 
 with.

 Relying on convention has the advantage that any scheme can be 
 implemented without constraint, while keeping interface simple. 
 The obvious drawback is that it is time consuming and error 
 prone. It also make a lot of things unclear, and dev choose the 
 better safe than sorry road. That mean excessive copying to 
 make sure one own the data, which is wasteful (in term of work 
 for the copy itself, garbage generation and cache pressure). If 
 this must be an option locally for system code, it doesn't 
 seems like this is the right option at program scale and we do 
 it in C++ simply because we have to.

 Finally, annotations are a great way to combine safety and 
 speed, but generally come at a great cost when implenting 
 uncommon ownership strategies where you ends up having to 
 express complex lifetime and ownership relations.

 Ideally, we want to map with what the hardware does. So what 
 does the hardware do ?

 Multicore CPU have various cores, each of them having layers of 
 cache. Cache is organized in cache line and each cache line can 
 be in various modes. Actual system are quite complex and deal 
 with problems we are not very interesting here (like writeback) 
 but the general idea is that every cache line is owned with 
 different modes.

 Either the cache line is owned by a single core and can be 
 written to, or the cache line shared by several cores, each of 
 them having a local copy of the line, but none of them can 
 write to. There is an internal bus where cores can exchange 
 cache line with each other and messages to acquire cache line 
 in read or read/write mode. That mean CPU are good at thread 
 local read/write, shared immutable and transfer of ownership 
 from one core to the other. They are bad at shared writable 
 data (as effectively, the cache line will have to bounce back 
 and forth between cores, and all memory access will need to be 
 serialized instead of performed out of order).

 In that world, D has a bizaro position were it use a 
 combination of annotations (immutable, shared) and GC. 
 Ultimately, this is a good solution. Using annotation for 
 common cases, fallback on GC/unsafe code when these annotations 
 fall short.

 Before going into why it is fallign short, a digression on GC 
 and the benefits of segregating the heap. In D, the heap is 
 almost segregated in 3 groups: thread local, shared and 
 immutable. These group are very interesting for the GC:
  - Thread local heap can be collected while disturbing only one 
 thread. It should be possible to use different strategy in 
 different threads.
  - Immutable heap can be collected 100% concurrently without 
 any synchronization with the program.
  - Shared heap is the only one that require disturbing the 
 whole program, but as a matter of good practice, this heap 
 should be small anyway.

 Various ML family languages (like OCaml) have adopted 
 segregated heap strategy and get great benefice out of it. For 
 instance, OCaml's GC is known to outperform Java's in most 
 scenarios.

 We are sitting on a huge GC goldmine here, but 3 things prevent 
 us to exploit it:
  - Exceptions. They can bubble from one thread to the other and 
 create implicit sharing.
  - Uniqueness (as it is defined now) as it allow for unique 
 object to be merged with any heap.
  - message passing. Ownership transfert is not possible and so 
 unsafe casting ensue.

  * It has to be noted that delegate allow as well for this kind 
 of stunt, but this is recognized as a bug by now and hopefully 
 it is gonna be fixed.

 D has a type qualifier system for which we pay a big price. 
 Getting everything const correct is difficult. We'd want to get 
 the most bang for the buck. One of the bang we are not far to 
 be able to get is segregating the heap. That mean shitty GC and 
 unsafe code.

 Let's present a concrete exemple using ownership:
 pure Object foo() { ... }
 immutable o = foo();

 This is valid code. However, foo can do arbitrary manipulation 
 to come up with the object. These include various allocations. 
 These allocation are mutable into foo, which makes it 
 impossible to allocate them on the immutable heap (as a GC 
 relying on this immutability could mess up things pretty bad). 
 They also cannot be allocated on the TL heap as once promoted 
 to immutable, the data become shared as well.

 On the other hand, ownership means that the compiler can know 
 when things go out of scope and free them explicitly. Which is 
 a plus as generating less garbage is always a way to improve 
 garbage collection. The most efficient work there is is the one 
 that do not need to be done.

 I'd argue for the introduction of a basic ownership system. 
 Something much simpler than rust's, that do not cover all uses 
 cases. But the good thing is that we can fallback on GC or 
 unsafe code when the system show its limits. That mean we rely 
 less on the GC, while being able to provide a better GC.

 We already pay a cost at interface with type qualifier, let's 
 make the best of it ! I'm proposing to introduce a new type 
 qualifier for owned data.

 Now it means that throw statement expect a owned(Throwable), 
 that pure function that currently return an implicitly unique 
 object will return owned(Object) and that message passing will 
 accept to pass around owned stuff.

 The GC heap can be segregated into island. We currently have 3 
 types of islands : Thread local, shared and immutable. These 
 are builtin island with special characteristics in the 
 language. The new qualifier introduce a new type of island, the 
 owned island.

 owned island can only refers to other owned island and to 
 immutable. they can be merged in any other island at any time 
 (that is why they can't refers to TL or shared).

 owned(T) can be passed around as function parameter or 
 returned, or stored as fields. When doing so they are consumed. 
 When an owned is not consumed and goes out of scope, the whole 
 island is freed.

 That means that owned(T) can implicitly decay into T, 
 immutable(T), shared(T) at any time. When doing so, a call to 
 the runtime is done to merge the owned island to the 
 corresponding island. It is passed around as owned, then the 
 ownership is transferred and all local references to the island 
 are invalidated (using them is an error).

 On an implementation level, a call to a pure function that 
 return an owned could look like this :

 {
   IslandID __saved = gc_switch_new_island();
   scope(exit) gc_restore_island(__saved);

   call_pure_function();
 }

 This allow us to rely much less on the GC and allow for a 
 better GC implementation.

  nogc . Remember ? It was in the title. What does a  nogc 
 function look like ? a no gc function o not produce any garbage 
 or trigger the collection cycle. there is no reason per se to 
 prevent the  nogc code to allocate on the GC as long as you 
 know it won't produce garbage. That mean the only operation you 
 need to ban are the one that merge the owned things into TL, 
 shared or immutable heap.

 This solves the problem of the  nogc + Exception. As Exception 
 are isolated, they can be allocated, throw and catched into 
  nogc code without generating garbage. They can safely bubble 
 out of the  nogc section of the code and still be safe.

 The same way, it open the door for a LOT of code that is not 
  nogc to be. If the code allocate memory in an owned island and 
 return it, then it is now up to the caller to decide whether is 
 want's it garbage collected or keep it as owned (and/or make it 
 reference counted for instance).

 The solution of passing a policy at compile for allocation is 
 close to what C++'s stdlib is doing, and even if the proposed 
 approach by Andrei is better, I don't think this is a good one. 
 The proposed approach allow for a lot of code to be marked as 
  nogc and allow for the caller to decide. That is ultimately 
 what we want libraries to look like.

I think a combination of the C++'s standard library's approach 
and Rust's approach would actually be the best possible. If we 
were to follow C++'s strategy, I think it would be important to 
make sure that it wouldn't require specifically adding template 
parameters and constraints, and instead allow the use of a 
concept-like system. I think that being able to default the 
allocator parameter to the GC, provided the current method is not 
 nogc, would also be a good idea. I think that if C++'s approach 
were taken it would also be very beneficial to allow a syntax 
such as `auto obj = new MyClass() with allocator`, and `delete 
obj with allocator`. I do think that the definition of  nogc 
would have to be slightly expanded though, so mean that any 
values that are allocated with a given allocator are also freed 
with the given allocator before returning. To connect back with 
your proposal and allow even more to be  nogc, an owned(MyClass, 
allocator) object would be allowable to be returned in an  nogc 
function. This would allow transfer of the ownership of the data 
and responsibility of deletion to the caller, provided that the 
caller is  nogc. If the caller is  nogc and fails to free the 
memory, DMD should produce an error. If the caller is not  nogc, 
then DMD would say nothing, and assume that the allocator used to 
allocate the object will do the cleanup.

This would allow far more situations to be accounted for by the 
allocation system without needing a GC, while still allowing 
programs that want to to use the GC.  nogc would simply mean that 
no garbage is produced, it would not make a guarantee of what 
allocator was used to perform the allocation.  nogc would also 
mean that no allocators, other than the ones passed in by the 
user, would be used to perform the allocations. This allows the 
current definition of  nogc to still be present, while opening 
the scope of  nogc up for use in a much larger variety of 
situations.

Nov 11 2014

Rikki Cattermole <alphaglosined gmail.com> writes:

On 12/11/2014 3:34 p.m., deadalnix wrote:
 Hi all,

 I want to get back on the subject of ownership, lifetime and propose
 some solution, but before, propose to state the problem in a way that
 haven't seen before (even if I have no doubt some have came to the same
 conclusion in the past).

 The problem at hand is double: memory management and thread safety.
 Number one has been a hot topic for ages, and number 2 has become very
 over the past years, to the widespreading of multicores CPU.

 The problem at hand here is ownership of data. There are 3 roads you can
 go about it:
   - immutability and GC. Effectively, these 2 technique allow you to get
 rid of ownership. There are advantages and drawbacks i'm going to
 discuss later.
   - Being unsafe and rely on convention. This is the C++ road (and a
 possible road in D). It allow to implement almost any wanted scheme, but
 come at great cost for the developer.
   - Annotations. This is the Rust road. It also come a great cost for
 the developer, as some schemes may be non trivial to express granted the
 type system, but, contrary to the C++ road, is safe.

 These approach all have some very nice things going on for them, but
 also some killer scenarios.

 Immutability+GC allow to have safety while keeping interfaces simple.
 That is of great value. It also come with some nice goodies, in the
 sense that is it easy and safe to shared data without bookkeeping,
 allowing one to fit more in cache, and reduce the amount of garbage
 created. Most text processing apps fall into this category and this is
 why D is that good at them. Another big goodies is that many lock free
 algorithm become possible. Once you remove the need for bookkeeping of
 ownership many operations can be implemented in an atomic manner.
 Additionally, it is possible to implement various GC optimization on
 immutable heap, which make the GC generally more efficient. But the cost
 is also real. For some use case, this mean having a large amount of
 garbage generated (Carmack wrote a piece on haskell were he mention the
 disastrous effect that having a framebuffer immutable would have: you'd
 have to clone it everytime you draw in it, which is a no go). GC also
 tend to cause unpredictable runtime characteristics, which programs with
 real time constraint can have hard time to deal with.

 Relying on convention has the advantage that any scheme can be
 implemented without constraint, while keeping interface simple. The
 obvious drawback is that it is time consuming and error prone. It also
 make a lot of things unclear, and dev choose the better safe than sorry
 road. That mean excessive copying to make sure one own the data, which
 is wasteful (in term of work for the copy itself, garbage generation and
 cache pressure). If this must be an option locally for system code, it
 doesn't seems like this is the right option at program scale and we do
 it in C++ simply because we have to.

 Finally, annotations are a great way to combine safety and speed, but
 generally come at a great cost when implenting uncommon ownership
 strategies where you ends up having to express complex lifetime and
 ownership relations.

 Ideally, we want to map with what the hardware does. So what does the
 hardware do ?

 Multicore CPU have various cores, each of them having layers of cache.
 Cache is organized in cache line and each cache line can be in various
 modes. Actual system are quite complex and deal with problems we are not
 very interesting here (like writeback) but the general idea is that
 every cache line is owned with different modes.

 Either the cache line is owned by a single core and can be written to,
 or the cache line shared by several cores, each of them having a local
 copy of the line, but none of them can write to. There is an internal
 bus where cores can exchange cache line with each other and messages to
 acquire cache line in read or read/write mode. That mean CPU are good at
 thread local read/write, shared immutable and transfer of ownership from
 one core to the other. They are bad at shared writable data (as
 effectively, the cache line will have to bounce back and forth between
 cores, and all memory access will need to be serialized instead of
 performed out of order).

 In that world, D has a bizaro position were it use a combination of
 annotations (immutable, shared) and GC. Ultimately, this is a good
 solution. Using annotation for common cases, fallback on GC/unsafe code
 when these annotations fall short.

 Before going into why it is fallign short, a digression on GC and the
 benefits of segregating the heap. In D, the heap is almost segregated in
 3 groups: thread local, shared and immutable. These group are very
 interesting for the GC:
   - Thread local heap can be collected while disturbing only one thread.
 It should be possible to use different strategy in different threads.
   - Immutable heap can be collected 100% concurrently without any
 synchronization with the program.
   - Shared heap is the only one that require disturbing the whole
 program, but as a matter of good practice, this heap should be small
 anyway.

 Various ML family languages (like OCaml) have adopted segregated heap
 strategy and get great benefice out of it. For instance, OCaml's GC is
 known to outperform Java's in most scenarios.

 We are sitting on a huge GC goldmine here, but 3 things prevent us to
 exploit it:
   - Exceptions. They can bubble from one thread to the other and create
 implicit sharing.
   - Uniqueness (as it is defined now) as it allow for unique object to
 be merged with any heap.
   - message passing. Ownership transfert is not possible and so unsafe
 casting ensue.

   * It has to be noted that delegate allow as well for this kind of
 stunt, but this is recognized as a bug by now and hopefully it is gonna
 be fixed.

 D has a type qualifier system for which we pay a big price. Getting
 everything const correct is difficult. We'd want to get the most bang
 for the buck. One of the bang we are not far to be able to get is
 segregating the heap. That mean shitty GC and unsafe code.

 Let's present a concrete exemple using ownership:
 pure Object foo() { ... }
 immutable o = foo();

 This is valid code. However, foo can do arbitrary manipulation to come
 up with the object. These include various allocations. These allocation
 are mutable into foo, which makes it impossible to allocate them on the
 immutable heap (as a GC relying on this immutability could mess up
 things pretty bad). They also cannot be allocated on the TL heap as once
 promoted to immutable, the data become shared as well.

 On the other hand, ownership means that the compiler can know when
 things go out of scope and free them explicitly. Which is a plus as
 generating less garbage is always a way to improve garbage collection.
 The most efficient work there is is the one that do not need to be done.

 I'd argue for the introduction of a basic ownership system. Something
 much simpler than rust's, that do not cover all uses cases. But the good
 thing is that we can fallback on GC or unsafe code when the system show
 its limits. That mean we rely less on the GC, while being able to
 provide a better GC.

 We already pay a cost at interface with type qualifier, let's make the
 best of it ! I'm proposing to introduce a new type qualifier for owned
 data.

 Now it means that throw statement expect a owned(Throwable), that pure
 function that currently return an implicitly unique object will return
 owned(Object) and that message passing will accept to pass around owned
 stuff.

 The GC heap can be segregated into island. We currently have 3 types of
 islands : Thread local, shared and immutable. These are builtin island
 with special characteristics in the language. The new qualifier
 introduce a new type of island, the owned island.

 owned island can only refers to other owned island and to immutable.
 they can be merged in any other island at any time (that is why they
 can't refers to TL or shared).

 owned(T) can be passed around as function parameter or returned, or
 stored as fields. When doing so they are consumed. When an owned is not
 consumed and goes out of scope, the whole island is freed.

 That means that owned(T) can implicitly decay into T, immutable(T),
 shared(T) at any time. When doing so, a call to the runtime is done to
 merge the owned island to the corresponding island. It is passed around
 as owned, then the ownership is transferred and all local references to
 the island are invalidated (using them is an error).

 On an implementation level, a call to a pure function that return an
 owned could look like this :

 {
    IslandID __saved = gc_switch_new_island();
    scope(exit) gc_restore_island(__saved);

    call_pure_function();
 }

 This allow us to rely much less on the GC and allow for a better GC
 implementation.

  nogc . Remember ? It was in the title. What does a  nogc function look
 like ? a no gc function o not produce any garbage or trigger the
 collection cycle. there is no reason per se to prevent the  nogc code to
 allocate on the GC as long as you know it won't produce garbage. That
 mean the only operation you need to ban are the one that merge the owned
 things into TL, shared or immutable heap.

 This solves the problem of the  nogc + Exception. As Exception are
 isolated, they can be allocated, throw and catched into  nogc code
 without generating garbage. They can safely bubble out of the  nogc
 section of the code and still be safe.

 The same way, it open the door for a LOT of code that is not  nogc to
 be. If the code allocate memory in an owned island and return it, then
 it is now up to the caller to decide whether is want's it garbage
 collected or keep it as owned (and/or make it reference counted for
 instance).

 The solution of passing a policy at compile for allocation is close to
 what C++'s stdlib is doing, and even if the proposed approach by Andrei
 is better, I don't think this is a good one. The proposed approach allow
 for a lot of code to be marked as  nogc and allow for the caller to
 decide. That is ultimately what we want libraries to look like.

Humm.

import std.stdio;

struct TypeOwnerShip(T) {
	T value;
	alias value this;

	this(T)(T value) {
		this.value = value;	
	}
	
	// implict casts to immutable, shared?
	// on cast to immutable, shared change islands
}

T owned(T)(T value) {
	return TypeOwnerShip!T(value);	
}


class Bar {
	int x;
	
	this(int x) pure {
		this.x = x;	
	}
}

Bar foo() pure {
	return owned(new Bar(5));
}

struct IslandAllocationStrategy {
	this(ubyte v = 0) {
	}
	
	void opWithIn() {
		writeln("opWithIn");
		// thread local overriding
	}
	
	void opWithOut() {
		import std.stdio;
		writeln("opWithOut");
		// reset thread local overriding
	}
}

 property IslandAllocationStrategy island() {
	return IslandAllocationStrategy();
}

void main() {
	writeln("start");
	with(island) {
		opWithIn;
		writeln("{");
		
		Bar myValue = foo();
		writeln(myValue.x);
		
		
		writeln("}");
		opWithOut;
	}
	writeln("end");
}

I feel like I've suggested this, just without the CS theory.

Nov 11 2014

"deadalnix" <deadalnix gmail.com> writes:

On Wednesday, 12 November 2014 at 03:13:20 UTC, Rikki Cattermole 
wrote:
 [...]

yes and no. The ideas is similar, but it is not doable at library 
level if we want to get safety and the full benefit out of it, as 
it would require for the compiler to introduce some call to the 
runtime at strategic places and it does interact with  nogc.

Nov 11 2014

"Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> writes:

On Wednesday, 12 November 2014 at 06:48:47 UTC, deadalnix wrote:
 On Wednesday, 12 November 2014 at 03:13:20 UTC, Rikki 
 Cattermole wrote:
 [...]

 yes and no. The ideas is similar, but it is not doable at 
 library level if we want to get safety and the full benefit out 
 of it, as it would require for the compiler to introduce some 
 call to the runtime at strategic places and it does interact 
 with  nogc.

I'm not sure. A library implementation may be feasible. For 
invalidation, the objects can be returned to their init state. 
This is "safe", but maybe not ideal, as a compiler error might 
indeed be better. Even implicit conversion to shared & immutable 
will be possible with multiple alias this, though it's worth 
discussing whether an explicit conversion isn't preferable.

As for  nogc, I see it as a clutch that's needed while no "real" 
solution is available, and as a tool for aiding transition once 
we have one.

That said, some compiler support will definitely be necessary.

Nov 12 2014

"Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> writes:

On Wednesday, 12 November 2014 at 12:57:58 UTC, Marc Schütz wrote:
 On Wednesday, 12 November 2014 at 06:48:47 UTC, deadalnix wrote:
 On Wednesday, 12 November 2014 at 03:13:20 UTC, Rikki 
 Cattermole wrote:
 [...]

 yes and no. The ideas is similar, but it is not doable at 
 library level if we want to get safety and the full benefit 
 out of it, as it would require for the compiler to introduce 
 some call to the runtime at strategic places and it does 
 interact with  nogc.

 I'm not sure. A library implementation may be feasible. For 
 invalidation, the objects can be returned to their init state. 
 This is "safe", but maybe not ideal, as a compiler error might 
 indeed be better. Even implicit conversion to shared & 
 immutable will be possible with multiple alias this, though 
 it's worth discussing whether an explicit conversion isn't 
 preferable.

 As for  nogc, I see it as a clutch that's needed while no 
 "real" solution is available, and as a tool for aiding 
 transition once we have one.

 That said, some compiler support will definitely be necessary.

I now remember that I played around with a related idea when 
Andrei suggested his template solution (but without the islands 
part):

http://wiki.dlang.org/User:Schuetzm/RC,_Owned_and_allocators

The important part is that owning, RC and GC types all need to 
know the allocators they belong to. In my example I need that to 
allow Owned types to be converted to RC types. In your proposal, 
something similar will ultimately be needed for merging the 
islands into specific heaps. The nice thing is that this scheme 
also allows for switching allocators on the fly at runtime, 
without template bloat, at the cost of virtual calls for using 
the allocators (which is probably an acceptable compromise for an 
inherently expensive thing like allocation).

Nov 13 2014

"deadalnix" <deadalnix gmail.com> writes:

On Thursday, 13 November 2014 at 14:26:44 UTC, Marc Schütz wrote:
 The important part is that owning, RC and GC types all need to 
 know the allocators they belong to. In my example I need that 
 to allow Owned types to be converted to RC types. In your 
 proposal, something similar will ultimately be needed for 
 merging the islands into specific heaps. The nice thing is that 
 this scheme also allows for switching allocators on the fly at 
 runtime, without template bloat, at the cost of virtual calls 
 for using the allocators (which is probably an acceptable 
 compromise for an inherently expensive thing like allocation).

If you have a owned type, you don't need a builtin RC type. As
long as the wrapper that implement the RC own the RCed object,
everything is safe.

Also, this construct as the nice side effect that a single
reference count can own a whole island instead of tracking all
objects RC one by one.

Nov 13 2014

Walter Bright <newshound2 digitalmars.com> writes:

On 11/11/2014 6:34 PM, deadalnix wrote:
 [...]

Thanks for an excellent summary of the problem. I can't just read your solution 
and know it works, it'll take some time.

Nov 11 2014

"deadalnix" <deadalnix gmail.com> writes:

On Wednesday, 12 November 2014 at 06:16:34 UTC, Walter Bright 
wrote:
 On 11/11/2014 6:34 PM, deadalnix wrote:
 [...]

 Thanks for an excellent summary of the problem. I can't just 
 read your solution and know it works, it'll take some time.

That is quite difficult to explain with drawing. Maybe I can 
discuss that with Andrei when he has some time with a whiteboard 
around.

Nov 11 2014

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:

On Wednesday, 12 November 2014 at 02:34:55 UTC, deadalnix wrote:
 The problem at hand here is ownership of data.

"ownership of data" is one possible solution, but not the problem.

We are facing 2 problems:

1. A performance problem: Concurrency in writes (multiple 
writers, one writer, periodical locking during clean up etc).

2. A structural problem: Releasing resources correctly.

I suggest that the ownership focus is on the latter, to support 
solid non-GC implementations. Then rely on conventions for 
multi-threading.

  - Being unsafe and rely on convention. This is the C++ road 
 (and a possible road in D). It allow to implement almost any 
 wanted scheme, but come at great cost for the developer.

All performant solutions are going to be "unsafe" in the sense 
that you need to select a duplication/locking level that are 
optimal for the characteristics of the actual application. 
Copying data when you have no writers is too inefficient in real 
applications.

Hardware support for transactional memory is going to be the easy 
approach for speeding up locking.


  - Annotations. This is the Rust road. It also come a great

I think Rust's approach would favour a STM approach where you 
create thread local copies for processing then merge the result 
back into the "shared" memory.


 Immutability+GC allow to have safety while keeping interfaces 
 simple. That is of great value. It also come with some nice 
 goodies, in the sense that is it easy and safe to shared data 
 without bookkeeping, allowing one to fit more in cache, and 
 reduce the amount of garbage created.

How does GC fit more data in the cache? A GC usually has overhead 
and would typically generate more cache-misses due to unreachable 
in-cache ("hot") memory not being available for reallocation.


 Relying on convention has the advantage that any scheme can be 
 implemented without constraint, while keeping interface simple. 
 The obvious drawback is that it is time consuming and error 
 prone. It also make a lot of things unclear, and dev choose the 
 better safe than sorry road. That mean excessive copying to 
 make sure one own the data, which is wasteful (in term of work 
 for the copy itself, garbage generation and cache pressure). If 
 this must be an option locally for system code, it doesn't 
 seems like this is the right option at program scale and we do 
 it in C++ simply because we have to.

 Finally, annotations are a great way to combine safety and 
 speed, but generally come at a great cost when implenting 
 uncommon ownership strategies where you ends up having to 
 express complex lifetime and ownership relations.

The core problem is that if you are unhappy with single-threaded 
applications then you are looking for high throughput using 
multi-threading. And in that case sacrificing performance by not 
using the optimal strategy becomes problematic.

The optimal strategy is entirely dependent on the application and 
the dataset.

Therefore you need to support multiple approaches:

- per data structure GC
- thread local GC
- lock annotations of types or variables
- speculative lock optimisations (transactional memory)

And in the future you also will need to support the integration 
of GPU/Co-processors into mainstream CPUs. Metal and OpenCL is 
only a beginning…


 Ideally, we want to map with what the hardware does. So what 
 does the hardware do ?

That changes over time. The current focus in upcoming hardware is 
on:

1. Heterogenous architecture with high performance co-processors

2. Hardware support for transactional memory

Intel CPUs might have buffered transactional memory within 5 
years.


 from one core to the other. They are bad at shared writable 
 data (as effectively, the cache line will have to bounce back 
 and forth between cores, and all memory access will need to be 
 serialized instead of performed out of order).

This will vary a lot. On x86 you can write to a whole cache line 
(buffered) without reading it first and it uses a convenient 
cache coherency protocol (so that reads/write ops are in order). 
This is not true for all CPUs.

I agree with others that say that a heterogeneous approach, like 
C++, is the better alternative.  If parity with C++ is important 
then D needs to look closer at OpenMP, but that probably goes 
beyond what D can achieve in terms of implementation.


Some observations:

1. If you are not to rely on conventions for sync'ing threads 
then you need a pretty extensive framework if you want good 
performance.

2. Safety will harm performance.

3. Safety with high performance levels requires a very 
complicated static analysis that will probably not work very well 
for larger programs.

4. For most applications performance will come through 
co-processors (GPGPU etc).

5. If hardware progresses faster than compiler development, then 
you will never reach the performance frontier…


I think D needs to cut down on implementation complexity and 
ensure that the implementation time can catch up with hardware 
developments. The way to do it is:

1. Accept that generally performant multi-threaded code is unsafe 
and application/hardware optimized.

2. Focus on making  nogc single-threaded code robust and fast. 
And I agree that ownership is key.

3. Use semantic analysis to automatically generate a tailored 
runtime with application-optimized allocators.

Nov 12 2014

"deadalnix" <deadalnix gmail.com> writes:

On Wednesday, 12 November 2014 at 08:38:14 UTC, Ola Fosheim 
Grøstad wrote:
 That changes over time. The current focus in upcoming hardware 
 is on:

 1. Heterogenous architecture with high performance co-processors

 2. Hardware support for transactional memory

 Intel CPUs might have buffered transactional memory within 5 
 years.

I'm sorry to be blunt, but there is nothing actionable in your 
comment. You are just throwing more and more into the pot until 
nobody know what there is in. But ultimately, the crux of the 
problem is the thing quoted above.

  1. No that do not change that much over time. The 
implementations details are changing, recent schemes become more 
complex to accommodate heterogeneous chips, but it is irrelevant 
here. What I've mentioned is true for all of them, and has been 
for at least 2 decades by now. There is no sign that this is 
gonna change.
  2. The transactional memory thing is completely orthogonal to 
the subject at hand so, as the details of implementation of 
modern chip, this doesn't belong here. In addition, the whole CPU 
industry is backpedaling on the transactional memory concept. 
That is awesome on the paper, but it didn't worked.

There is only 2 way to achieve good design. You remove useless 
things until there is obviously nothing wrong, or you add more 
and more until there is nothing obviously wrong. I won't follow 
you down the second road, so please stay on track.

Nov 12 2014

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:

On Wednesday, 12 November 2014 at 08:55:30 UTC, deadalnix wrote:
 I'm sorry to be blunt, but there is nothing actionable in your 
 comment. You are just throwing more and more into the pot until 
 nobody know what there is in. But ultimately, the crux of the 
 problem is the thing quoted above.

My point is that you are making too many assumptions about both
applications and hardware.

  2. The transactional memory thing is completely orthogonal to 
 the subject at hand so, as the details of implementation of 
 modern chip, this doesn't belong here. In addition, the whole 
 CPU industry is backpedaling on the transactional memory 
 concept. That is awesome on the paper, but it didn't worked.

STM is used quite a bit. Hardware backed TM is used by IBM.

For many computationally intensive applications high levels of
parallelism is achieved using speculative computation. TM
supports that.

 There is only 2 way to achieve good design. You remove useless 
 things until there is obviously nothing wrong, or you add more 
 and more until there is nothing obviously wrong. I won't follow 
 you down the second road, so please stay on track.

Good design is achieved by understanding different patterns of
concurrency in applications and how it can reach peak performance
in the environment (hardware).

If D is locked to a narrow memory model then you can only reach
high performance on a subset of applications.

If D wants to support system level programming then it needs to 
taken an open approach to the memory model.

Nov 12 2014

"Paulo Pinto" <pjmlp progtools.org> writes:

On Wednesday, 12 November 2014 at 08:55:30 UTC, deadalnix wrote:
 On Wednesday, 12 November 2014 at 08:38:14 UTC, Ola Fosheim
In addition, the whole
 CPU industry is backpedaling on the transactional memory 
 concept. That is awesome on the paper, but it didn't worked.

Given the support on Haskell, Clojure and C++ I am not sure if 
they are really backpedaling on it.

The Haskell bugs are supposed to have been fixed in the next 
generation.

And there is PowerPC A2 as well.

Not that I have any use for it, though.

--
Paulo

Nov 12 2014

"ponce" <contact gam3sfrommars.fr> writes:

On Wednesday, 12 November 2014 at 09:56:57 UTC, Paulo  Pinto 
wrote:
 On Wednesday, 12 November 2014 at 08:55:30 UTC, deadalnix wrote:
 On Wednesday, 12 November 2014 at 08:38:14 UTC, Ola Fosheim
In addition, the whole
 CPU industry is backpedaling on the transactional memory 
 concept. That is awesome on the paper, but it didn't worked.

 Given the support on Haskell, Clojure and C++ I am not sure if 
 they are really backpedaling on it.

 The Haskell bugs are supposed to have been fixed in the next 
 generation.

 And there is PowerPC A2 as well.

 Not that I have any use for it, though.

 --
 Paulo

I actually tested Haswell HLE and was underwhelmed (not the full 
STM, was just trying to get more out of some locks).
The trouble with STM is that to be applicable, you need to have 
huge contention (else it wouldn't be a bottleneck) and a small 
task to do.
And this use case is already well served with spinlock-guarded 
locks which already allows to stay in user space most of the time.
That, added with the usual restrictions and gotchas for lockfree, 
makes it something not very life-changing at least in my limited 
experience.

Nov 12 2014

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:

On Wednesday, 12 November 2014 at 11:08:41 UTC, ponce wrote:
 I actually tested Haswell HLE and was underwhelmed (not the 
 full STM, was just trying to get more out of some locks).

STM = software based transactional memory (without hardware 
support)

Haswell does not have buffered transactions so you wait for the 
commit, but there are presentations out where Intel has put 
buffered transactions at around 2017… (but I would expect a 
delay).

Nov 12 2014

"ponce" <contact gam3sfrommars.fr> writes:

On Wednesday, 12 November 2014 at 11:19:59 UTC, Ola Fosheim 
Grøstad wrote:
 STM = software based transactional memory (without hardware 
 support)

I was meaning HTM instead, good catch.

 Haswell does not have buffered transactions so you wait for the 
 commit, but there are presentations out where Intel has put 
 buffered transactions at around 2017… (but I would expect a 
 delay).

I wasn't arguing of the current performance (which is not 
impressive).
My point is that transactional memory has limited applicability, 
since locks already fits the bill well. And I'd argue the same 
with most lockfree structures actually.

Nov 12 2014

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:

On Wednesday, 12 November 2014 at 11:51:11 UTC, ponce wrote:
 Haswell does not have buffered transactions so you wait for 
 the commit, but there are presentations out where Intel has 
 put buffered transactions at around 2017… (but I would expect 
 a delay).

 I wasn't arguing of the current performance (which is not 
 impressive).
 My point is that transactional memory has limited 
 applicability, since locks already fits the bill well.

Yes, Intel style HTM is only an optimization for high contention 
where you already have locking code in place, since you need to 
take a lock as a fallback anyway. But useful for database-like 
situations where you might have 7 readers traversing and 1 writer 
updating a leaf node.

It is of course difficult to say how it will perform in 2./3. 
generation implementations or if the overall hardware 
architecture will change radically (as we see in Phi and 
Parallella).

I can easily imagine that the on-die architecture will change 
radically, within a decade, with the current x86 architecture 
remaining at a coordination level. This is the direction Phi 
seems to be going.

In that case, maybe the performance of the x86 will be less 
critical (if it spends most time waiting and buffering is done in 
hardware).

 And I'd argue the same with most lockfree structures actually.

I think in general that you need to create application specific 
data-structures to get performance and convenience. (I seldom 
reuse lists and graph-like data structures for this reason, it is 
just easier to create them from scratch.)

I also agree that you usually can get away with regular locks and 
very simple lockfree structures where performance matters (such 
as a lockfree stack where only one thread removes nodes).

Where performance truly matters you probably need to split up the 
dataset based on the actual computations and run over the data in 
a batch-like SIMD way anyway. (E.g. physics simulation).

Nov 12 2014

"Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> writes:

On Wednesday, 12 November 2014 at 02:34:55 UTC, deadalnix wrote:
 Before going into why it is fallign short, a digression on GC 
 and the benefits of segregating the heap. In D, the heap is 
 almost segregated in 3 groups: thread local, shared and 
 immutable. These group are very interesting for the GC:
  - Thread local heap can be collected while disturbing only one 
 thread. It should be possible to use different strategy in 
 different threads.
  - Immutable heap can be collected 100% concurrently without 
 any synchronization with the program.
  - Shared heap is the only one that require disturbing the 
 whole program, but as a matter of good practice, this heap 
 should be small anyway.

All this is unfortunately only true if there are no references 
between heaps, i.e. if the heaps are indeed "islands". Otherwise, 
there need to be at least write barriers.

 I'd argue for the introduction of a basic ownership system. 
 Something much simpler than rust's, that do not cover all uses 
 cases. But the good thing is that we can fallback on GC or 
 unsafe code when the system show its limits. That mean we rely 
 less on the GC, while being able to provide a better GC.

 We already pay a cost at interface with type qualifier, let's 
 make the best of it ! I'm proposing to introduce a new type 
 qualifier for owned data.

 Now it means that throw statement expect a owned(Throwable), 
 that pure function that currently return an implicitly unique 
 object will return owned(Object) and that message passing will 
 accept to pass around owned stuff.

 The GC heap can be segregated into island. We currently have 3 
 types of islands : Thread local, shared and immutable. These 
 are builtin island with special characteristics in the 
 language. The new qualifier introduce a new type of island, the 
 owned island.

 owned island can only refers to other owned island and to 
 immutable. they can be merged in any other island at any time 
 (that is why they can't refers to TL or shared).

 owned(T) can be passed around as function parameter or 
 returned, or stored as fields. When doing so they are consumed. 
 When an owned is not consumed and goes out of scope, the whole 
 island is freed.

 That means that owned(T) can implicitly decay into T, 
 immutable(T), shared(T) at any time. When doing so, a call to 
 the runtime is done to merge the owned island to the 
 corresponding island. It is passed around as owned, then the 
 ownership is transferred and all local references to the island 
 are invalidated (using them is an error).

 On an implementation level, a call to a pure function that 
 return an owned could look like this :

 {
   IslandID __saved = gc_switch_new_island();
   scope(exit) gc_restore_island(__saved);

   call_pure_function();
 }

This is nice. Instead of calling fixed helpers in Druntime, it 
can also make an indirect call to allow for pluggable (and 
runtime switchable) allocators.

 The solution of passing a policy at compile for allocation is 
 close to what C++'s stdlib is doing, and even if the proposed 
 approach by Andrei is better, I don't think this is a good one. 
 The proposed approach allow for a lot of code to be marked as 
  nogc and allow for the caller to decide. That is ultimately 
 what we want libraries to look like.

+1

Andrei's approach mixes up memory allocation and memory 
management. Library functions shouldn't know about the latter. 
This proposal is clearly better and cleaner in this respect.

Nov 12 2014

"deadalnix" <deadalnix gmail.com> writes:

On Wednesday, 12 November 2014 at 12:49:41 UTC, Marc Schütz wrote:
 All this is unfortunately only true if there are no references 
 between heaps, i.e. if the heaps are indeed "islands". 
 Otherwise, there need to be at least write barriers.

Yes, that is exactly why I'm listing the case where these can be 
created in  safe code and propose a solution to plug the hole 
(and which brings other benefits along the road).

Nov 12 2014

"Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> writes:

On Wednesday, 12 November 2014 at 21:15:05 UTC, deadalnix wrote:
 On Wednesday, 12 November 2014 at 12:49:41 UTC, Marc Schütz 
 wrote:
 All this is unfortunately only true if there are no references 
 between heaps, i.e. if the heaps are indeed "islands". 
 Otherwise, there need to be at least write barriers.

 Yes, that is exactly why I'm listing the case where these can 
 be created in  safe code and propose a solution to plug the 
 hole (and which brings other benefits along the road).

Hmm... I can't find that in what you wrote. To clarify: I'm 
talking about the fact that, for example, a thread-local heap can 
contain references into the immutable and shared heaps. 
Therefore, the immutable heap can _not_ be collected without 
disturbing any threads, because any TL heaps (and stacks!) can 
potentially have references to it. They either need to be 
stopped, or write barriers need to be utilized when references to 
immutable data are changed.

Nov 13 2014

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:

On Thursday, 13 November 2014 at 10:10:34 UTC, Marc Schütz wrote:
 potentially have references to it. They either need to be 
 stopped, or write barriers need to be utilized when references 
 to immutable data are changed.

I sometimes wonder if the very pragmatic and brutal option of 
just having GC on the main thread and prevent all escapes of 
objects from the main thread without explicit ownership transfer 
would solve D's GC performance related problems.

It would work for real time applications and it would work for 
people who use D in a scripty fashion (but not for vibe.d).

Nov 13 2014

"deadalnix" <deadalnix gmail.com> writes:

On Thursday, 13 November 2014 at 10:10:34 UTC, Marc Schütz wrote:
 On Wednesday, 12 November 2014 at 21:15:05 UTC, deadalnix wrote:
 On Wednesday, 12 November 2014 at 12:49:41 UTC, Marc Schütz 
 wrote:
 All this is unfortunately only true if there are no 
 references between heaps, i.e. if the heaps are indeed 
 "islands". Otherwise, there need to be at least write 
 barriers.

 Yes, that is exactly why I'm listing the case where these can 
 be created in  safe code and propose a solution to plug the 
 hole (and which brings other benefits along the road).

 Hmm... I can't find that in what you wrote. To clarify: I'm 
 talking about the fact that, for example, a thread-local heap 
 can contain references into the immutable and shared heaps. 
 Therefore, the immutable heap can _not_ be collected without 
 disturbing any threads, because any TL heaps (and stacks!) can 
 potentially have references to it. They either need to be 
 stopped, or write barriers need to be utilized when references 
 to immutable data are changed.

You need a set of root from the TL heap. The trick being to
consider everything that is allocated AFTER you get the roots as
automatically alive (you'll collect this in the next collection
cycle).

That way, new allocated chunk that have reference in the TL heap
will be kept alive, even if you don't know about the roots.

You'll find plenty of information about the details if look into
GC for ML family languages.

Nov 13 2014

"Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> writes:

On Thursday, 13 November 2014 at 22:12:22 UTC, deadalnix wrote:
 You need a set of root from the TL heap. The trick being to
 consider everything that is allocated AFTER you get the roots as
 automatically alive (you'll collect this in the next collection
 cycle).

 That way, new allocated chunk that have reference in the TL heap
 will be kept alive, even if you don't know about the roots.

 You'll find plenty of information about the details if look into
 GC for ML family languages.

Consider this:

     auto i = new immutable(int);
     immutable(int)* a, b;
     b = i;
     // GC kicks in here, scans `a` (== null)
     a = b;
     b = null;
     // GC goes on, scans `b` (== null)
     // => whoops, no reference to *i

Here, a and b are on the stack or in registers. They could also 
be on the TL heap.

Nov 14 2014

"deadalnix" <deadalnix gmail.com> writes:

On Friday, 14 November 2014 at 11:46:51 UTC, Marc Schütz wrote:
 On Thursday, 13 November 2014 at 22:12:22 UTC, deadalnix wrote:
 You need a set of root from the TL heap. The trick being to
 consider everything that is allocated AFTER you get the roots 
 as
 automatically alive (you'll collect this in the next collection
 cycle).

 That way, new allocated chunk that have reference in the TL 
 heap
 will be kept alive, even if you don't know about the roots.

 You'll find plenty of information about the details if look 
 into
 GC for ML family languages.

 Consider this:

     auto i = new immutable(int);
     immutable(int)* a, b;
     b = i;
     // GC kicks in here, scans `a` (== null)
     a = b;
     b = null;
     // GC goes on, scans `b` (== null)
     // => whoops, no reference to *i

 Here, a and b are on the stack or in registers. They could also 
 be on the TL heap.

That is a well covered subject and told you what to google for as
well as the basic approach. Your example here simply told me you
haven't done your homework before posting.

Please go look into scientific documentation about GC for ML
languages.

Nov 14 2014

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:

On Friday, 14 November 2014 at 21:59:47 UTC, deadalnix wrote:
 That is a well covered subject and told you what to google for 
 as
 well as the basic approach. Your example here simply told me you
 haven't done your homework before posting.

 Please go look into scientific documentation about GC for ML
 languages.

It would help if you post links to articles.

ML is geared towards functional programming which have different 
behaviour from system level imperative programming, so I am not 
sure if ML is the best starting point.

 From https://ocaml.org/learn/tutorials/garbage_collection.html :

«OCaml's garbage collector has two heaps, the minor heap and the 
major heap. This recognises a general principle: Most objects are 
small and allocated frequently and then immediately freed. These 
objects go into the minor heap first, which is GCed frequently. 
Only some objects are long lasting. These objects get promoted 
from the minor heap to the major heap after some time, and the 
major heap is only collected infrequently.

The OCaml GC is synchronous. It doesn't run in a separate thread, 
and it can only get called during an allocation request.»

Nothing about segregation here. The older MLkit which uses a 
regional allocator is interesting, but probably not what you are 
talking about.

Nov 14 2014

"deadalnix" <deadalnix gmail.com> writes:

ML is interesting because it emphasis immutability. For
performance reasons, a part of it is in effect mutable and thread
local. Some ML implementations are very interesting for us.

But let's get back to D. To make the example simpler, let's get
rid of shared (in effect, the same thing can be achieve with
write barrier on shared and do not fundamentally affect the way
it works).

So we have a TL GC that run on a regular basis.When doing so, it
also collect the pointers to the immutable heap and give the to
the immutable GC as roots.

Now the immutable GC works from these roots, but will consider
everything allocated after it get its root as alive.

This is working because the new root to the immutable heap that
can appear in the TL heap can come from:
   - new allocations.
   - from read in immutable heap.
   - other thread (and it will be a root from their scan).


system. An exchange from a non scanned thread to a scanned one
will register a root.


getting the roots are considered alive.


require a reference to the immutable heap. Ultimately, you end up

As the chain of reference is immutable, the one you read will
ultimately be scanned.

The idea is based on Doligez-Leroy's GC, but using TL heap as the
small object and immutable heap as the shared. Note that this GC
is done for ML, where most things are immutable and this is why
it works well (only require write barriers). This strategy would
be a disaster in java for instance, or for us to use the small
object strategy for TL.
http://gallium.inria.fr/~xleroy/publi/concurrent-gc.pdf
http://www.cs.tau.ac.il/~msagiv/courses/mm/doligez.ppt

Nov 14 2014

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:

On Saturday, 15 November 2014 at 00:16:22 UTC, deadalnix wrote:
 The idea is based on Doligez-Leroy's GC, but using TL heap as 
 the
 small object and immutable heap as the shared. Note that this GC
 is done for ML, where most things are immutable and this is why
 it works well (only require write barriers). This strategy would
 be a disaster in java for instance, or for us to use the small
 object strategy for TL.
 http://gallium.inria.fr/~xleroy/publi/concurrent-gc.pdf
 http://www.cs.tau.ac.il/~msagiv/courses/mm/doligez.ppt

Thanks for the link! I agree ML has some interesting work done 
for it, but they make some assumptions:

1. low portion of mutable objects

2. small heap fits in per-core-cache (<256KiB on Haswell).

So the small heap is basically a region allocator that caches 
allocations that are likely to die, cutting down on the more 
costly effort to update the big heap. But I am not sure if that 
fits system level programming, that is more typical for 
functional programming…

Nov 15 2014

"deadalnix" <deadalnix gmail.com> writes:

On Saturday, 15 November 2014 at 12:52:33 UTC, Ola Fosheim 
Grøstad wrote:
 Thanks for the link! I agree ML has some interesting work done 
 for it, but they make some assumptions:

 1. low portion of mutable objects

 2. small heap fits in per-core-cache (<256KiB on Haswell).

 So the small heap is basically a region allocator that caches 
 allocations that are likely to die, cutting down on the more 
 costly effort to update the big heap. But I am not sure if that 
 fits system level programming, that is more typical for 
 functional programming…

The small heap do not apply for us. We can't reproduce that part 
of the GC in D. However, the big heap part of the strategy would 
fit D's immutable heap very nicely.

Nov 15 2014

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:

On Saturday, 15 November 2014 at 20:13:47 UTC, deadalnix wrote:
 On Saturday, 15 November 2014 at 12:52:33 UTC, Ola Fosheim 
 Grøstad wrote:
 Thanks for the link! I agree ML has some interesting work done 
 for it, but they make some assumptions:

 1. low portion of mutable objects

 2. small heap fits in per-core-cache (<256KiB on Haswell).

 So the small heap is basically a region allocator that caches 
 allocations that are likely to die, cutting down on the more 
 costly effort to update the big heap. But I am not sure if 
 that fits system level programming, that is more typical for 
 functional programming…

 The small heap do not apply for us. We can't reproduce that 
 part of the GC in D. However, the big heap part of the strategy 
 would fit D's immutable heap very nicely.

Maybe, but immutable in ML means that two different immutable 
objects are 100% indistinguishable if they have the same value. 
Quite often you have a cluster of objects (that could be an 
isolate) that are periodically immutable. I assume what is most 
interesting is whether it is immutable between two collections, 
and not whether it is mutable throughout the lifespan?

There are two points in the powerpoint that sticks out though:

* «trade collection “quality” for level of synchronization - 
allow large amounts of floating
garbage»

* «trade collection “simplicity” for level of synchronization - 
complicated algorithm (not to
mention correctness proof)»

And also this:

«Root enumeration
* Raise a flag to signal beginning of marking
* shade globals
* ask mutators to shade roots
* wait until all mutators answered
* meanwhile - start scanning and marking»

Does this mean that you need all threads (which I presume are the 
mutators) to be in an eventloop in order to collect?


Online version of the slides:
http://s0.docspal.com/files/processed/64/6863864-yusxlrni/doligez.pdf

Nov 18 2014

"deadalnix" <deadalnix gmail.com> writes:

On Tuesday, 18 November 2014 at 20:34:01 UTC, Ola Fosheim Grøstad
wrote:
 Does this mean that you need all threads (which I presume are 
 the mutators) to be in an eventloop in order to collect?

What you need is for each thread to provide you a list of roots.
You can start tracing while having an incomplete list of roots,
so the level of concurrency allowed is high.

As for the immutability thing, here is how it goes. When you
compile to native, you can either do write barrier all the time,
via writing point using a function call in the runtime. The
second option is to not do it and use memory protection to trap
write.

Option 1 is usually preferred by VM that can swicth
implementation of methods when collecting, so you pay a moderate
cost and only when you collect. But if you compile AOT, you
always pay that price and it is not very interesting.

Optin 2 is WAY more expensive and will trap even write to value,
not only pointers. If it expensive, it is only expensive when it
traps. That mean it is a very interesting choice for code
compiled ahead of time and manipulating immutable data. This is
very interesting in the case of ML, where the compiler sometime
choses to mutate, but only as an optimization, and very rarely on
object that reach the big heap, as if the object is long lived
enough to reach that heap, most likely the compiler can't prove
it is unique and thus mutable.

The same strategy seems like the obvious winner for D's immutable
heap as well as part of the shared heap that do not contain
mutable pointers.

Nov 18 2014

"Guillaume Chatelet" <chatelet.guillaume gmail.com> writes:

What would be the next step to keep this effort going forward ?
A POC implementation would be a lot of work but it would also
help people detect corner cases or unsuspected interaction with
some features of the language.

Nov 21 2014

"Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> writes:

On Friday, 14 November 2014 at 21:59:47 UTC, deadalnix wrote:
 On Friday, 14 November 2014 at 11:46:51 UTC, Marc Schütz wrote:
 That is a well covered subject and told you what to google for 
 as
 well as the basic approach. Your example here simply told me you
 haven't done your homework before posting.

Ok, you caught me there :-P

To my excuse, I did do some (too) superficial research, but even 
found statements like some ML languages not supporting 
multithreading, but no details about GC techniques used.

But your explanation makes it clear, and this indeed a very 
clever idea.

Nov 16 2014

"Paulo Pinto" <pjmlp progtools.org> writes:

On Sunday, 16 November 2014 at 10:21:53 UTC, Marc Schütz wrote:
 On Friday, 14 November 2014 at 21:59:47 UTC, deadalnix wrote:
 On Friday, 14 November 2014 at 11:46:51 UTC, Marc Schütz wrote:
 That is a well covered subject and told you what to google for 
 as
 well as the basic approach. Your example here simply told me 
 you
 haven't done your homework before posting.

 Ok, you caught me there :-P

 To my excuse, I did do some (too) superficial research, but 
 even found statements like some ML languages not supporting 
 multithreading, but no details about GC techniques used.

 But your explanation makes it clear, and this indeed a very 
 clever idea.

If you mean OCaml, the runtime is being made multi-thread.

http://www.ocamlpro.com/blog/2013/07/01/monthly-06.html

Nov 17 2014

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

12-Nov-2014 05:34, deadalnix пишет:
 Hi all,

 I want to get back on the subject of ownership, lifetime and propose
 some solution, but before, propose to state the problem in a way that
 haven't seen before (even if I have no doubt some have came to the same
 conclusion in the past).

[snip nice summary]
 In that world, D has a bizaro position were it use a combination of
 annotations (immutable, shared) and GC. Ultimately, this is a good
 solution. Using annotation for common cases, fallback on GC/unsafe code
 when these annotations fall short.

Aye.

 Before going into why it is fallign short, a digression on GC and the
 benefits of segregating the heap. In D, the heap is almost segregated in
 3 groups: thread local, shared and immutable. These group are very
 interesting for the GC:
   - Thread local heap can be collected while disturbing only one thread.
 It should be possible to use different strategy in different threads.
   - Immutable heap can be collected 100% concurrently without any
 synchronization with the program.
   - Shared heap is the only one that require disturbing the whole
 program, but as a matter of good practice, this heap should be small
 anyway.

 Various ML family languages (like OCaml) have adopted segregated heap
 strategy and get great benefice out of it. For instance, OCaml's GC is
 known to outperform Java's in most scenarios.

+1000
We should take advantage of segregated heap to make all complexity of 
shared/immutable/TL finally pay off.

 I'd argue for the introduction of a basic ownership system. Something
 much simpler than rust's, that do not cover all uses cases. But the good
 thing is that we can fallback on GC or unsafe code when the system show
 its limits. That mean we rely less on the GC, while being able to
 provide a better GC.

 We already pay a cost at interface with type qualifier, let's make the
 best of it ! I'm proposing to introduce a new type qualifier for owned
 data.

 Now it means that throw statement expect a owned(Throwable), that pure
 function that currently return an implicitly unique object will return
 owned(Object) and that message passing will accept to pass around owned
 stuff.

 The GC heap can be segregated into island. We currently have 3 types of
 islands : Thread local, shared and immutable. These are builtin island
 with special characteristics in the language. The new qualifier
 introduce a new type of island, the owned island.

Seems sane. owned(Exception) would be implicitly assumed i.e.:
catch(Exception e){ ... }

would be seen by compiler as:
catch(owned(Exception) e){ ... }

What happens if I throw l-value exception? Do I need to cast or 
assumeOwned it?

It's easy to see how it goes with r-values, such as new Exception(...), 
since they are "unique expressions" whatever that means ;)

 owned island can only refers to other owned island and to immutable.
 they can be merged in any other island at any time (that is why they
 can't refers to TL or shared).

 owned(T) can be passed around as function parameter or returned, or
 stored as fields. When doing so they are consumed. When an owned is not
 consumed and goes out of scope, the whole island is freed.

 That means that owned(T) can implicitly decay into T, immutable(T),
 shared(T) at any time. When doing so, a call to the runtime is done to
 merge the owned island to the corresponding island. It is passed around
 as owned, then the ownership is transferred and all local references to
 the island are invalidated (using them is an error).

 On an implementation level, a call to a pure function that return an
 owned could look like this :

 {
    IslandID __saved = gc_switch_new_island();
    scope(exit) gc_restore_island(__saved);

    call_pure_function();
 }

 This allow us to rely much less on the GC and allow for a better GC
 implementation.

I take it that owned(T) is implicitly deduced by compiler in case of 
pure functions? Also it seem templates should not take owned(T) into 
consideration and let it decay... How does owned compose with other 
qualifiers?

  nogc . Remember ? It was in the title. What does a  nogc function look
 like ? a no gc function o not produce any garbage or trigger the
 collection cycle. there is no reason per se to prevent the  nogc code to
 allocate on the GC as long as you know it won't produce garbage. That
 mean the only operation you need to ban are the one that merge the owned
 things into TL, shared or immutable heap.

 This solves the problem of the  nogc + Exception. As Exception are
 isolated, they can be allocated, throw and catched into  nogc code
 without generating garbage. They can safely bubble out of the  nogc
 section of the code and still be safe.

Seems absolutely cool. But doesn't allocating exception touches heap 
anyway? I take it that if I don't save exception explicitly anywhere the 
owned island is destroyed at catch scope?

 The same way, it open the door for a LOT of code that is not  nogc to
 be. If the code allocate memory in an owned island and return it, then
 it is now up to the caller to decide whether is want's it garbage
 collected or keep it as owned (and/or make it reference counted for
 instance).

 The solution of passing a policy at compile for allocation is close to
 what C++'s stdlib is doing, and even if the proposed approach by Andrei
 is better, I don't think this is a good one. The proposed approach allow
 for a lot of code to be marked as  nogc and allow for the caller to
 decide. That is ultimately what we want libraries to look like.

I'm not sure I get all details but I like your proposal MUCH better then 
forcibly introducing ref-counting.

-- 
Dmitry Olshansky

Nov 12 2014

"deadalnix" <deadalnix gmail.com> writes:

On Wednesday, 12 November 2014 at 20:36:32 UTC, Dmitry Olshansky 
wrote:
 Seems sane. owned(Exception) would be implicitly assumed i.e.:
 catch(Exception e){ ... }

 would be seen by compiler as:
 catch(owned(Exception) e){ ... }

 What happens if I throw l-value exception? Do I need to cast or 
 assumeOwned it?

 It's easy to see how it goes with r-values, such as new 
 Exception(...), since they are "unique expressions" whatever 
 that means ;)

Yes, the unsafe road must always be open, we are a system 
programming language :)

 I take it that owned(T) is implicitly deduced by compiler in 
 case of pure functions? Also it seem templates should not take 
 owned(T) into consideration and let it decay... How does owned 
 compose with other qualifiers?

You mean what is I have an owned field into an object ? In the 
case you pass the owned where a TL, shared or immutable is 
expected, the island is merged so the question do not make sense.

An owned field in an object is interpreted as follow:
  - immutable => immutable
  - shared => owned (and can be touched only if the shared object 
is synchronized, which allow to hide a whole hierarchy behind a 
mutex. That is another selling point but I don't wanted to get 
into all the details as the post was already quite big).
  - const => const owned (essentially unusable - except via 
burrowing if we ever want to go that road one day).

 Seems absolutely cool. But doesn't allocating exception touches 
 heap anyway? I take it that if I don't save exception 
 explicitly anywhere the owned island is destroyed at catch 
 scope?

Yes it touches the heap. But as long as things are owned, they'll 
be freed automatically when going out of scope. That means, with 
that definition of things, what is forbidden in  nogc code is to 
consume the owned in such a fashion that its island is merged 
into TL, shared or immutable heap. If you don't do this, then 
your isolated will be freed when going out of scope and the GC 
won't need to kick in/no garbage will be produced.

Doing so allow for relaxing the constraint in  nogc and allow for 
the same library code to be used with or without GC.

Nov 12 2014

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

13-Nov-2014 00:27, deadalnix пишет:
 On Wednesday, 12 November 2014 at 20:36:32 UTC, Dmitry Olshansky wrote:
 Seems sane. owned(Exception) would be implicitly assumed i.e.:
 catch(Exception e){ ... }

 would be seen by compiler as:
 catch(owned(Exception) e){ ... }

 What happens if I throw l-value exception? Do I need to cast or
 assumeOwned it?

 It's easy to see how it goes with r-values, such as new
 Exception(...), since they are "unique expressions" whatever that
 means ;)

 Yes, the unsafe road must always be open, we are a system programming
 language :)

 I take it that owned(T) is implicitly deduced by compiler in case of
 pure functions? Also it seem templates should not take owned(T) into
 consideration and let it decay... How does owned compose with other
 qualifiers?

 You mean what is I have an owned field into an object ? In the case you
 pass the owned where a TL, shared or immutable is expected, the island
 is merged so the question do not make sense.

Sorry, forgot to reply.

Here is the case I wanted to check:

try{
	...
}
catch(owned(Exception) e){
	foo(e);
}

void foo(T)(T arg){
	// what would this print? Exception or owned(Exception)
	// do we bloat a bit more on qualifiers?
	pragma(msg, T);
}


 An owned field in an object is interpreted as follow:
   - immutable => immutable
   - shared => owned (and can be touched only if the shared object is
 synchronized, which allow to hide a whole hierarchy behind a mutex. That
 is another selling point but I don't wanted to get into all the details
 as the post was already quite big).
   - const => const owned (essentially unusable - except via burrowing if
 we ever want to go that road one day).

Thanks, looks sane.

 Seems absolutely cool. But doesn't allocating exception touches heap
 anyway? I take it that if I don't save exception explicitly anywhere
 the owned island is destroyed at catch scope?

 Yes it touches the heap. But as long as things are owned, they'll be
 freed automatically when going out of scope. That means, with that
 definition of things, what is forbidden in  nogc code is to consume the
 owned in such a fashion that its island is merged into TL, shared or
 immutable heap. If you don't do this, then your isolated will be freed
 when going out of scope and the GC won't need to kick in/no garbage will
 be produced.

 Doing so allow for relaxing the constraint in  nogc and allow for the
 same library code to be used with or without GC.


-- 
Dmitry Olshansky

Nov 14 2014

"Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> writes:

On Friday, 14 November 2014 at 19:02:52 UTC, Dmitry Olshansky 
wrote:
 Here is the case I wanted to check:

 try{
 	...
 }
 catch(owned(Exception) e){
 	foo(e);
 }

 void foo(T)(T arg){
 	// what would this print? Exception or owned(Exception)
 	// do we bloat a bit more on qualifiers?
 	pragma(msg, T);
 }

It needs to be `owned(Exception)`, otherwise, how could the 
compiler know how to treat it correctly? But declaring foo() in 
that way would be unhelpful, because it would move the exception 
on calling the function, which is usually not desired. What you 
want here, instead, is borrowing:

     void foo(T)(scope(T) arg) {
         pragma(msg, T);
     }

Nov 14 2014

"deadalnix" <deadalnix gmail.com> writes:

On Friday, 14 November 2014 at 20:22:13 UTC, Marc Schütz wrote:
 It needs to be `owned(Exception)`, otherwise, how could the 
 compiler know how to treat it correctly? But declaring foo() in 
 that way would be unhelpful, because it would move the 
 exception on calling the function, which is usually not 
 desired. What you want here, instead, is borrowing:

     void foo(T)(scope(T) arg) {
         pragma(msg, T);
     }

That is a very good question. I do think this should show the 
type of T, without owned qualifier.

Nov 14 2014

Walter Bright <newshound2 digitalmars.com> writes:

On 11/11/2014 6:34 PM, deadalnix wrote:
 On an implementation level, a call to a pure function that return an owned
could
 look like this :

 {
    IslandID __saved = gc_switch_new_island();
    scope(exit) gc_restore_island(__saved);

    call_pure_function();
 }

 This allow us to rely much less on the GC and allow for a better GC
implementation.

If that wrapper is automatically generated by the compiler, so the user doesn't 
have to mess with it, it could be workable.

Nov 13 2014

"deadalnix" <deadalnix gmail.com> writes:

On Friday, 14 November 2014 at 01:05:13 UTC, Walter Bright wrote:
 On 11/11/2014 6:34 PM, deadalnix wrote:
 On an implementation level, a call to a pure function that 
 return an owned could
 look like this :

 {
   IslandID __saved = gc_switch_new_island();
   scope(exit) gc_restore_island(__saved);

   call_pure_function();
 }

 This allow us to rely much less on the GC and allow for a 
 better GC implementation.

 If that wrapper is automatically generated by the compiler, so 
 the user doesn't have to mess with it, it could be workable.

Yes, that is the intention. It means that my proposal does 
increase moderately the complexity of the language, but increase 
the complexity of the runtime (most specifically the GC) 
significantly.

Nov 13 2014

Walter Bright <newshound2 digitalmars.com> writes:

Nice article.

Some observations:

1. I don't understand the difference between 'unique', 'owned' and 'isolated'. 
(Some other systems use 'isolated'.) Is there a difference?

2. I suspect that 'owned' does not need to be a type qualifier - it can be a 
storage class much like 'ref' is. This makes for a much simpler implementation.

3. Taking it a step further, I suspect that 'owned' can be implemented as a 
library type, ala owned!T, with only a smidgen of compiler help.

Nov 17 2014

"deadalnix" <deadalnix gmail.com> writes:

On Monday, 17 November 2014 at 08:04:44 UTC, Walter Bright wrote:
 Nice article.

 Some observations:

 1. I don't understand the difference between 'unique', 'owned' 
 and 'isolated'. (Some other systems use 'isolated'.) Is there a 
 difference?

 2. I suspect that 'owned' does not need to be a type qualifier 
 - it can be a storage class much like 'ref' is. This makes for 
 a much simpler implementation.

 3. Taking it a step further, I suspect that 'owned' can be 
 implemented as a library type, ala owned!T, with only a smidgen 
 of compiler help.

Ok, I'm gonna respond in reverse order, so I can use one answer
as a starting point for the next one.

3. This is not doable as a library type. The proposal interact
with  nogc, pure function and exception handling. There is no way
to make it  safe as a librayr type as well, consider:
static A statica;
class A {
      void foo() {
          statica = this;
      }
}

owned!A owneda = ...
owneda.foo();

We want the compiler to be able to determine what is owned as
much as possible and insert the right runtime magic calls to make
that work.

2. I'm not sure if that is enough to make it work. It is worth
investigating. It is gonna be much more limited than what I have
in mind.

How do you see this working with fields in struct/classes for
instance ?

1. owned is simply a refined version of isolated. The reserach
team comming up with isolated did it for concurrency reasons, and
it is useful for this, but my proposal include a memory
management aspect, that isolated lacks. So I figured out I'd come
up with a better name.

It is different from unique in the sense that there can be
multiple references to the owned object. For instance:

class A {
      A next;
}

pure auto foo() {
      auto a1 = new A();
      auto a2 = new A();
      a1.next = a2;
      a2.next = a1;

      return a1;
}

Here the returned reference is owned. But it is not unique (a2
also have a reference to that object). Owned indicate that you
enter into a new island. Everything reached transitively from
there is assumed to be in the same island unless you cross a new
owned or immutable reference.

That means code like:

owned(A) bar() {
      auto a = foo();
      auto next = a.next;
      // At this point, we have 2 local reference to the island.

      // Returning is an operation that consume. It would
invalidate all local reference to that island. That is ok here as
we are returning and not using them anyway.
      return a.next;
}

is valid. The type system here is not checking that a reference
to an object is unique, but that who owns the island is know (and
the island only has one owner at a time). It is a more generic
and more powerful construct, and I don't think it come at extra
complexity for the user compared to unique. However, it
definitively means more work on our side.

Nov 17 2014

Walter Bright <newshound2 digitalmars.com> writes:

On 11/17/2014 1:57 PM, deadalnix wrote:
 2. I'm not sure if that is enough to make it work. It is worth
 investigating. It is gonna be much more limited than what I have
 in mind.

There was a lot of skepticism initially when I proposed that ref be a storage 
class rather than a type constructor, but it turned out that the problems were 
resolvable and the end design was an unqualified win. So I think we can do this 
- at a minimum we need to exhaust the possibility.

Nov 17 2014

"deadalnix" <deadalnix gmail.com> writes:

On Tuesday, 18 November 2014 at 02:39:48 UTC, Walter Bright wrote:
 On 11/17/2014 1:57 PM, deadalnix wrote:
 2. I'm not sure if that is enough to make it work. It is worth
 investigating. It is gonna be much more limited than what I 
 have
 in mind.

 There was a lot of skepticism initially when I proposed that 
 ref be a storage class rather than a type constructor, but it 
 turned out that the problems were resolvable and the end design 
 was an unqualified win. So I think we can do this - at a 
 minimum we need to exhaust the possibility.

ref only make sense at interface boundary. That is not the same
thing.

Nov 17 2014

"Dicebot" <public dicebot.lv> writes:

Finally got a look at your proposal. While I do agree with many 
initial statements and, specifically, proposal for heap 
segregation, proposed semantics of `owned` does leave me 
skeptical. It is effectively a more refined/mature approach to 
"cooking of immutables" concept and Marc proposal, while being 
more complicated semantic-wise, allows for much more than that.

Dec 04 2014

"deadalnix" <deadalnix gmail.com> writes:

On Thursday, 4 December 2014 at 14:23:13 UTC, Dicebot wrote:
 Finally got a look at your proposal. While I do agree with many 
 initial statements and, specifically, proposal for heap 
 segregation, proposed semantics of `owned` does leave me 
 skeptical. It is effectively a more refined/mature approach to 
 "cooking of immutables" concept and Marc proposal, while being 
 more complicated semantic-wise, allows for much more than that.

I don't think this is solving the same problem as Marc's proposal 
so I'm not sure how comparing them make sense. Marc's proposal is 
about manipulating data without having ownership. This defines 
ownership.

This proposal add some complexity, but I do think this is a 
winner. Right now we have various type qualifier (mutable, const, 
immutable, shared, const shared, inout, inout shared). This come 
at a cost, and, ultimately, as the proposal interact with this, 
you want to compare the total complexity with the total benefit.

In one case, you have 7 qualifier. You get a mostly safe language 
out of them.
With he proposal, you have 8 qualifiers. You get a safe language 
out of it + many opportunities to optimize + added expressiveness 
(ie interaction of GC and no gc code, ownership transfers, safe 
reference counting, extension of possibilities in  nogc code in a 
similar fashion as weakly pure allow for extra expressiveness in 
pure code).

If you allow me for a non politically correct metaphor, would you 
buy a 1$ condom with an hole in it or a 1.14$ one that do not ?

Dec 04 2014

"Dicebot" <public dicebot.lv> writes:

On Thursday, 4 December 2014 at 23:22:04 UTC, deadalnix wrote:
 I don't think this is solving the same problem as Marc's 
 proposal so I'm not sure how comparing them make sense. Marc's 
 proposal is about manipulating data without having ownership. 
 This defines ownership.

Indeed. But both combined add too much complexity to be feasible 
and thus we need to decide what problems are more important to 
solve. I think one from Marc has wider application while elements 
of yours can be more suitable as hidden implementation detail. 
Though now there is also DIP69 and I need to seriously shake them 
all through my head before having any true opinion :P

 This proposal add some complexity, but I do think this is a 
 winner. Right now we have various type qualifier (mutable, 
 const, immutable, shared, const shared, inout, inout shared). 
 This come at a cost, and, ultimately, as the proposal interact 
 with this, you want to compare the total complexity with the 
 total benefit.

I respectfully can't agree that shared qualifier truly exists in 
current language implementation. Thus perspective is a bit 
different. Also const/inout don't actually tell anything about 
how underlying data is managed and serve as secondary utility 
qualifiers.

Dec 05 2014

D Programming

C/C++ Programming

Other

digitalmars.D - On heap segregation, GC optimization and nogc relaxing