digitalmars.D - The Big Picture
- "Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> (166/166) Feb 24 2015 Several of us (deadalnix, Zach the Mystic, myself, probably
Several of us (deadalnix, Zach the Mystic, myself, probably others) have been putting forward some ideas related to memory management that involve ownership. A recent discussion made me realize that, while there were more or less concrete proposals for specific parts, there was no explanation of how they're all supposed to work together. I believe this may have led to significant misunderstandings. In the following, I want to summarize my understanding of "The Big Picture" (which is probably not far from deadalnix's and Zach's ideas). Please note that this is not a proposal, even the exact semantics of the described concepts are not really important for this post. Problem Statement ================= There are many different strategies for managing resources like memory, file handles, OpenGL objects, etc. The most important ones are: a) manual management (e.g. new/delete, malloc()/free(), open()/close()) b) unique/owned wrappers: there is only one reference to the resource; when that reference goes out of scope, the resource can be released c) reference counting: there can be more than one reference; when the last one is dropped, the resource is released d) (tracing) garbage collection: this is mostly used for memory resources D already provides several mechanisms to implement these strategies, but relies a lot on garbage collection. To improve that situation (which is part of the "Vision" for the near future [1]), we'd like to provide ways to use other strategies in a SAFE and EFFICIENT manner. Requirements ============ It's clear that we have to work with the existing language (although still allowing some changes to it, ideally non-breaking ones), and that we as a project have only limited resources. Therefore, any demands we want to make against a possible solution must be weighed against the costs of satisfying them. However, it is a good idea to start with requirements from an ideal solution, and see what can realistically be implemented. Here's my "wishlist": 1) Compatibility There's a huge amount of existing code. An ideal solution should not only not break existing code, but also allow existing code to take advantage of the new features with as little change as possible. 2) Safety/Correctness The compiler must statically disallow uses of a resource that are unsafe or incorrect for the chosen management strategy. Ideally, this applies not only to safe-ty, but also to other kinds of correctness, like preventing access to a closed file handle. 3) Efficiency Lack of performance is probably the most frequent reason for avoiding the GC. Therefore, our "dream solution" should not introduce unnecessary performance penalties itself. Just like template functions are expected to perform as well as hand-written specialized code, an RC wrapper, for example, should perform as well as hand-written (but tedious and potentially unsafe) manual reference counting. 4) Implementable in a Library The language should provide the tools necessary to implement as much as possible in the standard library or in application code. 5) Composability Most code, especially in libraries, shouldn't have to care about the underlying resource management strategy of the data it processes, nor should it impose a particular strategy on the user. Resource management strategy should be the responsibility of the client code. This principle should be followed to the greatest possible extent. Especially in light of point 4), this will make it possible to use user-defined RC implementations together with the standard library and other libraries. 6) Additional uses A good feature is applicable outside of the use case it was introduced for. This is all the more important, the more fundamental a change to the language is, so that it can pull its own weight. Proposed Solution ================= Most resource management problems are best described in terms of ownership. Therefore, it is natural to take the solution from the vast amount of research and practical experimentation that has been done in this field. Two things are proposed: A) A way to limit the lifetime of resource handles (mostly references/pointers, but could be other things like file handles) to a particular lexical scope (the `scope` keyword is already designated for that purpose), as well as providing a compiler-checkable escape hatch (`scope!identifier` in my suggestion [2], `return ref` in DIP25). B) A way to bind ownership of a resource to a variable and ensure that this variable is the only (non-ephemeral) handle/reference to that resource. The uniqueness property can be exploited to provide many interesting guarantees. The details of implementation and exact semantics of these two features are not important for the big picture. Let's call them SCOPE and UNIQUE from now on. Evaluation ========== Let's see how we fare: 1) Compatibility Both new features are add-ons to the language, and are therefore opt-in. They won't affect existing programs at all. On the other hand, existing code cannot directly profit from them, but it needs only small modifications to enable that, see 5). 2) Safety/Correctness The features will behave in such a way that guarantees that no references/handles to a resource are left over when the resource is destroyed (SCOPE). Moreover, UNIQUE will have features that allows for safe use in the situations exemplified below. (The details to this are out of scope here, but it has indeed been proven possible.) 3) Efficiency SCOPE and UNIQUE objects by themselves have a memory layout identical to there "normal" counterparts. There are no inherent runtime penalties for them. In fact, they allow for certain performance optimizations. Let's take reference counting as an example: an RC!T wrapper can decay to SCOPE(T), which enables it to elide refcount manipulation entirely, and - depending on the proposal - to SCOPE(RC!T), which allows it to stay copyable, but defer adjusting of the refcount to the point where the actual copying is done. UNIQUE allows conversion to immutable and shared without calling `idup` in many cases. 4) Implementable in a Library The language will provide a minimal, but sufficient implementation of SCOPE and UNIQUE. This can then be used as a building block to implement other things in user code. 5) Composability As written above, only client code should decide about management strategy. Library code comes in two flavors: it can be a consumer of data (probably most cases), or it can be a producer. Most consumers will only temporarily "look at" data they receive from client code, maybe make changes to it, but never keep reference to it around after they return. This is exactly what SCOPE guarantees. Such consumers need to take all their data by SCOPE. UNIQUE(T) is implicitly convertible to SCOPE(T), and user-defined types can make themselves decay to SCOPE(T) using `alias this` or accessor methods. All kinds of data can then be passed to them, no matter whether it is refcounted, GC managed, a stack variable, a global variable, or manually managed. Then there's the producers. These are things like `toString()` and `std.stdio.File`, which allocate memory or other resources and return them to client code. They shouldn't need to care what the client code wants to do with them. UNIQUE enables that: A UNIQUE(T) can be consumed by moving it into an RC!T, a T (which means the GC will manage it from then on), any other user defined type, or another UNIQUE(T). It can also be left to go out of scope, in which case it will be released automatically. Before that, it can also be passed as a SCOPE parameter or stored in a SCOPE variable. (Allocators will play an important role here in practice, but they are actually a different topic: memory allocation != memory management.) Without these two features, consumers would need to be specialized for the various types (=> template bloat or manual work), and producers would either need to decide on one return type or make it configurable (=> template bloat). 6) Additional uses UNIQUE in particular has interesting additional uses. nogc exceptions are one example, safe message passing (transfer of entire graphs of objects to other threads) is another. Some variants of ownership also provide ways to prevent iterator invalidation. [1] http://wiki.dlang.org/Vision/2015H1 [2] http://wiki.dlang.org/User:Schuetzm/scope [3] http://wiki.dlang.org/DIP25
Feb 24 2015