digitalmars.D - Memory/Allocation attributes for variables
- Elmar (283/283) May 29 2021 Hello dear D community,
- Ola Fosheim Grostad (14/21) May 29 2021 I agree that D has jumped down the rabbithole in terms of
- Elmar (74/96) May 31 2021 Thank you for your reply. Also sorry for the wordyness, I'm just
- Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (12/23) May 31 2021 All high level programming-languages do. Only the low level
- Elmar (110/133) May 31 2021 I suppose you mean the "higher" level languages (because C is by
- Ola Fosheim Grostad (34/79) May 31 2021 Yes, I mean system level language vs proper high level languages
- Ola Fosheim Grostad (4/6) May 31 2021 What I meant here is that depth would be too restrictive as it
- Elmar (100/138) Jun 03 2021 Thank you for answering.
- sighoya (47/72) Jun 03 2021 What if dup is creating things on the heap (I don't know by the
- Elmar (173/216) Jun 05 2021 There is no combinatorial explosion, that would be a bad idea ;-).
- sighoya (27/32) Jun 06 2021 That seems to very much resemble the idea odin strives for:
- Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (33/61) Jun 04 2021 I don't think separate compilation is a good point. I think a
- Elmar (38/54) Jun 05 2021 I know what you mean. Avoiding separate compilation where
- Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (8/14) Jun 05 2021 But introducing all these special cases just to avoid explicit
Hello dear D community, personally I'm interested in security, memory safety and such although I'm not an expert. I would like to know, if D has memory/allocation attributes (I will use both terms interchangeably), or if someone knows a library for that which statically asserts that only values allocated in compatible allocation regions are contained in variables of attributed type. Attributes for memory safety can be seen as extension to the type system because these memory attributes constrain the address domain of values instead of the value domain itself and so they become part of the value domain of a pointer or reference. Now there are A LOT of different allocation regions and scopes for variables for whatever purpose: - static allocation - stack frame - dynamic allocation w/o GC - dynamic allocation with GC - fast allocators optimized for specific objects or even function-specific allocators - peripheral addresses - yes even register as allocation region (which allows a value to not be stored in RAM and thus being not easily overwritten, which is useful for security reasons like storing pointer encryption keys, stack canaries or to assign common registers to variables) - or memory-only allocation (which requires a value to not be stored/hold in registers) Memory safety problems often boil down to that the program accidentally stores a pointer value into a variable being *semantically* out of bounds or pointing to a too small memory area w.r.t. the variable's purpose or scope. (Aliasing of variables in languages which better type-safety this probably impossible, typical is the case for unbounded data structures like C's variadic arguments or attacker-controlled variable-length arrays). I don't know any other language yet which has allocation attributes for pointer/reference variables and allocation attributes for value-typed variables which restricts the allocation region for the data of the variable (some kind of contract on variable level). However, value-type variables are another story because they are allocated at the same time when defined and would only serve as expressive generalization of such attributes, generalized to value-types and even control structures. Looking at what attributes D provides I definitly see that memory safety related concerns are addressed with existing attributes. But I personally find them rather unintuitive to use or difficult to learn for beginners and not flexible enough (like two different versions of `return ref scope`). As currently defined, those attributes don't annotate destinations of data flow but sources (like function arguments) of data flow. What I imagine: More specific scopes (more specifically attributed types) or allocation regions correspond to more generalized types. Some scopes/allocation regions are contained within others (these smaller contained regions are virtually base types of bigger regions) and some regions are disjoint (but they should not intersect each other incompletely which is against the structured-programming paradigm and inheritance analogy in OOP). This results in Scope Polymorphy which statically/dynamically checks the address type of RHS expressions during assignments and memory safety becomes a special case of type safety. I could annotate return values with attributes to make clear that a function returns GC-allocated memory, e.g. using a gc attribute. ```d gc string[] stringifyArray(T : U[], U)(T arr) { import std.algorithm.iteration : map; import std.conv : to; return arr.map!(value => value.to!string); } nogc stringtable = stringifyArray([1, 2, 3]); // error! // a useless factory example new auto makeBoat(double length, double height, Color c) { theAllocator = Mallocator; auto b = new Boat(length, height, c); theAllocator = processAllocator; return b; } // combining multiple attributes gives a union of both which is valid for reference variables new newcpp gc Boat = makeBoat( ... ); // technically, a union of attributes for value types is possible but would // require inferring the most appropriate attribute from context which is difficult ``` Variables with no attributes allow any pointer for assignment and will infer the proper attribute from the assignment. Some of these use cases are already covered by existing attributes: - `scope` makes sure, that a reference variable is not written to an allocation region outside the current function block (which corresponds to using "` scope(function)`" with the argument, see below) and it would be type-unsafe to assign it to a variable type with larger scope. "Scope" basically means, the argument belongs to a stack frame in the caller chain. (It corresponds to arguments annotated with "` caller`", see below.) It's used to tell the function that the referenced value has a limited lifetime in a caller stack frame despite being a reference variable and the reference could become invalid after the function returns so it must not write the value to variables outside the function. For arguments this is very useful and I would rather prefer the complementary case to be explicit. That's where `in` is really useful as a short form. - `ref` specifies that the actual allocation region of a variable's value is outside of the function scope in which the variable is visible (or used). (`out` is similar.) - `return ref` specifies that the value (referenced by the returned reference) is in the same allocation scope as the argument annotated with `return ref` (corresponds to the annotation of the return type with "` scope(argName)`", see below). - `return ref scope`, a combination of two above. The return type is seen to have the same allocation region equal to the one used by this annotated argument. - `__gshared`, `shared`. Variables with these attributes save them in a scope accessible across threads. This is the default in C so that `__gshared` corresponds to C's volatile values which are accepted by ` memory` references. Here is a (really long) collection of many possible memory attributes I am looking for. They define which addresses of values are accepted for the pointer/reference: - auto: allocation in any stack frame, which includes fixed-size value-type variables passed as arguments or return value - stack: dynamic allocation in any stack frame (alloca) - loop: allocation in the region which lives as long as the current loop lives - scope(recursion): allocation-scope not yet available in D I believe, scope which lives as long as the entire recursion lives, equivalent to `loop` in the functional sense. Locals in this scope are accessible to all recursive calls of the same function. - scope(function): allocation in the current stack frame (`scope`d arguments are a special case of this) - scope(label): allocation in the scope of the labeled control structure - scope(identifier): allocation in the same scope as the specified variable name, `return ref` can be seen as special case for return types. - static: allocation/scope in static memory segment (lifetime over entire program runtime), `static` variables and control structures are a special case of this attribute - caller: allocation in the caller's stack frame (usuable for convenient optimizations like shown below), an "implicit argument" when used for value types, corresponds to `ref scope` for reference-type variables. Something in between "static" and "auto". - gc: allocation region managed by D's garbage collector - nogc: disallows pointer/reference to GC-allocated data - new: allocation region managed by Mallocator - newcpp: allocation region managed by stdcpp allocator thing, eases C++ compatibilty - peripheral: target- or even linkerscript-specific memory region for peripherals - register: only stored in a register (with compile-time error if not possible) - shared: allocation region for values which are synchronized between threads - memory: never stored in a register (use case can overlap with "` peripheral`", it's used for variables whose content can change non-deterministically and must be reloaded from memory each time, for example interrupt-handler modified variables, it also prevents optimization when unwanted) - make(allocator): allocated by the given allocator (dynamic type check required, if "allocator" is a dynamic object) In the basic version for reference variables these attributes statically/dynamically assert that a given pointer value is in bounds of that allocation region. Of course, this is a long list of personal ideas and some of them could be unpopular in the community. But I think, all of them would be a tribute to Systems programming. Why are such attributes useful? At first because type-safe design means to restrict value domains as much as possible so that it is only as large as required. They restrict the address (pointer value) at which a value bounded by a variable can be located and provide additional static type checks as well as *allocation transparency* (something which I miss in every language I used so far). The good thing is, if no attribute is provided, it can be inferred from the location where the value-typed variable is defined or is inferred from the assigned pointer value for reference types. Maybe also useful: with additional memory safety attributes, it could become legitimate to assign to `scope`d reference variables. For reference-type variables, these attributes are simple value domain checks of the pointer variable. A disadvantage of memory attributes is (like with polymorphy) that runtime checks might be needed in some cases when static analysis isn't sufficient (if attributes are casted). An interesting extension is a generalization to value-type variables. It can generalize the `scope` and `return` attribute to value-types. While probably not un-controversal it could allow fine control over variable allocation and force where a value-typed variable is allocated exactly (allocation guarantees). You could indirectly define a variable in a nested code block which is allocated for another scope. The main disadvantage I can think of is only, that it cannot be just created as a library add on. ```d outer: { // ... scope(inner) uint key = generateKey(seed); // precomputes the RHS // and initializes the LHS with the memorized value when entering the "inner" block seed.destroy(); // do something with seed, modify/destroy it, whatever // key is not available/accessible here // Message cipher; // <-- implicit but uninitialized inner: if (useSecurity) { // if not entered, the init-value of the variable is used scope(outer) Message cipher = encrypt(key); // Implicitly defines "cipher" uninitialized in "outer" scope. // Generates default init in all other control flow paths without scope(outer) definition } //else cipher = Messsage.init; // <-- implicit, actual initialization decrypt(cipher, key); // error, key is only available in the "inner" scope } ``` Some would criticize the unconventional visibility of `cipher` which doesn't follow common visibility rules. For example if `static` variables are defined in functions, they are still only visible in the function itself and not in the entire scope in which they live. So a likely improvement would be that the visibility is not impacted by the attribute, only the point of actual creation/destruction. Just looking at the previous example, it would seem useless at first, but it's not if loops are considered (and variables which have ` loop` scope, that means are created on loop entry and only destructed on loop exit). Also interesting cases can emerge for additional user optimization in order to avoid costly recomputation by using a larger scope as allocation region: ```d double permeability(Vec2f direction) { caller Vec2f grad = calculateTextureDerivative(); // "grad" is a common setup shared by all calls to "permeability" from the same caller instance // It is hidden from the caller because it's an implementation detail of this function. // All calls of "grad" by the same caller will use the same variable. // It would be implemented as invisible argument whose initialization // happens in the caller. The variable is stored on the caller's site as // invisible variable and is passed with every call. return scalprod(direction, grad); } ``` A main benefit of this feature is readability and in some cases optimization because the executed function is not repeated for every call, only if the repetition is needed which can be computed in the callee instead. For closures the ` caller` scope is clear but it also works for non-closure functions as an invisible argument. Modifications to a ` caller ref` variable are remembered for consecutive calls from the same caller stack frame whereas ` caller` without ref maybe only modifies a local copy. Or being able to create Arrays easily on the stack which is yet a further extension ```d auto arr1 = [0, 1, 2, 3]; // asserts fixed-size, okay, but variable size would fail stack arr2 = dynamicArr.dup; // create a copy on stack, the stack is "scope"d ``` An easy but probably limited implementation would set `theAllocator` before the initialization of such an attributed value-type variable and resets `theAllocator` afterwards to the allocator from before. Finally, one could even more generally annotate control structures with attributes to define in whose scope's entry the control structure's arguments are evaluated (e.g. `static if` is a special case which represents ` static if` in terms of attributes) but this yet another different story and unrelated to allocation. This is it, I'm sorry for the long post. It took me a while to write it down and reread. Regards!
May 29 2021
On Sunday, 30 May 2021 at 02:18:38 UTC, Elmar wrote:Looking at what attributes D provides I definitly see that memory safety related concerns are addressed with existing attributes. But I personally find them rather unintuitive to use or difficult to learn for beginners and not flexible enough (like two different versions of `return ref scope`). As currently defined, those attributes don't annotate destinations of data flow but sources (like function arguments) of data flow.I agree that D has jumped down the rabbithole in terms of usability and function signatures are becoming weirder. The reusage of the term "return" is particularly bad. To a large extent this is the aftermath that comes from changing course when it went from D1 to D2. Where simplicity was sacrificed and it was opened for more and more complexity. Once a language becomes complex, it seems difficult to prevent people from adding just-one-more-feature that adds to the complexity. Also, since experienced users influence the process most... There are nobody to stop it. The main issue is however not specifying where it allocates, but keeping track of it when pointers are put into complex datastructures.
May 29 2021
On Sunday, 30 May 2021 at 05:13:45 UTC, Ola Fosheim Grostad wrote:On Sunday, 30 May 2021 at 02:18:38 UTC, Elmar wrote:Thank you for your reply. Also sorry for the wordyness, I'm just awkwardly detailed sometimes. In your case it's not what I was thinking. I would count myself to the sophisticated programmers (but not the productive ones, unfortunately). I can cope with all those reused keywords even though I think at this place their design is unintuitive to use. Intuitive would be annotation of the return type because the aliasing is a property of the return type, not the argument. At least I feel like I understood the sparsely explained intension behind the current scope-related attributes but my main point is, I find they can be improved with more expressiveness. It would give programmers a hint of what kind of allocated argument is acceptable for a parameter. And no, this is not trivial. It's the reason for my decision to start this thread: *Functions in phobos accept range iterators of fixed-sized arrays as range argument but even if it fails miserably, it compiles happily and accesses illegal memory without any warning, creating fully non-deterministic results with different compilers. I noticed this when I tried to use "map" with fixed-size arrays. It simply misses any tool to check and signal that fixed-size arrays are illegal as range argument for "map". And sometimes mapping onto fixed-size arrays even works.* Without better memory safety tools, I'd discourage more memory efficient programming techniques in D although I'd really like to see D for embedded and resource constrained systems to replace C. --- I wonder how programming languages don't see the obvious, to consider memory safety as a part of type safety (address/allocation properties to be type properties) and that memory unsafe code only means an incomplete type system. I also don't know whether conventional "type safety" in programming languages suffices to eliminate the possibility of deadly programming bugs (aliased reference variables e.g.). But of course, security and safety is complex and there is no way around complexity to make safe code flexible. The important part is the first one (without generalization to allocation and control structures which I only mentioned as an interesting thought) because I think it's an easy but effective addition. D already has features in that direction which is good, the awareness exists, but it's still weak at some points. My post should be seen as a collection of ideas and a request for comment (because maybe my ideas are totally bad or don't fit D) rather than a request to implement all this. The main point is to consider references/pointers as values with critical type safety which means a way to specify stricter constraints. Memory safety is violated by storing a pointer value in a reference which is out of the intended/reasonable value domain of the pointer (not matching its lifetime). If someone already thought the same like me, there could be a safepointer-like user library which supports additional attributes (which represent restricted pointer domains) by implementing a custom pointer/reference type. (It's not a smart pointer because smart pointers try to fix the problem at the other end and require dynamic allocation which is not that nice.) Due to D's nature, it would support safe pointers and safe references (reference variables) and provides static and dynamic type checks with overloaded operators and memory attributes. Attributes couldn't be inferred automatically I guess but annotation of variables could entirely allow static memory safety checks (which doesn't need to explicitly test whether a pointer value is contained in a set of allowed values) and maybe prevents bugs or unwanted side effects. --- One important aspect which I forgot: aliasing of variables. I know, D allows aliased references as arguments by default. Many memory safety problems derive from aliased variables which were not assumed to be aliased. Aliased variables complicate formal verification of code and confuse people. I would add ` alias(symbol)` to my collection which indicates that a reference explicitly aliases (overlap) another reference in memory or a ` noalias(symbol)`. --- If someone thinks, I heavily missed something, please let me know.Looking at what attributes D provides I definitly see that memory safety related concerns are addressed with existing attributes. But I personally find them rather unintuitive to use or difficult to learn for beginners and not flexible enough (like two different versions of `return ref scope`). As currently defined, those attributes don't annotate destinations of data flow but sources (like function arguments) of data flow.I agree that D has jumped down the rabbithole in terms of usability and function signatures are becoming weirder. The reusage of the term "return" is particularly bad. To a large extent this is the aftermath that comes from changing course when it went from D1 to D2. Where simplicity was sacrificed and it was opened for more and more complexity. Once a language becomes complex, it seems difficult to prevent people from adding just-one-more-feature that adds to the complexity. Also, since experienced users influence the process most... There are nobody to stop it. The main issue is however not specifying where it allocates, but keeping track of it when pointers are put into complex datastructures.
May 31 2021
On Monday, 31 May 2021 at 18:21:26 UTC, Elmar wrote:I wonder how programming languages don't see the obvious, to consider memory safety as a part of type safety (address/allocation properties to be type properties) and that memory unsafe code only means an incomplete type system.All high level programming-languages do. Only the low level don't, and that is one of the things what makes their type systems unsound.constraints. Memory safety is violated by storing a pointer value in a reference which is out of the intended/reasonable value domain of the pointer (not matching its lifetime).But how do you keep track of it without requiring that all graphs are acyclic? No back pointers is too constraining. And no, Rust does not solve this. Reference counting does not solve this. How do you prove that a graph remains fully connected when you change one pointer?One important aspect which I forgot: aliasing of variables. I know, D allows aliased references as arguments by default. Many memory safety problems derive from aliased variables which were not assumed to be aliased.So, how do you know that you don't have aliasing when you provide pointers to two graphs? How do you prove that none of the nodes in the graph are shared?
May 31 2021
Good questions :-) . On Monday, 31 May 2021 at 18:56:44 UTC, Ola Fosheim Grøstad wrote:On Monday, 31 May 2021 at 18:21:26 UTC, Elmar wrote:I suppose you mean the "higher" level languages (because C is by original definition also a high-level language). I neither know any "higher" level language which provides the flexibility of constraining the value domain of a pointer/reference except for restricting `null` (non-nullable pointers are probably the most simple domain constraint for pointers/references). I think, not even Ada nor VHDL have it. The thing I'd like to gain with those attributes is a guarantee, that the referenced value wasn't allocated in a certain address region/scope and lives in a lifetime-compatible scope which can be detected by checking the pointer value against an interval or a range of intervals. For example a returned reference to an integer could have been created with "malloc" or even a C++ allocator or interfacing functions could annotate parameters with such attributes. With guarantees about the scope of arguments function implementations can avoid buggy reference assignments to outside variables. The function could expect compatible references allocated with GC but the caller doesn't know it. Whether any reference variable assignment is legitimate can be checked by comparing the source attributes (the reference value which says where the value is allocated) with the destination attributes (where the reference is stored in memory). Even better are runtime checks of pointer values for a better degree of memory safety but only if the programmers want to use it. A reference assignment is legitimate if the destination scope is compatible with the source's scope, not in any other case. I would suggest a lifetime rating for value addresses as follows: *peripheral > system/kernal > global shared > private global (TLS) > extern global (TLS) > shared GC allocated > shared dynamically allocated > GC allocated (TLS) > dynamically allocated (TLS) <=> RAII/scoped/stack <=> RAII/scoped/stack > register* Heap regions are not always comparable to stack or RAII. So the current practice of not allowing assignment to RAII references (using `scope` attribute) is probably best to continue. Everything other than stack addresses are seen as one single lifetime region with equal lifetime. The comparison between stack addresses assumes that an address deeper in the stack has a higher or equal lifetime. The caller could also provide it's stack frame bounds which allows to consider this interval as one single lifetime. It should constrain the possible value domain of pointers absolutely so that no attack with counterfeited pointers to certain memory addresses is possible. If I would use custom allocators for different types I could expect or delimit what the pointer value can be. On Monday, 31 May 2021 at 18:56:44 UTC, Ola Fosheim Grøstad wrote:I wonder how programming languages don't see the obvious, to consider memory safety as a part of type safety (address/allocation properties to be type properties) and that memory unsafe code only means an incomplete type system.All high level programming-languages do. Only the low level don't, and that is one of the things what makes their type systems unsound.I think, this is GC-related memory management, not type checking. The memory attributes don't solve memory management problems. The problem with reference counting usually is solved by inserting weak pointers into cycles (which also solves the apparent contradiction of a cycle of references). Weak references are used by those objects which are deeper in the graph of data links. Otherwise it's a code smell and one could refactor the links into a joint object and deleted objects will deregister in this joint object. I already thought about other allocation schemes for detecting cycles that could be combined with reference counting. For example tagging structs/classes with the ID of the conntected graph in which they are linked if they aren't leaves. But this ID is difficult to change. It can also analyze at compile time which pointers can only be part of a cycle but more explanation leads to far here. Instead the problem, my idea is intended to solve, is 1. giving hints to programmers (to know which kind of allocated memory works with the implementation, stack addresses apparently won't generally work with `map` for example) 2. having static or dynamic (simple) value domain checks (which checks whether a pointer value is in the allowed interval(s) of the allocation address spaces belonging to the attributes) which ensures that only allowed types of allocation are used. These checks can be used to statically or dynamically dispatch functions. Of course such a check could also be performed manually but it's tedious and requires me to put all different function bodies in one `static if else`. It's more of a lightweight solution and works like an ordinary type check (value-in-range check). Where the feature shines most is function signatures because they separate code and create intransparency which can be countered by memory attributes for return type and argument types. On Monday, 31 May 2021 at 18:56:44 UTC, Ola Fosheim Grøstad wrote:constraints. Memory safety is violated by storing a pointer value in a reference which is out of the intended/reasonable value domain of the pointer (not matching its lifetime).But how do you keep track of it without requiring that all graphs are acyclic? No back pointers is too constraining. And no, Rust does not solve this. Reference counting does not solve this. How do you prove that a graph remains fully connected when you change one pointer?Okay, I didn't define aliasing. With "aliasing" I mean that "aliasing references" (or pointers) either point to the exact same address or that the immediately pointed class/struct (pointed to by the reference/pointer) does not overlap. I would consider anything else more complicated than necessary. The definition doesn't care about further indirections. I often only consider the directly pointed struct or class contiguous chunk of memory as "the type". If I code a function, I'm usually only interested in the top level of the type (the "root node" of the type) and further indirections are handled by nested function calls. For example it suffices, if two argument slices are not overlapping. For that I only need to check aliasing as just defined. If you really would like two arguments (graphs) to not share any single pointer value I would suggest using a more appropriate type than a memory attribute, a type which is recursively "unique" (in terms of only using "unique pointers"). Do you think, it sounds like a nice idea to have a data structure attribute `unique` next to `abstract` and `final` which recursively guarantees that any reference or pointer is a unique pointer? If you are interested for a algorithmic answer to your questions, then the best approach (I quickly can think of) is creating an appropriate hash table from all pointers in one graph and testing all pointers in the other graph against it (if I cannot use any properties on the pointers' values, e.g. that certain types and all indirections are allocated in specific pools). But that only works with exactly equal pointer values.One important aspect which I forgot: aliasing of variables. I know, D allows aliased references as arguments by default. Many memory safety problems derive from aliased variables which were not assumed to be aliased.So, how do you know that you don't have aliasing when you provide pointers to two graphs? How do you prove that none of the nodes in the graph are shared?
May 31 2021
On Tuesday, 1 June 2021 at 00:36:17 UTC, Elmar wrote:On Monday, 31 May 2021 at 18:56:44 UTC, Ola Fosheim Grøstad wrote: I suppose you mean the "higher" level languages (because C is by original definition also a high-level language).Yes, I mean system level language vs proper high level languages that abstract away the hardware. "Low level" is not correct, but the usage of the term "system level" tend to lead to debates in these fora as there is a rift between low level and high level programmers...The thing I'd like to gain with those attributes is a guarantee, that the referenced value wasn't allocated in a certain address region/scope and lives in a lifetime-compatible scope which can be detected by checking the pointer value against an interval or a range of intervals. For example a returned reference to an integer could have been created with "malloc" or even a C++ allocator or interfacing functions could annotate parameters with such attributes.Well, I guess you are new, but Walter will refuse having many pointer types. Even the simple distinction between gc and raw pointers will be refused. The reason being that it would lead to a combinatorial explosion of function instances and prevent separate compilation. So for D, this is not a probable solution. That means you cannot do it through the regular type system, so that means you will have to do shape analysis of datastuctures. I personally have in the past argued that it would be an interesting experiment to make all functions templates and template pointer parameter types. That you can do with library pointer types, as a proof of concept, yourself. Then you will see what the effect is.lifetime region with equal lifetime. The comparison between stack addresses assumes that an address deeper in the stack has a higher or equal lifetime. The caller could also provide it's stack frame bounds which allows to consider this interval as one single lifetime.How about coroutines? Now you have multiple stacks.I think, this is GC-related memory management, not type checking. The memory attributes don't solve memory management problems. The problem with reference counting usually is solved by inserting weak pointers into cycles (which also solves the apparent contradiction of a cycle of references). Weak references are used by those objects which are deeper in the graph of data links.No, depth does not work, you could define an acyclic graph of owning pointers and then use weak pointers elsewhere. This restricts modelling and algorithms. So compiler verified restriction of non-weak references might be too restrictive? So basically whenever changing a non-weak reference the compiler has to prove that the graph still is non-weak acyclic. Maybe possible, but does not sound trivial.2. having static or dynamic (simple) value domain checks (which checks whether a pointer value is in the allowed interval(s) of the allocation address spaces belonging to the attributes) which ensures that only allowed types of allocation are used. These checks can be used to statically or dynamically dispatch functions. Of course such a check could also be performed manually but it's tedious and requires me to put all different function bodies in one `static if else`.Dynamic checks are unlikely to be accepted, I suggest you do this as a library.Where the feature shines most is function signatures because they separate code and create intransparency which can be countered by memory attributes for return type and argument types.Unfortunately, this is also why it will be rejected.Okay, I didn't define aliasing. With "aliasing" I mean that "aliasing references" (or pointers) either point to the exact same address or that the immediately pointed class/struct (pointed to by the reference/pointer) does not overlap. I would consider anything else more complicated than necessary.Insufficient for D with library container types and library smart pointers.Do you think, it sounds like a nice idea to have a data structure attribute `unique` next to `abstract` and `final` which recursively guarantees that any reference or pointer is a unique pointer?Yes, some want isolated pointers, but you have to do all this stuff as library smart pointers in D.
May 31 2021
On Tuesday, 1 June 2021 at 06:12:05 UTC, Ola Fosheim Grostad wrote:No, depth does not work, you could define an acyclic graph of owning pointers and then use weak pointers elsewhere.What I meant here is that depth would be too restrictive as it would prevent reasonable insertions nodes.
May 31 2021
Thank you for answering. On Tuesday, 1 June 2021 at 06:12:05 UTC, Ola Fosheim Grostad wrote: ...The separate compilation is a good point. Binary compatibility is a common property considered for security safeguards. But at least static checking with attributes would need no memory addresses at all (also if the compiler can infer the attribute for every value-typed variable automatically from where it is defined). Dynamic checks of pointers accross binary interfaces are difficult. It would work flawlessly with library-internal memory regions but for outside pointer values it can only rely on runtime information (memory regions used by allocators) or cannot perform checks at all (because it doesn't know the address ranges to check against). Or it would work better if binaries would support relocations for application-related memory addresses which are filled at link time. Static checks strike the balance here.The thing I'd like to gain with those attributes is a guarantee, that the referenced value wasn't allocated in a certain address region/scope and lives in a lifetime-compatible scope which can be detected by checking the pointer value against an interval or a range of intervals. For example a returned reference to an integer could have been created with "malloc" or even a C++ allocator or interfacing functions could annotate parameters with such attributes.Well, I guess you are new, but Walter will refuse having many pointer types. Even the simple distinction between gc and raw pointers will be refused. The reason being that it would lead to a combinatorial explosion of function instances and prevent separate compilation.I personally have in the past argued that it would be an interesting experiment to make all functions templates and template pointer parameter types. That you can do with library pointer types, as a proof of concept, yourself. Then you will see what the effect is.Okay, that's fine. Pointers in D are not debatable, I would not try. I think, any new language should remove the concept of pointers entirely rather than introducing new pointers. Pointers from C should be treated as reference variables, pointers to C as either an unbounded slice (if bounded, there should be another `size_t` argument to the function) or it passes addresses obtained from variables. As a C programmer I'd say that C's pointer concept was never needed as it stands, it just was created to be an unsafe reference variable + a reference + an iterator all-in-one-solution as the simplest generic thing which beats it all (without knowing the use case by looking at the pointer type). Attributes only would check properness of pointer value assignments without code duplication of the function as `auto ref` is doing. (One can still interprete it as part of the type.) On Tuesday, 1 June 2021 at 06:12:05 UTC, Ola Fosheim Grostad wrote:Thanks, I missed that, at least true coroutines have. Other things also can dissect stack-frame memory (function-specific allocators in the stack-region). But in our case, it's already a question whether such special stack frames still should be allocated in the stack-region, statically (as I implemented it once for C) or in a heap region (like stack frames of continuations). You could at least place coroutine stack frames in some allocator region in static memory. A probably less fragile but more costly solution (when checking stack-addresses) for stack address scope would be storing the stack depth of an address in the upper k-bit portion of a wide pointer value (for a simple check) but this is only a further unrelated idea.lifetime region with equal lifetime. The comparison between stack addresses assumes that an address deeper in the stack has a higher or equal lifetime. The caller could also provide it's stack frame bounds which allows to consider this interval as one single lifetime.How about coroutines? Now you have multiple stacks.Dynamic checks are unlikely to be accepted, I suggest you do this as a library.Right, if nobody tried it so far I'd like myself. Then I can firm my D experience with further practice. I'd compare the nature of static and dynamic attribute checks to the nature of C++ `static_cast` and `dynamic_cast` of class pointers. I was thinking, such a user library could use `__traits` with templated operator overloads.So, is that D's tenor that function signatures are thought to create *in*transparency and should continue to do so? Does the community think, allocation and memory transparency is a bad thing or just not needed? IMO, allocation and memory transparency is relevant to being a serious Systems programming language (even Systems Programming :-D ). Isn't the missing memory transparency from outside of functions the reason why global variables are frowned upon by many? Related to referential transparency (side effects), less transparency makes programs harder to debug, decouple and APIs harder to use right. (Just the single `map` issue with fixed-size arrays...)Where the feature shines most is function signatures because they separate code and create intransparency which can be countered by memory attributes for return type and argument types.Unfortunately, this is also why it will be rejected.Yeah. It makes no sense if we consider the pointer layers between the exposed pointer and the actual data (I assume, smart pointers in D are implemented with such a middle layer in between). But if it only means the first payload data layer that represents the actual root node of any graph-like data structure, is it still flawed? At least, if I can annotate all pointer variables in my data structures and if checks are done for every single reference/pointer assignment with any access so that no pointer value range in the entire structure ever becomes violated, isn't it closer to memory safety than without? Of course, I could still pass references to those pointers to a binary which write into it without knowing any type information but that's a deliberate risk which static type checking cannot mitigate, only dynamic value checking of the pointed data after function return. (Probably another useful safety feature for my idea.) Of course attributes are optional, nobody has to annotate anything with the risk of obtaining falsely scoped pointer values. But would you agree, it would be better than not having it? Of course, it doesn't make everything safe, particularly if one can omit it but annotating variables with attributes could help with ownership (I think in a better design than Walter's proposal of yet another function attribute live instead of a variable attribute). With ownership I mean to prevent leakage of (sensible) data out of a function (not just reference values as with `scope`) and could provide some sanity checks and even provide more transparency for API use (because then I can see what kind of allocated memory I can expect for parameters and return value). I think, it could improve interfacing with C++ as well. At the end, I only want certainty about the references and pointers when I look into a function signature. I probably should (try to) implement it myself as a proof of concept. Regards, ElmarOkay, I didn't define aliasing. With "aliasing" I mean that "aliasing references" (or pointers) either point to the exact same address or that the immediately pointed class/struct (pointed to by the reference/pointer) does not overlap. I would consider anything else more complicated than necessary.Insufficient for D with library container types and library smart pointers.
Jun 03 2021
On Thursday, 3 June 2021 at 17:26:03 UTC, Elmar wrote:```D stack arr2 = dynamicArr.dup; // create a copy on stack, the stack is "scope"d ``` An easy but probably limited implementation would set theAllocator before the initialization of such an attributed value-type variable and resets theAllocator afterwards to the allocator from before.What if dup is creating things on the heap (I don't know by the way). You need to make the allocator dynamically scoped.```D register: only stored in a register (with compile-time error if not possible) ```Only if you have complete control over the backend. Beside the combinatorial explosion in the required logic to check for, what happens if we copy/moving data between different memory annotated variables, e.g. nogc to gc, newcpp to gc. Did we auto copy, cast or throw an error. If we do not throw an error, an annotation might not only restrict access but also change semantics by introducing new references. So annotations become implied actions, that can be ok but is eventually hard to accept for the current uses of annotations.Does the community think, allocation and memory transparency is a bad thing or just not needed? IMO, allocation and memory transparency is relevant to being a serious Systems programming language (even though C doesn't have it, C++ doesn't have it andThere is no such thing as memory transparency, strictly speaking, even if you want to allocate things on the stack, what is if your backend doesn't have a stack at all? Or we just rename the heap to stack? In the end, we aren't that better what C or high level languages do, we have some heuristic though that our structures map better to the underlying hardware.As a C programmer I'd say that C's pointer concept was never needed as it stands, it just was created to be an unsafe reference variable + a reference + an iterator all-in-one-solution as the simplest generic thing which beats it all (without knowing the use case by looking at the pointer type).Well, I think having both is problematic/complex. But C has only one of those and C++ has both. It's not quite correct what arrays belong, so that's a mistake.I think, any new language should remove the concept of pointers entirely rather than introducing new pointers.Why not removing the distinction between values and references/pointers at all? But I think it drifts to hard in a logically high level language and isn't the right way to go in a system level language although very interesting. Annotations seam to be neat, but they parametrize your code: ```D allocator("X"), lifetime("param1", "greather", "param2") void f(Type1 param1, Type2 param2) ``` becomes ```D void f(Allocator X,Lifetime lifetime(param1), Lifetime lifetime(param2))(Type1 param1, Type2 param2) if currentAllocator=X && lifetime(param1)>=lifetime(param2) {...} ``` which literally turns every function allocating something into a template increasing "templatism" unless we get runtime generics as Swift. To summarize, I find these improvements interesting, but - doesn't feel system level - at all possible in a 20 years old language? Some Info: Rust distracts me in the point being a mix of high level with low level. They have values and references for their ownership/borrowing system but then also custom pointer types which doesn't interact well with the former.
Jun 03 2021
Thank you for your input. On Thursday, 3 June 2021 at 19:36:32 UTC, sighoya wrote:Beside the combinatorial explosion in the required logic to check for, what happens if we copy/moving data between different memory annotated variables, e.g. nogc to gc, newcpp to gc. Did we auto copy, cast or throw an error. If we do not throw an error, an annotation might not only restrict access but also change semantics by introducing new references. So annotations become implied actions, that can be ok but is eventually hard to accept for the current uses of annotations.There is no combinatorial explosion, that would be a bad idea ;-). Annotated references behave like a super class of non-annotated references or, say, a subset of attributes is a super class of a superset of attributes. The best effect description (in the dynamic case) would be viewing memory attributes like a precondition which requires the address value to be in certain interval(s). Currently attributes only have compile-time semantics, you said, so a static check would fit, right? On Thursday, 3 June 2021 at 19:36:32 UTC, sighoya wrote:There is no such thing as memory transparency, strictly speaking, even if you want to allocate things on the stack, what is if your backend doesn't have a stack at all? Or we just rename the heap to stack?Okay, "memory transparency" is a bad name. It could seem that it reveals actual memory addresses. I mean "allocation" or "scope" transparency. Concerning the call stack: Languages which don't provide the abstraction of scoped variables (which is implemented by a (call) stack) basically only have global variables + registers. Currently, I'm not conscious about a high-level language nor processor which wouldn't support that abstraction because it's the most basic abstraction of any high-level language. If you have "functions" then you also have a call stack or let's call it "automatic scope". It doesn't matter, whether automatic scope is allocated in heap area (which can happen with closures and continuations), static memory area (for non-recursive functions) or in its own area at the end of the memory layout, it only matters that it's automatically managed by the function. I also think that CPUs which don't support a call stack cannot be programmed with D at all. If attributes are used with static checks, it will not care about the actual memory address value, only the location in source code where a value was allocated or about the attributes which it gets from the user. The automatic lifetime is the criterion to distinguish it from heap, GC or static memory. For dynamic checks, I indeed made an assumption, that in real programs actual lifetime/scope can be inferred from memory addresses because allocation regions of related scope usually put variables in common memory areas (at least in common memory segments). This would result in pointer types to be value ranges instead of unconstrained 32-bit integers. Ultimately, information from a linker script could be needed for authentic dynamic checks (using relocated address for checking). I could imagine this to be difficult on top. Data from stack frames in the heap would be treated as dynamically allocated and data from static stack frames would be treated as stack. This could lead to unexpected results, false errors, unless more information is passed with the pointer. Dynamic checks would require a separate implementation (separate type) which memorizes in some bits which allocation scope a value was created. Eventually, the dynamic solution is less lightweight in memory but it makes the value check easier. On Thursday, 3 June 2021 at 19:36:32 UTC, sighoya wrote:Well, I think having both is problematic/complex. But C has only one of those and C++ has both. It's not quite correct what arrays belong, so that's a mistake.You mean references and pointers right? References (from C++) are immutable pointers (in theory). C++ has pointers for backwards compatibility (and probably because the designer originally didn't understand the problem) but are now discouraged from being used as "raw pointers" (when I wrote "pointer" I mean "raw pointer"). (Raw) Pointers instead are modifiable "reference variables" (like the variables in Java) which additionally provide access to the pointer address and allow modifying it. Reference variables however don't allow casting to non-pointer types. Arrays in C and C++ are actually more like C++ references, i.e. (locally) immutable pointers. On Thursday, 3 June 2021 at 19:36:32 UTC, sighoya wrote:Annotations seam to be neat, but they parametrize your code: ```D allocator("X"), lifetime("param1", "greather", "param2") void f(Type1 param1, Type2 param2) ``` becomes ```D void f(Allocator X,Lifetime lifetime(param1), Lifetime lifetime(param2))(Type1 param1, Type2 param2) if currentAllocator=X && lifetime(param1)>=lifetime(param2) {...} ``` which literally turns every function allocating something into a template increasing "templatism" unless we get runtime generics as Swift.I agree that templatism is bad. Are attributes really lowered to template-arguments by the compiler? I also didn't mean to introduce new syntax with a comma between attributes. With memory attributes I really mean attributes like `scope`, `ref`, `private`, `pure`, ` nogc` ... which are used with reference/pointer types, not functions. You would be right that any assignment operation to an annotated reference needs a templated overload. I can't think of another way how to implement it. In the worst case, it would become something like ```D Ref!(nogc, Flower) tulip; // anything but not allocated by garbage collector Ref!(static, new, Bird) raven; // no automatic allocation ``` I would already be happy with the most important attributes. - "Oh, I see, it returns me GC-allocated memory" - "Oh, the passed argument is allocated automatically, so I can't put the address into a static reference." - "Oh, a slice over a fixed-size array will not work with that function." --- Of course, the amount of safety to get from these attributes depends on the programmer. For example they don't prevent Use-after-free with ` newc` and ` newcpp` in every case because it could be that a referenced value suddenly is deleted by code which interrupts the function execution. The true scope depends not only on the location of allocation but also on the location of associated deallocation. If the deallocation can happen in a code block, which interrupts normal function execution, than I would treat it either like `shared` or ` memory`. The compiler can't know all by itself. The memory safety thus will only work if the proper attributes are used by programmers. But the fact, that D already implements a very small weaker subset of memory (or reference) attributes like `scope` and `return ref` shows, that this idea does fit to D's design. --- PS: On Thursday, 3 June 2021 at 19:36:32 UTC, sighoya wrote:Why not removing the distinction between values and references/pointers at all? But I think it drifts to hard in a logically high level language and isn't the right way to go in a system level language although very interesting.I'm getting offtopic but I totally agree with you! I have a System Programming language idea which treats every variable as a reference variable to get rid of the annoying value categories and value concept by using a unified variable access interface which allows for using different reference implementations for different optimization scenarios (like using registers to store and modify the referenced value). On Thursday, 3 June 2021 at 19:36:32 UTC, sighoya wrote:On Thursday, 3 June 2021 at 17:26:03 UTC, Elmar wrote:This was supposed to be a side idea unrelated to the main idea. `dup` does allocate memory with GC, so you'd be right when we talk about annotating references, but the snippet here is supposed to inject the annotated allocation of a *value* definition into the RHS, i.e. `dup`s internal implementation. If you like I elaborate more on that idea: Idea: when annotating value declarations it specifies where the return value or expression value of the RHS is allocated and thus, where the variable will be located in memory. This theoretical idea would give more control over the variable's allocation. This idea is not odd because a small subset of such attributes for value variables IS already implemented in D, like static/global variables, automatic local variables (of course) and member scope variables in structs and classes. C also features ` memory` (`volatile`) and to some extend ` register` (C99's `restrict` is only close to it (which keeps pointer dereferenced values in registers for further dereferencing) or with language extensions to map variables to specific registers). I thought, this would be not popular because it seems like D doesn't want to be too much an alternative for C++ Systems Programming and generalizing this concept seems like a bigger change. That's why this idea was only a side note. ```D gc short opal = 3; // eqv. to ref short opal = cast(ref short)GC.make!short(); opal = 5; newc int emerald = 5; // eqv. to ref int emerald = cast(ref int)malloc(int.sizeof); emerald = 5; new float ruby = 8.; // new uses the "new" operator, which is not always dynamic allocation rc int amethyst = 13; // reference counted, basically an abstraction over an underlying shared pointer ... free(&emerald); // needed because newc is not automatically managed ``` A benefit is that these variables still are used like values, i.e. they are passed by value or by reference depending on the function parameter type, although, physically, they are a reference of course (because everything is actually reference which is not stored in a register, variables on the call stack are referenced via the Stack Pointer for example). Goal: The responsibility of allocation is shifted from the service, the callee, (which doesn't know about any concrete client's allocation needs) to the client, the caller, (which knows about it's own allocation needs and actually should know what it gets). GC has been introduced to remove the symptoms of this problem (memory management problems) without solving it (consequence: it gets used way more often than needed and is inefficient). The only way, it would be solved reasonably, is letting the caller side (LHS of assignment) deside what it needs, not the callee side (RHS of assignment), because the caller side has to handle it afterwards. A generic solution would be to use some kind of Dependency Injector which handles the allocation, uses the callee to initialize the value and passes it to the caller. It would turn those attributes into a powerful abstraction. A very easy implementation of the Dependency Injector is overriding `theAllocator` while the RHS is computed.```D stack arr2 = dynamicArr.dup; // create a copy on stack, the stack is "scope"d ``` [...]What if dup is creating things on the heap (I don't know by the way). You need to make the allocator dynamically scoped.
Jun 05 2021
On Sunday, 6 June 2021 at 01:10:45 UTC, Elmar wrote:Goal: The responsibility of allocation is shifted from the service, the callee, (which doesn't know about any concrete client's allocation needs) to the client, the caller, (which knows about it's own allocation needs and actually should know what it gets).That seems to very much resemble the idea odin strives for: https://odin-lang.org/docs/overview/#implicit-context-system It sounds very interesting and feels indeed system programming like, but I see some issues with that: - the `new` call may not be the same for unique ptr, rc and gc as gc needs more context - rc/arc may need to insert incs and decs to the begin and end of the scope respectively, but this would that means to templatize the callee or to box over a runtime option, both having draws - overwriting allocators is an interesting concept, but how useful is this given that the code can't be re-adapted meaning not only the allocator make execution performant but also the code around it and both are intertwined to each other - you have a small performance hit because of passing function pointer to the callee or by exchanging them when global vars are used for, it's akin to nondetermistic vs deterministic exception handling - even more, how many custom allocators are passed to the callee because the callee as caller has many callees inside them which use allocation and all the custom allocators are propagating up. I recognize that some minority is interested in this, even me to some kind. I'm still skeptical about the gain. However, what is if you try to extend the compiler to see if it is possible, maybe you find other people to realize your idea. Recalling myself, I know that russhy and IGotD are similar interested in custom allocators.
Jun 06 2021
On Thursday, 3 June 2021 at 17:26:03 UTC, Elmar wrote:On Tuesday, 1 June 2021 at 06:12:05 UTC, Ola Fosheim Grostad wrote: The separate compilation is a good point. Binary compatibility is a common property considered for security safeguards. But at least static checking with attributes would need no memory addresses at all (also if the compiler can infer the attribute for every value-typed variable automatically from where it is defined).I don't think separate compilation is a good point. I think a modern language should mix object-file and IR-file linking with caching for speed improvements. Nevertheless, it is being used as an argument for not making D a better language. So, that is what you are up against. D isn't really "modern". It is very much in the C-mold, like C++. It has taken on too many of C++'s flaws. For instance it kept underperforming exceptions instead of making them fast.passes addresses obtained from variables. As a C programmer I'd say that C's pointer concept was never needed as it stands, it just was created to be an unsafe reference variable + a reference + an iterator all-in-one-solution as the simplest generic thing which beats it all (without knowing the use case by looking at the pointer type).C is mostly an abstraction over common machine language instructions. Making a non-optimizing C backend perform reasonably well for handcrafted C-code. C pointers do have a counterpart in C++ STL iterators though. So, one could argue that C-pointers are memory-iterators.Right, if nobody tried it so far I'd like myself. Then I can firm my D experience with further practice. I'd compare the nature of static and dynamic attribute checks to the nature of C++ `static_cast` and `dynamic_cast` of class pointers. I was thinking, such a user library could use `__traits` with templated operator overloads.Sounds like a fun project. (D, as the languages stands, encourages the equivalent of reinterpret_cast, so there is that.)So, is that D's tenor that function signatures are thought to create *in*transparency and should continue to do so? Does the community think, allocation and memory transparency is a bad thing or just not needed? IMO, allocation and memory transparency is relevant to being a serious Systems programming languageLet us not confuse community with creators. :). Also, let us not assume that there is a homogeneous community. So, you have the scripty-camp who are not bothered by the current GC and don't really deal with memory allocations much. Then there is the other camp. As one of those in the other camp, I think that the compiler should do the memory management and be free to optimize. So I am not fond of things like "scope". I think they are crutches. I think the language is becoming arcane by patching it up here and there instead of providing a generic solution.I probably should (try to) implement it myself as a proof of concept.The best option is to just introduce a custom pointer-library, like in C++, that tracks what you want it to track. Don't bother with separate compilation issues. Just template all functions. I think LDC will remove duplicates if the bodies of two functions turn into the same machine code? Then you get a feeling for what it would be like.
Jun 04 2021
On Friday, 4 June 2021 at 08:29:47 UTC, Ola Fosheim Grøstad wrote:I don't think separate compilation is a good point. I think a modern language should mix object-file and IR-file linking with caching for speed improvements.I know what you mean. Avoiding separate compilation where possible allows for more optimization potential but in general you can't avoid it, particularly when source code is not available. D tries to be linkable with C and C++ which don't know D. That's why working with separate binaries needs to be considered (and compatibility with C/C++ really is important because both languages have large code bases). So with "good point" I meant "an important aspect" in research, not that it's nice to need it :-) .C pointers do have a counterpart in C++ STL iterators though. So, one could argue that C-pointers are memory-iterators.Exactly.So, you have the scripty-camp who are not bothered by the current GC and don't really deal with memory allocations much. Then there is the other camp. As one of those in the other camp, I think that the compiler should do the memory management and be free to optimize. So I am not fond of things like "scope". I think they are crutches. I think the language is becoming arcane by patching it up here and there instead of providing a generic solution.`scope` exists for a good reason because guaranteeing compiler optimization is very hard (apparently). The `scope` attribute has two benefits: 1st it makes RAII explicit (which helps to prevent bugs by falsely using it), 2nd it makes the compiler faster and easier because it doesn't need to understand the code in order to search for optimization places. Performing optimization is easier than finding the places that can be optimized and then assuring the correctness of optimization. Finding all optimizations automatically (the ideal case) probably would explode compile-time so therefore attributes are given to programmers for hinting them, it solves the compile-time issue and reduces compiler complexity a lot. The problem with genericity is, it's always a tradeoff with efficiency. It makes coding more complicated and the solution more heavy (because it takes care about more cases), at some point too complicated to be practical. There are cases were the compiler cannot know in any way the correctness of an optimization because it depends on the programmer's intend (or would change behaviour). Then, explicit optimization hints are required.Don't bother with separate compilation issues. Just template all functions. I think LDC will remove duplicates if the bodies of two functions turn into the same machine code?I'm almost certain that duplicate functions will not be removed in all cases, because that's a very difficult problem to solve. It just reminds me of the undecidability of the Liskov Substitution principle according to Wikipedia. This principle requires that contracts/halting still hold with subtype arguments passed to a function. Have a nice day.
Jun 05 2021
On Saturday, 5 June 2021 at 17:55:26 UTC, Elmar wrote:`scope` exists for a good reason because guaranteeing compiler optimization is very hard (apparently). The `scope` attribute has two benefits: 1st it makes RAII explicit (which helps to prevent bugs by falsely using it), 2nd it makes the compiler faster and easier because it doesn't need to understand the code in order to search for optimization places.But introducing all these special cases just to avoid explicit lifetimes like Rust is making the language more complicated, not less. The intention is to make it less complicated, but that will not be the end result, I think. I don't think one can evolve a solution. It has to be designed as a whole, not one piece at a time like D is doing now.
Jun 05 2021