www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Reimplementing software building blocks like malloc and free in D

reply Aruna Maurya <aruna.maurya12 gmail.com> writes:
Hello all! I am a Computer Science undergraduate student. 
Currently a junior, I find the idea of 're-implementing software 
building blocks like malloc and free in D' listed in the wiki 
page of SAOC(Symmetry Autumn of Code) quite interesting.

I don't have experience of coding in D however, I have 4 years of 
coding in C++ and have a good hold on pointers and the standard 
template library. I have also successfully completed a course of 
Operating Systems and Data Structures in my 4th semester. I 
believe I can pick up D in due course of time.

I had few queries regarding the same and would be glad if anyone 
could pitch in to help me out.

1. If it would have been C++, the above can be implemented by 
using the mmap() syscall to                  allocate memory from 
the heap. However, something that I am not able to understand in 
the idea description is 'runtime hooks'.

2. Also as far as I can understand, this idea is about 
implementing the above from scratch, and not tweaking the 
existing codebase. Do let me know if I am wrong.

Thank you for your time!
Aug 11 2018
next sibling parent rikki cattermole <rikki cattermole.co.nz> writes:
On 12/08/2018 6:59 AM, Aruna Maurya wrote:
 1. If it would have been C++, the above can be implemented by using the 
 mmap() syscall to                  allocate memory from the
heap. 
 However, something that I am not able to understand in the idea 
 description is 'runtime hooks'.
A runtime hook is a function that the compiler injects a call to, to perform some function. Like allocate a class. By reimplementing a lot of these primitives, the compiler will be able to optimize them better e.g. inlining and removing dependencies on other runtime information.
 2. Also as far as I can understand, this idea is about implementing the 
 above from scratch, and not tweaking the existing codebase. Do let me 
 know if I am wrong.
It is modifying existing druntime and with it dmd (frontend) after developing these building blocks. This project is just an idea, you can set your own set of goals and scope in the proposal. For example, you may choose to rewrite the memory management stuff in libc in D and support it across Windows, Linux and OSX. Instead of dealing with integration into druntime/dmd.
Aug 11 2018
prev sibling next sibling parent Eugene Wissner <belka caraus.de> writes:
On Saturday, 11 August 2018 at 18:59:00 UTC, Aruna Maurya wrote:
 Hello all! I am a Computer Science undergraduate student. 
 Currently a junior, I find the idea of 're-implementing 
 software building blocks like malloc and free in D' listed in 
 the wiki page of SAOC(Symmetry Autumn of Code) quite 
 interesting.

 I don't have experience of coding in D however, I have 4 years 
 of coding in C++ and have a good hold on pointers and the 
 standard template library. I have also successfully completed a 
 course of Operating Systems and Data Structures in my 4th 
 semester. I believe I can pick up D in due course of time.

 I had few queries regarding the same and would be glad if 
 anyone could pitch in to help me out.

 1. If it would have been C++, the above can be implemented by 
 using the mmap() syscall to                  allocate memory 
 from the heap. However, something that I am not able to 
 understand in the idea description is 'runtime hooks'.

 2. Also as far as I can understand, this idea is about 
 implementing the above from scratch, and not tweaking the 
 existing codebase. Do let me know if I am wrong.

 Thank you for your time!
I missed this project; To let everyone know, I've implemented malloc and free: https://github.com/caraus-ecms/tanya/blob/master/source/tanya/memory/mmappool.d (see allocate() and deallocate() for a starting point). My implementation is just not thread-safe.
Aug 11 2018
prev sibling parent reply Mike Franklin <slavo5150 yahoo.com> writes:
On Saturday, 11 August 2018 at 18:59:00 UTC, Aruna Maurya wrote:

 1. If it would have been C++, the above can be implemented by 
 using the mmap() syscall to                  allocate memory 
 from the heap. However, something that I am not able to 
 understand in the idea description is 'runtime hooks'.
The compiler recognizes certain expressions and lowers them to functions in the D runtime that we call runtime hooks. See https://wiki.dlang.org/Runtime_Hooks for an unmaintained, but still relevant, list of some of them. For example, the expression `cast(int[])a`, where `a` is an array of another type will get lowered to `_d_arraycast`. We're trying to rewrite many of these hooks as templates. D has changed quite a bit over the past few years, and at the time the existing runtime hooks were written D didn't have templates and many other features. See https://github.com/dlang/druntime/pull/2264 for an example converting `_d_arraycast` to a template. Many of the existing runtime hooks leverage software building blocks like `memcpy`, `memcmp`, `malloc`, `free`, etc. in their implementation. If the runtime hooks are implemented as templates, we will have access to the type information and other information at compile-time rather than run-time. This will give us opportunities to specialize implementations at compile-time. See https://github.com/JinShil/memcpyD/blob/master/memcpyd.d for a proof of concept exploring such opportunities for `memcpy`.
 2. Also as far as I can understand, this idea is about 
 implementing the above from scratch, and not tweaking the 
 existing codebase. Do let me know if I am wrong.
I'm the one who proposed the idea, and, yes, I envisioned implementing these building blocks from scratch, translating an existing implementation from another language (watch the license!), or implementing an existing published algorithm in D. There might even be a way to leverage implementation code in https://dlang.org/phobos/std_experimental_allocator.html, but if that code were used in the D runtime, it couldn't have dependencies on Phobos (D's standard library), so it'd probably have to be modified to make it free-standing. I don't know, though, I haven't looked into it in any detail. I didn't envision the participant modifying any existing code. In the case of `malloc`, `realloc`, and `free`, I'd just like to have an idiomatic D implementation that we might be able to leverage when re-implementing the runtime hooks as templates so we no longer have to depend on the C standard library. I think implementing them in D will make D more portable to other platforms, including freestanding bare-metal platforms like operating systems or microcontrollers, and D's unique feature set may provide opportunities to improve upon existing implementations in terms of performance and compile-time guarantees. It's difficult for me to articulate all this, as it's a complex vision in my head, so if the above just raised more questions, keep asking. Also see https://wiki.dlang.org/Memory_Management for some perspective, especially if you're new to D. The ideas on the wiki page are just to plant the seeds of creativity. Feel free to deviate from the idea as you wish, or submit a proposal formulated from your own ideas. Mike
Aug 11 2018
parent reply Aruna Maurya <aruna.maurya12 gmail.com> writes:
On Saturday, 11 August 2018 at 20:15:06 UTC, Mike Franklin wrote:
 On Saturday, 11 August 2018 at 18:59:00 UTC, Aruna Maurya wrote:

 [...]
The compiler recognizes certain expressions and lowers them to functions in the D runtime that we call runtime hooks. See https://wiki.dlang.org/Runtime_Hooks for an unmaintained, but still relevant, list of some of them. [...]
So there are essentially two requirements here. Building the memory management functionalities from scratch in idiomatic D and implementing the existing runtime hooks as templates. The latter isn't quite a part of the idea as mentioned on the page. Also I see in one of the above replies that malloc and free have been already implemented. So is that off the shelf or still available for implementation?
 [...]
Thanks for the resource! Will go through it and get back in case of any doubts.
 [...]
Aug 11 2018
parent reply Mike Franklin <slavo5150 yahoo.com> writes:
On Sunday, 12 August 2018 at 04:16:24 UTC, Aruna Maurya wrote:

 So there are essentially two requirements here. Building the 
 memory management functionalities from scratch in idiomatic D 
 and implementing the existing runtime hooks as templates. The 
 latter isn't quite a part of the idea as mentioned on the page.
I'd say there is only 1 requirement: implementing `malloc`, `realloc`, and `free` in idiomatic D without any dependencies on other libraries from other languages. The "idiomatic D" part means you don't even have to name them "malloc", "realloc", and "free", nor use the same type-unsafe signatures found in the C standard library. Do what makes sense for D, for example, you may want to implement an overload set like `T[] heapAllocate(T)(size_t length)` and `T heapAllocate(T)()`, or something totally different. You may even want to provide some other features for gathering runtime statistics on the heap. The world's your oyster here. The only reason I mentioned the runtime hooks is to help make the case for why these D implementations are needed and provide some perspective about how they may be used. I don't envision the Autumn of Code participant learning about the compiler/runtime interface and refactoring the D runtime with the D implementations (unless they really want to). Regardless of whether we decide to incorporate these D implementations into the official D runtime or not, they will still be worth gold to those who desire using D in freestanding systems programming, so don't concern yourself too much with the d runtime; just make a kick-ass dynamic heap allocator in D, and people will use it.
 Also I see in one of the above replies that malloc and free 
 have been already implemented. So is that off the shelf or 
 still available for implementation?
I wasn't aware of Eugene's implementation. At first glance it looks real nice. It appears to have some dependencies on other features of his Tanya library, so at a minimum, those will have to be removed or replaced. But, how does it perform? In order to make the case for using these D implementations in druntime, we'll have to gather some metrics on the implementations in terms of performance, memory fragmentation, etc. and ensure they are on par with the C standard library implementations or alternative implementations like jemalloc and tcmalloc? Also, can Eugene's implementation be used in ` safe` code and is it compatible with the DIP1000? (See https://github.com/dlang/DIPs/blob/master/DIPs/DIP1000.md and the -dip1000 compiler switch at https://github.com/dlang/DIPs/blob/master/DIPs/DIP1000.md) Perhaps the D implementations can be enhanced to provide compile-time or runtime tuning parameters to achieve better performance in different use cases. Perhaps, with Eugene's permission, you can use his implementation as a base, take some measurements, and see if it can be improved upon with either more features or better performance. I'm sorry if I'm just muddying the waters with all this talk. All I want is a high-quality, idiomatic D implementation of a heap allocator that I can use in my own freestanding software and as a substitute for the C standard library implementations in the D runtime. Mike
Aug 11 2018
next sibling parent reply Eugene Wissner <belka caraus.de> writes:
On Sunday, 12 August 2018 at 05:24:56 UTC, Mike Franklin wrote:
 I wasn't aware of Eugene's implementation.  At first glance it 
 looks real nice.  It appears to have some dependencies on other 
 features of his Tanya library, so at a minimum, those will have 
 to be removed or replaced.
It depends only on internal libc-free syscalls that can be replaced with mmap/munmap and on copy()/copyBackward() that are just memcpy and memmove respectively.
 But, how does it perform?  In order to make the case for using 
 these D implementations in druntime, we'll have to gather some 
 metrics on the implementations in terms of performance, memory 
 fragmentation, etc. and ensure they are on par with the C 
 standard library implementations or alternative implementations 
 like jemalloc and tcmalloc?
Some years ago I had a BigInt implementation that allocated and deallocated memory like crazy. I compared how it works with malloc and my own allocator and my allocator was slightly faster than glibc (probably because of thread-safety). But yes, a good implementation needs benchmarks which test it systematically under various conditions. Tanya's allocator is also tested only on x86-64 Linux, but is trivially to port to Windows and other architectures. I actually had a Windows version before but removed it because I was too lazy to read msdn.
 Perhaps, with Eugene's permission, you can use his 
 implementation as a base, take some measurements, and see if it 
 can be improved upon with either more features or better 
 performance.
You can use it as reference when implementing something else or as a starting point for the further work. Anyway feel free to use it as you like. P.S. In the last weeks I had thoughts to split low-level stuff from tanya and put it into a separate library, kind of native, gc-free x86-64 (and maybe aarch64) Linux runtime for D. Probably I should go for it :)
Aug 11 2018
parent reply Mike Franklin <slavo5150 yahoo.com> writes:
On Sunday, 12 August 2018 at 06:35:17 UTC, Eugene Wissner wrote:

 P.S. In the last weeks I had thoughts to split low-level stuff 
 from tanya and put it into a separate library, kind of native, 
 gc-free x86-64 (and maybe aarch64) Linux runtime for D. 
 Probably I should go for it :)
In recent months, I've been thinking about something like that as well. As I work to improve implementations in DMD and DRuntime, I'm always running into situations where I need utility functions or other convenience implementations that currently exists in Phobos, but I can't use Phobos in DMD or DRuntime. I'd like to have a small dependency-less library that has implementations that don't require much of a runtime implementation (just depending on the language itself), so it can actually be used to create the DRuntime implementations. These include things type conversion, formatted IO, compile-time type information like that found in std.traits and std.meta, and generally useful algorithms like those in std.algorithm. It might even be quite nice to have some features from `std.experimental.allocator` available for implementing memory allocation in Druntime. I started explore this idea with https://github.com/JinShil/utiliD, but found that the Phobos implementations are so interconnected, it's quite a burden to decouple them. That, and I began encountering chicken-and-egg issues with DRuntime implementations. There was a discussion recently about refactoring out the compiler dependencies from the platform dependencies in DRuntime (https://forum.dlang.org/post/pjqepc$2sfv$1 digitalmars.com), and I think that will have to be done first before continuing with the idea. Mike
Aug 12 2018
parent bpr <brogoff gmail.com> writes:
On Monday, 13 August 2018 at 01:49:35 UTC, Mike Franklin wrote:
 On Sunday, 12 August 2018 at 06:35:17 UTC, Eugene Wissner wrote:

 P.S. In the last weeks I had thoughts to split low-level stuff 
 from tanya and put it into a separate library, kind of native, 
 gc-free x86-64 (and maybe aarch64) Linux runtime for D. 
 Probably I should go for it :)
In recent months, I've been thinking about something like that as well.
I think it's a very promising idea. Why not start with DasBetterC and build a Phobos like library from there?
Aug 13 2018
prev sibling parent reply Aruna Maurya <aruna.maurya12 gmail.com> writes:
On Sunday, 12 August 2018 at 05:24:56 UTC, Mike Franklin wrote:
 On Sunday, 12 August 2018 at 04:16:24 UTC, Aruna Maurya wrote:

 [...]
I'd say there is only 1 requirement: implementing `malloc`, `realloc`, and `free` in idiomatic D without any dependencies on other libraries from other languages. [...]
Also what about other implementations like memset or memcmp. Have they been implemented or require work from scratch?
 But, how does it perform?  In order to make the case for using 
 these D implementations in druntime, we'll have to gather some 
 metrics on the implementations in terms of performance, memory 
 fragmentation, etc. and ensure they are on par with the C 
 standard library implementations or alternative implementations 
 like jemalloc and tcmalloc?

 [...]
So the only option to implement the above is to go through Eugene's code, take some measurements and see if it can be improved.
 Perhaps the D implementations can be enhanced to provide 
 compile-time or runtime tuning parameters to achieve better 
 performance in different use cases.

 [...]
That's totally okay. Your every comment is making the idea more clearer!
 [...]
Aug 11 2018
next sibling parent reply Eugene Wissner <belka caraus.de> writes:
On Sunday, 12 August 2018 at 06:44:37 UTC, Aruna Maurya wrote:
 On Sunday, 12 August 2018 at 05:24:56 UTC, Mike Franklin wrote:
 On Sunday, 12 August 2018 at 04:16:24 UTC, Aruna Maurya wrote:

 [...]
I'd say there is only 1 requirement: implementing `malloc`, `realloc`, and `free` in idiomatic D without any dependencies on other libraries from other languages. [...]
Also what about other implementations like memset or memcmp. Have they been implemented or require work from scratch?
These functions are still mostly implemented in asm, so I'm not sure there is an "idiomatic D" way. I would only wrap them into a function working with slices and checking the length. Mike?
 So the only option to implement the above is to go through 
 Eugene's code, take some measurements and see if it can be 
 improved.
As said my allocator is nothing official but I would be overhappy if someone finds it useful. I can also advice to look at Doug Lea malloc: http://g.oswego.edu/dl/html/malloc.html It is like a book. It's one of the best documented programs I've ever seen.
Aug 12 2018
next sibling parent reply Mike Franklin <slavo5150 yahoo.com> writes:
On Sunday, 12 August 2018 at 07:00:30 UTC, Eugene Wissner wrote:

 Also what about other implementations like memset or memcmp. 
 Have they been implemented or require work from scratch?
These functions are still mostly implemented in asm, so I'm not sure there is an "idiomatic D" way. I would only wrap them into a function working with slices and checking the length. Mike?
Inline ASM is a feature of D, so "idiomatic D" includes assembly implementations. Where D shines here is with it's metaprogramming capabilities and the ability to select an implementation at compile-time or compose implementations. See https://github.com/JinShil/memcpyD/blob/master/memcpyd.d There you will that the compiler selects an implementation based on size in powers of 2. Sizes that aren't powers of 2 can be composed of the powers of 2 implementations. For example a 6 byte implementation would be comprised of a 4-byte implementation plus a the 2 byte implementation. I have more code than what's currently check into that repository, but I stalled because I need AVX512 to do an optimized implementation, and that doesn't seem to be a feature of DMD at the moment. That was one reason I posted https://wiki.dlang.org/SAOC_2018_ideas#Implement_AVX2_and_AVX-5 2_in_DMD.2FDRuntime on the wiki. Agner Fog had this to say in https://agner.org/optimize/optimizing_assembly.pdf --- There are several ways of moving large blocks of data. The most common methods are: 1. REP MOVS instruction. 2. If data are aligned: Read and write in a loop with the largest available register size. 3. If size is constant: inline move instructions. 165 4. If data are misaligned: First move as many bytes as required to make the destination aligned. Then read unaligned and write aligned in a loop with the largest available register size. 5. If data are misaligned: Read aligned, shift to compensate for misalignment and write aligned. 6. If the data size is too big for caching, use non-temporal writes to bypass the cache. Shift to compensate for misalignment, if necessary. As you can see, it can be very difficult to choose the optimal method in a given situation. The best advice I can give for a universal memcpy function, based on my testing, is as follows: * On Intel Wolfdale and earlier, Intel Atom, AMD K8 and earlier, and VIA Nano, use the aligned read - shift - aligned write method (5). * On Intel Nehalem and later, method (4) is up to 33% faster than method (5). 166 * On AMD K10 and later and Bobcat, use the unaligned read - aligned write method (4). * The non-temporal write method (6) can be used for data blocks bigger than half the size of the largest-level cache. --- I think D is well suited for an implementation that takes all these things into consideration, selecting an appropriate implementation, and composing complex implementations of the simpler implementations. Mike
Aug 12 2018
parent reply Aruna Maurya <aruna.maurya12 gmail.com> writes:
On Sunday, 12 August 2018 at 08:34:46 UTC, Mike Franklin wrote:
 On Sunday, 12 August 2018 at 07:00:30 UTC, Eugene Wissner wrote:

 [...]
Inline ASM is a feature of D, so "idiomatic D" includes assembly implementations. Where D shines here is with it's metaprogramming capabilities and the ability to select an implementation at compile-time or compose implementations. See https://github.com/JinShil/memcpyD/blob/master/memcpyd.d There you will that the compiler selects an implementation based on size in powers of 2. Sizes that aren't powers of 2 can be composed of the powers of 2 implementations. For example a 6 byte implementation would be comprised of a 4-byte implementation plus a the 2 byte implementation. [...]
So I'll be taking Eugene's code as reference to try and implement malloc free and realloc in dlang.
Aug 12 2018
parent reply Mike Parker <aldacron gmail.com> writes:
On Sunday, 12 August 2018 at 11:23:55 UTC, Aruna Maurya wrote:

 So I'll be taking Eugene's code as reference to try and 
 implement malloc free and realloc in dlang.
Be sure to send your proposal in by August 15!
Aug 12 2018
parent Aruna Maurya <aruna.maurya12 gmail.com> writes:
On Sunday, 12 August 2018 at 11:37:54 UTC, Mike Parker wrote:
 On Sunday, 12 August 2018 at 11:23:55 UTC, Aruna Maurya wrote:

 So I'll be taking Eugene's code as reference to try and 
 implement malloc free and realloc in dlang.
Be sure to send your proposal in by August 15!
Yes. Definitely!I'll do my best!
Aug 12 2018
prev sibling parent Mike Franklin <slavo5150 yahoo.com> writes:
On Sunday, 12 August 2018 at 07:00:30 UTC, Eugene Wissner wrote:

 Also what about other implementations like memset or memcmp. 
 Have they been implemented or require work from scratch?
These functions are still mostly implemented in asm, so I'm not sure there is an "idiomatic D" way. I would only wrap them into a function working with slices and checking the length. Mike?
It is also not only the implementation that's "idiomatic D", but also the interface. For example, I find `void copy(T)(const ref T from, ref T to)` to be much more "idiomatic D" than `void* memcpy(void* to, const void* from, size_t size)`. Mike
Aug 12 2018
prev sibling parent Mike Franklin <slavo5150 yahoo.com> writes:
On Sunday, 12 August 2018 at 06:44:37 UTC, Aruna Maurya wrote:

 Also what about other implementations like memset or memcmp. 
 Have they been implemented or require work from scratch?
I don't know of any D implementations of these other than the trivial and naive. I think it's a lot of work for one person to re-implement all of the software building blocks in D, so I envisioned the Autumn of Code participant choosing one or two to focus on. Mike
Aug 12 2018