digitalmars.D.announce - TempAlloc
- dsimcha (14/14) Dec 05 2008 As per a discusson on digitalmars.D about a month ago, I've created a te...
- Fawzi Mohamed (10/26) Dec 06 2008 I think that it could be useful to add an argument to to
- dsimca (20/28) Dec 06 2008 Can you elaborate on this? I'm not exactly sure what such an argument w...
- Fawzi Mohamed (23/62) Dec 07 2008 it could simply be an int storing the place in the stack...
- dsimcha (18/31) Dec 07 2008 I really don't see how this can work. AFAIK, there are only 4 ways to d...
- Fawzi Mohamed (10/34) Dec 07 2008 mmh sorry this comes from not having really understood how TmpAlloc was
As per a discusson on digitalmars.D about a month ago, I've created a temp space allocator for quickly allocating memory in a last in, first out manner. Its main use is for allocating scratch space to be used within a function without introducing threading bottlenecks, etc. I've released it as an alpha on scrapple. I'm working on a major update to my statistics library (also on scrapple) and will dogfood this there when I'm done with the update. I think that it's pretty useful, at least in number crunching-type code, in its current form as a library. However, it might be even better if integrated into druntime and the core language to make it safer (it's pretty dangerous if used incorrectly) or cleaner to use (I've done the best I can here, but if I had some compiler support, I could automate some stuff that I make the caller do.) For the code, see http://dsource.org/projects/scrapple/browser/trunk/tempAlloc. The code is for D2 only, but could probably be ported to D1 using Tango's thread-local storage.
Dec 05 2008
On 2008-12-06 00:20:49 +0100, dsimcha <dsimcha yahoo.com> said:As per a discusson on digitalmars.D about a month ago, I've created a temp space allocator for quickly allocating memory in a last in, first out manner. Its main use is for allocating scratch space to be used within a function without introducing threading bottlenecks, etc. I've released it as an alpha on scrapple. I'm working on a major update to my statistics library (also on scrapple) and will dogfood this there when I'm done with the update. I think that it's pretty useful, at least in number crunching-type code, in its current form as a library. However, it might be even better if integrated into druntime and the core language to make it safer (it's pretty dangerous if used incorrectly) or cleaner to use (I've done the best I can here, but if I had some compiler support, I could automate some stuff that I make the caller do.) For the code, see http://dsource.org/projects/scrapple/browser/trunk/tempAlloc. The code is for D2 only, but could probably be ported to D1 using Tango's thread-local storage.I think that it could be useful to add an argument to to frameInit/free, in any case not just to speed things up, but to quickly catch mismatched init/free calls. using GC.malloc in malloc for large requested sizes kind of defeats the stated purpose (and what makes superstack more difficult to use) of not being GC scanned, even if I understand the problem of either making bookkeeping more complicated or create potentially big holes in the stack... Fawzi
Dec 06 2008
== Quote from Fawzi Mohamed (fmohamed mac.com)'s articleI think that it could be useful to add an argument to to frameInit/free, in any case not just to speed things up, but to quickly catch mismatched init/free calls.Can you elaborate on this? I'm not exactly sure what such an argument would look like. Also, I wanted to make TempAlloc fairly simple even if it costs some performance so that I would actually want to use it. If doing something like this adds a lot of complexity for the caller, I probably won't implement it. Right now, I like the simple mixin API.using GC.malloc in malloc for large requested sizes kind of defeats the stated purpose (and what makes superstack more difficult to use) of not being GC scanned, even if I understand the problem of either making bookkeeping more complicated or create potentially big holes in the stack...I see over 4MB in a single allocation to be kind of a corner case anyhow. If you need to allocate this much memory for a temp variable that doesn't escape the current scope, in a single allocation (probably pretty unusual already), since the cost of an allocation is sublinear in bytes allocated, the allocation time will likely be dwarfed by the time it takes to use the memory for whatever you need it for. Furthermore, whether by frameInit/frameFree or by TempAlloc.malloc/TempAlloc.free, the memory gets freed deterministically rather than waiting for the GC to run to free it, so it's still faster than using new, etc. In fact, I'm thinking I should have made this limit ~1MB (about 1/4 the size of a TempAlloc block) since allocating 4MB will always require the creation of a new block anyhow. I'm open to suggestions about how to deal with these large block cases, but I will prioritize handling the common case of much smaller allocations efficiently and cleanly over what I feel is somewhat of a corner case.
Dec 06 2008
On 2008-12-06 15:55:44 +0100, dsimca <dsimcha yahoo.com> said:== Quote from Fawzi Mohamed (fmohamed mac.com)'s articleit could simply be an int storing the place in the stack... So if someone forgets to remove the frame the next call will see it immediately. By the way couldn't you avoid the thread local State by simply keeping a stack of State structures? Then instead of the thread local State you would simply look at the top of the State Stack, it should be faster and more self contained. Then the element to check could really be the size of the stack, and if it is simply an int you can make it optional, if present you check for mismatches... and you get rid of passing the State out... int initFrame()// returns the number of frames of the stack void freeFrame(int handler=-1)// frees the frame, if handler is >=0 then checks the number of framesI think that it could be useful to add an argument to to frameInit/free, in any case not just to speed things up, but to quickly catch mismatched init/free calls.Can you elaborate on this? I'm not exactly sure what such an argument would look like. Also, I wanted to make TempAlloc fairly simple even if it costs some performance so that I would actually want to use it. If doing something like this adds a lot of complexity for the caller, I probably won't implement it. Right now, I like the simple mixin API.no problem I understand, and I am not a user yet. OT: I just read (when looking for your previous messages) about your statistics library. Nice! You might be interested in giving a look to tango's random package, it has fast generation of accurate random bumbers with gauss exp and gamma distributions... D1.0, but bringing it to D2.0 shouldn't be difficult... Fawziusing GC.malloc in malloc for large requested sizes kind of defeats the stated purpose (and what makes superstack more difficult to use) of not being GC scanned, even if I understand the problem of either making bookkeeping more complicated or create potentially big holes in the stack...I see over 4MB in a single allocation to be kind of a corner case anyhow. If you need to allocate this much memory for a temp variable that doesn't escape the current scope, in a single allocation (probably pretty unusual already), since the cost of an allocation is sublinear in bytes allocated, the allocation time will likely be dwarfed by the time it takes to use the memory for whatever you need it for. Furthermore, whether by frameInit/frameFree or by TempAlloc.malloc/TempAlloc.free, the memory gets freed deterministically rather than waiting for the GC to run to free it, so it's still faster than using new, etc. In fact, I'm thinking I should have made this limit ~1MB (about 1/4 the size of a TempAlloc block) since allocating 4MB will always require the creation of a new block anyhow. I'm open to suggestions about how to deal with these large block cases, but I will prioritize handling the common case of much smaller allocations efficiently and cleanly over what I feel is somewhat of a corner case.
Dec 07 2008
== Quote from Fawzi Mohamed (fmohamed mac.com)'s articleit could simply be an int storing the place in the stack... So if someone forgets to remove the frame the next call will see it immediately. By the way couldn't you avoid the thread local State by simply keeping a stack of State structures? Then instead of the thread local State you would simply look at the top of the State Stack, it should be faster and more self contained.I really don't see how this can work. AFAIK, there are only 4 ways to deal with global mutable state in a multithreaded program: 1. Locking. 2. Somehow do everything atomically, if you can. 3. Make the "global" state thread-local. 4. Eliminate it altogether. The biggest reason for me writing TempAlloc in the first place was to avoid lock contention for GC malloc access. TempAlloc is somewhat faster even in single-threaded tests, but the really huge, orders of magnitude speedups come when it's used to avoid this contention.Then the element to check could really be the size of the stack, and if it is simply an int you can make it optional, if present you check for mismatches... and you get rid of passing the State out... int initFrame()// returns the number of frames of the stack void freeFrame(int handler=-1)// frees the frame, if handler is >=0 then checks the number of framesI thought about doing something like this. The biggest problem is large blocks. Since frequent very large allocations are where GC performance really suffers and false pointer issues turn up, I wanted to make sure that these get deleted deterministically, too. Also, just to note: I realized that I had not taken into account alignment when I first wrote TempAlloc. I've updated it to use 16-byte alignment (good for x86), and to fix tempdup to work right with const.
Dec 07 2008
On 2008-12-07 18:44:30 +0100, dsimcha <dsimcha yahoo.com> said:== Quote from Fawzi Mohamed (fmohamed mac.com)'s articlemmh sorry this comes from not having really understood how TmpAlloc was supposed to work. Indeed if you want to hide completely the presence of a State, and not pass it explicitly to the allocation routines you cannot avoid thread local. I was mislead by the name, State is actually the superstack object, what I meant was to keep an extra index in it that simply counts the number of frames, and return/ask it in initFrame freeFrame so that you can check that you did not have any mismatched frames.it could simply be an int storing the place in the stack... So if someone forgets to remove the frame the next call will see it immediately. By the way couldn't you avoid the thread local State by simply keeping a stack of State structures? Then instead of the thread local State you would simply look at the top of the State Stack, it should be faster and more self contained.I really don't see how this can work. AFAIK, there are only 4 ways to deal with global mutable state in a multithreaded program: 1. Locking. 2. Somehow do everything atomically, if you can. 3. Make the "global" state thread-local. 4. Eliminate it altogether. The biggest reason for me writing TempAlloc in the first place was to avoid lock contention for GC malloc access. TempAlloc is somewhat faster even in single-threaded tests, but the really huge, orders of magnitude speedups come when it's used to avoid this contention.
Dec 07 2008