digitalmars.D - potential speed improvement for repeated string concatenation
- downs (18/18) Jul 24 2007 Here's something interesting I noticed while writing serialization code.
- BCS (23/32) Jul 24 2007 For cases like this, it would be nice to have a VirtualArray type
- 0ffh (4/8) Jul 24 2007 I suppose what you're effectively doing is avoid memory reallocation,
- janderson (8/33) Jul 27 2007 You can do this as well:
- Serg Kovrov (4/13) Aug 02 2007 Is it *guaranteed* that array.length=0 memory will not cause memory to
- Witold Baryluk (46/70) Jul 27 2007 Yes, i using this pattern commonly, if this is possible.
- kenny (6/94) Jul 27 2007 wow, that looks amazing. I personally could benefit a lot from a library...
- Tristam MacDonald (2/93) Jul 27 2007
- Witold Baryluk (3/5) Jul 27 2007 I really don't know. But it can reduce amount of copying in COW, because
- BCS (4/9) Jul 27 2007 If I had to guess... It looks like it never modifies data, just makes re...
- BCS (25/54) Jul 27 2007 Because you actually build the strings then it might be even faster to b...
Here's something interesting I noticed while writing serialization code. If you're in a situation where you have to repeatedly append small to medium sized chunks of string to a buffer, and the computation of those chunks is relatively cheap, it might be faster (and use less memory) to do it in two passes: first, determining the size of the result string, then allocating it completely and filling it in. I noticed this while trying to figure out why my serialization code for YAWR was so atrociously slow. I'm passing the ser method a void delegate(char[]) that it's supposed to call with the strings it serializes, in order. So normally, serialization would look like this: char[] buffer; ser(data, (char[] f) { buffer~=f; }); When I figured out that it was the primarily bottleneck that caused the delays while saving, I replaced it with the following code size_t len=0; ser(data, (char[] f) { len+=f.length; }); auto buffer=new char[len]; len=0; ser(data, (char[] f) { buffer[len..len+f.length]=f; len+=f.length; } To my surprise, this more than doubled the speed of that code. So if you have some block of code that does lots of repeated string concats, you might want to give this a try. --downs
Jul 24 2007
Reply to Downs,Here's something interesting I noticed while writing serialization code. If you're in a situation where you have to repeatedly append small to medium sized chunks of string to a buffer, and the computation of those chunks is relatively cheap, it might be faster (and use less memory) to do it in two passes: first, determining the size of the result string, then allocating it completely and filling it in.For cases like this, it would be nice to have a VirtualArray type class VArray(T) { opSliceAssign(T[], size_t, size_t); opSliceAssign(T[]); T[] opSlice(size_t, size_t); T[] opSlice(); offset(int); offsetByLast(); } This would act as an array that is as big as it needs to be and where the beginning can be hidden. Then this would work auto a = new VArray!(char); foreach(char[] s; arrayOfStrings) { a[] = s; // copy in starting at "0"; a.offsetByLast } return a[]; Inside this would do copies and reallocates while trying to not keep the number of the second down;
Jul 24 2007
downs wrote:[...] To my surprise, this more than doubled the speed of that code. So if you have some block of code that does lots of repeated string concats, you might want to give this a try.I suppose what you're effectively doing is avoid memory reallocation, which is rather expensive, and therefore a good thing to avoid... :) Regards, Frank
Jul 24 2007
downs wrote:Here's something interesting I noticed while writing serialization code. If you're in a situation where you have to repeatedly append small to medium sized chunks of string to a buffer, and the computation of those chunks is relatively cheap, it might be faster (and use less memory) to do it in two passes: first, determining the size of the result string, then allocating it completely and filling it in. I noticed this while trying to figure out why my serialization code for YAWR was so atrociously slow. I'm passing the ser method a void delegate(char[]) that it's supposed to call with the strings it serializes, in order. So normally, serialization would look like this: char[] buffer; ser(data, (char[] f) { buffer~=f; }); When I figured out that it was the primarily bottleneck that caused the delays while saving, I replaced it with the following code size_t len=0; ser(data, (char[] f) { len+=f.length; }); auto buffer=new char[len]; len=0; ser(data, (char[] f) { buffer[len..len+f.length]=f; len+=f.length; } To my surprise, this more than doubled the speed of that code. So if you have some block of code that does lots of repeated string concats, you might want to give this a try. --downsYou can do this as well: void reserve(T)(T[] array, int reserve) { array.length += reserve; array.length = 0; } That way you can use char[] like normal.
Jul 27 2007
janderson wrote:You can do this as well: void reserve(T)(T[] array, int reserve) { array.length += reserve; array.length = 0; } That way you can use char[] like normal.Is it *guaranteed* that array.length=0 memory will not cause memory to be collected by GC? --serg.
Aug 02 2007
downs wrote:Here's something interesting I noticed while writing serialization code. If you're in a situation where you have to repeatedly append small to medium sized chunks of string to a buffer, and the computation of those chunks is relatively cheap, it might be faster (and use less memory) to do it in two passes: first, determining the size of the result string, then allocating it completely and filling it in. I noticed this while trying to figure out why my serialization code for YAWR was so atrociously slow. I'm passing the ser method a void delegate(char[]) that it's supposed to call with the strings it serializes, in order. So normally, serialization would look like this: char[] buffer; ser(data, (char[] f) { buffer~=f; }); When I figured out that it was the primarily bottleneck that caused the delays while saving, I replaced it with the following code size_t len=0; ser(data, (char[] f) { len+=f.length; }); auto buffer=new char[len]; len=0; ser(data, (char[] f) { buffer[len..len+f.length]=f; len+=f.length; } To my surprise, this more than doubled the speed of that code. So if you have some block of code that does lots of repeated string concats, you might want to give this a try.Yes, i using this pattern commonly, if this is possible. I'm going to write data structures similar to cords [1] or ropes [2] in D. Concatenation in them are very fast, because only pointer is copied, not content. Other important features: * easy undo (and fast, and cheap) - you can have multiple cords and will share most of data, even if you change something in one of them * easy (and fast O(1)) append and prepend * easy (and fast O(1)) substring slicing * string are computed only once. * reduced number of memory allocations * easy sharing of similar strings It will be simple to use: auto r = new cord!(char)(); // or new cord!(char)("aaa"); r ~= "xxx"; r ~= "yyy"; r ~= "zzz"; auto r2 = r.toString(); // will first precompute len, and then copy content // of buffers foreach (c; r) { write(c); } writefln("r[3]=%s", r[3]); foreach (block; r.fastiterator) { write(block); } auto r3 = r[1..3]; // O(1) r3.rebalance(); // for faster access r3[2] = 'c'; // will not change r ! O(log n) in 2-3 trees, // O(c*n) in lists, but with very small c (~ 0.01) auto r4 = r ~ r3; // O(1) auto r5 = r.remove(4, 9); // O(1) Common usages: * text editors * servers: appending on receiving, and appending when creating respone * creating big arrays step by step without knowing its size I will first implement simple list based buffers (with small overheads), and then implement second version based on 2-3 trees probably (for very large arrays, > 10MB). I really need this data structures, so it will probably take not so long to do all this things. :) [1] http://www.sgi.com/tech/stl/Rope.html [2] http://www.hpl.hp.com/personal/Hans_Boehm/gc/gc_source/cordh.txt -- Witold Baryluk
Jul 27 2007
Witold Baryluk wrote:downs wrote:wow, that looks amazing. I personally could benefit a lot from a library such as this! I do tons and tons of string concatenation at the moment, and hand optimizing every single time is annoying. What can I do to help? kennyHere's something interesting I noticed while writing serialization code. If you're in a situation where you have to repeatedly append small to medium sized chunks of string to a buffer, and the computation of those chunks is relatively cheap, it might be faster (and use less memory) to do it in two passes: first, determining the size of the result string, then allocating it completely and filling it in. I noticed this while trying to figure out why my serialization code for YAWR was so atrociously slow. I'm passing the ser method a void delegate(char[]) that it's supposed to call with the strings it serializes, in order. So normally, serialization would look like this: char[] buffer; ser(data, (char[] f) { buffer~=f; }); When I figured out that it was the primarily bottleneck that caused the delays while saving, I replaced it with the following code size_t len=0; ser(data, (char[] f) { len+=f.length; }); auto buffer=new char[len]; len=0; ser(data, (char[] f) { buffer[len..len+f.length]=f; len+=f.length; } To my surprise, this more than doubled the speed of that code. So if you have some block of code that does lots of repeated string concats, you might want to give this a try.Yes, i using this pattern commonly, if this is possible. I'm going to write data structures similar to cords [1] or ropes [2] in D. Concatenation in them are very fast, because only pointer is copied, not content. Other important features: * easy undo (and fast, and cheap) - you can have multiple cords and will share most of data, even if you change something in one of them * easy (and fast O(1)) append and prepend * easy (and fast O(1)) substring slicing * string are computed only once. * reduced number of memory allocations * easy sharing of similar strings It will be simple to use: auto r = new cord!(char)(); // or new cord!(char)("aaa"); r ~= "xxx"; r ~= "yyy"; r ~= "zzz"; auto r2 = r.toString(); // will first precompute len, and then copy content // of buffers foreach (c; r) { write(c); } writefln("r[3]=%s", r[3]); foreach (block; r.fastiterator) { write(block); } auto r3 = r[1..3]; // O(1) r3.rebalance(); // for faster access r3[2] = 'c'; // will not change r ! O(log n) in 2-3 trees, // O(c*n) in lists, but with very small c (~ 0.01) auto r4 = r ~ r3; // O(1) auto r5 = r.remove(4, 9); // O(1) Common usages: * text editors * servers: appending on receiving, and appending when creating respone * creating big arrays step by step without knowing its size I will first implement simple list based buffers (with small overheads), and then implement second version based on 2-3 trees probably (for very large arrays, > 10MB). I really need this data structures, so it will probably take not so long to do all this things. :) [1] http://www.sgi.com/tech/stl/Rope.html [2] http://www.hpl.hp.com/personal/Hans_Boehm/gc/gc_source/cordh.txt
Jul 27 2007
How do these perform in a multi-threaded scenario? At a glance, I would guess these suffer from much the same problems as COW. Witold Baryluk Wrote:downs wrote:Here's something interesting I noticed while writing serialization code. If you're in a situation where you have to repeatedly append small to medium sized chunks of string to a buffer, and the computation of those chunks is relatively cheap, it might be faster (and use less memory) to do it in two passes: first, determining the size of the result string, then allocating it completely and filling it in. I noticed this while trying to figure out why my serialization code for YAWR was so atrociously slow. I'm passing the ser method a void delegate(char[]) that it's supposed to call with the strings it serializes, in order. So normally, serialization would look like this: char[] buffer; ser(data, (char[] f) { buffer~=f; }); When I figured out that it was the primarily bottleneck that caused the delays while saving, I replaced it with the following code size_t len=0; ser(data, (char[] f) { len+=f.length; }); auto buffer=new char[len]; len=0; ser(data, (char[] f) { buffer[len..len+f.length]=f; len+=f.length; } To my surprise, this more than doubled the speed of that code. So if you have some block of code that does lots of repeated string concats, you might want to give this a try.Yes, i using this pattern commonly, if this is possible. I'm going to write data structures similar to cords [1] or ropes [2] in D. Concatenation in them are very fast, because only pointer is copied, not content. Other important features: * easy undo (and fast, and cheap) - you can have multiple cords and will share most of data, even if you change something in one of them * easy (and fast O(1)) append and prepend * easy (and fast O(1)) substring slicing * string are computed only once. * reduced number of memory allocations * easy sharing of similar strings It will be simple to use: auto r = new cord!(char)(); // or new cord!(char)("aaa"); r ~= "xxx"; r ~= "yyy"; r ~= "zzz"; auto r2 = r.toString(); // will first precompute len, and then copy content // of buffers foreach (c; r) { write(c); } writefln("r[3]=%s", r[3]); foreach (block; r.fastiterator) { write(block); } auto r3 = r[1..3]; // O(1) r3.rebalance(); // for faster access r3[2] = 'c'; // will not change r ! O(log n) in 2-3 trees, // O(c*n) in lists, but with very small c (~ 0.01) auto r4 = r ~ r3; // O(1) auto r5 = r.remove(4, 9); // O(1) Common usages: * text editors * servers: appending on receiving, and appending when creating respone * creating big arrays step by step without knowing its size I will first implement simple list based buffers (with small overheads), and then implement second version based on 2-3 trees probably (for very large arrays, > 10MB). I really need this data structures, so it will probably take not so long to do all this things. :) [1] http://www.sgi.com/tech/stl/Rope.html [2] http://www.hpl.hp.com/personal/Hans_Boehm/gc/gc_source/cordh.txt -- Witold Baryluk
Jul 27 2007
Tristam MacDonald wrote:How do these perform in a multi-threaded scenario? At a glance, I would guess these suffer from much the same problems as COW.I really don't know. But it can reduce amount of copying in COW, because chunks are smaller than whole string probably.
Jul 27 2007
Reply to Tristam,How do these perform in a multi-threaded scenario? At a glance, I would guess these suffer from much the same problems as COW. Witold Baryluk Wrote:If I had to guess... It looks like it never modifies data, just makes references to it: Copy reference, replace reference on write. As long as you can get the memory back that should be fast (and lock free to boot).
Jul 27 2007
Reply to Downs,Here's something interesting I noticed while writing serialization code. If you're in a situation where you have to repeatedly append small to medium sized chunks of string to a buffer, and the computation of those chunks is relatively cheap, it might be faster (and use less memory) to do it in two passes: first, determining the size of the result string, then allocating it completely and filling it in. I noticed this while trying to figure out why my serialization code for YAWR was so atrociously slow. I'm passing the ser method a void delegate(char[]) that it's supposed to call with the strings it serializes, in order. So normally, serialization would look like this: char[] buffer; ser(data, (char[] f) { buffer~=f; }); When I figured out that it was the primarily bottleneck that caused the delays while saving, I replaced it with the following code size_t len=0; ser(data, (char[] f) { len+=f.length; }); auto buffer=new char[len]; len=0; ser(data, (char[] f) { buffer[len..len+f.length]=f; len+=f.length; } To my surprise, this more than doubled the speed of that code. So if you have some block of code that does lots of repeated string concats, you might want to give this a try. --downsBecause you actually build the strings then it might be even faster to build up a char[][] of them and then string it all together later. |char[] buffer; | |char[][] dmp; |int at = 5; |int i = 0; |ser(data, (char[] f) | { | if(at >= dmp.length) dmp.length = at+10; | dmp[at] = f; | at++; | i+= str.length; | }); | | // stitch it all together |buffer.length = i; |char[] tmp = buffer; |foreach(char[] str; dmp) |{ | tmp[0..str.length] = str[]; | tmp = [str.length..$]; |} this only works if ser allocates new memory for each string.
Jul 27 2007