www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - potential speed improvement for repeated string concatenation

reply downs <default_357-line yahoo.de> writes:
Here's something interesting I noticed while writing serialization code.
If you're in a situation where you have to repeatedly append small to medium
sized chunks of string 
to a buffer, and the computation of those chunks is relatively cheap, it might
be faster (and use 
less memory) to do it in two passes: first, determining the size of the result
string, then 
allocating it completely and filling it in.
I noticed this while trying to figure out why my serialization code for YAWR
was so atrociously 
slow. I'm passing the ser method a void delegate(char[]) that it's supposed to
call with the 
strings it serializes, in order. So normally, serialization would look like
this:

char[] buffer; ser(data, (char[] f) { buffer~=f; });

When I figured out that it was the primarily bottleneck that caused the delays
while saving, I 
replaced it with the following code

size_t len=0;
ser(data, (char[] f) { len+=f.length; });
auto buffer=new char[len]; len=0;
ser(data, (char[] f) { buffer[len..len+f.length]=f; len+=f.length; }

To my surprise, this more than doubled the speed of that code. So if you have
some block of code 
that does lots of repeated string concats, you might want to give this a try.
  --downs
Jul 24 2007
next sibling parent BCS <ao pathlink.com> writes:
Reply to Downs,

 Here's something interesting I noticed while writing serialization
 code.
 If you're in a situation where you have to repeatedly append small to
 medium sized chunks of string
 to a buffer, and the computation of those chunks is relatively cheap,
 it might be faster (and use
 less memory) to do it in two passes: first, determining the size of
 the result string, then
 allocating it completely and filling it in.
For cases like this, it would be nice to have a VirtualArray type class VArray(T) { opSliceAssign(T[], size_t, size_t); opSliceAssign(T[]); T[] opSlice(size_t, size_t); T[] opSlice(); offset(int); offsetByLast(); } This would act as an array that is as big as it needs to be and where the beginning can be hidden. Then this would work auto a = new VArray!(char); foreach(char[] s; arrayOfStrings) { a[] = s; // copy in starting at "0"; a.offsetByLast } return a[]; Inside this would do copies and reallocates while trying to not keep the number of the second down;
Jul 24 2007
prev sibling next sibling parent 0ffh <spam frankhirsch.net> writes:
downs wrote:
 [...]
 To my surprise, this more than doubled the speed of that code. So if you 
 have some block of code that does lots of repeated string concats, you 
 might want to give this a try.
I suppose what you're effectively doing is avoid memory reallocation, which is rather expensive, and therefore a good thing to avoid... :) Regards, Frank
Jul 24 2007
prev sibling next sibling parent reply janderson <askme me.com> writes:
downs wrote:
 Here's something interesting I noticed while writing serialization code.
 If you're in a situation where you have to repeatedly append small to 
 medium sized chunks of string to a buffer, and the computation of those 
 chunks is relatively cheap, it might be faster (and use less memory) to 
 do it in two passes: first, determining the size of the result string, 
 then allocating it completely and filling it in.
 I noticed this while trying to figure out why my serialization code for 
 YAWR was so atrociously slow. I'm passing the ser method a void 
 delegate(char[]) that it's supposed to call with the strings it 
 serializes, in order. So normally, serialization would look like this:
 
 char[] buffer; ser(data, (char[] f) { buffer~=f; });
 
 When I figured out that it was the primarily bottleneck that caused the 
 delays while saving, I replaced it with the following code
 
 size_t len=0;
 ser(data, (char[] f) { len+=f.length; });
 auto buffer=new char[len]; len=0;
 ser(data, (char[] f) { buffer[len..len+f.length]=f; len+=f.length; }
 
 To my surprise, this more than doubled the speed of that code. So if you 
 have some block of code that does lots of repeated string concats, you 
 might want to give this a try.
  --downs
You can do this as well: void reserve(T)(T[] array, int reserve) { array.length += reserve; array.length = 0; } That way you can use char[] like normal.
Jul 27 2007
parent Serg Kovrov <sergk mailinator.com> writes:
janderson wrote:
 You can do this as well:
 
 void reserve(T)(T[] array, int reserve)
 {
     array.length += reserve;
     array.length = 0;
 }
 
 That way you can use char[] like normal.
Is it *guaranteed* that array.length=0 memory will not cause memory to be collected by GC? --serg.
Aug 02 2007
prev sibling next sibling parent reply Witold Baryluk <baryluk smp.if.uj.edu.pl> writes:
downs wrote:

 Here's something interesting I noticed while writing serialization code.
 If you're in a situation where you have to repeatedly append small to
 medium sized chunks of string to a buffer, and the computation of those
 chunks is relatively cheap, it might be faster (and use less memory) to do
 it in two passes: first, determining the size of the result string, then
 allocating it completely and filling it in. I noticed this while trying to
 figure out why my serialization code for YAWR was so atrociously slow. I'm
 passing the ser method a void delegate(char[]) that it's supposed to call
 with the strings it serializes, in order. So normally, serialization would
 look like this:
 
 char[] buffer; ser(data, (char[] f) { buffer~=f; });
 
 When I figured out that it was the primarily bottleneck that caused the
 delays while saving, I replaced it with the following code
 
 size_t len=0;
 ser(data, (char[] f) { len+=f.length; });
 auto buffer=new char[len]; len=0;
 ser(data, (char[] f) { buffer[len..len+f.length]=f; len+=f.length; }
 
 To my surprise, this more than doubled the speed of that code. So if you
 have some block of code that does lots of repeated string concats, you
 might want to give this a try.
Yes, i using this pattern commonly, if this is possible. I'm going to write data structures similar to cords [1] or ropes [2] in D. Concatenation in them are very fast, because only pointer is copied, not content. Other important features: * easy undo (and fast, and cheap) - you can have multiple cords and will share most of data, even if you change something in one of them * easy (and fast O(1)) append and prepend * easy (and fast O(1)) substring slicing * string are computed only once. * reduced number of memory allocations * easy sharing of similar strings It will be simple to use: auto r = new cord!(char)(); // or new cord!(char)("aaa"); r ~= "xxx"; r ~= "yyy"; r ~= "zzz"; auto r2 = r.toString(); // will first precompute len, and then copy content // of buffers foreach (c; r) { write(c); } writefln("r[3]=%s", r[3]); foreach (block; r.fastiterator) { write(block); } auto r3 = r[1..3]; // O(1) r3.rebalance(); // for faster access r3[2] = 'c'; // will not change r ! O(log n) in 2-3 trees, // O(c*n) in lists, but with very small c (~ 0.01) auto r4 = r ~ r3; // O(1) auto r5 = r.remove(4, 9); // O(1) Common usages: * text editors * servers: appending on receiving, and appending when creating respone * creating big arrays step by step without knowing its size I will first implement simple list based buffers (with small overheads), and then implement second version based on 2-3 trees probably (for very large arrays, > 10MB). I really need this data structures, so it will probably take not so long to do all this things. :) [1] http://www.sgi.com/tech/stl/Rope.html [2] http://www.hpl.hp.com/personal/Hans_Boehm/gc/gc_source/cordh.txt -- Witold Baryluk
Jul 27 2007
next sibling parent kenny <funisher gmail.com> writes:
Witold Baryluk wrote:
 downs wrote:
 
 Here's something interesting I noticed while writing serialization code.
 If you're in a situation where you have to repeatedly append small to
 medium sized chunks of string to a buffer, and the computation of those
 chunks is relatively cheap, it might be faster (and use less memory) to do
 it in two passes: first, determining the size of the result string, then
 allocating it completely and filling it in. I noticed this while trying to
 figure out why my serialization code for YAWR was so atrociously slow. I'm
 passing the ser method a void delegate(char[]) that it's supposed to call
 with the strings it serializes, in order. So normally, serialization would
 look like this:

 char[] buffer; ser(data, (char[] f) { buffer~=f; });

 When I figured out that it was the primarily bottleneck that caused the
 delays while saving, I replaced it with the following code

 size_t len=0;
 ser(data, (char[] f) { len+=f.length; });
 auto buffer=new char[len]; len=0;
 ser(data, (char[] f) { buffer[len..len+f.length]=f; len+=f.length; }

 To my surprise, this more than doubled the speed of that code. So if you
 have some block of code that does lots of repeated string concats, you
 might want to give this a try.
Yes, i using this pattern commonly, if this is possible. I'm going to write data structures similar to cords [1] or ropes [2] in D. Concatenation in them are very fast, because only pointer is copied, not content. Other important features: * easy undo (and fast, and cheap) - you can have multiple cords and will share most of data, even if you change something in one of them * easy (and fast O(1)) append and prepend * easy (and fast O(1)) substring slicing * string are computed only once. * reduced number of memory allocations * easy sharing of similar strings It will be simple to use: auto r = new cord!(char)(); // or new cord!(char)("aaa"); r ~= "xxx"; r ~= "yyy"; r ~= "zzz"; auto r2 = r.toString(); // will first precompute len, and then copy content // of buffers foreach (c; r) { write(c); } writefln("r[3]=%s", r[3]); foreach (block; r.fastiterator) { write(block); } auto r3 = r[1..3]; // O(1) r3.rebalance(); // for faster access r3[2] = 'c'; // will not change r ! O(log n) in 2-3 trees, // O(c*n) in lists, but with very small c (~ 0.01) auto r4 = r ~ r3; // O(1) auto r5 = r.remove(4, 9); // O(1) Common usages: * text editors * servers: appending on receiving, and appending when creating respone * creating big arrays step by step without knowing its size I will first implement simple list based buffers (with small overheads), and then implement second version based on 2-3 trees probably (for very large arrays, > 10MB). I really need this data structures, so it will probably take not so long to do all this things. :) [1] http://www.sgi.com/tech/stl/Rope.html [2] http://www.hpl.hp.com/personal/Hans_Boehm/gc/gc_source/cordh.txt
wow, that looks amazing. I personally could benefit a lot from a library such as this! I do tons and tons of string concatenation at the moment, and hand optimizing every single time is annoying. What can I do to help? kenny
Jul 27 2007
prev sibling parent reply Tristam MacDonald <swiftcoder gmail.com> writes:
How do these perform in a multi-threaded scenario? At a glance, I would guess
these suffer from much the same problems as COW.

Witold Baryluk Wrote:
 downs wrote:
 
 Here's something interesting I noticed while writing serialization code.
 If you're in a situation where you have to repeatedly append small to
 medium sized chunks of string to a buffer, and the computation of those
 chunks is relatively cheap, it might be faster (and use less memory) to do
 it in two passes: first, determining the size of the result string, then
 allocating it completely and filling it in. I noticed this while trying to
 figure out why my serialization code for YAWR was so atrociously slow. I'm
 passing the ser method a void delegate(char[]) that it's supposed to call
 with the strings it serializes, in order. So normally, serialization would
 look like this:
 
 char[] buffer; ser(data, (char[] f) { buffer~=f; });
 
 When I figured out that it was the primarily bottleneck that caused the
 delays while saving, I replaced it with the following code
 
 size_t len=0;
 ser(data, (char[] f) { len+=f.length; });
 auto buffer=new char[len]; len=0;
 ser(data, (char[] f) { buffer[len..len+f.length]=f; len+=f.length; }
 
 To my surprise, this more than doubled the speed of that code. So if you
 have some block of code that does lots of repeated string concats, you
 might want to give this a try.
Yes, i using this pattern commonly, if this is possible. I'm going to write data structures similar to cords [1] or ropes [2] in D. Concatenation in them are very fast, because only pointer is copied, not content. Other important features: * easy undo (and fast, and cheap) - you can have multiple cords and will share most of data, even if you change something in one of them * easy (and fast O(1)) append and prepend * easy (and fast O(1)) substring slicing * string are computed only once. * reduced number of memory allocations * easy sharing of similar strings It will be simple to use: auto r = new cord!(char)(); // or new cord!(char)("aaa"); r ~= "xxx"; r ~= "yyy"; r ~= "zzz"; auto r2 = r.toString(); // will first precompute len, and then copy content // of buffers foreach (c; r) { write(c); } writefln("r[3]=%s", r[3]); foreach (block; r.fastiterator) { write(block); } auto r3 = r[1..3]; // O(1) r3.rebalance(); // for faster access r3[2] = 'c'; // will not change r ! O(log n) in 2-3 trees, // O(c*n) in lists, but with very small c (~ 0.01) auto r4 = r ~ r3; // O(1) auto r5 = r.remove(4, 9); // O(1) Common usages: * text editors * servers: appending on receiving, and appending when creating respone * creating big arrays step by step without knowing its size I will first implement simple list based buffers (with small overheads), and then implement second version based on 2-3 trees probably (for very large arrays, > 10MB). I really need this data structures, so it will probably take not so long to do all this things. :) [1] http://www.sgi.com/tech/stl/Rope.html [2] http://www.hpl.hp.com/personal/Hans_Boehm/gc/gc_source/cordh.txt -- Witold Baryluk
Jul 27 2007
next sibling parent Witold Baryluk <baryluk smp.if.uj.edu.pl> writes:
Tristam MacDonald wrote:

 How do these perform in a multi-threaded scenario? At a glance, I would
 guess these suffer from much the same problems as COW.
I really don't know. But it can reduce amount of copying in COW, because chunks are smaller than whole string probably.
Jul 27 2007
prev sibling parent BCS <ao pathlink.com> writes:
Reply to Tristam,

 How do these perform in a multi-threaded scenario? At a glance, I
 would guess these suffer from much the same problems as COW.
 
 Witold Baryluk Wrote:
 
If I had to guess... It looks like it never modifies data, just makes references to it: Copy reference, replace reference on write. As long as you can get the memory back that should be fast (and lock free to boot).
Jul 27 2007
prev sibling parent BCS <ao pathlink.com> writes:
Reply to Downs,

 Here's something interesting I noticed while writing serialization
 code.
 If you're in a situation where you have to repeatedly append small to
 medium sized chunks of string
 to a buffer, and the computation of those chunks is relatively cheap,
 it might be faster (and use
 less memory) to do it in two passes: first, determining the size of
 the result string, then
 allocating it completely and filling it in.
 I noticed this while trying to figure out why my serialization code
 for YAWR was so atrociously
 slow. I'm passing the ser method a void delegate(char[]) that it's
 supposed to call with the
 strings it serializes, in order. So normally, serialization would look
 like this:
 char[] buffer; ser(data, (char[] f) { buffer~=f; });
 
 When I figured out that it was the primarily bottleneck that caused
 the delays while saving, I replaced it with the following code
 
 size_t len=0;
 ser(data, (char[] f) { len+=f.length; });
 auto buffer=new char[len]; len=0;
 ser(data, (char[] f) { buffer[len..len+f.length]=f; len+=f.length; }
 To my surprise, this more than doubled the speed of that code. So if
 you have some block of code
 that does lots of repeated string concats, you might want to give this
 a try.
 --downs
Because you actually build the strings then it might be even faster to build up a char[][] of them and then string it all together later. |char[] buffer; | |char[][] dmp; |int at = 5; |int i = 0; |ser(data, (char[] f) | { | if(at >= dmp.length) dmp.length = at+10; | dmp[at] = f; | at++; | i+= str.length; | }); | | // stitch it all together |buffer.length = i; |char[] tmp = buffer; |foreach(char[] str; dmp) |{ | tmp[0..str.length] = str[]; | tmp = [str.length..$]; |} this only works if ser allocates new memory for each string.
Jul 27 2007