digitalmars.D - potential speed improvement for repeated string concatenation

downs (18/18) Jul 24 2007 Here's something interesting I noticed while writing serialization code.

BCS (23/32) Jul 24 2007 For cases like this, it would be nice to have a VirtualArray type
0ffh (4/8) Jul 24 2007 I suppose what you're effectively doing is avoid memory reallocation,
janderson (8/33) Jul 27 2007 You can do this as well:

Serg Kovrov (4/13) Aug 02 2007 Is it *guaranteed* that array.length=0 memory will not cause memory to

Witold Baryluk (46/70) Jul 27 2007 Yes, i using this pattern commonly, if this is possible.

kenny (6/94) Jul 27 2007 wow, that looks amazing. I personally could benefit a lot from a library...
Tristam MacDonald (2/93) Jul 27 2007

Witold Baryluk (3/5) Jul 27 2007 I really don't know. But it can reduce amount of copying in COW, because
BCS (4/9) Jul 27 2007 If I had to guess... It looks like it never modifies data, just makes re...

BCS (25/54) Jul 27 2007 Because you actually build the strings then it might be even faster to b...

downs <default_357-line yahoo.de> writes:

Here's something interesting I noticed while writing serialization code.
If you're in a situation where you have to repeatedly append small to medium
sized chunks of string 
to a buffer, and the computation of those chunks is relatively cheap, it might
be faster (and use 
less memory) to do it in two passes: first, determining the size of the result
string, then 
allocating it completely and filling it in.
I noticed this while trying to figure out why my serialization code for YAWR
was so atrociously 
slow. I'm passing the ser method a void delegate(char[]) that it's supposed to
call with the 
strings it serializes, in order. So normally, serialization would look like
this:

char[] buffer; ser(data, (char[] f) { buffer~=f; });

When I figured out that it was the primarily bottleneck that caused the delays
while saving, I 
replaced it with the following code

size_t len=0;
ser(data, (char[] f) { len+=f.length; });
auto buffer=new char[len]; len=0;
ser(data, (char[] f) { buffer[len..len+f.length]=f; len+=f.length; }

To my surprise, this more than doubled the speed of that code. So if you have
some block of code 
that does lots of repeated string concats, you might want to give this a try.
  --downs

Jul 24 2007

BCS <ao pathlink.com> writes:

Reply to Downs,

 Here's something interesting I noticed while writing serialization
 code.
 If you're in a situation where you have to repeatedly append small to
 medium sized chunks of string
 to a buffer, and the computation of those chunks is relatively cheap,
 it might be faster (and use
 less memory) to do it in two passes: first, determining the size of
 the result string, then
 allocating it completely and filling it in.


For cases like this, it would be nice to have a VirtualArray type

class VArray(T)
{
  opSliceAssign(T[], size_t, size_t);
  opSliceAssign(T[]);
  T[] opSlice(size_t, size_t);
  T[] opSlice();
  offset(int);
  offsetByLast();
}

This would act as an array that is as big as it needs to be and where the 
beginning can be hidden.

Then this would work

auto a = new VArray!(char);
foreach(char[] s; arrayOfStrings)
{
  a[] = s; // copy in starting at "0";
  a.offsetByLast
}
return a[];

Inside this would do copies and reallocates while trying to not keep the 
number of the second down;

Jul 24 2007

0ffh <spam frankhirsch.net> writes:

downs wrote:
 [...]
 To my surprise, this more than doubled the speed of that code. So if you 
 have some block of code that does lots of repeated string concats, you 
 might want to give this a try.

I suppose what you're effectively doing is avoid memory reallocation,
which is rather expensive, and therefore a good thing to avoid... :)

Regards, Frank

Jul 24 2007

janderson <askme me.com> writes:

downs wrote:
 Here's something interesting I noticed while writing serialization code.
 If you're in a situation where you have to repeatedly append small to 
 medium sized chunks of string to a buffer, and the computation of those 
 chunks is relatively cheap, it might be faster (and use less memory) to 
 do it in two passes: first, determining the size of the result string, 
 then allocating it completely and filling it in.
 I noticed this while trying to figure out why my serialization code for 
 YAWR was so atrociously slow. I'm passing the ser method a void 
 delegate(char[]) that it's supposed to call with the strings it 
 serializes, in order. So normally, serialization would look like this:
 
 char[] buffer; ser(data, (char[] f) { buffer~=f; });
 
 When I figured out that it was the primarily bottleneck that caused the 
 delays while saving, I replaced it with the following code
 
 size_t len=0;
 ser(data, (char[] f) { len+=f.length; });
 auto buffer=new char[len]; len=0;
 ser(data, (char[] f) { buffer[len..len+f.length]=f; len+=f.length; }
 
 To my surprise, this more than doubled the speed of that code. So if you 
 have some block of code that does lots of repeated string concats, you 
 might want to give this a try.
  --downs

You can do this as well:

void reserve(T)(T[] array, int reserve)
{
	array.length += reserve;
	array.length = 0;
}

That way you can use char[] like normal.

Jul 27 2007

Serg Kovrov <sergk mailinator.com> writes:

janderson wrote:
 You can do this as well:
 
 void reserve(T)(T[] array, int reserve)
 {
     array.length += reserve;
     array.length = 0;
 }
 
 That way you can use char[] like normal.

Is it *guaranteed* that array.length=0 memory will not cause memory to 
be collected by GC?

--serg.

Aug 02 2007

Witold Baryluk <baryluk smp.if.uj.edu.pl> writes:

downs wrote:

 Here's something interesting I noticed while writing serialization code.
 If you're in a situation where you have to repeatedly append small to
 medium sized chunks of string to a buffer, and the computation of those
 chunks is relatively cheap, it might be faster (and use less memory) to do
 it in two passes: first, determining the size of the result string, then
 allocating it completely and filling it in. I noticed this while trying to
 figure out why my serialization code for YAWR was so atrociously slow. I'm
 passing the ser method a void delegate(char[]) that it's supposed to call
 with the strings it serializes, in order. So normally, serialization would
 look like this:
 
 char[] buffer; ser(data, (char[] f) { buffer~=f; });
 
 When I figured out that it was the primarily bottleneck that caused the
 delays while saving, I replaced it with the following code
 
 size_t len=0;
 ser(data, (char[] f) { len+=f.length; });
 auto buffer=new char[len]; len=0;
 ser(data, (char[] f) { buffer[len..len+f.length]=f; len+=f.length; }
 
 To my surprise, this more than doubled the speed of that code. So if you
 have some block of code that does lots of repeated string concats, you
 might want to give this a try.

Yes, i using this pattern commonly, if this is possible.

I'm going to write data structures similar to cords [1] or ropes [2] in D.
Concatenation in them are very fast, because only pointer is copied, not
content.

Other important features:
 * easy undo (and fast, and cheap) - you can have multiple cords and will
    share most of data, even if you change something in one of them
 * easy (and fast O(1)) append and prepend
 * easy (and fast O(1)) substring slicing
 * string are computed only once. 
 * reduced number of memory allocations
 * easy sharing of similar strings

It will be simple to use:

 auto r = new cord!(char)();      // or new cord!(char)("aaa");
 r ~= "xxx";
 r ~= "yyy";
 r ~= "zzz";

 auto r2 = r.toString(); // will first precompute len, and then copy content
                         // of buffers

 foreach (c; r) {
   write(c);
 }

 writefln("r[3]=%s", r[3]);

 foreach (block; r.fastiterator) {
   write(block);
 }

 auto r3 = r[1..3]; // O(1)

 r3.rebalance();    // for faster access

 r3[2] = 'c';       // will not change r ! O(log n) in 2-3 trees,
                    // O(c*n) in lists, but with very small c (~ 0.01)

 auto r4 = r ~ r3;  // O(1)

 auto r5 = r.remove(4, 9); // O(1)

Common usages:
  * text editors
  * servers: appending on receiving, and appending when creating respone
  * creating big arrays step by step without knowing its size

I will first implement simple list based buffers (with small overheads),
and then implement second version based on 2-3 trees probably (for very
large arrays, > 10MB).

I really need this data structures, so it will probably take not so long to
do all this things. :)

[1] http://www.sgi.com/tech/stl/Rope.html
[2] http://www.hpl.hp.com/personal/Hans_Boehm/gc/gc_source/cordh.txt

-- 
Witold Baryluk

Jul 27 2007

kenny <funisher gmail.com> writes:

Witold Baryluk wrote:
 downs wrote:
 
 Here's something interesting I noticed while writing serialization code.
 If you're in a situation where you have to repeatedly append small to
 medium sized chunks of string to a buffer, and the computation of those
 chunks is relatively cheap, it might be faster (and use less memory) to do
 it in two passes: first, determining the size of the result string, then
 allocating it completely and filling it in. I noticed this while trying to
 figure out why my serialization code for YAWR was so atrociously slow. I'm
 passing the ser method a void delegate(char[]) that it's supposed to call
 with the strings it serializes, in order. So normally, serialization would
 look like this:

 char[] buffer; ser(data, (char[] f) { buffer~=f; });

 When I figured out that it was the primarily bottleneck that caused the
 delays while saving, I replaced it with the following code

 size_t len=0;
 ser(data, (char[] f) { len+=f.length; });
 auto buffer=new char[len]; len=0;
 ser(data, (char[] f) { buffer[len..len+f.length]=f; len+=f.length; }

 To my surprise, this more than doubled the speed of that code. So if you
 have some block of code that does lots of repeated string concats, you
 might want to give this a try.

 
 Yes, i using this pattern commonly, if this is possible.
 
 I'm going to write data structures similar to cords [1] or ropes [2] in D.
 Concatenation in them are very fast, because only pointer is copied, not
 content.
 
 Other important features:
  * easy undo (and fast, and cheap) - you can have multiple cords and will
     share most of data, even if you change something in one of them
  * easy (and fast O(1)) append and prepend
  * easy (and fast O(1)) substring slicing
  * string are computed only once. 
  * reduced number of memory allocations
  * easy sharing of similar strings
 
 It will be simple to use:
 
  auto r = new cord!(char)();      // or new cord!(char)("aaa");
  r ~= "xxx";
  r ~= "yyy";
  r ~= "zzz";
 
  auto r2 = r.toString(); // will first precompute len, and then copy content
                          // of buffers
 
  foreach (c; r) {
    write(c);
  }
 
  writefln("r[3]=%s", r[3]);
 
  foreach (block; r.fastiterator) {
    write(block);
  }
 
  auto r3 = r[1..3]; // O(1)
 
  r3.rebalance();    // for faster access
 
  r3[2] = 'c';       // will not change r ! O(log n) in 2-3 trees,
                     // O(c*n) in lists, but with very small c (~ 0.01)
 
  auto r4 = r ~ r3;  // O(1)
 
  auto r5 = r.remove(4, 9); // O(1)
 
 Common usages:
   * text editors
   * servers: appending on receiving, and appending when creating respone
   * creating big arrays step by step without knowing its size
 
 I will first implement simple list based buffers (with small overheads),
 and then implement second version based on 2-3 trees probably (for very
 large arrays, > 10MB).
 
 I really need this data structures, so it will probably take not so long to
 do all this things. :)
 
 [1] http://www.sgi.com/tech/stl/Rope.html
 [2] http://www.hpl.hp.com/personal/Hans_Boehm/gc/gc_source/cordh.txt
 

wow, that looks amazing. I personally could benefit a lot from a library such
as this! I do tons and tons of string concatenation at the moment, and hand
optimizing every single time is annoying.

What can I do to help?

kenny

Jul 27 2007

Tristam MacDonald <swiftcoder gmail.com> writes:

How do these perform in a multi-threaded scenario? At a glance, I would guess
these suffer from much the same problems as COW.

Witold Baryluk Wrote:
 downs wrote:
 
 Here's something interesting I noticed while writing serialization code.
 If you're in a situation where you have to repeatedly append small to
 medium sized chunks of string to a buffer, and the computation of those
 chunks is relatively cheap, it might be faster (and use less memory) to do
 it in two passes: first, determining the size of the result string, then
 allocating it completely and filling it in. I noticed this while trying to
 figure out why my serialization code for YAWR was so atrociously slow. I'm
 passing the ser method a void delegate(char[]) that it's supposed to call
 with the strings it serializes, in order. So normally, serialization would
 look like this:
 
 char[] buffer; ser(data, (char[] f) { buffer~=f; });
 
 When I figured out that it was the primarily bottleneck that caused the
 delays while saving, I replaced it with the following code
 
 size_t len=0;
 ser(data, (char[] f) { len+=f.length; });
 auto buffer=new char[len]; len=0;
 ser(data, (char[] f) { buffer[len..len+f.length]=f; len+=f.length; }
 
 To my surprise, this more than doubled the speed of that code. So if you
 have some block of code that does lots of repeated string concats, you
 might want to give this a try.

 
 Yes, i using this pattern commonly, if this is possible.
 
 I'm going to write data structures similar to cords [1] or ropes [2] in D.
 Concatenation in them are very fast, because only pointer is copied, not
 content.
 
 Other important features:
  * easy undo (and fast, and cheap) - you can have multiple cords and will
     share most of data, even if you change something in one of them
  * easy (and fast O(1)) append and prepend
  * easy (and fast O(1)) substring slicing
  * string are computed only once. 
  * reduced number of memory allocations
  * easy sharing of similar strings
 
 It will be simple to use:
 
  auto r = new cord!(char)();      // or new cord!(char)("aaa");
  r ~= "xxx";
  r ~= "yyy";
  r ~= "zzz";
 
  auto r2 = r.toString(); // will first precompute len, and then copy content
                          // of buffers
 
  foreach (c; r) {
    write(c);
  }
 
  writefln("r[3]=%s", r[3]);
 
  foreach (block; r.fastiterator) {
    write(block);
  }
 
  auto r3 = r[1..3]; // O(1)
 
  r3.rebalance();    // for faster access
 
  r3[2] = 'c';       // will not change r ! O(log n) in 2-3 trees,
                     // O(c*n) in lists, but with very small c (~ 0.01)
 
  auto r4 = r ~ r3;  // O(1)
 
  auto r5 = r.remove(4, 9); // O(1)
 
 Common usages:
   * text editors
   * servers: appending on receiving, and appending when creating respone
   * creating big arrays step by step without knowing its size
 
 I will first implement simple list based buffers (with small overheads),
 and then implement second version based on 2-3 trees probably (for very
 large arrays, > 10MB).
 
 I really need this data structures, so it will probably take not so long to
 do all this things. :)
 
 [1] http://www.sgi.com/tech/stl/Rope.html
 [2] http://www.hpl.hp.com/personal/Hans_Boehm/gc/gc_source/cordh.txt
 
 -- 
 Witold Baryluk

Jul 27 2007

Witold Baryluk <baryluk smp.if.uj.edu.pl> writes:

Tristam MacDonald wrote:

 How do these perform in a multi-threaded scenario? At a glance, I would
 guess these suffer from much the same problems as COW.

I really don't know. But it can reduce amount of copying in COW, because
chunks are smaller than whole string probably.

Jul 27 2007

BCS <ao pathlink.com> writes:

Reply to Tristam,

 How do these perform in a multi-threaded scenario? At a glance, I
 would guess these suffer from much the same problems as COW.
 
 Witold Baryluk Wrote:
 

If I had to guess... It looks like it never modifies data, just makes
references 
to it: Copy reference, replace reference on write. As long as you can get 
the memory back that should be fast (and lock free to boot).

Jul 27 2007

BCS <ao pathlink.com> writes:

Reply to Downs,

 Here's something interesting I noticed while writing serialization
 code.
 If you're in a situation where you have to repeatedly append small to
 medium sized chunks of string
 to a buffer, and the computation of those chunks is relatively cheap,
 it might be faster (and use
 less memory) to do it in two passes: first, determining the size of
 the result string, then
 allocating it completely and filling it in.
 I noticed this while trying to figure out why my serialization code
 for YAWR was so atrociously
 slow. I'm passing the ser method a void delegate(char[]) that it's
 supposed to call with the
 strings it serializes, in order. So normally, serialization would look
 like this:
 char[] buffer; ser(data, (char[] f) { buffer~=f; });
 
 When I figured out that it was the primarily bottleneck that caused
 the delays while saving, I replaced it with the following code
 
 size_t len=0;
 ser(data, (char[] f) { len+=f.length; });
 auto buffer=new char[len]; len=0;
 ser(data, (char[] f) { buffer[len..len+f.length]=f; len+=f.length; }
 To my surprise, this more than doubled the speed of that code. So if
 you have some block of code
 that does lots of repeated string concats, you might want to give this
 a try.
 --downs

Because you actually build the strings then it might be even faster to build 
up a char[][] of them and then string it all together later.

|char[] buffer;
|
|char[][] dmp;
|int at = 5;
|int i = 0;
|ser(data, (char[] f)
|  {
|    if(at >= dmp.length) dmp.length = at+10;
|    dmp[at] = f;
|    at++;
|    i+= str.length;
|  });
|
| // stitch it all together
|buffer.length = i;
|char[] tmp = buffer;
|foreach(char[] str; dmp)
|{
|  tmp[0..str.length] = str[];
|  tmp = [str.length..$];
|}

this only works if ser allocates new memory for each string.

Jul 27 2007

D Programming

C/C++ Programming

Other

digitalmars.D - potential speed improvement for repeated string concatenation