digitalmars.D.learn - D memory consumption/runtime speed problem
- sybrandy (90/90) Jan 13 2010 Hello,
- BCS (3/25) Jan 13 2010 My guess is that you are getting long GC pauses. D's GC is rather unadva...
- bearophile (8/13) Jan 13 2010 My suggestions:
- Daniel Keep (24/35) Jan 13 2010 I haven't verified this, but I'd be *deeply* suspicious of encodeNumber.
- Steven Schveighoffer (7/43) Jan 14 2010 He's not using threads, you can simply do this in encode number:
-
sybrandy
(107/107)
Jan 14 2010
- bearophile (5/7) Jan 14 2010 Use "private const" in D1 and "private enum" in D2, there's no need for ...
Hello, I've been writing a bit of compression code and I noticed some strange behavior that's driving me a bit batty. I don't know if it's a bug with D or something I did. All I know is I can't figure it out. Below is the simplified version of the code as a single file. It takes two parameters. The first is a file to "compress" and the second is the number of times to run the benchmark. E.g. bugtest foo.txt 2 Now, if I set the second parameter to 1, it runs decently fast. 26 seconds on my work laptop for a half-sized enwiki8 from the Hutter challenge. If I set it to 2, then it takes about 142 seconds. In both cases a lot of memory is used and I'm not really sure why. Also, after it prints out the results, it takes several seconds for the program to exit. Am I doing something wrong? I've tried every trick that I could find by reading the documentation. Btw: The last time I tried this was with the latest version of D released at the beginning of the month. __CODE__ import std.conv; import std.stdio; import std.stream; import std.date; import std.mmfile; import std.array; string filename = "enwik8_small"; private immutable uint ONE_BYTE_VAL = (1 << 6) - 1; private immutable uint TWO_BYTE_VAL = (1 << 14) - 1; private immutable uint THREE_BYTE_VAL = (1 << 22) - 1; private immutable uint FOUR_BYTE_VAL = (1 << 30) - 1; private immutable uint ONE_BYTE_MASK = (0 << 6); private immutable uint TWO_BYTE_MASK = (1 << 6); private immutable uint THREE_BYTE_MASK = (2 << 6); private immutable uint FOUR_BYTE_MASK = (3 << 6); ubyte[] encodeNumber(in uint count) { if (count <= ONE_BYTE_VAL) { return [cast(ubyte)(ONE_BYTE_MASK | count)]; } else if (count <= TWO_BYTE_VAL) { return [cast(ubyte)(TWO_BYTE_MASK | (count >>> 8))] ~ [cast(ubyte)(count & 0x000000ff)]; } else if (count <= THREE_BYTE_VAL) { return [cast(ubyte)(THREE_BYTE_MASK | (count >>> 16))] ~ [cast(ubyte)((count >>> 8) & 0x000000ff)] ~ [cast(ubyte)(count & 0x000000ff)]; } else if (count <= FOUR_BYTE_VAL) { return [cast(ubyte)(FOUR_BYTE_MASK | (count >>> 24))] ~ [cast(ubyte)((count >>> 16) & 0x000000ff)] ~ [cast(ubyte)((count >>> 8) & 0x000000ff)] ~ [cast(ubyte)(count & 0x000000ff)]; } else { throw new Exception("Invalid count provided!"); } } void encode(in ubyte[] buff, out ubyte[] output) { ubyte currByte = buff[0]; uint count = 0; auto appOutput = appender(&output); foreach (byteVal; buff) { if (byteVal != currByte && count > 0) { appOutput.put(encodeNumber(count)); appOutput.put(currByte); currByte = byteVal; count = 0; } count++; } appOutput.put(encodeNumber(count)); appOutput.put(currByte); } void benchCode() { MmFile buff = new MmFile(filename); ubyte[] encodedBytes; encode(cast(ubyte[])buff[], encodedBytes); } void main(string[] args) { filename = args[1]; writeln("Benchmark time: ", benchmark!(benchCode)(to!(uint)(args[2]))); }
Jan 13 2010
Hello sybrandy,Hello, I've been writing a bit of compression code and I noticed some strange behavior that's driving me a bit batty. I don't know if it's a bug with D or something I did. All I know is I can't figure it out. Below is the simplified version of the code as a single file. It takes two parameters. The first is a file to "compress" and the second is the number of times to run the benchmark. E.g. bugtest foo.txt 2 Now, if I set the second parameter to 1, it runs decently fast. 26 seconds on my work laptop for a half-sized enwiki8 from the Hutter challenge. If I set it to 2, then it takes about 142 seconds. In both cases a lot of memory is used and I'm not really sure why. Also, after it prints out the results, it takes several seconds for the program to exit. Am I doing something wrong? I've tried every trick that I could find by reading the documentation. Btw: The last time I tried this was with the latest version of D released at the beginning of the month.My guess is that you are getting long GC pauses. D's GC is rather unadvanced. IIRC there are ways to can make it do better but I don't know what they are.
Jan 13 2010
sybrandy:Now, if I set the second parameter to 1, it runs decently fast. 26 seconds on my work laptop for a half-sized enwiki8 from the Hutter challenge. If I set it to 2, then it takes about 142 seconds. In both cases a lot of memory is used and I'm not really sure why. Also, after it prints out the results, it takes several seconds for the program to exit.My suggestions: - Stop using array joining, those create many small arrays that the GC has to manage, and the current D GC is not efficient at all. There are many ways to avoid array joinings and avoid most runtime allocations. - If you can, at the end of the program you can add std.c.stdlib.exit(0);, this kills the GC mid-way in a faster way (but probably destructors are not called, so be careful). - Use the latest daily build of the LDC compiler with very good command line arguments, and if needed link-time optimizaiton activated too :-) And then please tell us how much time it takes to run :-) Bye, bearophile
Jan 13 2010
sybrandy wrote:Hello, I've been writing a bit of compression code and I noticed some strange behavior that's driving me a bit batty. I don't know if it's a bug with D or something I did. All I know is I can't figure it out. ... Am I doing something wrong? I've tried every trick that I could find by reading the documentation. Btw: The last time I tried this was with the latest version of D released at the beginning of the month.I haven't verified this, but I'd be *deeply* suspicious of encodeNumber. I don't usually use array literals but, if I remember correctly, every time it is called you're performing a heap allocation. Even worse, those concatentations might be performing separate allocations, too. You could eliminate the overhead by using a passed-in buffer design like so: ubyte[] encodeNumber(in uint count, ref ubyte[4] buffer) { if (count <= ONE_BYTE_VAL) { buffer[0] = cast(ubyte)(ONE_BYTE_MASK | count); return buffer[0..1]; } // ... } Then, in the calling function: { ubyte[4] temp; foreach( ... ) { appOutput.put(encodeNumber(count, temp)); } } See if that helps.
Jan 13 2010
On Thu, 14 Jan 2010 01:00:31 -0500, Daniel Keep <daniel.keep.lists gmail.com> wrote:sybrandy wrote:He's not using threads, you can simply do this in encode number: static ubyte[4] buffer; In fact, that might even work in D2 *with* threads since buffer would be thread local :) -SteveHello, I've been writing a bit of compression code and I noticed some strange behavior that's driving me a bit batty. I don't know if it's a bug with D or something I did. All I know is I can't figure it out. ... Am I doing something wrong? I've tried every trick that I could find by reading the documentation. Btw: The last time I tried this was with the latest version of D released at the beginning of the month.I haven't verified this, but I'd be *deeply* suspicious of encodeNumber. I don't usually use array literals but, if I remember correctly, every time it is called you're performing a heap allocation. Even worse, those concatentations might be performing separate allocations, too. You could eliminate the overhead by using a passed-in buffer design like so: ubyte[] encodeNumber(in uint count, ref ubyte[4] buffer) { if (count <= ONE_BYTE_VAL) { buffer[0] = cast(ubyte)(ONE_BYTE_MASK | count); return buffer[0..1]; } // ... } Then, in the calling function: { ubyte[4] temp; foreach( ... ) { appOutput.put(encodeNumber(count, temp)); } } See if that helps.
Jan 14 2010
<snip> Using a small buffer as suggested by Daniel Keep and Steven Schveighoffer significantly improved performance. Now down to about 5 seconds. I ended up using the static array buffer since the encodeNumber function will be in its own file in my resulting program, so I can keep it private. Doing something similar to the output buffer had a similar effect and it's now processing everything in less than 2 seconds! I didn't realize that all those little arrays were created. Perhaps this is something that should be detailed in the arrays documentation or, perhaps even better, an optimization guide? I honestly thought the GC would help keep memory in check as I didn't want to assume a worst-case scenario, which with my RLE implementation is 2 * input buffer size, but I guess I have to. Well, perhaps my original code will be of use if Walter and the gang decide to try to revamp the GC and want some "bad" code to test it with. Thanks all! Btw: below is the updated code import std.conv; import std.stdio; import std.stream; import std.date; import std.mmfile; import std.array; string filename = "enwik8_small"; private immutable uint ONE_BYTE_VAL = (1 << 6) - 1; private immutable uint TWO_BYTE_VAL = (1 << 14) - 1; private immutable uint THREE_BYTE_VAL = (1 << 22) - 1; private immutable uint FOUR_BYTE_VAL = (1 << 30) - 1; private immutable uint ONE_BYTE_MASK = (0 << 6); private immutable uint TWO_BYTE_MASK = (1 << 6); private immutable uint THREE_BYTE_MASK = (2 << 6); private immutable uint FOUR_BYTE_MASK = (3 << 6); private static ubyte[4] encodeBuff; ubyte[] encodeNumber(in uint count) { if (count <= ONE_BYTE_VAL) { encodeBuff[0] = cast(ubyte)(ONE_BYTE_MASK | count); return encodeBuff[0..1]; } else if (count <= TWO_BYTE_VAL) { encodeBuff[0] = cast(ubyte)(TWO_BYTE_MASK | (count >>> 8)); encodeBuff[1] = cast(ubyte)(count & 0x000000ff); return encodeBuff[0..2]; } else if (count <= THREE_BYTE_VAL) { encodeBuff[0] = cast(ubyte)(THREE_BYTE_MASK | (count >>> 16)); encodeBuff[1] = cast(ubyte)((count >>> 8) & 0x000000ff); encodeBuff[2] = cast(ubyte)(count & 0x000000ff); return encodeBuff[0..3]; } else if (count <= FOUR_BYTE_VAL) { encodeBuff[0] = cast(ubyte)(FOUR_BYTE_MASK | (count >>> 24)); encodeBuff[1] = cast(ubyte)((count >>> 16) & 0x000000ff); encodeBuff[2] = cast(ubyte)((count >>> 8) & 0x000000ff); encodeBuff[3] = cast(ubyte)(count & 0x000000ff); return encodeBuff[0..4]; } else { throw new Exception("Invalid count provided!"); } } void encode(in ubyte[] buff, ref ubyte[] output) { ubyte currByte = buff[0]; uint count = 0; uint outIdx = 0; ubyte[] temp; foreach (byteVal; buff) { if (byteVal != currByte && count > 0) { temp = encodeNumber(count); foreach (t; temp) { output[outIdx++] = t; } output[outIdx++] = currByte; currByte = byteVal; count = 0; } count++; } temp = encodeNumber(count); foreach (t; temp) { output[outIdx++] = t; } output[outIdx++] = currByte; } void benchCode() { MmFile buff = new MmFile(filename); ubyte[] encodedBytes; encodedBytes.length = cast(size_t)buff.length * 2; encode(cast(ubyte[])buff[], encodedBytes); } void main(string[] args) { filename = args[1]; writeln("Benchmark time: ", benchmark!(benchCode)(to!(uint)(args[2]))); }
Jan 14 2010
sybrandy:private immutable uint ONE_BYTE_VAL = (1 << 6) - 1; private immutable uint TWO_BYTE_VAL = (1 << 14) - 1;Use "private const" in D1 and "private enum" in D2, there's no need for an immutable here. In your code there are now no useless memory allocations, so the exit(0) trick is not needed. Bye, bearophile
Jan 14 2010