www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - zlib performance

reply "yawniek" <dlang srtnwz.com> writes:
hi,

unpacking files is kinda slow, probably i'm doing something wrong.

below code is about half the speed of gnu zcat on my os x machine.
why?

why do i need to .dup the buffer?
can i get rid of the casts?


the chunk size has only a marginal influence.
https://github.com/yannick/zcatd

import
   std.zlib,
   std.file,
   std.stdio;

void main(string[] args)
{
   auto f = File(args[1], "r");
   auto uncompressor = new UnCompress(HeaderFormat.gzip);

   foreach (ubyte[] buffer; f.byChunk(4096))
   {
           auto uncompressed = cast(immutable(string)) 
uncompressor.uncompress(buffer.dup);
           write(uncompressed);
   }
}
Aug 07 2015
next sibling parent reply Daniel =?UTF-8?B?S296w6Fr?= via Digitalmars-d-learn writes:
On Fri, 07 Aug 2015 07:19:43 +0000
yawniek via Digitalmars-d-learn <digitalmars-d-learn puremagic.com>
wrote:

 hi,
 
 unpacking files is kinda slow, probably i'm doing something wrong.
 
 below code is about half the speed of gnu zcat on my os x machine.
 why?
 
 why do i need to .dup the buffer?
 can i get rid of the casts?
 
 
 the chunk size has only a marginal influence.
 https://github.com/yannick/zcatd
 
 import
    std.zlib,
    std.file,
    std.stdio;
 
 void main(string[] args)
 {
    auto f = File(args[1], "r");
    auto uncompressor = new UnCompress(HeaderFormat.gzip);
 
    foreach (ubyte[] buffer; f.byChunk(4096))
    {
            auto uncompressed = cast(immutable(string)) 
 uncompressor.uncompress(buffer.dup);
            write(uncompressed);
    }
 }
Which compiler and version. There has been some performance problem with IO on OSX, it should be fixed in 2.068 release
Aug 07 2015
parent reply "yawniek" <dlang srtnwz.com> writes:
On Friday, 7 August 2015 at 07:29:15 UTC, Daniel Kozák wrote:
 Which compiler and version. There has been some performance 
 problem with IO on OSX, it should be fixed in 2.068 release
i'm on master. v2.068-devel-8f81ffc also changed file read mode to "rb". i don't understand why the program crashes when i do not do the .dup
Aug 07 2015
parent reply Daniel =?UTF-8?B?S296w6Fr?= <kozzi dlang.cz> writes:
On Fri, 07 Aug 2015 07:36:39 +0000
"yawniek" <dlang srtnwz.com> wrote:

 On Friday, 7 August 2015 at 07:29:15 UTC, Daniel Koz=C3=A1k wrote:
 Which compiler and version. There has been some performance=20
 problem with IO on OSX, it should be fixed in 2.068 release
=20 i'm on master. v2.068-devel-8f81ffc also changed file read mode to "rb". =20 i don't understand why the program crashes when i do not do the=20 .dup
This is weird. I would say it should not crash
Aug 07 2015
next sibling parent reply "yawniek" <dlang srtnwz.com> writes:
On Friday, 7 August 2015 at 07:43:25 UTC, Daniel Kozák wrote:

 i don't understand why the program crashes when i do not do 
 the .dup
This is weird. I would say it should not crash
exactely. but try it yourself. the fastest version i could come up so far is below. std.conv slows it down. going from a 4kb to a 4mb buffer helped. now i'm within 30% of gzcat's performance. import std.zlib, std.file, std.stdio; void main(string[] args) { auto f = File(args[1], "rb"); auto uncompressor = new UnCompress(HeaderFormat.gzip); foreach (ubyte[] buffer; f.byChunk(1024*1024*4)) { auto uncompressed = cast(immutable(string)) uncompressor.uncompress(buffer.dup); write(uncompressed); } }
Aug 07 2015
parent reply "yawniek" <dlang srtnwz.com> writes:
On Friday, 7 August 2015 at 07:48:25 UTC, yawniek wrote:
 On Friday, 7 August 2015 at 07:43:25 UTC, Daniel Kozák wrote:
 the fastest version i could come up so far is below.
 std.conv slows it down.
 going from a 4kb to a 4mb buffer helped. now i'm within 30% of 
 gzcat's performance.
ok maybe not, there is another problem, not everything seems to get flushed, i'm missing output
Aug 07 2015
parent reply Daniel =?UTF-8?B?S296w6Fr?= <kozzi dlang.cz> writes:
On Fri, 07 Aug 2015 08:01:27 +0000
"yawniek" <dlang srtnwz.com> wrote:

 On Friday, 7 August 2015 at 07:48:25 UTC, yawniek wrote:
 On Friday, 7 August 2015 at 07:43:25 UTC, Daniel Koz=C3=A1k wrote:
 the fastest version i could come up so far is below.
 std.conv slows it down.
 going from a 4kb to a 4mb buffer helped. now i'm within 30% of=20
 gzcat's performance.
=20 ok maybe not, there is another problem, not everything seems to=20 get flushed, i'm missing output =20 =20 =20
import=20 std.zlib,=20 std.file, std.stdio, std.conv; void main(string[] args) { auto f =3D File(args[1], "rb"); auto uncompressor =3D new UnCompress(HeaderFormat.gzip); foreach (buffer; f.byChunk(4096)) { auto uncompressed =3D cast(char[])(uncompressor.uncompress(buffer.idup)); write(uncompressed); } write(cast(char[])uncompressor.flush); } this is faster for me than zcat
Aug 07 2015
parent reply "yawniek" <dlang srtnwz.com> writes:
On Friday, 7 August 2015 at 08:05:01 UTC, Daniel Kozák wrote:
 import
   std.zlib,
   std.file,
   std.stdio,
   std.conv;

 void main(string[] args)
 {
   auto f = File(args[1], "rb");
   auto uncompressor = new UnCompress(HeaderFormat.gzip);

   foreach (buffer; f.byChunk(4096))
   {
           auto uncompressed =
   cast(char[])(uncompressor.uncompress(buffer.idup));
   write(uncompressed); }
   write(cast(char[])uncompressor.flush);
 }

 this is faster for me than zcat
not here on os x: d version: 3.06s user 1.17s system 82% cpu 5.156 total gzcat 1.79s user 0.11s system 99% cpu 1.899 total
Aug 07 2015
next sibling parent Daniel =?UTF-8?B?S296w6Fr?= <kozzi dlang.cz> writes:
On Fri, 07 Aug 2015 08:13:01 +0000
"yawniek" <dlang srtnwz.com> wrote:

 On Friday, 7 August 2015 at 08:05:01 UTC, Daniel Koz=C3=A1k wrote:
 import
   std.zlib,
   std.file,
   std.stdio,
   std.conv;

 void main(string[] args)
 {
   auto f =3D File(args[1], "rb");
   auto uncompressor =3D new UnCompress(HeaderFormat.gzip);

   foreach (buffer; f.byChunk(4096))
   {
           auto uncompressed =3D
   cast(char[])(uncompressor.uncompress(buffer.idup));
   write(uncompressed); }
   write(cast(char[])uncompressor.flush);
 }

 this is faster for me than zcat
=20 not here on os x: d version: 3.06s user 1.17s system 82% cpu 5.156 total gzcat 1.79s user 0.11s system 99% cpu 1.899 total
Maybe stil some IO issues. On Linux it is OK. I remmeber a few days ago there has been some discussion about IO improvments fo osx. http://forum.dlang.org/post/mailman.184.1437841312.16005.digitalmars-d pure= magic.com
Aug 07 2015
prev sibling parent reply Daniel =?UTF-8?B?S296w6Fr?= <kozzi dlang.cz> writes:
On Fri, 07 Aug 2015 08:13:01 +0000
"yawniek" <dlang srtnwz.com> wrote:

 On Friday, 7 August 2015 at 08:05:01 UTC, Daniel Koz=C3=A1k wrote:
 import
   std.zlib,
   std.file,
   std.stdio,
   std.conv;

 void main(string[] args)
 {
   auto f =3D File(args[1], "rb");
   auto uncompressor =3D new UnCompress(HeaderFormat.gzip);

   foreach (buffer; f.byChunk(4096))
   {
           auto uncompressed =3D
   cast(char[])(uncompressor.uncompress(buffer.idup));
   write(uncompressed); }
   write(cast(char[])uncompressor.flush);
 }

 this is faster for me than zcat
=20 not here on os x: d version: 3.06s user 1.17s system 82% cpu 5.156 total gzcat 1.79s user 0.11s system 99% cpu 1.899 total
can you try it with ldc? ldc[2] -O -release -boundscheck=3Doff -singleobj app.d
Aug 07 2015
parent reply "yawniek" <dlang srtnwz.com> writes:
On Friday, 7 August 2015 at 08:24:11 UTC, Daniel Kozák wrote:

 can you try it with ldc?

 ldc[2] -O -release -boundscheck=off -singleobj  app.d
ldc 0.15.2 beta2 2.86s user 0.55s system 77% cpu 4.392 total v2.068-devel-8f81ffc 2.86s user 0.67s system 78% cpu 4.476 total v2.067 2.88s user 0.67s system 78% cpu 4.529 total (different file, half the size of the one above:) archlinux, virtualbox vm, DMD64 D Compiler v2.067 real 0m2.079s user 0m1.193s sys 0m0.637s zcat: real 0m3.023s user 0m0.320s sys 0m2.440s is there a way to get rid of the flush in the end so everything happens in one loop? its a bit inconvenient when i have another subloop that does work
Aug 07 2015
parent reply Daniel =?UTF-8?B?S296w6Fr?= <kozzi dlang.cz> writes:
On Fri, 07 Aug 2015 08:42:45 +0000
"yawniek" <dlang srtnwz.com> wrote:

 On Friday, 7 August 2015 at 08:24:11 UTC, Daniel Koz=C3=A1k wrote:
=20
 can you try it with ldc?

 ldc[2] -O -release -boundscheck=3Doff -singleobj  app.d
=20 =20 ldc 0.15.2 beta2 2.86s user 0.55s system 77% cpu 4.392 total =20 v2.068-devel-8f81ffc 2.86s user 0.67s system 78% cpu 4.476 total =20 v2.067 2.88s user 0.67s system 78% cpu 4.529 total =20 (different file, half the size of the one above:) archlinux, virtualbox vm, DMD64 D Compiler v2.067 real 0m2.079s user 0m1.193s sys 0m0.637s =20 zcat: real 0m3.023s user 0m0.320s sys 0m2.440s =20 =20 is there a way to get rid of the flush in the end so everything=20 happens in one loop? its a bit inconvenient when i have another=20 subloop that does work =20
I am not sure but I do not tkink so, it is currently possible. Btw. If you want to remove [i]dup. you can use this code: http://dpaste.dzfl.pl/f52c82935bb5
Aug 07 2015
parent reply "yawniek" <dlang srtnwz.com> writes:
On Friday, 7 August 2015 at 08:50:11 UTC, Daniel Kozák wrote:
 ldc[2] -O -release -boundscheck=off -singleobj  app.d
ldc 0.15.2 beta2 2.86s user 0.55s system 77% cpu 4.392 total v2.068-devel-8f81ffc 2.86s user 0.67s system 78% cpu 4.476 total v2.067 2.88s user 0.67s system 78% cpu 4.529 total
i can now reproduce the results and indeed, its faster than zcat: on a c4.xlarge aws instance running archlinux and dmd v2.067 same file as above on my macbook. best run: 2.72s user 0.39s system 99% cpu 3.134 total worst run: 3.47s user 0.46s system 99% cpu 3.970 total zcat: best: 4.45s user 0.28s system 99% cpu 4.764 total worst: 4.99s user 0.57s system 99% cpu 5.568 total so i guess on os x there is still something to be optimized
Aug 07 2015
parent reply "Daniel Kozak" <kozzi11 gmail.com> writes:
On Friday, 7 August 2015 at 09:12:32 UTC, yawniek wrote:
 On Friday, 7 August 2015 at 08:50:11 UTC, Daniel Kozák wrote:
 ldc[2] -O -release -boundscheck=off -singleobj  app.d
ldc 0.15.2 beta2 2.86s user 0.55s system 77% cpu 4.392 total v2.068-devel-8f81ffc 2.86s user 0.67s system 78% cpu 4.476 total v2.067 2.88s user 0.67s system 78% cpu 4.529 total
i can now reproduce the results and indeed, its faster than zcat: on a c4.xlarge aws instance running archlinux and dmd v2.067 same file as above on my macbook. best run: 2.72s user 0.39s system 99% cpu 3.134 total worst run: 3.47s user 0.46s system 99% cpu 3.970 total zcat: best: 4.45s user 0.28s system 99% cpu 4.764 total worst: 4.99s user 0.57s system 99% cpu 5.568 total so i guess on os x there is still something to be optimized
Can you try it without write operation (comment out all write)? And than try it without uncompression? // without compression: void main(string[] args) { auto f = File(args[1], "r"); foreach (buffer; f.byChunk(4096)) { write(cast(char[])buffer); } } // without write: void main(string[] args) { auto f = File(args[1], "r"); auto uncompressor = new UnCompress(HeaderFormat.gzip); foreach (buffer; f.byChunk(4096)) { auto uncompressed = cast(char[])(uncompressor.uncompress(buffer)); } uncompressor.flush; }
Aug 07 2015
parent reply "yawniek" <dlang srtnwz.com> writes:
On Friday, 7 August 2015 at 11:45:00 UTC, Daniel Kozak wrote:
 On Friday, 7 August 2015 at 09:12:32 UTC, yawniek wrote:
 [...]
Can you try it without write operation (comment out all write)? And than try it without uncompression? // without compression: void main(string[] args) { auto f = File(args[1], "r"); foreach (buffer; f.byChunk(4096)) { write(cast(char[])buffer); } }
0.03s user 0.09s system 11% cpu 1.046 total
 // without write:

 void main(string[] args)
 {
   auto f = File(args[1], "r");
   auto uncompressor = new UnCompress(HeaderFormat.gzip);

   foreach (buffer; f.byChunk(4096))
   {
           auto uncompressed = 
 cast(char[])(uncompressor.uncompress(buffer));
   }
   uncompressor.flush;
 }
2.82s user 0.05s system 99% cpu 2.873 total
Aug 07 2015
parent Daniel =?UTF-8?B?S296w6Fr?= <kozzi dlang.cz> writes:
On Fri, 07 Aug 2015 12:29:26 +0000
"yawniek" <dlang srtnwz.com> wrote:

 On Friday, 7 August 2015 at 11:45:00 UTC, Daniel Kozak wrote:
 On Friday, 7 August 2015 at 09:12:32 UTC, yawniek wrote:
 [...]
Can you try it without write operation (comment out all write)? And than try it without uncompression? // without compression: void main(string[] args) { auto f = File(args[1], "r"); foreach (buffer; f.byChunk(4096)) { write(cast(char[])buffer); } }
0.03s user 0.09s system 11% cpu 1.046 total
So I/O seems to be OK
 
 // without write:

 void main(string[] args)
 {
   auto f = File(args[1], "r");
   auto uncompressor = new UnCompress(HeaderFormat.gzip);

   foreach (buffer; f.byChunk(4096))
   {
           auto uncompressed = 
 cast(char[])(uncompressor.uncompress(buffer));
   }
   uncompressor.flush;
 }
2.82s user 0.05s system 99% cpu 2.873 total
So maybe it is a zlib problem on osx?
Aug 07 2015
prev sibling parent Daniel =?UTF-8?B?S296w6Fr?= <kozzi dlang.cz> writes:
On Fri, 7 Aug 2015 09:43:25 +0200
Daniel Koz=C3=A1k <kozzi dlang.cz> wrote:

=20
 On Fri, 07 Aug 2015 07:36:39 +0000
 "yawniek" <dlang srtnwz.com> wrote:
=20
 On Friday, 7 August 2015 at 07:29:15 UTC, Daniel Koz=C3=A1k wrote:
 Which compiler and version. There has been some performance=20
 problem with IO on OSX, it should be fixed in 2.068 release
=20 i'm on master. v2.068-devel-8f81ffc also changed file read mode to "rb". =20 i don't understand why the program crashes when i do not do the=20 .dup
This is weird. I would say it should not crash =20
Ok I see it is not weird because Uncompressor class probably has slice to buffer
Aug 07 2015
prev sibling parent Daniel =?UTF-8?B?S296w6Fr?= <kozzi dlang.cz> writes:
On Fri, 07 Aug 2015 07:19:43 +0000
"yawniek" <dlang srtnwz.com> wrote:

 hi,
 
 unpacking files is kinda slow, probably i'm doing something wrong.
 
 below code is about half the speed of gnu zcat on my os x machine.
 why?
 
 why do i need to .dup the buffer?
It depends. In your case you don't need to. byChunk() reuse buffer which means, after each call same buffer is use, so all previous data are gone.
 can i get rid of the casts?
 
Yes, you can use std.conv.to import std.zlib, std.file, std.stdio, std.conv; void main(string[] args) { auto f = File(args[1], "rb"); auto uncompressor = new UnCompress(HeaderFormat.gzip); foreach (buffer; f.byChunk(4096)) { auto uncompressed = to!(char[])(uncompressor.uncompress(buffer)); write(uncompressed); } }
Aug 07 2015