digitalmars.D.learn - zlib performance

yawniek (23/23) Aug 07 2015 hi,

Daniel =?UTF-8?B?S296w6Fr?= via Digitalmars-d-learn (5/36) Aug 07 2015 On Fri, 07 Aug 2015 07:19:43 +0000

yawniek (5/7) Aug 07 2015 i'm on master. v2.068-devel-8f81ffc

Daniel =?UTF-8?B?S296w6Fr?= (3/12) Aug 07 2015 This is weird. I would say it should not crash

yawniek (21/24) Aug 07 2015 exactely. but try it yourself.

yawniek (3/8) Aug 07 2015 ok maybe not, there is another problem, not everything seems to

Daniel =?UTF-8?B?S296w6Fr?= (19/31) Aug 07 2015 import=20

yawniek (4/21) Aug 07 2015 not here on os x:

Daniel =?UTF-8?B?S296w6Fr?= (6/31) Aug 07 2015 Maybe stil some IO issues. On Linux it is OK. I remmeber a few days ago
Daniel =?UTF-8?B?S296w6Fr?= (4/29) Aug 07 2015 can you try it with ldc?

yawniek (19/21) Aug 07 2015 ldc 0.15.2 beta2

Daniel =?UTF-8?B?S296w6Fr?= (5/37) Aug 07 2015 I am not sure but I do not tkink so, it is currently possible.

yawniek (10/22) Aug 07 2015 i can now reproduce the results and indeed, its faster than zcat:

Daniel Kozak (24/47) Aug 07 2015 Can you try it without write operation (comment out all write)?

yawniek (3/28) Aug 07 2015 2.82s user 0.05s system 99% cpu 2.873 total

Daniel =?UTF-8?B?S296w6Fr?= (5/43) Aug 07 2015 So I/O seems to be OK

Daniel =?UTF-8?B?S296w6Fr?= (4/19) Aug 07 2015 Ok I see it is not weird because Uncompressor class probably has slice

Daniel =?UTF-8?B?S296w6Fr?= (21/31) Aug 07 2015 It depends. In your case you don't need to.

"yawniek" <dlang srtnwz.com> writes:

hi,

unpacking files is kinda slow, probably i'm doing something wrong.

below code is about half the speed of gnu zcat on my os x machine.
why?

why do i need to .dup the buffer?
can i get rid of the casts?


the chunk size has only a marginal influence.
https://github.com/yannick/zcatd

import
   std.zlib,
   std.file,
   std.stdio;

void main(string[] args)
{
   auto f = File(args[1], "r");
   auto uncompressor = new UnCompress(HeaderFormat.gzip);

   foreach (ubyte[] buffer; f.byChunk(4096))
   {
           auto uncompressed = cast(immutable(string)) 
uncompressor.uncompress(buffer.dup);
           write(uncompressed);
   }
}

Aug 07 2015

Daniel =?UTF-8?B?S296w6Fr?= via Digitalmars-d-learn writes:

On Fri, 07 Aug 2015 07:19:43 +0000
yawniek via Digitalmars-d-learn <digitalmars-d-learn puremagic.com>
wrote:

 hi,
 
 unpacking files is kinda slow, probably i'm doing something wrong.
 
 below code is about half the speed of gnu zcat on my os x machine.
 why?
 
 why do i need to .dup the buffer?
 can i get rid of the casts?
 
 
 the chunk size has only a marginal influence.
 https://github.com/yannick/zcatd
 
 import
    std.zlib,
    std.file,
    std.stdio;
 
 void main(string[] args)
 {
    auto f = File(args[1], "r");
    auto uncompressor = new UnCompress(HeaderFormat.gzip);
 
    foreach (ubyte[] buffer; f.byChunk(4096))
    {
            auto uncompressed = cast(immutable(string)) 
 uncompressor.uncompress(buffer.dup);
            write(uncompressed);
    }
 }

Which compiler and version. There has been some performance problem
with IO on OSX, it should be fixed in 2.068 release

Aug 07 2015

"yawniek" <dlang srtnwz.com> writes:

On Friday, 7 August 2015 at 07:29:15 UTC, Daniel Kozák wrote:
 Which compiler and version. There has been some performance 
 problem with IO on OSX, it should be fixed in 2.068 release

i'm on master. v2.068-devel-8f81ffc
also changed file read mode to "rb".

i don't understand why the program crashes when i do not do the 
.dup

Aug 07 2015

Daniel =?UTF-8?B?S296w6Fr?= <kozzi dlang.cz> writes:

On Fri, 07 Aug 2015 07:36:39 +0000
"yawniek" <dlang srtnwz.com> wrote:

 On Friday, 7 August 2015 at 07:29:15 UTC, Daniel Koz=C3=A1k wrote:
 Which compiler and version. There has been some performance=20
 problem with IO on OSX, it should be fixed in 2.068 release

=20
 i'm on master. v2.068-devel-8f81ffc
 also changed file read mode to "rb".
=20
 i don't understand why the program crashes when i do not do the=20
 .dup

This is weird. I would say it should not crash

Aug 07 2015

"yawniek" <dlang srtnwz.com> writes:

On Friday, 7 August 2015 at 07:43:25 UTC, Daniel Kozák wrote:

 i don't understand why the program crashes when i do not do 
 the .dup

 This is weird. I would say it should not crash

exactely. but try it yourself.

the fastest version i could come up so far is below.
std.conv slows it down.
going from a 4kb to a 4mb buffer helped. now i'm within 30% of 
gzcat's performance.

import
   std.zlib,
   std.file,
   std.stdio;

void main(string[] args)
{
   auto f = File(args[1], "rb");
   auto uncompressor = new UnCompress(HeaderFormat.gzip);

   foreach (ubyte[] buffer; f.byChunk(1024*1024*4))
   {
           auto uncompressed = cast(immutable(string)) 
uncompressor.uncompress(buffer.dup);
           write(uncompressed);
   }
}

Aug 07 2015

"yawniek" <dlang srtnwz.com> writes:

On Friday, 7 August 2015 at 07:48:25 UTC, yawniek wrote:
 On Friday, 7 August 2015 at 07:43:25 UTC, Daniel Kozák wrote:
 the fastest version i could come up so far is below.
 std.conv slows it down.
 going from a 4kb to a 4mb buffer helped. now i'm within 30% of 
 gzcat's performance.

ok maybe not, there is another problem, not everything seems to 
get flushed, i'm missing output

Aug 07 2015

Daniel =?UTF-8?B?S296w6Fr?= <kozzi dlang.cz> writes:

On Fri, 07 Aug 2015 08:01:27 +0000
"yawniek" <dlang srtnwz.com> wrote:

 On Friday, 7 August 2015 at 07:48:25 UTC, yawniek wrote:
 On Friday, 7 August 2015 at 07:43:25 UTC, Daniel Koz=C3=A1k wrote:
 the fastest version i could come up so far is below.
 std.conv slows it down.
 going from a 4kb to a 4mb buffer helped. now i'm within 30% of=20
 gzcat's performance.

=20
 ok maybe not, there is another problem, not everything seems to=20
 get flushed, i'm missing output
=20
=20
=20

import=20
  std.zlib,=20
  std.file,
  std.stdio,
  std.conv;

void main(string[] args)
{
  auto f =3D File(args[1], "rb");
  auto uncompressor =3D new UnCompress(HeaderFormat.gzip);

  foreach (buffer; f.byChunk(4096))
  {
          auto uncompressed =3D
  cast(char[])(uncompressor.uncompress(buffer.idup));
  write(uncompressed); }
  write(cast(char[])uncompressor.flush);
}

this is faster for me than zcat

Aug 07 2015

"yawniek" <dlang srtnwz.com> writes:

On Friday, 7 August 2015 at 08:05:01 UTC, Daniel Kozák wrote:
 import
   std.zlib,
   std.file,
   std.stdio,
   std.conv;

 void main(string[] args)
 {
   auto f = File(args[1], "rb");
   auto uncompressor = new UnCompress(HeaderFormat.gzip);

   foreach (buffer; f.byChunk(4096))
   {
           auto uncompressed =
   cast(char[])(uncompressor.uncompress(buffer.idup));
   write(uncompressed); }
   write(cast(char[])uncompressor.flush);
 }

 this is faster for me than zcat

not here on os x:
d version:  3.06s user 1.17s system 82% cpu 5.156 total
gzcat   1.79s user 0.11s system 99% cpu 1.899 total

Aug 07 2015

Daniel =?UTF-8?B?S296w6Fr?= <kozzi dlang.cz> writes:

On Fri, 07 Aug 2015 08:13:01 +0000
"yawniek" <dlang srtnwz.com> wrote:

 On Friday, 7 August 2015 at 08:05:01 UTC, Daniel Koz=C3=A1k wrote:
 import
   std.zlib,
   std.file,
   std.stdio,
   std.conv;

 void main(string[] args)
 {
   auto f =3D File(args[1], "rb");
   auto uncompressor =3D new UnCompress(HeaderFormat.gzip);

   foreach (buffer; f.byChunk(4096))
   {
           auto uncompressed =3D
   cast(char[])(uncompressor.uncompress(buffer.idup));
   write(uncompressed); }
   write(cast(char[])uncompressor.flush);
 }

 this is faster for me than zcat

=20
 not here on os x:
 d version:  3.06s user 1.17s system 82% cpu 5.156 total
 gzcat   1.79s user 0.11s system 99% cpu 1.899 total

Maybe stil some IO issues. On Linux it is OK. I remmeber a few days ago
there has been some discussion about IO improvments fo osx.

http://forum.dlang.org/post/mailman.184.1437841312.16005.digitalmars-d pure=
magic.com

Aug 07 2015

Daniel =?UTF-8?B?S296w6Fr?= <kozzi dlang.cz> writes:

On Fri, 07 Aug 2015 08:13:01 +0000
"yawniek" <dlang srtnwz.com> wrote:

 On Friday, 7 August 2015 at 08:05:01 UTC, Daniel Koz=C3=A1k wrote:
 import
   std.zlib,
   std.file,
   std.stdio,
   std.conv;

 void main(string[] args)
 {
   auto f =3D File(args[1], "rb");
   auto uncompressor =3D new UnCompress(HeaderFormat.gzip);

   foreach (buffer; f.byChunk(4096))
   {
           auto uncompressed =3D
   cast(char[])(uncompressor.uncompress(buffer.idup));
   write(uncompressed); }
   write(cast(char[])uncompressor.flush);
 }

 this is faster for me than zcat

=20
 not here on os x:
 d version:  3.06s user 1.17s system 82% cpu 5.156 total
 gzcat   1.79s user 0.11s system 99% cpu 1.899 total

can you try it with ldc?

ldc[2] -O -release -boundscheck=3Doff -singleobj  app.d

Aug 07 2015

"yawniek" <dlang srtnwz.com> writes:

On Friday, 7 August 2015 at 08:24:11 UTC, Daniel Kozák wrote:

 can you try it with ldc?

 ldc[2] -O -release -boundscheck=off -singleobj  app.d


ldc 0.15.2 beta2
2.86s user 0.55s system 77% cpu 4.392 total

v2.068-devel-8f81ffc
2.86s user 0.67s system 78% cpu 4.476 total

v2.067
2.88s user 0.67s system 78% cpu 4.529 total

(different file, half the size of the one above:)
archlinux, virtualbox vm, DMD64 D Compiler v2.067
real	0m2.079s
user	0m1.193s
sys	0m0.637s

zcat:
real	0m3.023s
user	0m0.320s
sys	0m2.440s


is there a way to get rid of the flush in the end so everything 
happens in one loop? its a bit inconvenient when i have another 
subloop that does work

Aug 07 2015

Daniel =?UTF-8?B?S296w6Fr?= <kozzi dlang.cz> writes:

On Fri, 07 Aug 2015 08:42:45 +0000
"yawniek" <dlang srtnwz.com> wrote:

 On Friday, 7 August 2015 at 08:24:11 UTC, Daniel Koz=C3=A1k wrote:
=20
 can you try it with ldc?

 ldc[2] -O -release -boundscheck=3Doff -singleobj  app.d

=20
=20
 ldc 0.15.2 beta2
 2.86s user 0.55s system 77% cpu 4.392 total
=20
 v2.068-devel-8f81ffc
 2.86s user 0.67s system 78% cpu 4.476 total
=20
 v2.067
 2.88s user 0.67s system 78% cpu 4.529 total
=20
 (different file, half the size of the one above:)
 archlinux, virtualbox vm, DMD64 D Compiler v2.067
 real	0m2.079s
 user	0m1.193s
 sys	0m0.637s
=20
 zcat:
 real	0m3.023s
 user	0m0.320s
 sys	0m2.440s
=20
=20
 is there a way to get rid of the flush in the end so everything=20
 happens in one loop? its a bit inconvenient when i have another=20
 subloop that does work
=20

I am not sure but I do not tkink so, it is currently possible.

Btw. If you want to remove [i]dup. you can use this code:

http://dpaste.dzfl.pl/f52c82935bb5

Aug 07 2015

"yawniek" <dlang srtnwz.com> writes:

On Friday, 7 August 2015 at 08:50:11 UTC, Daniel Kozák wrote:
 ldc[2] -O -release -boundscheck=off -singleobj  app.d

 
 
 ldc 0.15.2 beta2
 2.86s user 0.55s system 77% cpu 4.392 total
 
 v2.068-devel-8f81ffc
 2.86s user 0.67s system 78% cpu 4.476 total
 
 v2.067
 2.88s user 0.67s system 78% cpu 4.529 total
 



i can now reproduce the results and indeed, its faster than zcat:
on a c4.xlarge aws instance running archlinux and dmd v2.067
same file as above on my macbook.

best run: 2.72s user 0.39s system 99% cpu 3.134 total
worst run: 3.47s user 0.46s system 99% cpu 3.970 total

zcat:
best: 4.45s user 0.28s system 99% cpu 4.764 total
worst: 4.99s user 0.57s system 99% cpu 5.568 total


so i guess on os x there is still something to be optimized

Aug 07 2015

"Daniel Kozak" <kozzi11 gmail.com> writes:

On Friday, 7 August 2015 at 09:12:32 UTC, yawniek wrote:
 On Friday, 7 August 2015 at 08:50:11 UTC, Daniel Kozák wrote:
 ldc[2] -O -release -boundscheck=off -singleobj  app.d

 
 
 ldc 0.15.2 beta2
 2.86s user 0.55s system 77% cpu 4.392 total
 
 v2.068-devel-8f81ffc
 2.86s user 0.67s system 78% cpu 4.476 total
 
 v2.067
 2.88s user 0.67s system 78% cpu 4.529 total
 



 i can now reproduce the results and indeed, its faster than 
 zcat:
 on a c4.xlarge aws instance running archlinux and dmd v2.067
 same file as above on my macbook.

 best run: 2.72s user 0.39s system 99% cpu 3.134 total
 worst run: 3.47s user 0.46s system 99% cpu 3.970 total

 zcat:
 best: 4.45s user 0.28s system 99% cpu 4.764 total
 worst: 4.99s user 0.57s system 99% cpu 5.568 total


 so i guess on os x there is still something to be optimized

Can you try it without write operation (comment out all write)? 
And than try it without uncompression?


// without compression:

void main(string[] args)
{
   auto f = File(args[1], "r");
   foreach (buffer; f.byChunk(4096))
   {
           write(cast(char[])buffer);
   }
}

// without write:

void main(string[] args)
{
   auto f = File(args[1], "r");
   auto uncompressor = new UnCompress(HeaderFormat.gzip);

   foreach (buffer; f.byChunk(4096))
   {
           auto uncompressed = 
cast(char[])(uncompressor.uncompress(buffer));
   }
   uncompressor.flush;
}

Aug 07 2015

"yawniek" <dlang srtnwz.com> writes:

On Friday, 7 August 2015 at 11:45:00 UTC, Daniel Kozak wrote:
 On Friday, 7 August 2015 at 09:12:32 UTC, yawniek wrote:
 [...]

 Can you try it without write operation (comment out all write)? 
 And than try it without uncompression?


 // without compression:

 void main(string[] args)
 {
   auto f = File(args[1], "r");
   foreach (buffer; f.byChunk(4096))
   {
           write(cast(char[])buffer);
   }
 }

  0.03s user 0.09s system 11% cpu 1.046 total


 // without write:

 void main(string[] args)
 {
   auto f = File(args[1], "r");
   auto uncompressor = new UnCompress(HeaderFormat.gzip);

   foreach (buffer; f.byChunk(4096))
   {
           auto uncompressed = 
 cast(char[])(uncompressor.uncompress(buffer));
   }
   uncompressor.flush;
 }

2.82s user 0.05s system 99% cpu 2.873 total

Aug 07 2015

Daniel =?UTF-8?B?S296w6Fr?= <kozzi dlang.cz> writes:

On Fri, 07 Aug 2015 12:29:26 +0000
"yawniek" <dlang srtnwz.com> wrote:

 On Friday, 7 August 2015 at 11:45:00 UTC, Daniel Kozak wrote:
 On Friday, 7 August 2015 at 09:12:32 UTC, yawniek wrote:
 [...]

 Can you try it without write operation (comment out all write)? 
 And than try it without uncompression?


 // without compression:

 void main(string[] args)
 {
   auto f = File(args[1], "r");
   foreach (buffer; f.byChunk(4096))
   {
           write(cast(char[])buffer);
   }
 }

 
   0.03s user 0.09s system 11% cpu 1.046 total

So I/O seems to be OK
 
 
 // without write:

 void main(string[] args)
 {
   auto f = File(args[1], "r");
   auto uncompressor = new UnCompress(HeaderFormat.gzip);

   foreach (buffer; f.byChunk(4096))
   {
           auto uncompressed = 
 cast(char[])(uncompressor.uncompress(buffer));
   }
   uncompressor.flush;
 }

 
 2.82s user 0.05s system 99% cpu 2.873 total
 

So maybe it is a zlib problem on osx?

Aug 07 2015

Daniel =?UTF-8?B?S296w6Fr?= <kozzi dlang.cz> writes:

On Fri, 7 Aug 2015 09:43:25 +0200
Daniel Koz=C3=A1k <kozzi dlang.cz> wrote:

=20
 On Fri, 07 Aug 2015 07:36:39 +0000
 "yawniek" <dlang srtnwz.com> wrote:
=20
 On Friday, 7 August 2015 at 07:29:15 UTC, Daniel Koz=C3=A1k wrote:
 Which compiler and version. There has been some performance=20
 problem with IO on OSX, it should be fixed in 2.068 release

=20
 i'm on master. v2.068-devel-8f81ffc
 also changed file read mode to "rb".
=20
 i don't understand why the program crashes when i do not do the=20
 .dup

 This is weird. I would say it should not crash
=20

Ok I see it is not weird because Uncompressor class probably has slice
to buffer

Aug 07 2015

Daniel =?UTF-8?B?S296w6Fr?= <kozzi dlang.cz> writes:

On Fri, 07 Aug 2015 07:19:43 +0000
"yawniek" <dlang srtnwz.com> wrote:

 hi,
 
 unpacking files is kinda slow, probably i'm doing something wrong.
 
 below code is about half the speed of gnu zcat on my os x machine.
 why?
 
 why do i need to .dup the buffer?

It depends. In your case you don't need to.

byChunk() reuse buffer which means, after each call same buffer is use,
so all previous data are gone.


 can i get rid of the casts?
 

Yes, you can use std.conv.to

import 
  std.zlib, 
  std.file,
  std.stdio,
  std.conv;

void main(string[] args)
{
  auto f = File(args[1], "rb");
  auto uncompressor = new UnCompress(HeaderFormat.gzip);

  foreach (buffer; f.byChunk(4096))
  {
          auto uncompressed = to!(char[])(uncompressor.uncompress(buffer));
          write(uncompressed);
  }
}

Aug 07 2015

D Programming

C/C++ Programming

Other

digitalmars.D.learn - zlib performance