www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.announce - LZ4 decompression at CTFE

reply Stefan Koch <uplink.coder googlemail.com> writes:
Hello,

originally I want to wait with this announcement until DConf.
But since I working on another toy. I can release this info early.

So as per title. you can decompress .lz4 flies created by the 
standard lz4hc commnadline tool at compile time.

No github link yet as there is a little bit of cleanup todo :)

Please comment.
Apr 26 2016
next sibling parent reply MrSmith <mrsmith33 yandex.ru> writes:
On Tuesday, 26 April 2016 at 22:05:39 UTC, Stefan Koch wrote:
 Hello,

 originally I want to wait with this announcement until DConf.
 But since I working on another toy. I can release this info 
 early.

 So as per title. you can decompress .lz4 flies created by the 
 standard lz4hc commnadline tool at compile time.

 No github link yet as there is a little bit of cleanup todo :)

 Please comment.
I would like to use this instead of c++ static lib. Thanks! (I hope it works at runtime too).
Apr 26 2016
next sibling parent Stefan Koch <uplink.coder googlemail.com> writes:
On Tuesday, 26 April 2016 at 22:07:47 UTC, MrSmith wrote:
 I would like to use this instead of c++ static lib. Thanks! (I 
 hope it works at runtime too).
Sure it does, but keep in mind the c++ version is heavily optimized. I would have to make a special runtime version to archive comparable performance I think. That said, I already plan to write another optimized version. Concerning compression. I am fairly certain I can beat the compression ratio of lz4hc in a few cases. But it is going to be slower.
Apr 26 2016
prev sibling parent Stefan Koch <uplink.coder googlemail.com> writes:
On Tuesday, 26 April 2016 at 22:07:47 UTC, MrSmith wrote:

 I would like to use this instead of c++ static lib. Thanks! (I 
 hope it works at runtime too).
Oh and If you could please send me a sample of a file you are trying to uncompress. That would be most helpful.
Apr 26 2016
prev sibling next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 4/26/2016 3:05 PM, Stefan Koch wrote:
 Hello,

 originally I want to wait with this announcement until DConf.
 But since I working on another toy. I can release this info early.

 So as per title. you can decompress .lz4 flies created by the standard lz4hc
 commnadline tool at compile time.

 No github link yet as there is a little bit of cleanup todo :)

 Please comment.
Sounds nice. I'm curious how it would compare to: https://www.digitalmars.com/sargon/lz77.html https://github.com/DigitalMars/sargon/blob/master/src/sargon/lz77.d
Apr 26 2016
next sibling parent reply Marco Leise <Marco.Leise gmx.de> writes:
Am Tue, 26 Apr 2016 23:55:46 -0700
schrieb Walter Bright <newshound2 digitalmars.com>:

 On 4/26/2016 3:05 PM, Stefan Koch wrote:
 Hello,

 originally I want to wait with this announcement until DConf.
 But since I working on another toy. I can release this info early.

 So as per title. you can decompress .lz4 flies created by the standard lz4hc
 commnadline tool at compile time.

 No github link yet as there is a little bit of cleanup todo :)

 Please comment.  
Sounds nice. I'm curious how it would compare to: https://www.digitalmars.com/sargon/lz77.html https://github.com/DigitalMars/sargon/blob/master/src/sargon/lz77.d
There exist some comparisons for the C++ implementations (zlib's DEFLATE being a variation of lz77): http://catchchallenger.first-world.info//wiki/Quick_Benchmark:_Gzip_vs_Bzip2_vs_LZMA_vs_XZ_vs_LZ4_vs_LZO https://pdfs.semanticscholar.org/9b69/86f2fff8db7e080ef8b02aa19f3941a61a91.pdf (pg.9) The high compression variant of lz4 basically like gzip with 9x faster decompression. That makes it well suited for use cases where you compress once, decompress often and I/O sequential reads are fast e.g. 200 MB/s or the program does other computations meanwhile and one doesn't want decompression to use a lot of CPU time. -- Marco
Apr 27 2016
parent Stefan Koch <uplink.coder googlemail.com> writes:
On Thursday, 28 April 2016 at 06:03:46 UTC, Marco Leise wrote:
 There exist some comparisons for the C++ implementations
 (zlib's DEFLATE being a variation of lz77):
 http://catchchallenger.first-world.info//wiki/Quick_Benchmark:_Gzip_vs_Bzip2_vs_LZMA_vs_XZ_vs_LZ4_vs_LZO
 https://pdfs.semanticscholar.org/9b69/86f2fff8db7e080ef8b02aa19f3941a61a91.pdf
(pg.9)

 The high compression variant of lz4 basically like gzip with 9x 
 faster decompression. That makes it well suited for use cases 
 where you compress once, decompress often and I/O sequential 
 reads are fast e.g. 200 MB/s or the program does other 
 computations meanwhile and one doesn't want decompression to 
 use a lot of CPU time.
Thanks for the 2. link you posted. This made me aware of a few things I were not aware of before.
Apr 28 2016
prev sibling parent reply Stefan Koch <uplink.coder googlemail.com> writes:
On Wednesday, 27 April 2016 at 06:55:46 UTC, Walter Bright wrote:
 Sounds nice. I'm curious how it would compare to:

 https://www.digitalmars.com/sargon/lz77.html

 https://github.com/DigitalMars/sargon/blob/master/src/sargon/lz77.d
lz77 took 176 hnecs uncompressing lz4 took 92 hnecs uncompressing And another test in reversed order using the same data. lz4 took 162 hnecs uncompressing lz77 took 245 hnecs uncompressing
Apr 28 2016
parent reply Stefan Koch <uplink.coder googlemail.com> writes:
On Thursday, 28 April 2016 at 20:12:58 UTC, Stefan Koch wrote:
 On Wednesday, 27 April 2016 at 06:55:46 UTC, Walter Bright 
 wrote:
 Sounds nice. I'm curious how it would compare to:

 https://www.digitalmars.com/sargon/lz77.html

 https://github.com/DigitalMars/sargon/blob/master/src/sargon/lz77.d
lz77 took 176 hnecs uncompressing lz4 took 92 hnecs uncompressing And another test in reversed order using the same data. lz4 took 162 hnecs uncompressing lz77 took 245 hnecs uncompressing
Though the compression ratio is worse. But that is partially fixable.
Apr 28 2016
parent Stefan Koch <uplink.coder googlemail.com> writes:
On Thursday, 28 April 2016 at 20:58:25 UTC, Stefan Koch wrote:
 Though the compression ratio is worse.
 But that is partially fixable.
I have to go back on that, due to restrictions in the lz4 spec many _very_ small files will have significant overhead. Work on improving the compression ratio is ongoing but there is not more then 0.5-1.5% improvement to expect.
Apr 30 2016
prev sibling next sibling parent reply Dejan Lekic <dejan.lekic gmail.com> writes:
On Tuesday, 26 April 2016 at 22:05:39 UTC, Stefan Koch wrote:
 Hello,

 originally I want to wait with this announcement until DConf.
 But since I working on another toy. I can release this info 
 early.

 So as per title. you can decompress .lz4 flies created by the 
 standard lz4hc commnadline tool at compile time.

 No github link yet as there is a little bit of cleanup todo :)

 Please comment.
That is brilliant! I need LZ4 compression for a small project I work on...
Apr 27 2016
parent Stefan Koch <uplink.coder googlemail.com> writes:
On Wednesday, 27 April 2016 at 07:51:30 UTC, Dejan Lekic wrote:
 That is brilliant! I need LZ4 compression for a small project I 
 work on...
The decompressor is ready to be released. It should work for all files compressed with the vanilla lz4c -9 please regard this release as alpha quality. https://github.com/UplinkCoder/lz4-ctfe P.S and I did not tweak the source. The compressed file size just happens to be 1911. I take this as a sign of correctness. P.P.S Actually LZ4 is a much more interesting topic then SQLite. If you don't mind I am going to talk about that :)
Apr 27 2016
prev sibling parent reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On 27-Apr-2016 01:05, Stefan Koch wrote:
 Hello,

 originally I want to wait with this announcement until DConf.
 But since I working on another toy. I can release this info early.

 So as per title. you can decompress .lz4 flies created by the standard
 lz4hc commnadline tool at compile time.
What's the benefit? I mean after CTFE-decompression they are going to add weight to the binary as much as decompressed files. Compression on the other hand might be helpful to avoid precompressing everything beforehand.
 No github link yet as there is a little bit of cleanup todo :)

 Please comment.
-- Dmitry Olshansky
Apr 28 2016
next sibling parent reply Stefan Koch <uplink.coder googlemail.com> writes:
On Thursday, 28 April 2016 at 17:29:05 UTC, Dmitry Olshansky 
wrote:
 What's the benefit? I mean after CTFE-decompression they are 
 going to add weight to the binary as much as decompressed files.

 Compression on the other hand might be helpful to avoid 
 precompressing everything beforehand.
The compiler can load files faster, that are being used by ctfe only. Which would be stripped out by the linker later. And keep in mind that it also works at runtime. Memory is scarce at compiletime and this can help reducing the memory requirements. When a bit of structure is added on top.
Apr 28 2016
parent reply deadalnix <deadalnix gmail.com> writes:
On Thursday, 28 April 2016 at 17:58:50 UTC, Stefan Koch wrote:
 On Thursday, 28 April 2016 at 17:29:05 UTC, Dmitry Olshansky 
 wrote:
 What's the benefit? I mean after CTFE-decompression they are 
 going to add weight to the binary as much as decompressed 
 files.

 Compression on the other hand might be helpful to avoid 
 precompressing everything beforehand.
The compiler can load files faster, that are being used by ctfe only. Which would be stripped out by the linker later. And keep in mind that it also works at runtime. Memory is scarce at compiletime and this can help reducing the memory requirements. When a bit of structure is added on top.
Considering the speed and memory consumption of CTFE, I'd bet on the exact reverse. Also, the damn thing is allocation in a loop.
Apr 28 2016
next sibling parent Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On 28-Apr-2016 21:31, deadalnix wrote:
 On Thursday, 28 April 2016 at 17:58:50 UTC, Stefan Koch wrote:
 On Thursday, 28 April 2016 at 17:29:05 UTC, Dmitry Olshansky wrote:
 What's the benefit? I mean after CTFE-decompression they are going to
 add weight to the binary as much as decompressed files.

 Compression on the other hand might be helpful to avoid
 precompressing everything beforehand.
The compiler can load files faster, that are being used by ctfe only. Which would be stripped out by the linker later. And keep in mind that it also works at runtime. Memory is scarce at compiletime and this can help reducing the memory requirements. When a bit of structure is added on top.
Considering the speed and memory consumption of CTFE, I'd bet on the exact reverse.
Yeah, the whole CTFE to save compile-time memory sounds like a bad joke to me;)
 Also, the damn thing is allocation in a loop.
-- Dmitry Olshansky
Apr 28 2016
prev sibling parent Stefan Koch <uplink.coder googlemail.com> writes:
On Thursday, 28 April 2016 at 18:31:25 UTC, deadalnix wrote:
 Also, the damn thing is allocation in a loop.
I would like a have an allocation primitive for ctfe use. But that would not help too much as I don't know the size I need in advance. storing that in the header is optional, and unfortunately lz4c does not store it by default. decompressing the lz family takes never more space then uncompressed size of the data. The working set is often bounded. In the case of lz4 it's restricted to 4k in the frame format. and to 64k by design.
Apr 28 2016
prev sibling parent Stefan Koch <uplink.coder googlemail.com> writes:
On Thursday, 28 April 2016 at 17:29:05 UTC, Dmitry Olshansky 
wrote:
 Compression on the other hand might be helpful to avoid 
 precompressing everything beforehand.
I fear that is going to be pretty slow and will eat at least 1.5 the memory of the file you are trying to store. If you want a good compression ratio. then again... it might be fast enough to still be useful.
Apr 28 2016