digitalmars.D.learn - [Rosettacode] Growable slices
- bearophile (38/38) May 15 2014 This task asks for an basic implementation of the the
- monarch_dodra (18/57) May 15 2014 I don't think it shows a limit of "slices" in and out of itself,
- bearophile (13/16) May 16 2014 Third version added:
- Artur Skawina via Digitalmars-d-learn (5/11) May 16 2014 Ugh. So how does it perform wrt the D version that I wrote for
- bearophile (11/17) May 16 2014 write readable and efficient code. So lets do a saner
- bearophile (14/16) May 16 2014 I have done a benchmark with the various version (the first 3 are
- monarch_dodra (11/27) May 16 2014 Arguably, your code allocates a lot.
- bearophile (6/8) May 16 2014 I think the LZW entries are OK :) So if you want to work on
- monarch_dodra (3/6) May 16 2014 Any of the versions where you can see "[c]" or "[b]" where c/b is
- Artur Skawina via Digitalmars-d-learn (39/53) May 16 2014 While I don't remember the numbers, I did test w/ gdc back then and
This task asks for an basic implementation of the the Lempel-Ziv-Welch (LZW) compression/decompression algorithm. I am keeping two D versions, the first one is minimal, and the second is a little more optimized: http://rosettacode.org/wiki/LZW_compression#More_Refined_Version There are of course ways to implement a really efficient ZLW compressor in D, and such implementation could be added as third D entry if it manages to be not too much long, but for the first two versions keeping the code very short is more important. Despite the compressor in the second D entry is still not efficient at all, I have tried to make it not terrible regarding its efficiency, so I have defined a new kind of slice that can be extended, unlike the D slices, to avoid the string appends visible in the first D entry: struct Slice { size_t start, end; property opSlice() const pure nothrow safe nogc { return original[start .. end]; } alias opSlice this; } Slice w; Tcomp[] result; foreach (immutable i; 0 .. original.length) { auto wc = Slice(w.start, w.end + 1); // Extend slice. if (wc in dict) { w = wc; } else { result ~= dict[w]; assert(dict.length < Tcomp.max); // Overflow guard. dict[wc] = cast(Tcomp)dict.length; w = Slice(i, i + 1); } } Is this showing a limit (problem) of D slices, or is this design better written in other ways? Bye, bearophile
May 15 2014
On Thursday, 15 May 2014 at 09:00:13 UTC, bearophile wrote:This task asks for an basic implementation of the the Lempel-Ziv-Welch (LZW) compression/decompression algorithm. I am keeping two D versions, the first one is minimal, and the second is a little more optimized: http://rosettacode.org/wiki/LZW_compression#More_Refined_Version There are of course ways to implement a really efficient ZLW compressor in D, and such implementation could be added as third D entry if it manages to be not too much long, but for the first two versions keeping the code very short is more important. Despite the compressor in the second D entry is still not efficient at all, I have tried to make it not terrible regarding its efficiency, so I have defined a new kind of slice that can be extended, unlike the D slices, to avoid the string appends visible in the first D entry: struct Slice { size_t start, end; property opSlice() const pure nothrow safe nogc { return original[start .. end]; } alias opSlice this; } Slice w; Tcomp[] result; foreach (immutable i; 0 .. original.length) { auto wc = Slice(w.start, w.end + 1); // Extend slice. if (wc in dict) { w = wc; } else { result ~= dict[w]; assert(dict.length < Tcomp.max); // Overflow guard. dict[wc] = cast(Tcomp)dict.length; w = Slice(i, i + 1); } } Is this showing a limit (problem) of D slices, or is this design better written in other ways? Bye, bearophileI don't think it shows a limit of "slices" in and out of itself, but rather the whole "range" concept, that conveniently encapsulates a start/end iteration scheme. The problem though is that, unlike iterators, these can only shrink, and never grow. Furthermore, it is usally hard to "split" a range into two parts, given a center iterator (especially for bidirectional-only ranges: EG: linked list). I think ranges/slices still provide more functionality and are worth it, but they do sacrifice *some* power: Growth and splitting. To be honest, what you are doing is essentially working with iterator pairs, disguised as indexes inside a "Slice" struct. Not that there's anything wrong with that, but that's my impression when looking at the code. Related: http://forum.dlang.org/thread/bhssgxfcscfbionhqcmn forum.dlang.org#post-umxlhsfqmfjwydemfdeb:40forum.dlang.org http://forum.dlang.org/thread/gpsiwnslxtsyfolymesc forum.dlang.org#post-mailman.2149.1353648522.5162.digitalmars-d:40puremagic.com
May 15 2014
There are of course ways to implement a really efficient ZLW compressor in D, and such implementation could be added as third D entry if it manages to be not too much long,Third version added: http://rosettacode.org/wiki/LZW_compression#More_Efficient_Version I have tried to make it idiomatic D, but some of the C style is probably still present. The D code currently still requires five casts, and I have guarded them with asserts, like: assert(tmp >> oBits <= ubyte.max); result[outLen] = cast(ubyte)(tmp >> oBits); Perhaps with some more semantic knowledge of the program it's possible to avoid one or two of those casts. The D code contains a little less "type insanity" than the original C version :-) Bye, bearophile
May 16 2014
On 05/16/14 11:47, bearophile via Digitalmars-d-learn wrote:Ugh. So how does it perform wrt the D version that I wrote for you last time? [1] [1] http://forum.dlang.org/post/mailman.2132.1353592965.5162.digitalmars-d puremagic.com) arturThere are of course ways to implement a really efficient ZLW compressor in D, and such implementation could be added as third D entry if it manages to be not too much long,Third version added: http://rosettacode.org/wiki/LZW_compression#More_Efficient_Version I have tried to make it idiomatic D, but some of the C style is probably still present.
May 16 2014
Artur Skawina:So how does it perform wrt the D version that I wrote for you last time? [1] [1] http://forum.dlang.org/post/mailman.2132.1353592965.5162.digitalmars-d puremagic.com)From your answer:First, the code above is practically a textbook example on how /not/ towrite readable and efficient code. So lets do a saner implementation:< I think the code I wrote is sufficiently readable, and surely efficiency was not its purpose. And November 2012 is lot of time ago, I have forgotten that answer of yours, sorry :-) I'll take a better look at it later today and I'll see if it's useful. Thank you. Bye, bearophile
May 16 2014
Artur Skawina:Ugh. So how does it perform wrt the D version that I wrote for you last time? [1]I have done a benchmark with the various version (the first 3 are lzw1: 0.39 lzw2: 0.17 lzw3: 0.21 lzw4: 0.17 I think your comment was about an older version of the first entry, that is non meant to be fast, just short and simple. I think the second version is enough. Do you agree? The third version should be more efficient but for such small files it's slower than the second. Bye, bearophile
May 16 2014
On Friday, 16 May 2014 at 15:20:04 UTC, bearophile wrote:Artur Skawina:Arguably, your code allocates a lot. To speed it up, arguably, you could store a global immutable string containing characters 0 to char.max + 1. This way, when you build your dictionary (each time you enter compress), you at least don't have to allocate the string, but rather, slice your global immutable. Ditto for the lines "w = [ch];", which allocates, you could instead do "w = gAllChars[ch .. ch + 1]". (or "dict[b] = [b]" etc...) Well, just a quick idea... I'll give it a shot (next week).Ugh. So how does it perform wrt the D version that I wrote for you last time? [1]I have done a benchmark with the various version (the first 3 lzw1: 0.39 lzw2: 0.17 lzw3: 0.21 lzw4: 0.17 I think your comment was about an older version of the first entry, that is non meant to be fast, just short and simple. I think the second version is enough. Do you agree? The third version should be more efficient but for such small files it's slower than the second. Bye, bearophile
May 16 2014
monarch_dodra:Arguably, your code allocates a lot.What version (1-2-3) do you mean?I'll give it a shot (next week).I think the LZW entries are OK :) So if you want to work on Rosettacode it's better to write an entry that is missing in D. Bye, bearophile
May 16 2014
On Friday, 16 May 2014 at 15:49:10 UTC, bearophile wrote:monarch_dodra:Any of the versions where you can see "[c]" or "[b]" where c/b is a char/byteArguably, your code allocates a lot.What version (1-2-3) do you mean?
May 16 2014
On 05/16/14 17:20, bearophile via Digitalmars-d-learn wrote:Artur Skawina:While I don't remember the numbers, I did test w/ gdc back then and both of your versions (from that thread) performed very poorly in a micro benchmark - I wasn't exaggerating when I said /an order of magnitude/, there really was a ~10 times perf difference. I didn't read the current entries on that page, so that might no longer apply, if you've modified them since. My point is that this third version that you announced here as almost- idiomatic-D is not even close to that goal; it's more or less a direct translation from C, and not like anything that would be written from scratch in D, hence the "ugh". Your own numbers above suggest that not even the "more efficient" claim is true.Ugh. So how does it perform wrt the D version that I wrote for you last time? [1]I have done a benchmark with the various version (the first 3 are the ones on lzw1: 0.39 lzw2: 0.17 lzw3: 0.21 lzw4: 0.17 I think your comment was about an older version of the first entry, that is non meant to be fast, just short and simple.I think the second version is enough. Do you agree?I don't see the point of the third ("C-in-D") one, other than to show how a direct C to D translation would look like. If /I/ was looking for a D code sample, then OUT[] compress(OUT=int, R)(R original) { IDAA!(OUT, const typeof(cast()original.front)[]) dictionary; OUT[] result; size_t l = 1; while (original.length>=l) { if (original.takeExactly(l) in dictionary) { ++l; continue; } result ~= dictionary[original.takeExactly(l-1)]; dictionary[original.takeExactly(l)] = cast(OUT)dictionary.length; original.popFrontN(l-1); l = 1; } return original.empty ? result : (result ~ dictionary[original]); } would have been much more useful than any of the currently available someone not familiar with D and ranges, but extremely inefficient and not something that anyone would like to use in real life. Many of the rosettacode entries are very misleading and unidiomatic, they give a very wrong picture of the language they're trying to show off. This is probably not specific to D, but true for most of the represented languages. artur
May 16 2014