digitalmars.D - Dev.to daily challenge Duplicate Encoder
- Jesse Phillips (15/15) Jun 24 2020 The Dev.to community does a Daily Challenge[1]. I didn't really
- CraigDillabaugh (3/19) Jun 24 2020 Neat. It would be nice if the axes in your images were labeled.
- CraigDillabaugh (11/27) Jun 24 2020 For your pointer based implementation you might be able to finish
- Vladimir Panteleev (30/32) Jun 24 2020 In `duplicateEncode_go`, the code is inconsistent in whether it
- Jesse Phillips (7/9) Jun 24 2020 It it was basically, what happens if I don't loop twice over but
- Jesse Phillips (4/13) Jun 24 2020 Yeah I noticed that too. Really just the first 'make it work'
- Jesse Phillips (4/15) Jun 25 2020 I've added those changes and included axis labels
The Dev.to community does a Daily Challenge[1]. I didn't really set out to solve the problem and instead took the top implementations done in different languages and implemented them in D[2]. I need to emphasize I reference languages but this is not performance of those languages all data is from D. Also I did not play too much with optimizations on the compiler (using LDC) but feel free to recommend a change. I find the changes to Haskell's performance to be interesting as it went from very unreasonable to competing with Go (Ranges vs foreach) This was just a fun little experiment. 1. https://dev.to/thepracticaldev/daily-challenge-259-duplicate-encoder-2e8l 2. https://github.com/JesseKPhillips/devto-challenge259-dupencoder
Jun 24 2020
On Wednesday, 24 June 2020 at 17:21:34 UTC, Jesse Phillips wrote:The Dev.to community does a Daily Challenge[1]. I didn't really set out to solve the problem and instead took the top implementations done in different languages and implemented them in D[2]. I need to emphasize I reference languages but this is not performance of those languages all data is from D. Also I did not play too much with optimizations on the compiler (using LDC) but feel free to recommend a change. I find the changes to Haskell's performance to be interesting as it went from very unreasonable to competing with Go (Ranges vs foreach) This was just a fun little experiment. 1. https://dev.to/thepracticaldev/daily-challenge-259-duplicate-encoder-2e8l 2. https://github.com/JesseKPhillips/devto-challenge259-dupencoderNeat. It would be nice if the axes in your images were labeled. It is not immediately clear what the numbers mean.
Jun 24 2020
On Wednesday, 24 June 2020 at 17:21:34 UTC, Jesse Phillips wrote:The Dev.to community does a Daily Challenge[1]. I didn't really set out to solve the problem and instead took the top implementations done in different languages and implemented them in D[2]. I need to emphasize I reference languages but this is not performance of those languages all data is from D. Also I did not play too much with optimizations on the compiler (using LDC) but feel free to recommend a change. I find the changes to Haskell's performance to be interesting as it went from very unreasonable to competing with Go (Ranges vs foreach) This was just a fun little experiment. 1. https://dev.to/thepracticaldev/daily-challenge-259-duplicate-encoder-2e8l 2. https://github.com/JesseKPhillips/devto-challenge259-dupencoderFor your pointer based implementation you might be able to finish with a single for loop over the array (psuedocode). for each char ch in str: if this is the first time you see ch, replace it with '(' and store a pointer to it. else if this is the second time replace it with ')' and follow your pointer back, replacing the first occurrence with ')' else replace it with ')' Likely results in somewhat uglier code.
Jun 24 2020
On Wednesday, 24 June 2020 at 17:21:34 UTC, Jesse Phillips wrote:2. https://github.com/JesseKPhillips/devto-challenge259-dupencoderIn `duplicateEncode_go`, the code is inconsistent in whether it wants to process the string by code unit or code point. `occurences` [sic] is declared as `int[dchar]`, but later it uses a standard (`char`-wise) `foreach` loop, and the `std.ascii` variant of `toLower`. Replacing the type with `int[256]` results in roughly a 2x speedup. I think it might be possible to use an AVX (256-bit) register to hold state to avoid going to the RAM at all. I also don't understand the choices that led to the duplicateEncode_pointer implementation. This version is much faster: string duplicateEncode_pointer(string str) { import std.ascii : toLower; auto result = str.dup; char*[256] locMap; foreach(ref c; result) { auto p = &locMap[c.toLower()]; if (*p) **p = c = MANY; else { c = ONCE; *p = &c; } } return result.to!string; }
Jun 24 2020
On Wednesday, 24 June 2020 at 19:13:39 UTC, Vladimir Panteleev wrote:I also don't understand the choices that led to the duplicateEncode_pointer implementation.It it was basically, what happens if I don't loop twice over but instead just store where things need to change. I did want a single loop but when I realized I failed I switched to gnuplot This wasn't really an attempt to make efficient code, just a translation + some of my own curiosity.
Jun 24 2020
On Wednesday, 24 June 2020 at 19:13:39 UTC, Vladimir Panteleev wrote:On Wednesday, 24 June 2020 at 17:21:34 UTC, Jesse Phillips wrote:Yeah I noticed that too. Really just the first 'make it work' thing I did.2. https://github.com/JesseKPhillips/devto-challenge259-dupencoderIn `duplicateEncode_go`, the code is inconsistent in whether it wants to process the string by code unit or code point. `occurences` [sic] is declared as `int[dchar]`, but later it uses a standard (`char`-wise) `foreach` loop, and the `std.ascii` variant of `toLower`.
Jun 24 2020
On Wednesday, 24 June 2020 at 19:13:39 UTC, Vladimir Panteleev wrote:On Wednesday, 24 June 2020 at 17:21:34 UTC, Jesse Phillips wrote:I've added those changes and included axis labels https://github.com/JesseKPhillips/devto-challenge259-dupencoder2. https://github.com/JesseKPhillips/devto-challenge259-dupencoderReplacing the type with `int[256]` results in roughly a 2x speedup. I think it might be possible to use an AVX (256-bit) register to hold state to avoid going to the RAM at all. I also don't understand the choices that led to the duplicateEncode_pointer implementation. This version is much faster:
Jun 25 2020