www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Dev.to daily challenge Duplicate Encoder

reply Jesse Phillips <Jesse.K.Phillips+D gmail.com> writes:
The Dev.to community does a Daily Challenge[1]. I didn't really 
set out to solve the  problem and instead took the top 
implementations done in different languages and implemented them 
in D[2].

I need to emphasize I reference languages but this is not 
performance of those languages all data is from D. Also I did not 
play too much with optimizations on the compiler (using LDC) but 
feel free to recommend a change.

I find the changes to Haskell's performance to be interesting as 
it went from very unreasonable to competing with Go (Ranges vs 
foreach)

This was just a fun little experiment.

1. 
https://dev.to/thepracticaldev/daily-challenge-259-duplicate-encoder-2e8l
2. https://github.com/JesseKPhillips/devto-challenge259-dupencoder
Jun 24 2020
next sibling parent CraigDillabaugh <craig.dillabaugh gmail.com> writes:
On Wednesday, 24 June 2020 at 17:21:34 UTC, Jesse Phillips wrote:
 The Dev.to community does a Daily Challenge[1]. I didn't really 
 set out to solve the  problem and instead took the top 
 implementations done in different languages and implemented 
 them in D[2].

 I need to emphasize I reference languages but this is not 
 performance of those languages all data is from D. Also I did 
 not play too much with optimizations on the compiler (using 
 LDC) but feel free to recommend a change.

 I find the changes to Haskell's performance to be interesting 
 as it went from very unreasonable to competing with Go (Ranges 
 vs foreach)

 This was just a fun little experiment.

 1. 
 https://dev.to/thepracticaldev/daily-challenge-259-duplicate-encoder-2e8l
 2. 
 https://github.com/JesseKPhillips/devto-challenge259-dupencoder
Neat. It would be nice if the axes in your images were labeled. It is not immediately clear what the numbers mean.
Jun 24 2020
prev sibling next sibling parent CraigDillabaugh <craig.dillabaugh gmail.com> writes:
On Wednesday, 24 June 2020 at 17:21:34 UTC, Jesse Phillips wrote:
 The Dev.to community does a Daily Challenge[1]. I didn't really 
 set out to solve the  problem and instead took the top 
 implementations done in different languages and implemented 
 them in D[2].

 I need to emphasize I reference languages but this is not 
 performance of those languages all data is from D. Also I did 
 not play too much with optimizations on the compiler (using 
 LDC) but feel free to recommend a change.

 I find the changes to Haskell's performance to be interesting 
 as it went from very unreasonable to competing with Go (Ranges 
 vs foreach)

 This was just a fun little experiment.

 1. 
 https://dev.to/thepracticaldev/daily-challenge-259-duplicate-encoder-2e8l
 2. 
 https://github.com/JesseKPhillips/devto-challenge259-dupencoder
For your pointer based implementation you might be able to finish with a single for loop over the array (psuedocode). for each char ch in str: if this is the first time you see ch, replace it with '(' and store a pointer to it. else if this is the second time replace it with ')' and follow your pointer back, replacing the first occurrence with ')' else replace it with ')' Likely results in somewhat uglier code.
Jun 24 2020
prev sibling parent reply Vladimir Panteleev <thecybershadow.lists gmail.com> writes:
On Wednesday, 24 June 2020 at 17:21:34 UTC, Jesse Phillips wrote:
 2. 
 https://github.com/JesseKPhillips/devto-challenge259-dupencoder
In `duplicateEncode_go`, the code is inconsistent in whether it wants to process the string by code unit or code point. `occurences` [sic] is declared as `int[dchar]`, but later it uses a standard (`char`-wise) `foreach` loop, and the `std.ascii` variant of `toLower`. Replacing the type with `int[256]` results in roughly a 2x speedup. I think it might be possible to use an AVX (256-bit) register to hold state to avoid going to the RAM at all. I also don't understand the choices that led to the duplicateEncode_pointer implementation. This version is much faster: string duplicateEncode_pointer(string str) { import std.ascii : toLower; auto result = str.dup; char*[256] locMap; foreach(ref c; result) { auto p = &locMap[c.toLower()]; if (*p) **p = c = MANY; else { c = ONCE; *p = &c; } } return result.to!string; }
Jun 24 2020
next sibling parent Jesse Phillips <Jesse.K.Phillips+D gmail.com> writes:
On Wednesday, 24 June 2020 at 19:13:39 UTC, Vladimir Panteleev 
wrote:
 I also don't understand the choices that led to the 
 duplicateEncode_pointer implementation.
It it was basically, what happens if I don't loop twice over but instead just store where things need to change. I did want a single loop but when I realized I failed I switched to gnuplot This wasn't really an attempt to make efficient code, just a translation + some of my own curiosity.
Jun 24 2020
prev sibling next sibling parent Jesse Phillips <Jesse.K.Phillips+D gmail.com> writes:
On Wednesday, 24 June 2020 at 19:13:39 UTC, Vladimir Panteleev 
wrote:
 On Wednesday, 24 June 2020 at 17:21:34 UTC, Jesse Phillips 
 wrote:
 2. 
 https://github.com/JesseKPhillips/devto-challenge259-dupencoder
In `duplicateEncode_go`, the code is inconsistent in whether it wants to process the string by code unit or code point. `occurences` [sic] is declared as `int[dchar]`, but later it uses a standard (`char`-wise) `foreach` loop, and the `std.ascii` variant of `toLower`.
Yeah I noticed that too. Really just the first 'make it work' thing I did.
Jun 24 2020
prev sibling parent Jesse Phillips <Jesse.K.Phillips+D gmail.com> writes:
On Wednesday, 24 June 2020 at 19:13:39 UTC, Vladimir Panteleev 
wrote:
 On Wednesday, 24 June 2020 at 17:21:34 UTC, Jesse Phillips 
 wrote:
 2. 
 https://github.com/JesseKPhillips/devto-challenge259-dupencoder
Replacing the type with `int[256]` results in roughly a 2x speedup. I think it might be possible to use an AVX (256-bit) register to hold state to avoid going to the RAM at all. I also don't understand the choices that led to the duplicateEncode_pointer implementation. This version is much faster:
I've added those changes and included axis labels https://github.com/JesseKPhillips/devto-challenge259-dupencoder
Jun 25 2020