digitalmars.D - Dev.to daily challenge Duplicate Encoder

Jesse Phillips (15/15) Jun 24 2020 The Dev.to community does a Daily Challenge[1]. I didn't really

CraigDillabaugh (3/19) Jun 24 2020 Neat. It would be nice if the axes in your images were labeled.
CraigDillabaugh (11/27) Jun 24 2020 For your pointer based implementation you might be able to finish
Vladimir Panteleev (30/32) Jun 24 2020 In `duplicateEncode_go`, the code is inconsistent in whether it

Jesse Phillips (7/9) Jun 24 2020 It it was basically, what happens if I don't loop twice over but
Jesse Phillips (4/13) Jun 24 2020 Yeah I noticed that too. Really just the first 'make it work'
Jesse Phillips (4/15) Jun 25 2020 I've added those changes and included axis labels

Jesse Phillips <Jesse.K.Phillips+D gmail.com> writes:

The Dev.to community does a Daily Challenge[1]. I didn't really 
set out to solve the  problem and instead took the top 
implementations done in different languages and implemented them 
in D[2].

I need to emphasize I reference languages but this is not 
performance of those languages all data is from D. Also I did not 
play too much with optimizations on the compiler (using LDC) but 
feel free to recommend a change.

I find the changes to Haskell's performance to be interesting as 
it went from very unreasonable to competing with Go (Ranges vs 
foreach)

This was just a fun little experiment.

1. 
https://dev.to/thepracticaldev/daily-challenge-259-duplicate-encoder-2e8l
2. https://github.com/JesseKPhillips/devto-challenge259-dupencoder

Jun 24 2020

CraigDillabaugh <craig.dillabaugh gmail.com> writes:

On Wednesday, 24 June 2020 at 17:21:34 UTC, Jesse Phillips wrote:
 The Dev.to community does a Daily Challenge[1]. I didn't really 
 set out to solve the  problem and instead took the top 
 implementations done in different languages and implemented 
 them in D[2].

 I need to emphasize I reference languages but this is not 
 performance of those languages all data is from D. Also I did 
 not play too much with optimizations on the compiler (using 
 LDC) but feel free to recommend a change.

 I find the changes to Haskell's performance to be interesting 
 as it went from very unreasonable to competing with Go (Ranges 
 vs foreach)

 This was just a fun little experiment.

 1. 
 https://dev.to/thepracticaldev/daily-challenge-259-duplicate-encoder-2e8l
 2. 
 https://github.com/JesseKPhillips/devto-challenge259-dupencoder

Neat. It would be nice if the axes in your images were labeled. 
It is not immediately clear what the numbers mean.

Jun 24 2020

CraigDillabaugh <craig.dillabaugh gmail.com> writes:

On Wednesday, 24 June 2020 at 17:21:34 UTC, Jesse Phillips wrote:
 The Dev.to community does a Daily Challenge[1]. I didn't really 
 set out to solve the  problem and instead took the top 
 implementations done in different languages and implemented 
 them in D[2].

 I need to emphasize I reference languages but this is not 
 performance of those languages all data is from D. Also I did 
 not play too much with optimizations on the compiler (using 
 LDC) but feel free to recommend a change.

 I find the changes to Haskell's performance to be interesting 
 as it went from very unreasonable to competing with Go (Ranges 
 vs foreach)

 This was just a fun little experiment.

 1. 
 https://dev.to/thepracticaldev/daily-challenge-259-duplicate-encoder-2e8l
 2. 
 https://github.com/JesseKPhillips/devto-challenge259-dupencoder

For your pointer based implementation you might be able to finish 
with a single for loop over the array (psuedocode).

for each char ch in str:
      if this is the first time you see ch, replace it with '(' and
          store a pointer to it.
      else if this is the second time replace it with ')' and
          follow your pointer back, replacing the first occurrence 
with ')'
      else replace it with ')'

Likely results in somewhat uglier code.

Jun 24 2020

Vladimir Panteleev <thecybershadow.lists gmail.com> writes:

On Wednesday, 24 June 2020 at 17:21:34 UTC, Jesse Phillips wrote:
 2. 
 https://github.com/JesseKPhillips/devto-challenge259-dupencoder

In `duplicateEncode_go`, the code is inconsistent in whether it 
wants to process the string by code unit or code point. 
`occurences` [sic] is declared as `int[dchar]`, but later it uses 
a standard (`char`-wise) `foreach` loop, and the `std.ascii` 
variant of `toLower`.

Replacing the type with `int[256]` results in roughly a 2x 
speedup.

I think it might be possible to use an AVX (256-bit) register to 
hold state to avoid going to the RAM at all.

I also don't understand the choices that led to the 
duplicateEncode_pointer implementation. This version is much 
faster:

string duplicateEncode_pointer(string str) {
     import std.ascii : toLower;
     auto result = str.dup;
     char*[256] locMap;

     foreach(ref c; result)
     {
         auto p = &locMap[c.toLower()];
         if (*p)
             **p = c = MANY;
         else
         {
             c = ONCE;
             *p = &c;
         }
     }

     return result.to!string;
}

Jun 24 2020

Jesse Phillips <Jesse.K.Phillips+D gmail.com> writes:

On Wednesday, 24 June 2020 at 19:13:39 UTC, Vladimir Panteleev 
wrote:
 I also don't understand the choices that led to the 
 duplicateEncode_pointer implementation.

It it was basically, what happens if I don't loop twice over but 
instead just store where things need to change. I did want a 
single loop but when I realized I failed I switched to gnuplot

This wasn't really an attempt to make efficient code, just a 
translation + some of my own curiosity.

Jun 24 2020

Jesse Phillips <Jesse.K.Phillips+D gmail.com> writes:

On Wednesday, 24 June 2020 at 19:13:39 UTC, Vladimir Panteleev 
wrote:
 On Wednesday, 24 June 2020 at 17:21:34 UTC, Jesse Phillips 
 wrote:
 2. 
 https://github.com/JesseKPhillips/devto-challenge259-dupencoder

 In `duplicateEncode_go`, the code is inconsistent in whether it 
 wants to process the string by code unit or code point. 
 `occurences` [sic] is declared as `int[dchar]`, but later it 
 uses a standard (`char`-wise) `foreach` loop, and the 
 `std.ascii` variant of `toLower`.

Yeah I noticed that too. Really just the first 'make it work' 
thing I did.

Jun 24 2020

Jesse Phillips <Jesse.K.Phillips+D gmail.com> writes:

On Wednesday, 24 June 2020 at 19:13:39 UTC, Vladimir Panteleev 
wrote:
 On Wednesday, 24 June 2020 at 17:21:34 UTC, Jesse Phillips 
 wrote:
 2. 
 https://github.com/JesseKPhillips/devto-challenge259-dupencoder

 Replacing the type with `int[256]` results in roughly a 2x 
 speedup.

 I think it might be possible to use an AVX (256-bit) register 
 to hold state to avoid going to the RAM at all.

 I also don't understand the choices that led to the 
 duplicateEncode_pointer implementation. This version is much 
 faster:

I've added those changes and included axis labels

https://github.com/JesseKPhillips/devto-challenge259-dupencoder

Jun 25 2020

D Programming

C/C++ Programming

Other

digitalmars.D - Dev.to daily challenge Duplicate Encoder