digitalmars.D.learn - wordladder

digitalmars.D.learn - wordladder - code improvement

mark (89/89) Jan 30 2020 Below is a program that produces a wordladder.

dwdv (53/53) Jan 30 2020 From one noob to another: Not much of a difference, but levenshteinDist...

mark (13/13) Jan 31 2020 Thanks for your implementation.

dwdv (24/27) Jan 31 2020 Nah, it isn't, sorry for the noise, should have slept before sending the...

mark (4/4) Jan 31 2020 I forgot to mention: I know it isn't worth bothering with

mark <mark qtrac.eu> writes:

Below is a program that produces a wordladder.

The algorithm is probably suboptimal, but I don't care since I've 
implemented the same one in Python, Rust, Go, Java, and Nim, so I 
find it useful for language comparison purposes.

What I'd like some feedback on is how to improve the code 
(keeping the algorithm the same) to make it into more idiomatic D 
and to take the most advantage of D's features (while keeping it 
as understandable as possible).

For example, in the last function, compatibleWords(), I can't 
help thinking that I could avoid the foreach loop.

Also I'm still not really clear on the appropriateness of 
const/immutable/in in function arguments. The docs seem to 
discourage in, and other things I've read seem to favour const. 
Similarly, the appropriateness of const/immutable inside 
functions.

// wordladder.d
import core.time: MonoTime;
import std.algorithm: any, count, filter, map, sum, until;
import std.array: array, join;
import std.conv: dtext, to;
import std.functional: not;
import std.range: assocArray, repeat, zip;
import std.random: choice;
import std.stdio: File, write, writeln;
import std.uni: isAlpha, toUpper;

enum WORDFILE = "/usr/share/hunspell/en_GB.dic";
enum WORDSIZE = 4; // Should be even
enum STEPS = WORDSIZE;

alias WordList = string[];
alias WordSet = int[string]; // key = word; value = 0

void main() {
     immutable start = MonoTime.currTime;
     auto words = getWords(WORDFILE, WORDSIZE);
     int count = 1;
     WordList ladder;
     write("Try ");
     while (true) {
	write('.');
	ladder = generate(words, STEPS);
	if (ladder.length != 0)
	    break;
	++count;
     }
     writeln(' ', count);
     writeln(join(ladder, '\n'));
     writeln(MonoTime.currTime - start);
}

WordSet getWords(const string filename, const int wordsize) {
     return File(filename).byLine
	.map!(line => line.until!(not!isAlpha))
	.filter!(word => word.count == wordsize)
	.map!(word => word.to!string.toUpper)
	.assocArray(0.repeat);
}

WordList generate(WordSet allWords, const int steps) {
     WordList ladder;
     auto words = allWords.dup;
     auto compatibles = allWords.dup;
     auto prev = update(ladder, words, compatibles);
     for (int i = 0; i <= steps; ++i) {
	compatibles = compatibleWords(prev, words);
	if (compatibles.length == 0)
	    return [];
	prev = update(ladder, words, compatibles);
     }
     immutable first = dtext(ladder[0]);
     immutable last = dtext(ladder[$ - 1]);
     if (any!(t => t[0] == t[1])(zip(first, last)))
	return []; // Don't accept any vertical letters in common
     return ladder;
}

string update(ref WordList ladder, ref WordSet words,
	      const WordSet compatibles) {
     immutable word = compatibles.byKey.array.choice;
     ladder ~= word;
     words.remove(word);
     return word;
}

// Add words that are 1 letter different to prev
WordSet compatibleWords(const string prev, const WordSet words) {
     WordSet compatibles;
     immutable prevChars = dtext(prev);
     immutable size = prevChars.length - 1;
     foreach (word; words.byKey)
	if (sum(map!(t => int(t[0] == t[1]))
		     (zip(prevChars, dtext(word)))) == size)
	    compatibles[word] = 0;
     return compatibles;
}

Jan 30 2020

dwdv <dwdv posteo.de> writes:

 From one noob to another: Not much of a difference, but levenshteinDistance
seems to be a good fit 
here. About as fast as your solution, slightly lower memory usage.
byCodeUnit/byChar might shave off 
a few more ms.

For small scripts like these I'm usually not even bothering with const
correctness and selective 
imports. There's also import std; but that might lead to ambiguous imports.

---

import std.algorithm, std.array, std.conv, std.datetime.stopwatch,
        std.functional, std.random, std.range, std.stdio, std.uni;

enum WORDFILE = "/usr/share/hunspell/en_GB.dic";
enum WORDSIZE = 4;
enum STEPS = WORDSIZE;

// could be verified at compile-time
static assert(WORDSIZE % 2 == 0, "WORDSIZE should be even");

alias WordList = string[];
alias WordSet = bool[string];

void main() {
     auto sw = StopWatch(AutoStart.yes);
     auto words = getWords(WORDFILE, WORDSIZE);
     // would be slicker with proper destructuring support
     auto res = generate!(() => genLadder(words.dup, STEPS))
         .enumerate(1)
         .find!(a => !a[1].empty)
         .front;
     writeln("tries: ", res[0]);
     res[1].each!writeln;
     writeln(sw.peek); // or sw.peek.total!"msecs"
}

WordSet getWords(string filename, uint wordsize) {
     return File(filename).byLine
         .map!(line => line.until!(not!isAlpha))
         .filter!(word => word.count == wordsize)
         .map!(word => word.to!string.toUpper)
         .assocArray(true.repeat);
}

WordList genLadder(WordSet words, uint steps) {
     WordList ladder;

     string update(WordList comps) {
         ladder ~= comps.choice;
         words.remove(ladder.back);
         return ladder.back;
     }
     WordList compWords(string prev) {
         return words.byKey.filter!(word => levenshteinDistance(word, prev) ==
1).array;
     }

     auto prev = update(words.byKey.array);
     foreach (comps; generate!(() => compWords(prev)).take(steps + 1)) {
         if (comps.empty) return comps;
         prev = update(comps);
     }

     if (levenshteinDistance(ladder.front, ladder.back) < WORDSIZE)
         return [];
     return ladder;
}

Jan 30 2020

mark <mark qtrac.eu> writes:

Thanks for your implementation.

I can't use the levenshtien distance because although it is a 
better solution, I want to keep the implementation as compatible 
with those in the other languages as possible.

Your main() is much shorter than mine, but doesn't match the 
behaviour (and again I want to keep the same as the other 
implementations). Nonetheless, I didn't know about generate!()(), 
so I've learnt from yours.

Nor did I know about the StopWatch which I'm now using, plus I've 
added your static assert, and also copied your idea of making 
update() an inner function (and now only needing one parameter).

I still need to study your genLadder() more carefully to 
understand it because it is so compact.

Jan 31 2020

dwdv <dwdv posteo.de> writes:

On 2020-01-31 09:44, mark via Digitalmars-d-learn wrote:
 I can't use the levenshtien distance because although it is a better solution,
[...]

Nah, it isn't, sorry for the noise, should have slept before sending the
message, was thinking of 
hamming distance:

auto a = "abcd";
auto b = "bcda";

auto hammingDistance(string a, string b) {
     return zip(a, b).count!(t => t[0] != t[1]);
}

levenshteinDistance(a, b).writeln; // => 2
hammingDistance(a, b).writeln;     // => 4

 main() [...] doesn't match the behaviour

There's also std.range.tee if you ever need to perform additional side-effects
in a range pipeline, 
which restores the original dot printing:

write("Try ");
auto res = generate!(() => genLadder(words.dup, STEPS))
     .enumerate(1)
     .tee!(_ => write('.'), No.pipeOnPop) // No.pipeOnPop ensures we're not one
dot short
     .find!(a => !a[1].empty)
     .front;
writeln(" ", res[0]);

 genLadder() [...] is so compact.

Thinking about it, you could even eliminate the prev variable all together when
using ladder.back in 
compWords.

Regarding const correctness, this article and thread might contain useful
information: 
https://forum.dlang.org/thread/2735451.YHZktzbKJo lyonel

By the way, keep the documentation improvements coming, much appreciated!

Jan 31 2020

mark <mark qtrac.eu> writes:

I forgot to mention: I know it isn't worth bothering with 
const/immutable for this tiny example. But I want to learn how to 
write large D programs, so I need to get into the right habits 
and know the right things.

Jan 31 2020

D Programming

C/C++ Programming

Other

digitalmars.D.learn - wordladder - code improvement