www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - wordladder - code improvement

reply mark <mark qtrac.eu> writes:
Below is a program that produces a wordladder.

The algorithm is probably suboptimal, but I don't care since I've 
implemented the same one in Python, Rust, Go, Java, and Nim, so I 
find it useful for language comparison purposes.

What I'd like some feedback on is how to improve the code 
(keeping the algorithm the same) to make it into more idiomatic D 
and to take the most advantage of D's features (while keeping it 
as understandable as possible).

For example, in the last function, compatibleWords(), I can't 
help thinking that I could avoid the foreach loop.

Also I'm still not really clear on the appropriateness of 
const/immutable/in in function arguments. The docs seem to 
discourage in, and other things I've read seem to favour const. 
Similarly, the appropriateness of const/immutable inside 
functions.

// wordladder.d
import core.time: MonoTime;
import std.algorithm: any, count, filter, map, sum, until;
import std.array: array, join;
import std.conv: dtext, to;
import std.functional: not;
import std.range: assocArray, repeat, zip;
import std.random: choice;
import std.stdio: File, write, writeln;
import std.uni: isAlpha, toUpper;

enum WORDFILE = "/usr/share/hunspell/en_GB.dic";
enum WORDSIZE = 4; // Should be even
enum STEPS = WORDSIZE;

alias WordList = string[];
alias WordSet = int[string]; // key = word; value = 0

void main() {
     immutable start = MonoTime.currTime;
     auto words = getWords(WORDFILE, WORDSIZE);
     int count = 1;
     WordList ladder;
     write("Try ");
     while (true) {
	write('.');
	ladder = generate(words, STEPS);
	if (ladder.length != 0)
	    break;
	++count;
     }
     writeln(' ', count);
     writeln(join(ladder, '\n'));
     writeln(MonoTime.currTime - start);
}

WordSet getWords(const string filename, const int wordsize) {
     return File(filename).byLine
	.map!(line => line.until!(not!isAlpha))
	.filter!(word => word.count == wordsize)
	.map!(word => word.to!string.toUpper)
	.assocArray(0.repeat);
}

WordList generate(WordSet allWords, const int steps) {
     WordList ladder;
     auto words = allWords.dup;
     auto compatibles = allWords.dup;
     auto prev = update(ladder, words, compatibles);
     for (int i = 0; i <= steps; ++i) {
	compatibles = compatibleWords(prev, words);
	if (compatibles.length == 0)
	    return [];
	prev = update(ladder, words, compatibles);
     }
     immutable first = dtext(ladder[0]);
     immutable last = dtext(ladder[$ - 1]);
     if (any!(t => t[0] == t[1])(zip(first, last)))
	return []; // Don't accept any vertical letters in common
     return ladder;
}

string update(ref WordList ladder, ref WordSet words,
	      const WordSet compatibles) {
     immutable word = compatibles.byKey.array.choice;
     ladder ~= word;
     words.remove(word);
     return word;
}

// Add words that are 1 letter different to prev
WordSet compatibleWords(const string prev, const WordSet words) {
     WordSet compatibles;
     immutable prevChars = dtext(prev);
     immutable size = prevChars.length - 1;
     foreach (word; words.byKey)
	if (sum(map!(t => int(t[0] == t[1]))
		     (zip(prevChars, dtext(word)))) == size)
	    compatibles[word] = 0;
     return compatibles;
}
Jan 30 2020
parent reply dwdv <dwdv posteo.de> writes:
 From one noob to another: Not much of a difference, but levenshteinDistance
seems to be a good fit 
here. About as fast as your solution, slightly lower memory usage.
byCodeUnit/byChar might shave off 
a few more ms.

For small scripts like these I'm usually not even bothering with const
correctness and selective 
imports. There's also import std; but that might lead to ambiguous imports.

---

import std.algorithm, std.array, std.conv, std.datetime.stopwatch,
        std.functional, std.random, std.range, std.stdio, std.uni;

enum WORDFILE = "/usr/share/hunspell/en_GB.dic";
enum WORDSIZE = 4;
enum STEPS = WORDSIZE;

// could be verified at compile-time
static assert(WORDSIZE % 2 == 0, "WORDSIZE should be even");

alias WordList = string[];
alias WordSet = bool[string];

void main() {
     auto sw = StopWatch(AutoStart.yes);
     auto words = getWords(WORDFILE, WORDSIZE);
     // would be slicker with proper destructuring support
     auto res = generate!(() => genLadder(words.dup, STEPS))
         .enumerate(1)
         .find!(a => !a[1].empty)
         .front;
     writeln("tries: ", res[0]);
     res[1].each!writeln;
     writeln(sw.peek); // or sw.peek.total!"msecs"
}

WordSet getWords(string filename, uint wordsize) {
     return File(filename).byLine
         .map!(line => line.until!(not!isAlpha))
         .filter!(word => word.count == wordsize)
         .map!(word => word.to!string.toUpper)
         .assocArray(true.repeat);
}

WordList genLadder(WordSet words, uint steps) {
     WordList ladder;

     string update(WordList comps) {
         ladder ~= comps.choice;
         words.remove(ladder.back);
         return ladder.back;
     }
     WordList compWords(string prev) {
         return words.byKey.filter!(word => levenshteinDistance(word, prev) ==
1).array;
     }

     auto prev = update(words.byKey.array);
     foreach (comps; generate!(() => compWords(prev)).take(steps + 1)) {
         if (comps.empty) return comps;
         prev = update(comps);
     }

     if (levenshteinDistance(ladder.front, ladder.back) < WORDSIZE)
         return [];
     return ladder;
}
Jan 30 2020
next sibling parent reply mark <mark qtrac.eu> writes:
Thanks for your implementation.

I can't use the levenshtien distance because although it is a 
better solution, I want to keep the implementation as compatible 
with those in the other languages as possible.

Your main() is much shorter than mine, but doesn't match the 
behaviour (and again I want to keep the same as the other 
implementations). Nonetheless, I didn't know about generate!()(), 
so I've learnt from yours.

Nor did I know about the StopWatch which I'm now using, plus I've 
added your static assert, and also copied your idea of making 
update() an inner function (and now only needing one parameter).

I still need to study your genLadder() more carefully to 
understand it because it is so compact.
Jan 31 2020
parent dwdv <dwdv posteo.de> writes:
On 2020-01-31 09:44, mark via Digitalmars-d-learn wrote:
 I can't use the levenshtien distance because although it is a better solution,
[...]
Nah, it isn't, sorry for the noise, should have slept before sending the message, was thinking of hamming distance: auto a = "abcd"; auto b = "bcda"; auto hammingDistance(string a, string b) { return zip(a, b).count!(t => t[0] != t[1]); } levenshteinDistance(a, b).writeln; // => 2 hammingDistance(a, b).writeln; // => 4
 main() [...] doesn't match the behaviour
There's also std.range.tee if you ever need to perform additional side-effects in a range pipeline, which restores the original dot printing: write("Try "); auto res = generate!(() => genLadder(words.dup, STEPS)) .enumerate(1) .tee!(_ => write('.'), No.pipeOnPop) // No.pipeOnPop ensures we're not one dot short .find!(a => !a[1].empty) .front; writeln(" ", res[0]);
 genLadder() [...] is so compact.
Thinking about it, you could even eliminate the prev variable all together when using ladder.back in compWords. Regarding const correctness, this article and thread might contain useful information: https://forum.dlang.org/thread/2735451.YHZktzbKJo lyonel By the way, keep the documentation improvements coming, much appreciated!
Jan 31 2020
prev sibling parent mark <mark qtrac.eu> writes:
I forgot to mention: I know it isn't worth bothering with 
const/immutable for this tiny example. But I want to learn how to 
write large D programs, so I need to get into the right habits 
and know the right things.
Jan 31 2020