digitalmars.D.learn - Associative Array potential performance pitfall?

Adnan (79/79) Mar 12 2020 In my machine the following D code compiled with release flag and

H. S. Teoh (33/51) Mar 13 2020 .format is dog slow, and not exactly something you want to be calling

Steven Schveighoffer (5/7) Mar 13 2020 That always allocates. Only appending may avoid allocation:

H. S. Teoh (15/23) Mar 13 2020 [...]

dayllenger (13/14) Mar 13 2020 It is nonsensical to say without measuring. Humans are

Adnan <relay.public.adnan outlook.com> writes:

In my machine the following D code compiled with release flag and 
LDC performs over 230ms while the similar Go code performs under 
120ms.

string smallestRepr(const string arg) {
	import std.format : format;

	const repeated = format!"%s%s"(arg, arg);
	string result;
	result.reserve(arg.length);
	result = arg;
	foreach (i; 1 .. arg.length) {
		const slice = repeated[i .. i + arg.length];
		if (slice < result)
			result = slice;
	}
	return result;
}

unittest {
	assert("cba".smallestRepr() == "acb");
}

void main(const string[] args) {
	string[][const string] wordTable;
	import std.stdio : writeln, lines, File;

	foreach (string word; File(args[1]).lines()) {
		word = word[0 .. $ - 1]; // strip the newline character
		const string key = word.smallestRepr();
		if (key in wordTable)
			wordTable[key] ~= word;
		else
			wordTable[key] = [word];
	}

	foreach (key; wordTable.keys) {
		if (wordTable[key].length == 4) {
			writeln(wordTable[key]);
			break;
		}
	}
}

---

package main

import (
         "bufio"
         "fmt"
         "os"
)

func canonicalize(s string) string {
         c := s + s[:len(s)-1]
         best := s
         for i := 1; i < len(s); i++ {
                 if c[i:i+len(s)] < best {
                         best = c[i : i+len(s)]
                 }
         }
         return best
}

func main() {
         seen := make(map[string][]string)

         inputFile, err := os.Open(os.Args[1])
         defer inputFile.Close()

         if err != nil {
                 os.Exit(1)
         }

         scanner := bufio.NewScanner(inputFile)

         for scanner.Scan() {
                 word := scanner.Text()
                 n := canonicalize(word)
                 seen[n] = append(seen[n], word)

         }

         for _, words := range seen {
                 if len(words) == 4 {
                         fmt.Printf("%v\n", words)
                         break
                 }

         }
}

The input file : 
https://raw.githubusercontent.com/dolph/dictionary/master/enable1.txthttps://raw.githubusercontent.com/dolph/dictionary/master/enable1.txt

Problem description: 
https://www.reddit.com/r/dailyprogrammer/comments/ffxabb/20200309_challenge_383_easy_necklace_matching

Where am I losing performance?

Mar 12 2020

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Fri, Mar 13, 2020 at 03:40:11AM +0000, Adnan via Digitalmars-d-learn wrote:
 In my machine the following D code compiled with release flag and LDC
 performs over 230ms while the similar Go code performs under 120ms.
 
 string smallestRepr(const string arg) {
 	import std.format : format;
 
 	const repeated = format!"%s%s"(arg, arg);

.format is dog slow, and not exactly something you want to be calling
every iteration.  Try instead:

	auto repeated = arg ~ arg;


 	string result;
 	result.reserve(arg.length);
 	result = arg;

This does not do what you think it does.  Assigning to a string is a
shallow copy; the call to .reserve is superfluous. In fact, it's
triggering another allocation, only to throw it away immediately. Just
write:

	auto result = arg;


 	foreach (i; 1 .. arg.length) {
 		const slice = repeated[i .. i + arg.length];
 		if (slice < result)
 			result = slice;
 	}
 	return result;
 }

[...]
 Where am I losing performance?

1) Calling .format -- it's dog slow. Just use ~.

2) Calling .reserve only to throw away the results immediately. Just
skip it.

I doubt your AA is the performance bottleneck here.


Try the following implementation of .smallestRepr that avoids (1) and
(2):

	string smallestRepr(const string arg) {
		auto repeated = arg ~ arg;
		string result = arg;
		foreach (i; 1 .. arg.length) {
			const slice = repeated[i .. i + arg.length];
			if (slice < result)
				result = slice;
		}
		return result;
	}

Note that `arg ~ arg` may allocate, but it also may not if the current
buffer for `arg` is big enough to accomodate both. So you might actually
save on some allocations here, that you lose out from calling .format.

On my machine, a quick benchmark shows a 2x performance boost.


T

-- 
That's not a bug; that's a feature!

Mar 13 2020

Steven Schveighoffer <schveiguy gmail.com> writes:

On 3/13/20 5:24 AM, H. S. Teoh wrote:
 Note that `arg ~ arg` may allocate, but it also may not if the current
 buffer for `arg` is big enough to accomodate both.

That always allocates. Only appending may avoid allocation:

arg ~= arg;

But, I would instead use ranges if possible to avoid all allocations.

-Steve

Mar 13 2020

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Fri, Mar 13, 2020 at 09:30:16AM -0400, Steven Schveighoffer via
Digitalmars-d-learn wrote:
 On 3/13/20 5:24 AM, H. S. Teoh wrote:
 Note that `arg ~ arg` may allocate, but it also may not if the
 current buffer for `arg` is big enough to accomodate both.

 
 That always allocates. Only appending may avoid allocation:
 
 arg ~= arg;

Ah I see.  Mea culpa.


 But, I would instead use ranges if possible to avoid all allocations.

[...]

Initially I thought about avoiding all allocations, but I realized that
no matter what, you need to allocate the key for the AA anyway, so I
didn't go that route.  That said, though, we could probably reduce the
number of allocations by various means, such as having the function
return char[] instead of string, and calling .idup on-demand as opposed
to allocating up-front. This would allow us to reuse a static buffer for
`repeated` instead of incurring an allocation each time.  Given what the
algorithm is doing, this should save some number of allocations if done
correctly.


T

-- 
Creativity is not an excuse for sloppiness.

Mar 13 2020

dayllenger <dayllenger protonmail.com> writes:

On Friday, 13 March 2020 at 03:40:11 UTC, Adnan wrote:
 Where am I losing performance?

It is nonsensical to say without measuring. Humans are 
notoriously bad at predicting performance issues. Wrap all your 
hardworking code into a loop with like 100 iterations, compile 
and run:

$ perf record -g ./app file.txt && perf report

Measure dmd debug build first, then ldc2 -release -O.

Besides what H. S. Teoh said, these things help:

- disable GC before doing the job and enable it afterwards
- use .byLineCopy instead of lines(), it's faster
- use .byKeyValue when printing the results, it's lazy and 
returns both key and value

90ms after vs 215ms before

Mar 13 2020

D Programming

C/C++ Programming

Other

digitalmars.D.learn - Associative Array potential performance pitfall?