digitalmars.D - Reserving/Preallocating associative array?

Gordon (33/33) Dec 24 2013 Hello,

bearophile (12/26) Dec 24 2013 Built-in associative arrays don't yet have a "reserve".

bearophile (18/26) Dec 24 2013 This code is almost 2.5X faster on my PC (but I am using only

Gordon (8/18) Dec 24 2013 You're just too fast for me...

Philippe Sigaud (3/3) Dec 24 2013 Out of curiosity, do you know which one of his 3 suggestions brought you

Gordon (13/18) Dec 25 2013 Good question, I did test each one incrementally:

Marco Leise (6/30) Dec 26 2013 So for the record: The garbage collector slowed down this
Benjamin Thaut (22/32) Dec 27 2013 So I was interrested how far you can take this. The version bearophile

bearophile (38/43) Dec 27 2013 What compiler are you using? ldc2?

Benjamin Thaut (10/32) Dec 27 2013 I don't know about the properties of the MurMur hash function. I only

Gordon (17/26) Dec 27 2013 This is turning into very interesting (and useful) discussion.

bearophile (5/7) Dec 27 2013 It should be Juan Manuel Cabo:
Benjamin Thaut (41/44) Dec 27 2013 https://github.com/Ingrater/druntime/blob/merge64/src/core/hashmap.d
Benjamin Thaut (8/8) Dec 27 2013 Also, gordon whats is keeping you from converting the textual

Gordon (8/15) Dec 27 2013 This is intended to be a general-purpose CLI program to handle

Benjamin Thaut (23/36) Dec 27 2013 Given my hashmap implementation I would do something like

Peter Alexander (12/14) Dec 27 2013 When I'm building large immutable hash tables I do this:

bearophile (9/20) Dec 26 2013 A question for people that know something about the D garbage

thedeemon (19/26) Dec 26 2013 I think it's possible. During the sweep phase where GC returns

bearophile (4/10) Dec 24 2013 What's the interval of the numbers of the first column?
Andrei Alexandrescu (10/40) Dec 24 2013 void main()

Gordon (19/27) Dec 25 2013 Thanks for this very elegant solution!

Andrei Alexandrescu (5/30) Dec 25 2013 Thanks, I added https://d.puremagic.com/issues/show_bug.cgi?id=11816 to

Philippe Sigaud (4/14) Dec 25 2013 Yes, indeed, thanks. I would have thought sorting (particularly the

John Colvin (7/58) Dec 25 2013 watch out for the parenthsesis on sort. As bearophile likes to

Andrei Alexandrescu (4/8) Dec 25 2013 Also, can you share your data somewhere if it's not confidential?

H. S. Teoh (7/13) Dec 25 2013 [...]

bearophile (4/6) Dec 25 2013 https://d.puremagic.com/issues/show_bug.cgi?id=10318

Gordon (13/18) Dec 25 2013 It will be related to this research: http://www.familinx.org .

Benjamin Thaut (5/23) Dec 26 2013 Did you use D to convert that SQL dump to tabular data? If so, could you...

Gordon (7/10) Dec 26 2013 No, I use "D" for some internal experimentation.

Daniel Kozak (22/55) Dec 27 2013 using OrderedAA improve speed 3x

Benjamin Thaut (8/10) Dec 27 2013 A possible downside of this implementation is though, that due to the

Benjamin Thaut (42/42) Dec 28 2013 Because I was always curious to do some hashmap profiling with real

Daniel Kozak (5/52) Dec 30 2013 This is cool!

Benjamin Thaut (19/22) Dec 30 2013 Here is the post that describes how to get the data:

"Gordon" <me home.com> writes:

Hello,

I want to load a large text file containing two numeric fields 
into an associative array.
The file looks like:
    1   40
    4   2
    42  11
    ...

And has 11M lines.

My code looks like this:
===
void main()
{
         size_t[size_t] unions;
         auto f = File("input.txt");
         foreach ( line ; f.byLine() ) {
                 auto fields = line.split();
                 size_t i = to!size_t(fields[0]);
                 size_t j = to!size_t(fields[1]);
                 unions[i] = j; // <-- here be question
         }
}
===

This is just a test code to illustrate my question (though 
general comments are welcomed - I'm new to D).

Commenting out the highlighted line (not populating the hash), 
the program completes in 25 seconds.
Compiling with the highlighted line, the program takes ~3.5 
minutes.

Is there a way to speed the loading? perhaps reserving memory in 
the hash before populating it? Or another trick?

Many thanks,
  -gordon

Dec 24 2013

"bearophile" <bearophileHUGS lycos.com> writes:

Gordon:

 void main()
 {
         size_t[size_t] unions;
         auto f = File("input.txt");
         foreach ( line ; f.byLine() ) {
                 auto fields = line.split();
                 size_t i = to!size_t(fields[0]);
                 size_t j = to!size_t(fields[1]);
                 unions[i] = j; // <-- here be question
         }
 }
 ...
 Is there a way to speed the loading? perhaps reserving memory 
 in the hash before populating it? Or another trick?

Built-in associative arrays don't yet have a "reserve".

Some ides to speed up your code:
- Try to use parse instead of split + to!size_t
- Try to use byLineFast, in the attach here (the code is not 
mine): https://d.puremagic.com/issues/attachment.cgi?id=1305
- Try to disable the GC before the associative array creation and 
re-enable it when it's built. (This could double your max memory 
usage or more). To disable it import GC from core.memory and call 
GC.disable; and GC.enable;

Bye,
bearophile

Dec 24 2013

"bearophile" <bearophileHUGS lycos.com> writes:

 Some ides to speed up your code:
 - Try to use parse instead of split + to!size_t
 - Try to use byLineFast, in the attach here (the code is not 
 mine): https://d.puremagic.com/issues/attachment.cgi?id=1305
 - Try to disable the GC before the associative array creation 
 and re-enable it when it's built. (This could double your max 
 memory usage or more). To disable it import GC from core.memory 
 and call GC.disable; and GC.enable;


This code is almost 2.5X faster on my PC (but I am using only 
300_000 lines of text):

void main() {
     import std.stdio, std.conv, std.string, core.memory;
     import bylinefast;

     GC.disable;
     size_t[size_t] unions;
     foreach (line; "input.txt".File.byLineFast) {
         line.munch(" \t"); // skip ws
         immutable i = line.parse!size_t;
         line.munch(" \t"); // skip ws
         immutable j = line.parse!size_t;
         unions[i] = j;
     }
     GC.enable;
}


Bye,
bearophile

Dec 24 2013

"Gordon" <me home.com> writes:

Hello Bearophine,

On Tuesday, 24 December 2013 at 23:13:59 UTC, bearophile wrote:
 Some ides to speed up your code:
 - Try to use parse instead of split + to!size_t
 - Try to use byLineFast, in the attach here (the code is not 
 mine): https://d.puremagic.com/issues/attachment.cgi?id=1305
 - Try to disable the GC before the associative array creation 
 and re-enable it when it's built. (This could double your max 
 memory usage or more). To disable it import GC from 
 core.memory and call GC.disable; and GC.enable;


 This code is almost 2.5X faster on my PC (but I am using only 
 300_000 lines of text):

You're just too fast for me...

After incorporating your three suggestions, the entire file (11M 
lines) loads in just 25 seconds (down from 3.5 minutes).

AMAZING!

Many thanks,
  -gordon

Dec 24 2013

Philippe Sigaud <philippe.sigaud gmail.com> writes:

Out of curiosity, do you know which one of his 3 suggestions brought you
the highest speed boost? What happens if you do not disable the GC, for
example?

Dec 24 2013

"Gordon" <me home.com> writes:

On Wednesday, 25 December 2013 at 08:27:53 UTC, Philippe Sigaud 
wrote:
 Out of curiosity, do you know which one of his 3 suggestions 
 brought you
 the highest speed boost? What happens if you do not disable the 
 GC, for
 example?

Good question, I did test each one incrementally:

1. Switching from "split+to!size_t" to "parse" (and commenting 
out the "union") reduced running time from ~26s to ~20s (on 
average).

2. Switching from "byLine" to "byLineFast" (and commenting out 
the "union") reduced running time from ~20s to ~14s (on average).

3. With "parse" and "byLineFast", and with GC still on, and 
populating the "union" the program took about 2m45s .

4. Adding "GC.disable" brought it down to 25s.

HTH,
  -gordon

Dec 25 2013

Marco Leise <Marco.Leise gmx.de> writes:

Am Wed, 25 Dec 2013 16:12:09 +0000
schrieb "Gordon" <me home.com>:

 On Wednesday, 25 December 2013 at 08:27:53 UTC, Philippe Sigaud 
 wrote:
 Out of curiosity, do you know which one of his 3 suggestions 
 brought you
 the highest speed boost? What happens if you do not disable the 
 GC, for
 example?

 
 Good question, I did test each one incrementally:
 
 1. Switching from "split+to!size_t" to "parse" (and commenting 
 out the "union") reduced running time from ~26s to ~20s (on 
 average).
 
 2. Switching from "byLine" to "byLineFast" (and commenting out 
 the "union") reduced running time from ~20s to ~14s (on average).
 
 3. With "parse" and "byLineFast", and with GC still on, and 
 populating the "union" the program took about 2m45s .
 
 4. Adding "GC.disable" brought it down to 25s.
 
 HTH,
   -gordon

So for the record: The garbage collector slowed down this
piece of code by a factor of 6.6

-- 
Marco

Dec 26 2013

Benjamin Thaut <code benjamin-thaut.de> writes:

Am 25.12.2013 17:12, schrieb Gordon:
 Good question, I did test each one incrementally:

 1. Switching from "split+to!size_t" to "parse" (and commenting out the
 "union") reduced running time from ~26s to ~20s (on average).

 2. Switching from "byLine" to "byLineFast" (and commenting out the
 "union") reduced running time from ~20s to ~14s (on average).

 3. With "parse" and "byLineFast", and with GC still on, and populating
 the "union" the program took about 2m45s .

 4. Adding "GC.disable" brought it down to 25s.

 HTH,
   -gordon


So I was interrested how far you can take this. The version bearophile 
posted runs in 14 seconds (48420527 ticks) on my machine.

I created a new version using my own hashmap implementation and manual 
memory management (no GC present at all).
This version runs in 12 seconds (41614375 ticks)

Preallocating all the entries for the hashmap brings quite a boost. It 
brings it down to 9 seconds (32926550 ticks)

If you replace the default hash function D uses with the MurMur hash 
function it brings it down even more to 8 seconds (29425713 ticks)

I also tried not using any hash function at all, but that made the time 
skyrocket.

I stopped at that point, because now the most time is spend looking for 
a free entry in the hashmap, which pretty much comes down to cache-misses.

At this point the profiler looked something like this:
50%  find free entry in hashmap
21%  parse uint
9%   find end of line + tab
3.5% read data from disk


The ticks in my measurements have been obtained via QueryPerformanceCounter.

Kind Regards
Benjamin Thaut

Dec 27 2013

"bearophile" <bearophileHUGS lycos.com> writes:

Benjamin Thaut:

 If you replace the default hash function D uses with the MurMur 
 hash function it brings it down even more to 8 seconds 
 (29425713 ticks)

What compiler are you using? ldc2?

And is it a good idea to put MurMur in D?


 50%  find free entry in hashmap
 21%  parse uint

Perhaps the parse uint times can be improved. A lightly tested 
function to try:


uint toUint(const(char)[] txt) pure nothrow {
     auto p = txt.ptr;
     const q = p + txt.length;

     // Skip leading not-digits.
     while (p < q && (*p < '0' || *p > '9'))
         p++;

     uint result = 0;
     while (p < q && *p >= '0' && *p <= '9') {
         result = (result * 10) + (*p - '0');
         p++;
     }
     return result;
}

void main() {
     import std.stdio;
     "".toUint.writeln;
     "x".toUint.writeln;
     "-1".toUint.writeln;
     " 125 ".toUint.writeln;
     " 10000".toUint.writeln;
     " 1000000 ".toUint.writeln;
     " 10000000000000 ".toUint.writeln;
}


Its output (it's not a safe conversion):

0
0
1
125
10000
1000000
1316134912

Bye,
bearophile

Dec 27 2013

Benjamin Thaut <code benjamin-thaut.de> writes:

Am 27.12.2013 12:02, schrieb bearophile:
 Benjamin Thaut:

 If you replace the default hash function D uses with the MurMur hash
 function it brings it down even more to 8 seconds (29425713 ticks)

 What compiler are you using? ldc2?

dmd 2.064.2

 And is it a good idea to put MurMur in D?

I don't know about the properties of the MurMur hash function. I only 
know that it is cheaper to compute then the hash function D uses. To 
decide if it would be better then the currently choosen hash function, a 
in depth analysis would be required.

 50%  find free entry in hashmap
 21%  parse uint

 Perhaps the parse uint times can be improved. A lightly tested function
 to try:


 uint toUint(const(char)[] txt) pure nothrow {
      auto p = txt.ptr;
      const q = p + txt.length;

      // Skip leading not-digits.
      while (p < q && (*p < '0' || *p > '9'))
          p++;

      uint result = 0;
      while (p < q && *p >= '0' && *p <= '9') {
          result = (result * 10) + (*p - '0');
          p++;
      }
      return result;
 }

The way I parse the uint already looks that way. I'm not using std.conv
https://github.com/Ingrater/thBase/blob/master/src/thBase/conv.d#L22

Kind Regards
Benjamin Thaut

Dec 27 2013

"Gordon" <me home.com> writes:

On Friday, 27 December 2013 at 09:37:09 UTC, Benjamin Thaut wrote:
 Am 25.12.2013 17:12, schrieb Gordon:

 So I was interrested how far you can take this. The version 
 bearophile posted runs in 14 seconds (48420527 ticks) on my 
 machine.

<...>
 At this point the profiler looked something like this:
 50%  find free entry in hashmap
 21%  parse uint
 9%   find end of line + tab
 3.5% read data from disk

This is turning into very interesting (and useful) discussion. 
Thank you!

I've created a generic "FileSlurp" library, to load a file (using 
"byLineFast") and use a delegate to store the data (so I can 
later use any improved methods you suggest).
It's available here, comments are welcomed:
   http://code.dlang.org/packages/fileslurp

Benjamin,
Can you point me to your Hashmap implementation? I could perhaps 
use it to improve the timings even more.

Bearophile,
Who is "JM" who wrote "byLineFast" ? I'm using his code and would 
like to properly acknowledge him/her.

Thanks!
  -gordon

Dec 27 2013

"bearophile" <bearophileHUGS lycos.com> writes:

Gordon:

 Who is "JM" who wrote "byLineFast" ? I'm using his code and 
 would like to properly acknowledge him/her.

It should be Juan Manuel Cabo:
http://permalink.gmane.org/gmane.comp.lang.d.general/117750

Bye,
bearophile

Dec 27 2013

Benjamin Thaut <code benjamin-thaut.de> writes:

Am 27.12.2013 17:49, schrieb Gordon:
 Benjamin,
 Can you point me to your Hashmap implementation? I could perhaps use it
 to improve the timings even more.

https://github.com/Ingrater/druntime/blob/merge64/src/core/hashmap.d

This implementation depends on my own allocator design, but it should be 
possible to remove that dependeny quite easly by replacing all 
allocations / frees with malloc/free or GC.malloc / nothing. Just make 
sure that memory is not initialized beforehand (as new 
ubyte[sizeInBytes] would do) because that also has a considerable 
performance impact. Also when allocating with GC.malloc you should 
specify the GC.BlkAttr.NO_SCAN flag, because you know that your data 
does not contain any pointers. That way the GC will not scan that huge 
memory block and it should speed up collection a lot.

To improve the performance of the hashmap you can try:
- specifingy different hashing functions (via the hash policy). See 
https://github.com/Ingrater/thBase/blob/master/src/thBase/policies/hashing.d 
for more examples of hashing policies.
- Change the amount of always free entries in the hashmap (currently 
25%) for that change line 233, 207, 210 (not factored into a constant 
yet). Reducing the free entries might result in less cache misses, but 
more linear search, as this is a linear probing hashmap. Increasing the 
free entries might reduce the amount of linear search, but increase 
cache misses and increases memory usage.
- Compute a series of prime numbers, for which each prime number is at 
least twice as big as the previous one and use that as the possible 
sizes for the hashmap. Prime number sizes give better distribution of 
the items within the hashmap, therefor reduce the amount of linear 
searching neccessary and thus improve the hashmap performance.
- When searching for the next free spot in the hashmap, it currently 
just adds 1 to the previous index found (line 301). It is also possible 
to rehash the previous index (just put it into the hashing function 
again), which would give a new index that is more distant in the hashmap 
from the current one. This would again improve the distribution of the 
items in the hashmap, and thus reduce linear search time, but may 
increase the amount of cache-misses as it no longer does linear memory 
access.

In case you are interrested in my implementation of your problem see here:
http://dpaste.dzfl.pl/8d0d174e

You won't be able to compile the code unless you setup my custom D 
environment. Which I would not recommend unless you really hate the D 
garbage collector ;-)

Kind Regards
Benjamin Thaut

Dec 27 2013

Benjamin Thaut <code benjamin-thaut.de> writes:

Also, gordon whats is keeping you from converting the textual 
representation to a binary one? I mean, if you need to read that file in 
often, you can easly preprocess it into a binary blob. You could even 
modify my hashmap implementation to support binary serialization, by 
just writing the entire array to a binary file and loading it again.
That would be fast...

Kind Regards
Benjamin Thaut

Dec 27 2013

"Gordon" <me home.com> writes:

On Friday, 27 December 2013 at 17:26:42 UTC, Benjamin Thaut wrote:
 Also, gordon whats is keeping you from converting the textual 
 representation to a binary one? I mean, if you need to read 
 that file in often, you can easly preprocess it into a binary 
 blob. You could even modify my hashmap implementation to 
 support binary serialization, by just writing the entire array 
 to a binary file and loading it again.
 That would be fast...

This is intended to be a general-purpose CLI program to handle 
certain text files (from other sources), not a in-house 
application where I can restructure my data.

Text files are still the most ubiquitous way to share data 
(especially without limiting the languages/tools other will use 
to process them).

But out of curiosity - how would you serialize it?

Dec 27 2013

Benjamin Thaut <code benjamin-thaut.de> writes:

Am 27.12.2013 20:55, schrieb Gordon:
 On Friday, 27 December 2013 at 17:26:42 UTC, Benjamin Thaut wrote:
 Also, gordon whats is keeping you from converting the textual
 representation to a binary one? I mean, if you need to read that file
 in often, you can easly preprocess it into a binary blob. You could
 even modify my hashmap implementation to support binary serialization,
 by just writing the entire array to a binary file and loading it again.
 That would be fast...

 This is intended to be a general-purpose CLI program to handle certain
 text files (from other sources), not a in-house application where I can
 restructure my data.

 Text files are still the most ubiquitous way to share data (especially
 without limiting the languages/tools other will use to process them).

 But out of curiosity - how would you serialize it?

Given my hashmap implementation I would do something like

// method of hashmap
void serializeTo(FILE* f)
{
   fwrite(&m_FullCount, typeof(m_FullCount).sizeof, 1, f);
   ulong len = m_Data.length;
   fwrite(&len, typeof(len).sizeof, 1, f);
   fwrite(m_Data.ptr, Pair.sizeof, len, f);
}

// construct from file
void this(FILE* f)
{
   fread(&m_FullCount, typeof(m_FullCount).sizeof, 1, f);
   ulong len;
   fread(&len, typeof(len).sizeof, 1, f);
   m_Data = (cast(Pair*)malloc(Pair.sizeof * len))[0..len];
   fread(m_Data.ptr, Pair.sizeof, len, f);
}

The deserialization would be just one allocation and a big binary read 
from a file, which should be quite fast.

Kind Regards
Benjamin Thaut

Dec 27 2013

"Peter Alexander" <peter.alexander.au gmail.com> writes:

On Friday, 27 December 2013 at 09:37:09 UTC, Benjamin Thaut wrote:
 At this point the profiler looked something like this:
 50%  find free entry in hashmap

When I'm building large immutable hash tables I do this:

Read in all the unordered key-value pairs into an array of 
Tuple!(Key, Value)

Determine the number of buckets you want (B)

Schwartz sort the array by hash(key) % B (a radix sort will 
probably be efficient here, but any will do)

This groups all items with the same bucket index together. You 
can then easily generate the array of bucket pointers by 
iterating the array.

Of course, you need your own hash table implementation to do 
this. You cannot do it using the built-in hash tables.

Dec 27 2013

"bearophile" <bearophileHUGS lycos.com> writes:

     GC.disable;
     size_t[size_t] unions;
     foreach (line; "input.txt".File.byLineFast) {
         line.munch(" \t"); // skip ws
         immutable i = line.parse!size_t;
         line.munch(" \t"); // skip ws
         immutable j = line.parse!size_t;
         unions[i] = j;
     }
     GC.enable;
 }

A question for people that know something about the D garbage 
collector: can it be added some logic to the GC to detect the 
situations where you have to allocate many times very frequently, 
and disable or rarefy the collection frequency, to increase the 
code performance a lot? (I think "recently" the Python garbage 
collector has added such optimization). This way that enabling 
and disabling the GC becomes not needed any more.

Bye,
bearophile

Dec 26 2013

"thedeemon" <dlang thedeemon.com> writes:

On Thursday, 26 December 2013 at 14:09:29 UTC, bearophile wrote:

 A question for people that know something about the D garbage 
 collector: can it be added some logic to the GC to detect the 
 situations where you have to allocate many times very 
 frequently, and disable or rarefy the collection frequency, to 
 increase the code performance a lot? (I think "recently" the 
 Python garbage collector has added such optimization). This way 
 that enabling and disabling the GC becomes not needed any more.

I think it's possible. During the sweep phase where GC returns 
unmarked memory blocks (garbage) to the free lists, it can 
calculate total amount of freed memory at this stage. Probably 
this stat is already being calculated. It also knows total size 
of its heap, so after a collection it can know how much % of the 
heap was freed. When the program is in a state where it mostly 
allocates, this % will be very low compared to usual case. When 
GC sees it's so low it can increase the effective threshold for 
the next collection, making collections less often. I think 
actual parameter will be the size of pool allocated from system 
memory. AFAIR, currently GC is triggered when there are not 
enough free blocks in the current pool, and if after collection 
there is still not enough free space a new pool gets allocated. 
Its size is currently constant, making long allocating loops run 
in O(n^2) but if the pool size will grow in cases where % of 
collected memory is low and shrink back when % of freed space is 
bigger again, then GC will adapt to allocation rate and such 
scenarios will work faster.

Dec 26 2013

"bearophile" <bearophileHUGS lycos.com> writes:

Gordon:

 The file looks like:
    1   40
    4   2
    42  11
    ...

 And has 11M lines.

What's the interval of the numbers of the first column?

Bye,
bearophile

Dec 24 2013

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 12/24/13 2:28 PM, Gordon wrote:
 Hello,

 I want to load a large text file containing two numeric fields into an
 associative array.
 The file looks like:
     1   40
     4   2
     42  11
     ...

 And has 11M lines.

 My code looks like this:
 ===
 void main()
 {
          size_t[size_t] unions;
          auto f = File("input.txt");
          foreach ( line ; f.byLine() ) {
                  auto fields = line.split();
                  size_t i = to!size_t(fields[0]);
                  size_t j = to!size_t(fields[1]);
                  unions[i] = j; // <-- here be question
          }
 }
 ===

 This is just a test code to illustrate my question (though general
 comments are welcomed - I'm new to D).

 Commenting out the highlighted line (not populating the hash), the
 program completes in 25 seconds.
 Compiling with the highlighted line, the program takes ~3.5 minutes.

 Is there a way to speed the loading? perhaps reserving memory in the
 hash before populating it? Or another trick?

void main()
{
     size_t[size_t] unions;
     foreach (e; "input.txt"
             .slurp!(size_t, size_t)("%s %s").sort.uniq ) {
         unions[e[0]] = e[1];
     }
}


Andrei

Dec 24 2013

"Gordon" <me home.com> writes:

On Tuesday, 24 December 2013 at 23:52:49 UTC, Andrei Alexandrescu 
wrote:
 void main()
 {
     size_t[size_t] unions;
     foreach (e; "input.txt"
             .slurp!(size_t, size_t)("%s %s").sort.uniq ) {
         unions[e[0]] = e[1];
     }
 }

Thanks for this very elegant solution!

For completeness (since we're dealing with timing):

1. Running the above code with Garbage-collection enabled, takes 
1m45s.

2. Running it with GC disabled takes 50s .

3. Running with GC disabled and without sort+uniq (my data is 
already uniq'd), takes 40s.


This is a beautiful idiomatic D code, but for now I think I'll 
stick with the more verbose code above.

Ine reason is that I want to provide helpful and verbose error 
messages for invalid input (e.g. "Invalid numeric value found in 
field 3 line 1000") - and with the "slurp" method, any input 
error will result in a not-so-helpful exception (e.g. 
"std.conv.ConvException /usr/include/dmd/phobos/std/conv.d(2009): 
Unexpected 'H' when converting from type char[] to type ulong).

Many thanks,
  -gordon

Dec 25 2013

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 12/25/13 8:29 AM, Gordon wrote:
 On Tuesday, 24 December 2013 at 23:52:49 UTC, Andrei Alexandrescu wrote:
 void main()
 {
     size_t[size_t] unions;
     foreach (e; "input.txt"
             .slurp!(size_t, size_t)("%s %s").sort.uniq ) {
         unions[e[0]] = e[1];
     }
 }

 Thanks for this very elegant solution!

 For completeness (since we're dealing with timing):

 1. Running the above code with Garbage-collection enabled, takes 1m45s.

 2. Running it with GC disabled takes 50s .

 3. Running with GC disabled and without sort+uniq (my data is already
 uniq'd), takes 40s.

Thanks for the numbers!

 This is a beautiful idiomatic D code, but for now I think I'll stick
 with the more verbose code above.

 Ine reason is that I want to provide helpful and verbose error messages
 for invalid input (e.g. "Invalid numeric value found in field 3 line
 1000") - and with the "slurp" method, any input error will result in a
 not-so-helpful exception (e.g.
 "std.conv.ConvException /usr/include/dmd/phobos/std/conv.d(2009):
 Unexpected 'H' when converting from type char[] to type ulong).

Thanks, I added https://d.puremagic.com/issues/show_bug.cgi?id=11816 to 
keep track of this.


Andrei

Dec 25 2013

Philippe Sigaud <philippe.sigaud gmail.com> writes:

On Wed, Dec 25, 2013 at 5:47 PM, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:
 On 12/25/13 8:29 AM, Gordon wrote:
 For completeness (since we're dealing with timing):

 1. Running the above code with Garbage-collection enabled, takes 1m45s.

 2. Running it with GC disabled takes 50s .

 3. Running with GC disabled and without sort+uniq (my data is already
 uniq'd), takes 40s.


 Thanks for the numbers!

Yes, indeed, thanks. I would have thought sorting (particularly the
built-in sort) + uniq would 'cost' more.

Dec 25 2013

"John Colvin" <john.loughran.colvin gmail.com> writes:

On Tuesday, 24 December 2013 at 23:52:49 UTC, Andrei Alexandrescu 
wrote:
 On 12/24/13 2:28 PM, Gordon wrote:
 Hello,

 I want to load a large text file containing two numeric fields 
 into an
 associative array.
 The file looks like:
    1   40
    4   2
    42  11
    ...

 And has 11M lines.

 My code looks like this:
 ===
 void main()
 {
         size_t[size_t] unions;
         auto f = File("input.txt");
         foreach ( line ; f.byLine() ) {
                 auto fields = line.split();
                 size_t i = to!size_t(fields[0]);
                 size_t j = to!size_t(fields[1]);
                 unions[i] = j; // <-- here be question
         }
 }
 ===

 This is just a test code to illustrate my question (though 
 general
 comments are welcomed - I'm new to D).

 Commenting out the highlighted line (not populating the hash), 
 the
 program completes in 25 seconds.
 Compiling with the highlighted line, the program takes ~3.5 
 minutes.

 Is there a way to speed the loading? perhaps reserving memory 
 in the
 hash before populating it? Or another trick?

 void main()
 {
     size_t[size_t] unions;
     foreach (e; "input.txt"
             .slurp!(size_t, size_t)("%s %s").sort.uniq ) {
         unions[e[0]] = e[1];
     }
 }


 Andrei

watch out for the parenthsesis on sort. As bearophile likes to 
point out frequently, without parenthesis you are calling the 
builtin sort, not the std.algorithm one.


Gordon, you may find this has better performance if you add () to 
sort.

Dec 25 2013

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 12/25/13 10:25 AM, John Colvin wrote:
 watch out for the parenthsesis on sort. As bearophile likes to point out
 frequently, without parenthesis you are calling the builtin sort, not
 the std.algorithm one.

Ew, good point. Thanks.

 Gordon, you may find this has better performance if you add () to sort.

Also, can you share your data somewhere if it's not confidential?


Andrei

Dec 25 2013

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Wed, Dec 25, 2013 at 10:51:23AM -0800, Andrei Alexandrescu wrote:
 On 12/25/13 10:25 AM, John Colvin wrote:
watch out for the parenthsesis on sort. As bearophile likes to point
out frequently, without parenthesis you are calling the builtin sort,
not the std.algorithm one.

 
 Ew, good point. Thanks.

[...]

Is the built-in sort deprecated yet? If it is, when can we finally say
bye-bye to it?


T

-- 
Be in denial for long enough, and one day you'll deny yourself of things you
wish you hadn't.

Dec 25 2013

"bearophile" <bearophileHUGS lycos.com> writes:

H. S. Teoh:

 Is the built-in sort deprecated yet? If it is, when can we 
 finally say bye-bye to it?

https://d.puremagic.com/issues/show_bug.cgi?id=10318

Bye,
bearophile

Dec 25 2013

"Gordon" <me home.com> writes:

On Wednesday, 25 December 2013 at 18:51:22 UTC, Andrei 
Alexandrescu wrote:
 On 12/25/13 10:25 AM, John Colvin wrote:

 Gordon, you may find this has better performance if you add () 
 to sort.

 Also, can you share your data somewhere if it's not 
 confidential?

It will be related to this research: http://www.familinx.org .
The data is available in the "download" section as an SQL dump 
(which can be easily converted to just tabular files).

I've also added verbose errors to "std.file.slurp()", which 
(IMHO) provide all the necessary information when an error occurs.
Available here:
  https://github.com/agordon/phobos/compare/file.slurp.verbose

But this has lead me to detect some "formattedRead" quirks - I'll 
start a new thread about those.

Thanks,
  -gordon

Dec 25 2013

Benjamin Thaut <code benjamin-thaut.de> writes:

Am 25.12.2013 21:26, schrieb Gordon:
 On Wednesday, 25 December 2013 at 18:51:22 UTC, Andrei Alexandrescu wrote:
 On 12/25/13 10:25 AM, John Colvin wrote:

 Gordon, you may find this has better performance if you add () to sort.

 Also, can you share your data somewhere if it's not confidential?

 It will be related to this research: http://www.familinx.org .
 The data is available in the "download" section as an SQL dump (which
 can be easily converted to just tabular files).

 I've also added verbose errors to "std.file.slurp()", which (IMHO)
 provide all the necessary information when an error occurs.
 Available here:
   https://github.com/agordon/phobos/compare/file.slurp.verbose

 But this has lead me to detect some "formattedRead" quirks - I'll start
 a new thread about those.

 Thanks,
   -gordon

Did you use D to convert that SQL dump to tabular data? If so, could you 
post the small snippet? If not did you convert the "relationship" table?

Kind Regards
Benjamin Thaut

Dec 26 2013

"Gordon" <me home.com> writes:

On Thursday, 26 December 2013 at 18:07:38 UTC, Benjamin Thaut 
wrote:
 Did you use D to convert that SQL dump to tabular data? If so, 
 could you post the small snippet? If not did you convert the 
 "relationship" table?

No, I use "D" for some internal experimentation.
To get the data as tabular I loaded the SQL dump to a MySQL 
server, then used "mysqldump --tab" to export the tables I needed.

HTH,
  -gordon

Dec 26 2013

"Daniel Kozak" <kozzi11 gmail.com> writes:

On Tuesday, 24 December 2013 at 22:28:21 UTC, Gordon wrote:
 Hello,

 I want to load a large text file containing two numeric fields 
 into an associative array.
 The file looks like:
    1   40
    4   2
    42  11
    ...

 And has 11M lines.

 My code looks like this:
 ===
 void main()
 {
         size_t[size_t] unions;
         auto f = File("input.txt");
         foreach ( line ; f.byLine() ) {
                 auto fields = line.split();
                 size_t i = to!size_t(fields[0]);
                 size_t j = to!size_t(fields[1]);
                 unions[i] = j; // <-- here be question
         }
 }
 ===

 This is just a test code to illustrate my question (though 
 general comments are welcomed - I'm new to D).

 Commenting out the highlighted line (not populating the hash), 
 the program completes in 25 seconds.
 Compiling with the highlighted line, the program takes ~3.5 
 minutes.

 Is there a way to speed the loading? perhaps reserving memory 
 in the hash before populating it? Or another trick?

 Many thanks,
  -gordon

using OrderedAA improve speed 3x
https://github.com/Kozzi11/Trash/tree/master/util

import util.orderedaa;

int main(string[] args)
{
     import std.stdio, std.conv, std.string, core.memory;
     import bylinefast;

     GC.disable;
     OrderedAA!(size_t, size_t, 1_000_007) unions;
     //size_t[size_t] unions;
     foreach (line; "input.txt".File.byLineFast) {
         line.munch(" \t"); // skip ws
         immutable i = line.parse!size_t;
         line.munch(" \t"); // skip ws
         immutable j = line.parse!size_t;
         unions[i] = j;
     }
     GC.enable;
	
	return 0;
}

Dec 27 2013

Benjamin Thaut <code benjamin-thaut.de> writes:

Am 27.12.2013 19:25, schrieb Daniel Kozak:
 using OrderedAA improve speed 3x
 https://github.com/Kozzi11/Trash/tree/master/util

A possible downside of this implementation is though, that due to the 
fact that you are using a double linked list per index, there will be 
more chache misses during a read operation compared to a linear probing 
hashmap. Did you try using a array per index instead of a linked list, 
and measure if that makes any difference?

Kind Regards
Benjamin Thaut

Dec 27 2013

Benjamin Thaut <code benjamin-thaut.de> writes:

Because I was always curious to do some hashmap profiling with real 
data, I did some more. Here are the results:

My implementation. Power of two size (25% free):
building hashmap: 8 seconds 28880605 ticks
looking entries up: 0 seconds 31685 ticks

My implementation. Prime number size (25% free):
building hashmap: 8 seconds 26802252 ticks
looking entries up: 0 seconds 27719 ticks

My implementation. Prime number size (10% free):
building hashmap: 8 seconds 29323952 ticks
looking entries up: 0 seconds 32768 ticks

My implementation. Prime number size (20% free):
26877474 entries
building hashmap: 8 seconds 28895211 ticks
sum 74128
looking entries up: 0 seconds 33222 ticks

My implementation. Prime number size (50% free):
building hashmap: 8 seconds 27738475 ticks
looking entries up: 0 seconds 25696 ticks

OrderedAA (implementation of Daniel Kozak):
building array: 13 seconds 45663219 ticks
lookup: 0 seconds 236323 ticks

Builtin AA:
building array: 14 seconds 47175035 ticks
lookup: 0 seconds 33692 ticks


You can see, that both my implementation and daniel kozaks outperform 
the builtin AA when filling the AA. Both my and daniel kozaks version 
preallocate the internal array. Although daniel kozaks version does one 
additional allocation per entry to build the double linked lists.
When not preallocating my implementation still outperforms the builtin AA.

When looking up data, my implementation outperforms both other 
implementations. My implementation is only slightly faster then the 
builtin one. Daniel Kozaks version is 8 times slower during lookup 
compared to my implementation. I think this is mostly due to cache 
misses caused by the linked list storage of the entries.
It is also interresting to note, that using prime number sizes for the 
internal array of the AA improved the lookup performance slightly in my 
implementation. A certain portion of all entries in the internal array 
are always kept free. Reducing this amount leads to worse performance 
during looup, increasing it, gives better performance. You can gain a 
few percent speed if you are willing to waste 50% memory. My 
measurements show that 25% free entries seem to be the sweet spot.

Dec 28 2013

"Daniel Kozak" <kozzi11 gmail.com> writes:

On Saturday, 28 December 2013 at 20:20:36 UTC, Benjamin Thaut 
wrote:
 Because I was always curious to do some hashmap profiling with 
 real data, I did some more. Here are the results:

 My implementation. Power of two size (25% free):
 building hashmap: 8 seconds 28880605 ticks
 looking entries up: 0 seconds 31685 ticks

 My implementation. Prime number size (25% free):
 building hashmap: 8 seconds 26802252 ticks
 looking entries up: 0 seconds 27719 ticks

 My implementation. Prime number size (10% free):
 building hashmap: 8 seconds 29323952 ticks
 looking entries up: 0 seconds 32768 ticks

 My implementation. Prime number size (20% free):
 26877474 entries
 building hashmap: 8 seconds 28895211 ticks
 sum 74128
 looking entries up: 0 seconds 33222 ticks

 My implementation. Prime number size (50% free):
 building hashmap: 8 seconds 27738475 ticks
 looking entries up: 0 seconds 25696 ticks

 OrderedAA (implementation of Daniel Kozak):
 building array: 13 seconds 45663219 ticks
 lookup: 0 seconds 236323 ticks

 Builtin AA:
 building array: 14 seconds 47175035 ticks
 lookup: 0 seconds 33692 ticks


 You can see, that both my implementation and daniel kozaks 
 outperform the builtin AA when filling the AA. Both my and 
 daniel kozaks version preallocate the internal array. Although 
 daniel kozaks version does one additional allocation per entry 
 to build the double linked lists.
 When not preallocating my implementation still outperforms the 
 builtin AA.

 When looking up data, my implementation outperforms both other 
 implementations. My implementation is only slightly faster then 
 the builtin one. Daniel Kozaks version is 8 times slower during 
 lookup compared to my implementation. I think this is mostly 
 due to cache misses caused by the linked list storage of the 
 entries.
 It is also interresting to note, that using prime number sizes 
 for the internal array of the AA improved the lookup 
 performance slightly in my implementation. A certain portion of 
 all entries in the internal array are always kept free. 
 Reducing this amount leads to worse performance during looup, 
 increasing it, gives better performance. You can gain a few 
 percent speed if you are willing to waste 50% memory. My 
 measurements show that 25% free entries seem to be the sweet 
 spot.

This is cool!

Can you post somewhere your code and data which you use for this 
test? I really like to test it on my new AA implementation :)

Dec 30 2013

Benjamin Thaut <code benjamin-thaut.de> writes:

Am 30.12.2013 15:06, schrieb Daniel Kozak:
 This is cool!

 Can you post somewhere your code and data which you use for this test? I
 really like to test it on my new AA implementation :)

Here is the post that describes how to get the data: 
http://forum.dlang.org/post/yglwnmawrvbhpszdsiqs forum.dlang.org

And here the post how to convert it to tabular text data:
http://forum.dlang.org/post/rhyycuwlxsoqbtpzpgzd forum.dlang.org

After dumping the "relationship" table from mysql I just removed the 
first column because it only contains the ID of the entry. I then used 
the remaining two columns for profiling. You will have to get the data 
yourself, as I can't repost it here (because its not my data).

Here is the post that describes my implementation:
http://forum.dlang.org/post/l9kcm2$2i4i$1 digitalmars.com

For profiling the lookups, I generated 100_000 random numbers in the 
range range (0, 200_000] and saved them to a file. I then profile the 
lookup of those 100_000 random numbers. Out of all these numbers rughly 
75% are valid indices and the rest are invalid indices to profile both 
the lookup of valid and invalid keys.

For profiling your implementation I just used the code you posted.

Kind Regards
Benjamin Thaut

Dec 30 2013

D Programming

C/C++ Programming

Other

digitalmars.D - Reserving/Preallocating associative array?