digitalmars.D - Reserving/Preallocating associative array?
- Gordon (33/33) Dec 24 2013 Hello,
- bearophile (12/26) Dec 24 2013 Built-in associative arrays don't yet have a "reserve".
- bearophile (18/26) Dec 24 2013 This code is almost 2.5X faster on my PC (but I am using only
- Gordon (8/18) Dec 24 2013 You're just too fast for me...
- Philippe Sigaud (3/3) Dec 24 2013 Out of curiosity, do you know which one of his 3 suggestions brought you
- Gordon (13/18) Dec 25 2013 Good question, I did test each one incrementally:
- Marco Leise (6/30) Dec 26 2013 So for the record: The garbage collector slowed down this
- Benjamin Thaut (22/32) Dec 27 2013 So I was interrested how far you can take this. The version bearophile
- bearophile (38/43) Dec 27 2013 What compiler are you using? ldc2?
- Benjamin Thaut (10/32) Dec 27 2013 I don't know about the properties of the MurMur hash function. I only
- Gordon (17/26) Dec 27 2013 This is turning into very interesting (and useful) discussion.
- bearophile (5/7) Dec 27 2013 It should be Juan Manuel Cabo:
- Benjamin Thaut (41/44) Dec 27 2013 https://github.com/Ingrater/druntime/blob/merge64/src/core/hashmap.d
- Benjamin Thaut (8/8) Dec 27 2013 Also, gordon whats is keeping you from converting the textual
- Gordon (8/15) Dec 27 2013 This is intended to be a general-purpose CLI program to handle
- Benjamin Thaut (23/36) Dec 27 2013 Given my hashmap implementation I would do something like
- Peter Alexander (12/14) Dec 27 2013 When I'm building large immutable hash tables I do this:
- bearophile (9/20) Dec 26 2013 A question for people that know something about the D garbage
- thedeemon (19/26) Dec 26 2013 I think it's possible. During the sweep phase where GC returns
- bearophile (4/10) Dec 24 2013 What's the interval of the numbers of the first column?
- Andrei Alexandrescu (10/40) Dec 24 2013 void main()
- Gordon (19/27) Dec 25 2013 Thanks for this very elegant solution!
- Andrei Alexandrescu (5/30) Dec 25 2013 Thanks, I added https://d.puremagic.com/issues/show_bug.cgi?id=11816 to
- Philippe Sigaud (4/14) Dec 25 2013 Yes, indeed, thanks. I would have thought sorting (particularly the
- John Colvin (7/58) Dec 25 2013 watch out for the parenthsesis on sort. As bearophile likes to
- Andrei Alexandrescu (4/8) Dec 25 2013 Also, can you share your data somewhere if it's not confidential?
- H. S. Teoh (7/13) Dec 25 2013 [...]
- bearophile (4/6) Dec 25 2013 https://d.puremagic.com/issues/show_bug.cgi?id=10318
- Gordon (13/18) Dec 25 2013 It will be related to this research: http://www.familinx.org .
- Benjamin Thaut (5/23) Dec 26 2013 Did you use D to convert that SQL dump to tabular data? If so, could you...
- Gordon (7/10) Dec 26 2013 No, I use "D" for some internal experimentation.
- Daniel Kozak (22/55) Dec 27 2013 using OrderedAA improve speed 3x
- Benjamin Thaut (8/10) Dec 27 2013 A possible downside of this implementation is though, that due to the
- Benjamin Thaut (42/42) Dec 28 2013 Because I was always curious to do some hashmap profiling with real
- Daniel Kozak (5/52) Dec 30 2013 This is cool!
- Benjamin Thaut (19/22) Dec 30 2013 Here is the post that describes how to get the data:
Hello, I want to load a large text file containing two numeric fields into an associative array. The file looks like: 1 40 4 2 42 11 ... And has 11M lines. My code looks like this: === void main() { size_t[size_t] unions; auto f = File("input.txt"); foreach ( line ; f.byLine() ) { auto fields = line.split(); size_t i = to!size_t(fields[0]); size_t j = to!size_t(fields[1]); unions[i] = j; // <-- here be question } } === This is just a test code to illustrate my question (though general comments are welcomed - I'm new to D). Commenting out the highlighted line (not populating the hash), the program completes in 25 seconds. Compiling with the highlighted line, the program takes ~3.5 minutes. Is there a way to speed the loading? perhaps reserving memory in the hash before populating it? Or another trick? Many thanks, -gordon
Dec 24 2013
Gordon:void main() { size_t[size_t] unions; auto f = File("input.txt"); foreach ( line ; f.byLine() ) { auto fields = line.split(); size_t i = to!size_t(fields[0]); size_t j = to!size_t(fields[1]); unions[i] = j; // <-- here be question } } ... Is there a way to speed the loading? perhaps reserving memory in the hash before populating it? Or another trick?Built-in associative arrays don't yet have a "reserve". Some ides to speed up your code: - Try to use parse instead of split + to!size_t - Try to use byLineFast, in the attach here (the code is not mine): https://d.puremagic.com/issues/attachment.cgi?id=1305 - Try to disable the GC before the associative array creation and re-enable it when it's built. (This could double your max memory usage or more). To disable it import GC from core.memory and call GC.disable; and GC.enable; Bye, bearophile
Dec 24 2013
Some ides to speed up your code: - Try to use parse instead of split + to!size_t - Try to use byLineFast, in the attach here (the code is not mine): https://d.puremagic.com/issues/attachment.cgi?id=1305 - Try to disable the GC before the associative array creation and re-enable it when it's built. (This could double your max memory usage or more). To disable it import GC from core.memory and call GC.disable; and GC.enable;This code is almost 2.5X faster on my PC (but I am using only 300_000 lines of text): void main() { import std.stdio, std.conv, std.string, core.memory; import bylinefast; GC.disable; size_t[size_t] unions; foreach (line; "input.txt".File.byLineFast) { line.munch(" \t"); // skip ws immutable i = line.parse!size_t; line.munch(" \t"); // skip ws immutable j = line.parse!size_t; unions[i] = j; } GC.enable; } Bye, bearophile
Dec 24 2013
Hello Bearophine, On Tuesday, 24 December 2013 at 23:13:59 UTC, bearophile wrote:You're just too fast for me... After incorporating your three suggestions, the entire file (11M lines) loads in just 25 seconds (down from 3.5 minutes). AMAZING! Many thanks, -gordonSome ides to speed up your code: - Try to use parse instead of split + to!size_t - Try to use byLineFast, in the attach here (the code is not mine): https://d.puremagic.com/issues/attachment.cgi?id=1305 - Try to disable the GC before the associative array creation and re-enable it when it's built. (This could double your max memory usage or more). To disable it import GC from core.memory and call GC.disable; and GC.enable;This code is almost 2.5X faster on my PC (but I am using only 300_000 lines of text):
Dec 24 2013
Out of curiosity, do you know which one of his 3 suggestions brought you the highest speed boost? What happens if you do not disable the GC, for example?
Dec 24 2013
On Wednesday, 25 December 2013 at 08:27:53 UTC, Philippe Sigaud wrote:Out of curiosity, do you know which one of his 3 suggestions brought you the highest speed boost? What happens if you do not disable the GC, for example?Good question, I did test each one incrementally: 1. Switching from "split+to!size_t" to "parse" (and commenting out the "union") reduced running time from ~26s to ~20s (on average). 2. Switching from "byLine" to "byLineFast" (and commenting out the "union") reduced running time from ~20s to ~14s (on average). 3. With "parse" and "byLineFast", and with GC still on, and populating the "union" the program took about 2m45s . 4. Adding "GC.disable" brought it down to 25s. HTH, -gordon
Dec 25 2013
Am Wed, 25 Dec 2013 16:12:09 +0000 schrieb "Gordon" <me home.com>:On Wednesday, 25 December 2013 at 08:27:53 UTC, Philippe Sigaud wrote:So for the record: The garbage collector slowed down this piece of code by a factor of 6.6 -- MarcoOut of curiosity, do you know which one of his 3 suggestions brought you the highest speed boost? What happens if you do not disable the GC, for example?Good question, I did test each one incrementally: 1. Switching from "split+to!size_t" to "parse" (and commenting out the "union") reduced running time from ~26s to ~20s (on average). 2. Switching from "byLine" to "byLineFast" (and commenting out the "union") reduced running time from ~20s to ~14s (on average). 3. With "parse" and "byLineFast", and with GC still on, and populating the "union" the program took about 2m45s . 4. Adding "GC.disable" brought it down to 25s. HTH, -gordon
Dec 26 2013
Am 25.12.2013 17:12, schrieb Gordon:Good question, I did test each one incrementally: 1. Switching from "split+to!size_t" to "parse" (and commenting out the "union") reduced running time from ~26s to ~20s (on average). 2. Switching from "byLine" to "byLineFast" (and commenting out the "union") reduced running time from ~20s to ~14s (on average). 3. With "parse" and "byLineFast", and with GC still on, and populating the "union" the program took about 2m45s . 4. Adding "GC.disable" brought it down to 25s. HTH, -gordonSo I was interrested how far you can take this. The version bearophile posted runs in 14 seconds (48420527 ticks) on my machine. I created a new version using my own hashmap implementation and manual memory management (no GC present at all). This version runs in 12 seconds (41614375 ticks) Preallocating all the entries for the hashmap brings quite a boost. It brings it down to 9 seconds (32926550 ticks) If you replace the default hash function D uses with the MurMur hash function it brings it down even more to 8 seconds (29425713 ticks) I also tried not using any hash function at all, but that made the time skyrocket. I stopped at that point, because now the most time is spend looking for a free entry in the hashmap, which pretty much comes down to cache-misses. At this point the profiler looked something like this: 50% find free entry in hashmap 21% parse uint 9% find end of line + tab 3.5% read data from disk The ticks in my measurements have been obtained via QueryPerformanceCounter. Kind Regards Benjamin Thaut
Dec 27 2013
Benjamin Thaut:If you replace the default hash function D uses with the MurMur hash function it brings it down even more to 8 seconds (29425713 ticks)What compiler are you using? ldc2? And is it a good idea to put MurMur in D?50% find free entry in hashmap 21% parse uintPerhaps the parse uint times can be improved. A lightly tested function to try: uint toUint(const(char)[] txt) pure nothrow { auto p = txt.ptr; const q = p + txt.length; // Skip leading not-digits. while (p < q && (*p < '0' || *p > '9')) p++; uint result = 0; while (p < q && *p >= '0' && *p <= '9') { result = (result * 10) + (*p - '0'); p++; } return result; } void main() { import std.stdio; "".toUint.writeln; "x".toUint.writeln; "-1".toUint.writeln; " 125 ".toUint.writeln; " 10000".toUint.writeln; " 1000000 ".toUint.writeln; " 10000000000000 ".toUint.writeln; } Its output (it's not a safe conversion): 0 0 1 125 10000 1000000 1316134912 Bye, bearophile
Dec 27 2013
Am 27.12.2013 12:02, schrieb bearophile:Benjamin Thaut:dmd 2.064.2If you replace the default hash function D uses with the MurMur hash function it brings it down even more to 8 seconds (29425713 ticks)What compiler are you using? ldc2?And is it a good idea to put MurMur in D?I don't know about the properties of the MurMur hash function. I only know that it is cheaper to compute then the hash function D uses. To decide if it would be better then the currently choosen hash function, a in depth analysis would be required.The way I parse the uint already looks that way. I'm not using std.conv https://github.com/Ingrater/thBase/blob/master/src/thBase/conv.d#L22 Kind Regards Benjamin Thaut50% find free entry in hashmap 21% parse uintPerhaps the parse uint times can be improved. A lightly tested function to try: uint toUint(const(char)[] txt) pure nothrow { auto p = txt.ptr; const q = p + txt.length; // Skip leading not-digits. while (p < q && (*p < '0' || *p > '9')) p++; uint result = 0; while (p < q && *p >= '0' && *p <= '9') { result = (result * 10) + (*p - '0'); p++; } return result; }
Dec 27 2013
On Friday, 27 December 2013 at 09:37:09 UTC, Benjamin Thaut wrote:Am 25.12.2013 17:12, schrieb Gordon: So I was interrested how far you can take this. The version bearophile posted runs in 14 seconds (48420527 ticks) on my machine.<...>At this point the profiler looked something like this: 50% find free entry in hashmap 21% parse uint 9% find end of line + tab 3.5% read data from diskThis is turning into very interesting (and useful) discussion. Thank you! I've created a generic "FileSlurp" library, to load a file (using "byLineFast") and use a delegate to store the data (so I can later use any improved methods you suggest). It's available here, comments are welcomed: http://code.dlang.org/packages/fileslurp Benjamin, Can you point me to your Hashmap implementation? I could perhaps use it to improve the timings even more. Bearophile, Who is "JM" who wrote "byLineFast" ? I'm using his code and would like to properly acknowledge him/her. Thanks! -gordon
Dec 27 2013
Gordon:Who is "JM" who wrote "byLineFast" ? I'm using his code and would like to properly acknowledge him/her.It should be Juan Manuel Cabo: http://permalink.gmane.org/gmane.comp.lang.d.general/117750 Bye, bearophile
Dec 27 2013
Am 27.12.2013 17:49, schrieb Gordon:Benjamin, Can you point me to your Hashmap implementation? I could perhaps use it to improve the timings even more.https://github.com/Ingrater/druntime/blob/merge64/src/core/hashmap.d This implementation depends on my own allocator design, but it should be possible to remove that dependeny quite easly by replacing all allocations / frees with malloc/free or GC.malloc / nothing. Just make sure that memory is not initialized beforehand (as new ubyte[sizeInBytes] would do) because that also has a considerable performance impact. Also when allocating with GC.malloc you should specify the GC.BlkAttr.NO_SCAN flag, because you know that your data does not contain any pointers. That way the GC will not scan that huge memory block and it should speed up collection a lot. To improve the performance of the hashmap you can try: - specifingy different hashing functions (via the hash policy). See https://github.com/Ingrater/thBase/blob/master/src/thBase/policies/hashing.d for more examples of hashing policies. - Change the amount of always free entries in the hashmap (currently 25%) for that change line 233, 207, 210 (not factored into a constant yet). Reducing the free entries might result in less cache misses, but more linear search, as this is a linear probing hashmap. Increasing the free entries might reduce the amount of linear search, but increase cache misses and increases memory usage. - Compute a series of prime numbers, for which each prime number is at least twice as big as the previous one and use that as the possible sizes for the hashmap. Prime number sizes give better distribution of the items within the hashmap, therefor reduce the amount of linear searching neccessary and thus improve the hashmap performance. - When searching for the next free spot in the hashmap, it currently just adds 1 to the previous index found (line 301). It is also possible to rehash the previous index (just put it into the hashing function again), which would give a new index that is more distant in the hashmap from the current one. This would again improve the distribution of the items in the hashmap, and thus reduce linear search time, but may increase the amount of cache-misses as it no longer does linear memory access. In case you are interrested in my implementation of your problem see here: http://dpaste.dzfl.pl/8d0d174e You won't be able to compile the code unless you setup my custom D environment. Which I would not recommend unless you really hate the D garbage collector ;-) Kind Regards Benjamin Thaut
Dec 27 2013
Also, gordon whats is keeping you from converting the textual representation to a binary one? I mean, if you need to read that file in often, you can easly preprocess it into a binary blob. You could even modify my hashmap implementation to support binary serialization, by just writing the entire array to a binary file and loading it again. That would be fast... Kind Regards Benjamin Thaut
Dec 27 2013
On Friday, 27 December 2013 at 17:26:42 UTC, Benjamin Thaut wrote:Also, gordon whats is keeping you from converting the textual representation to a binary one? I mean, if you need to read that file in often, you can easly preprocess it into a binary blob. You could even modify my hashmap implementation to support binary serialization, by just writing the entire array to a binary file and loading it again. That would be fast...This is intended to be a general-purpose CLI program to handle certain text files (from other sources), not a in-house application where I can restructure my data. Text files are still the most ubiquitous way to share data (especially without limiting the languages/tools other will use to process them). But out of curiosity - how would you serialize it?
Dec 27 2013
Am 27.12.2013 20:55, schrieb Gordon:On Friday, 27 December 2013 at 17:26:42 UTC, Benjamin Thaut wrote:Given my hashmap implementation I would do something like // method of hashmap void serializeTo(FILE* f) { fwrite(&m_FullCount, typeof(m_FullCount).sizeof, 1, f); ulong len = m_Data.length; fwrite(&len, typeof(len).sizeof, 1, f); fwrite(m_Data.ptr, Pair.sizeof, len, f); } // construct from file void this(FILE* f) { fread(&m_FullCount, typeof(m_FullCount).sizeof, 1, f); ulong len; fread(&len, typeof(len).sizeof, 1, f); m_Data = (cast(Pair*)malloc(Pair.sizeof * len))[0..len]; fread(m_Data.ptr, Pair.sizeof, len, f); } The deserialization would be just one allocation and a big binary read from a file, which should be quite fast. Kind Regards Benjamin ThautAlso, gordon whats is keeping you from converting the textual representation to a binary one? I mean, if you need to read that file in often, you can easly preprocess it into a binary blob. You could even modify my hashmap implementation to support binary serialization, by just writing the entire array to a binary file and loading it again. That would be fast...This is intended to be a general-purpose CLI program to handle certain text files (from other sources), not a in-house application where I can restructure my data. Text files are still the most ubiquitous way to share data (especially without limiting the languages/tools other will use to process them). But out of curiosity - how would you serialize it?
Dec 27 2013
On Friday, 27 December 2013 at 09:37:09 UTC, Benjamin Thaut wrote:At this point the profiler looked something like this: 50% find free entry in hashmapWhen I'm building large immutable hash tables I do this: Read in all the unordered key-value pairs into an array of Tuple!(Key, Value) Determine the number of buckets you want (B) Schwartz sort the array by hash(key) % B (a radix sort will probably be efficient here, but any will do) This groups all items with the same bucket index together. You can then easily generate the array of bucket pointers by iterating the array. Of course, you need your own hash table implementation to do this. You cannot do it using the built-in hash tables.
Dec 27 2013
GC.disable; size_t[size_t] unions; foreach (line; "input.txt".File.byLineFast) { line.munch(" \t"); // skip ws immutable i = line.parse!size_t; line.munch(" \t"); // skip ws immutable j = line.parse!size_t; unions[i] = j; } GC.enable; }A question for people that know something about the D garbage collector: can it be added some logic to the GC to detect the situations where you have to allocate many times very frequently, and disable or rarefy the collection frequency, to increase the code performance a lot? (I think "recently" the Python garbage collector has added such optimization). This way that enabling and disabling the GC becomes not needed any more. Bye, bearophile
Dec 26 2013
On Thursday, 26 December 2013 at 14:09:29 UTC, bearophile wrote:A question for people that know something about the D garbage collector: can it be added some logic to the GC to detect the situations where you have to allocate many times very frequently, and disable or rarefy the collection frequency, to increase the code performance a lot? (I think "recently" the Python garbage collector has added such optimization). This way that enabling and disabling the GC becomes not needed any more.I think it's possible. During the sweep phase where GC returns unmarked memory blocks (garbage) to the free lists, it can calculate total amount of freed memory at this stage. Probably this stat is already being calculated. It also knows total size of its heap, so after a collection it can know how much % of the heap was freed. When the program is in a state where it mostly allocates, this % will be very low compared to usual case. When GC sees it's so low it can increase the effective threshold for the next collection, making collections less often. I think actual parameter will be the size of pool allocated from system memory. AFAIR, currently GC is triggered when there are not enough free blocks in the current pool, and if after collection there is still not enough free space a new pool gets allocated. Its size is currently constant, making long allocating loops run in O(n^2) but if the pool size will grow in cases where % of collected memory is low and shrink back when % of freed space is bigger again, then GC will adapt to allocation rate and such scenarios will work faster.
Dec 26 2013
Gordon:The file looks like: 1 40 4 2 42 11 ... And has 11M lines.What's the interval of the numbers of the first column? Bye, bearophile
Dec 24 2013
On 12/24/13 2:28 PM, Gordon wrote:Hello, I want to load a large text file containing two numeric fields into an associative array. The file looks like: 1 40 4 2 42 11 ... And has 11M lines. My code looks like this: === void main() { size_t[size_t] unions; auto f = File("input.txt"); foreach ( line ; f.byLine() ) { auto fields = line.split(); size_t i = to!size_t(fields[0]); size_t j = to!size_t(fields[1]); unions[i] = j; // <-- here be question } } === This is just a test code to illustrate my question (though general comments are welcomed - I'm new to D). Commenting out the highlighted line (not populating the hash), the program completes in 25 seconds. Compiling with the highlighted line, the program takes ~3.5 minutes. Is there a way to speed the loading? perhaps reserving memory in the hash before populating it? Or another trick?void main() { size_t[size_t] unions; foreach (e; "input.txt" .slurp!(size_t, size_t)("%s %s").sort.uniq ) { unions[e[0]] = e[1]; } } Andrei
Dec 24 2013
On Tuesday, 24 December 2013 at 23:52:49 UTC, Andrei Alexandrescu wrote:void main() { size_t[size_t] unions; foreach (e; "input.txt" .slurp!(size_t, size_t)("%s %s").sort.uniq ) { unions[e[0]] = e[1]; } }Thanks for this very elegant solution! For completeness (since we're dealing with timing): 1. Running the above code with Garbage-collection enabled, takes 1m45s. 2. Running it with GC disabled takes 50s . 3. Running with GC disabled and without sort+uniq (my data is already uniq'd), takes 40s. This is a beautiful idiomatic D code, but for now I think I'll stick with the more verbose code above. Ine reason is that I want to provide helpful and verbose error messages for invalid input (e.g. "Invalid numeric value found in field 3 line 1000") - and with the "slurp" method, any input error will result in a not-so-helpful exception (e.g. "std.conv.ConvException /usr/include/dmd/phobos/std/conv.d(2009): Unexpected 'H' when converting from type char[] to type ulong). Many thanks, -gordon
Dec 25 2013
On 12/25/13 8:29 AM, Gordon wrote:On Tuesday, 24 December 2013 at 23:52:49 UTC, Andrei Alexandrescu wrote:Thanks for the numbers!void main() { size_t[size_t] unions; foreach (e; "input.txt" .slurp!(size_t, size_t)("%s %s").sort.uniq ) { unions[e[0]] = e[1]; } }Thanks for this very elegant solution! For completeness (since we're dealing with timing): 1. Running the above code with Garbage-collection enabled, takes 1m45s. 2. Running it with GC disabled takes 50s . 3. Running with GC disabled and without sort+uniq (my data is already uniq'd), takes 40s.This is a beautiful idiomatic D code, but for now I think I'll stick with the more verbose code above. Ine reason is that I want to provide helpful and verbose error messages for invalid input (e.g. "Invalid numeric value found in field 3 line 1000") - and with the "slurp" method, any input error will result in a not-so-helpful exception (e.g. "std.conv.ConvException /usr/include/dmd/phobos/std/conv.d(2009): Unexpected 'H' when converting from type char[] to type ulong).Thanks, I added https://d.puremagic.com/issues/show_bug.cgi?id=11816 to keep track of this. Andrei
Dec 25 2013
On Wed, Dec 25, 2013 at 5:47 PM, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:On 12/25/13 8:29 AM, Gordon wrote:Yes, indeed, thanks. I would have thought sorting (particularly the built-in sort) + uniq would 'cost' more.For completeness (since we're dealing with timing): 1. Running the above code with Garbage-collection enabled, takes 1m45s. 2. Running it with GC disabled takes 50s . 3. Running with GC disabled and without sort+uniq (my data is already uniq'd), takes 40s.Thanks for the numbers!
Dec 25 2013
On Tuesday, 24 December 2013 at 23:52:49 UTC, Andrei Alexandrescu wrote:On 12/24/13 2:28 PM, Gordon wrote:watch out for the parenthsesis on sort. As bearophile likes to point out frequently, without parenthesis you are calling the builtin sort, not the std.algorithm one. Gordon, you may find this has better performance if you add () to sort.Hello, I want to load a large text file containing two numeric fields into an associative array. The file looks like: 1 40 4 2 42 11 ... And has 11M lines. My code looks like this: === void main() { size_t[size_t] unions; auto f = File("input.txt"); foreach ( line ; f.byLine() ) { auto fields = line.split(); size_t i = to!size_t(fields[0]); size_t j = to!size_t(fields[1]); unions[i] = j; // <-- here be question } } === This is just a test code to illustrate my question (though general comments are welcomed - I'm new to D). Commenting out the highlighted line (not populating the hash), the program completes in 25 seconds. Compiling with the highlighted line, the program takes ~3.5 minutes. Is there a way to speed the loading? perhaps reserving memory in the hash before populating it? Or another trick?void main() { size_t[size_t] unions; foreach (e; "input.txt" .slurp!(size_t, size_t)("%s %s").sort.uniq ) { unions[e[0]] = e[1]; } } Andrei
Dec 25 2013
On 12/25/13 10:25 AM, John Colvin wrote:watch out for the parenthsesis on sort. As bearophile likes to point out frequently, without parenthesis you are calling the builtin sort, not the std.algorithm one.Ew, good point. Thanks.Gordon, you may find this has better performance if you add () to sort.Also, can you share your data somewhere if it's not confidential? Andrei
Dec 25 2013
On Wed, Dec 25, 2013 at 10:51:23AM -0800, Andrei Alexandrescu wrote:On 12/25/13 10:25 AM, John Colvin wrote:[...] Is the built-in sort deprecated yet? If it is, when can we finally say bye-bye to it? T -- Be in denial for long enough, and one day you'll deny yourself of things you wish you hadn't.watch out for the parenthsesis on sort. As bearophile likes to point out frequently, without parenthesis you are calling the builtin sort, not the std.algorithm one.Ew, good point. Thanks.
Dec 25 2013
H. S. Teoh:Is the built-in sort deprecated yet? If it is, when can we finally say bye-bye to it?https://d.puremagic.com/issues/show_bug.cgi?id=10318 Bye, bearophile
Dec 25 2013
On Wednesday, 25 December 2013 at 18:51:22 UTC, Andrei Alexandrescu wrote:On 12/25/13 10:25 AM, John Colvin wrote:It will be related to this research: http://www.familinx.org . The data is available in the "download" section as an SQL dump (which can be easily converted to just tabular files). I've also added verbose errors to "std.file.slurp()", which (IMHO) provide all the necessary information when an error occurs. Available here: https://github.com/agordon/phobos/compare/file.slurp.verbose But this has lead me to detect some "formattedRead" quirks - I'll start a new thread about those. Thanks, -gordonGordon, you may find this has better performance if you add () to sort.Also, can you share your data somewhere if it's not confidential?
Dec 25 2013
Am 25.12.2013 21:26, schrieb Gordon:On Wednesday, 25 December 2013 at 18:51:22 UTC, Andrei Alexandrescu wrote:Did you use D to convert that SQL dump to tabular data? If so, could you post the small snippet? If not did you convert the "relationship" table? Kind Regards Benjamin ThautOn 12/25/13 10:25 AM, John Colvin wrote:It will be related to this research: http://www.familinx.org . The data is available in the "download" section as an SQL dump (which can be easily converted to just tabular files). I've also added verbose errors to "std.file.slurp()", which (IMHO) provide all the necessary information when an error occurs. Available here: https://github.com/agordon/phobos/compare/file.slurp.verbose But this has lead me to detect some "formattedRead" quirks - I'll start a new thread about those. Thanks, -gordonGordon, you may find this has better performance if you add () to sort.Also, can you share your data somewhere if it's not confidential?
Dec 26 2013
On Thursday, 26 December 2013 at 18:07:38 UTC, Benjamin Thaut wrote:Did you use D to convert that SQL dump to tabular data? If so, could you post the small snippet? If not did you convert the "relationship" table?No, I use "D" for some internal experimentation. To get the data as tabular I loaded the SQL dump to a MySQL server, then used "mysqldump --tab" to export the tables I needed. HTH, -gordon
Dec 26 2013
On Tuesday, 24 December 2013 at 22:28:21 UTC, Gordon wrote:Hello, I want to load a large text file containing two numeric fields into an associative array. The file looks like: 1 40 4 2 42 11 ... And has 11M lines. My code looks like this: === void main() { size_t[size_t] unions; auto f = File("input.txt"); foreach ( line ; f.byLine() ) { auto fields = line.split(); size_t i = to!size_t(fields[0]); size_t j = to!size_t(fields[1]); unions[i] = j; // <-- here be question } } === This is just a test code to illustrate my question (though general comments are welcomed - I'm new to D). Commenting out the highlighted line (not populating the hash), the program completes in 25 seconds. Compiling with the highlighted line, the program takes ~3.5 minutes. Is there a way to speed the loading? perhaps reserving memory in the hash before populating it? Or another trick? Many thanks, -gordonusing OrderedAA improve speed 3x https://github.com/Kozzi11/Trash/tree/master/util import util.orderedaa; int main(string[] args) { import std.stdio, std.conv, std.string, core.memory; import bylinefast; GC.disable; OrderedAA!(size_t, size_t, 1_000_007) unions; //size_t[size_t] unions; foreach (line; "input.txt".File.byLineFast) { line.munch(" \t"); // skip ws immutable i = line.parse!size_t; line.munch(" \t"); // skip ws immutable j = line.parse!size_t; unions[i] = j; } GC.enable; return 0; }
Dec 27 2013
Am 27.12.2013 19:25, schrieb Daniel Kozak:using OrderedAA improve speed 3x https://github.com/Kozzi11/Trash/tree/master/utilA possible downside of this implementation is though, that due to the fact that you are using a double linked list per index, there will be more chache misses during a read operation compared to a linear probing hashmap. Did you try using a array per index instead of a linked list, and measure if that makes any difference? Kind Regards Benjamin Thaut
Dec 27 2013
Because I was always curious to do some hashmap profiling with real data, I did some more. Here are the results: My implementation. Power of two size (25% free): building hashmap: 8 seconds 28880605 ticks looking entries up: 0 seconds 31685 ticks My implementation. Prime number size (25% free): building hashmap: 8 seconds 26802252 ticks looking entries up: 0 seconds 27719 ticks My implementation. Prime number size (10% free): building hashmap: 8 seconds 29323952 ticks looking entries up: 0 seconds 32768 ticks My implementation. Prime number size (20% free): 26877474 entries building hashmap: 8 seconds 28895211 ticks sum 74128 looking entries up: 0 seconds 33222 ticks My implementation. Prime number size (50% free): building hashmap: 8 seconds 27738475 ticks looking entries up: 0 seconds 25696 ticks OrderedAA (implementation of Daniel Kozak): building array: 13 seconds 45663219 ticks lookup: 0 seconds 236323 ticks Builtin AA: building array: 14 seconds 47175035 ticks lookup: 0 seconds 33692 ticks You can see, that both my implementation and daniel kozaks outperform the builtin AA when filling the AA. Both my and daniel kozaks version preallocate the internal array. Although daniel kozaks version does one additional allocation per entry to build the double linked lists. When not preallocating my implementation still outperforms the builtin AA. When looking up data, my implementation outperforms both other implementations. My implementation is only slightly faster then the builtin one. Daniel Kozaks version is 8 times slower during lookup compared to my implementation. I think this is mostly due to cache misses caused by the linked list storage of the entries. It is also interresting to note, that using prime number sizes for the internal array of the AA improved the lookup performance slightly in my implementation. A certain portion of all entries in the internal array are always kept free. Reducing this amount leads to worse performance during looup, increasing it, gives better performance. You can gain a few percent speed if you are willing to waste 50% memory. My measurements show that 25% free entries seem to be the sweet spot.
Dec 28 2013
On Saturday, 28 December 2013 at 20:20:36 UTC, Benjamin Thaut wrote:Because I was always curious to do some hashmap profiling with real data, I did some more. Here are the results: My implementation. Power of two size (25% free): building hashmap: 8 seconds 28880605 ticks looking entries up: 0 seconds 31685 ticks My implementation. Prime number size (25% free): building hashmap: 8 seconds 26802252 ticks looking entries up: 0 seconds 27719 ticks My implementation. Prime number size (10% free): building hashmap: 8 seconds 29323952 ticks looking entries up: 0 seconds 32768 ticks My implementation. Prime number size (20% free): 26877474 entries building hashmap: 8 seconds 28895211 ticks sum 74128 looking entries up: 0 seconds 33222 ticks My implementation. Prime number size (50% free): building hashmap: 8 seconds 27738475 ticks looking entries up: 0 seconds 25696 ticks OrderedAA (implementation of Daniel Kozak): building array: 13 seconds 45663219 ticks lookup: 0 seconds 236323 ticks Builtin AA: building array: 14 seconds 47175035 ticks lookup: 0 seconds 33692 ticks You can see, that both my implementation and daniel kozaks outperform the builtin AA when filling the AA. Both my and daniel kozaks version preallocate the internal array. Although daniel kozaks version does one additional allocation per entry to build the double linked lists. When not preallocating my implementation still outperforms the builtin AA. When looking up data, my implementation outperforms both other implementations. My implementation is only slightly faster then the builtin one. Daniel Kozaks version is 8 times slower during lookup compared to my implementation. I think this is mostly due to cache misses caused by the linked list storage of the entries. It is also interresting to note, that using prime number sizes for the internal array of the AA improved the lookup performance slightly in my implementation. A certain portion of all entries in the internal array are always kept free. Reducing this amount leads to worse performance during looup, increasing it, gives better performance. You can gain a few percent speed if you are willing to waste 50% memory. My measurements show that 25% free entries seem to be the sweet spot.This is cool! Can you post somewhere your code and data which you use for this test? I really like to test it on my new AA implementation :)
Dec 30 2013
Am 30.12.2013 15:06, schrieb Daniel Kozak:This is cool! Can you post somewhere your code and data which you use for this test? I really like to test it on my new AA implementation :)Here is the post that describes how to get the data: http://forum.dlang.org/post/yglwnmawrvbhpszdsiqs forum.dlang.org And here the post how to convert it to tabular text data: http://forum.dlang.org/post/rhyycuwlxsoqbtpzpgzd forum.dlang.org After dumping the "relationship" table from mysql I just removed the first column because it only contains the ID of the entry. I then used the remaining two columns for profiling. You will have to get the data yourself, as I can't repost it here (because its not my data). Here is the post that describes my implementation: http://forum.dlang.org/post/l9kcm2$2i4i$1 digitalmars.com For profiling the lookups, I generated 100_000 random numbers in the range range (0, 200_000] and saved them to a file. I then profile the lookup of those 100_000 random numbers. Out of all these numbers rughly 75% are valid indices and the rest are invalid indices to profile both the lookup of valid and invalid keys. For profiling your implementation I just used the code you posted. Kind Regards Benjamin Thaut
Dec 30 2013