digitalmars.D - Speed of hash tables compared to lua
- Domingo (65/65) Sep 13 2020 Hello !
- Domingo (34/45) Sep 13 2020 After writing the previous message I realized the test was not
- Domingo (32/81) Sep 13 2020 And here is the C++ using unordered map, c++ speed is equal to
- ikod (6/37) Sep 13 2020 Default runtime map provide some properties which may be
- Daniel Kozak (2/19) Sep 13 2020 lua takes 489ms
- Daniel Kozak (8/13) Sep 13 2020 I have tried emsi_containers, memutils and ikod-containers.
- Steven Schveighoffer (11/32) Sep 13 2020 The allocation patterns definitely affect the builtin containers. A long...
- FeepingCreature (6/11) Sep 15 2020 This reinforces my increasing conviction that
- James Blachly (3/9) Sep 13 2020 ikod-containers is nice. Don't forget also dklib's khash =)
- Daniel Kozak (11/20) Sep 13 2020 Does not work, first I have to change
- mw (3/6) Sep 13 2020 Create a github repo, so everyone can run the benchmark, ... and
- Daniel Kozak (5/14) Sep 13 2020 But this line will fix that for me:
- James Blachly (4/30) Sep 14 2020 Hmm, thank you for that. We are using extensively internally and have
Hello ! Doing some tests I found that D hash tables are a lot slower than Lua tables see bellow, D is using 2 times more memory and 5 times more time compared to Lua, when comparing with Luajit it's unbelievable worse. I would expect a compiled language like D to use less resources than a scripting language, anyone have experienced something similar ? Does the community know this ? I hope this simple test would help to test and improve the situation ! ===== local dsize = 6000008; local d = {}; for i=1, dsize do if ((i % 2) == 0) then d[i] = 0.0000000123; else d[i] = 12345.0000000001; end end local sum = 0.0; for i=1, dsize do sum = sum + d[i]; end print(string.format("count: %d, sum: %35.15f\n", #d, sum)); ===== /usr/bin/time lua sum-test.lua count: 6000008, sum: 37035049380.000160217285156 0.39user 0.03system 0:00.43elapsed 99%CPU (0avgtext+0avgdata 133528maxresident)k 520inputs+0outputs (1major+32908minor)pagefaults 0swaps ===== /usr/bin/time luajit sum-test.lua count: 6000008, sum: 37035049380.000160217285156 0.07user 0.02system 0:00.10elapsed 97%CPU (0avgtext+0avgdata 67824maxresident)k 984inputs+0outputs (3major+18560minor)pagefaults 0swaps ===== ===== import std.stdio; int main(char[][] argv) { immutable auto DSIZE = 6000008; double[int] d; for (int i = 0; i < DSIZE; ++i) { d[i] = (i % 2) == 0 ? 0.0000000123 : 12345.0000000001; } //d.rehash; double sum = 0.0; for (int i = 0; i < DSIZE; ++i) { sum += d[i]; } writefln("count: %d, sum: %35.15f\n", DSIZE, sum); return 0; } ===== /usr/bin/time ./sum_test_aa count: 6000008, sum: 37035049380.000160217285156 1.96user 0.10system 0:02.04elapsed 101%CPU (0avgtext+0avgdata 272888maxresident)k 0inputs+0outputs (0major+68226minor)pagefaults 0swaps ===== Cheers !
Sep 13 2020
On Sunday, 13 September 2020 at 09:50:27 UTC, Domingo wrote:Hello ! Doing some tests I found that D hash tables are a lot slower than Lua tables see bellow, D is using 2 times more memory and 5 times more time compared to Lua, when comparing with Luajit it's unbelievable worse. I would expect a compiled language like D to use less resources than a scripting language, anyone have experienced something similar ? Does the community know this ? I hope this simple test would help to test and improve the situation !After writing the previous message I realized the test was not exactly fair, Lua was probably using the array functionality of it's table so I changed it a bit and now Lua uses as much memory as D but still is faster by a good margin (~4). ===== local dsize = 6000008; local d = {}; for i=dsize, 1, -1 do if ((i % 2) == 0) then d[i] = 0.0000000123; else d[i] = 12345.0000000001; end end local sum = 0.0; for i=dsize, 1, -1 do sum = sum + d[i]; end print(string.format("count: %d, sum: %35.15f\n", #d, sum)); ===== ===== /usr/bin/time lua sum-test-rev.lua count: 6000008, sum: 37035049380.000160217285156 0.63user 0.11system 0:00.74elapsed 100%CPU (0avgtext+0avgdata 264500maxresident)k 0inputs+0outputs (0major+98435minor)pagefaults 0swaps ===== ===== /usr/bin/time luajit sum-test-rev.lua count: 6000008, sum: 37035049380.000160217285156 0.48user 0.04system 0:00.52elapsed 100%CPU (0avgtext+0avgdata 117240maxresident)k 0inputs+0outputs (0major+41086minor)pagefaults 0swaps =====
Sep 13 2020
On Sunday, 13 September 2020 at 10:06:56 UTC, Domingo wrote:On Sunday, 13 September 2020 at 09:50:27 UTC, Domingo wrote:And here is the C++ using unordered map, c++ speed is equal to Lua but uses more memory: ===== #include <unordered_map> #include <string> #include <cstdio> int main() { const int DSIZE = 6000008; // Create an empty unordered_map std::unordered_map<int, double> intMap; for (int i = 0; i < DSIZE; ++i) { intMap.insert( { i, (i % 2) == 0 ? 0.0000000123 : 12345.0000000001 }); } double sum = 0.0; for (std::pair<int, double> element : intMap) sum += element.second; printf("count: %d, sum: %35.15f\n", DSIZE, sum); return 0; } ===== ===== g++ -O2 -o sum_test_aa-cpp sum_test_aa.cpp /usr/bin/time ./sum_test_aa-cpp count: 6000008, sum: 37035049380.000160217285156 0.49user 0.09system 0:00.59elapsed 99%CPU (0avgtext+0avgdata 330508maxresident)k 0inputs+0outputs (0major+93595minor)pagefaults 0swaps =====Hello ! Doing some tests I found that D hash tables are a lot slower than Lua tables see bellow, D is using 2 times more memory and 5 times more time compared to Lua, when comparing with Luajit it's unbelievable worse. I would expect a compiled language like D to use less resources than a scripting language, anyone have experienced something similar ? Does the community know this ? I hope this simple test would help to test and improve the situation !After writing the previous message I realized the test was not exactly fair, Lua was probably using the array functionality of it's table so I changed it a bit and now Lua uses as much memory as D but still is faster by a good margin (~4). ===== local dsize = 6000008; local d = {}; for i=dsize, 1, -1 do if ((i % 2) == 0) then d[i] = 0.0000000123; else d[i] = 12345.0000000001; end end local sum = 0.0; for i=dsize, 1, -1 do sum = sum + d[i]; end print(string.format("count: %d, sum: %35.15f\n", #d, sum)); ===== ===== /usr/bin/time lua sum-test-rev.lua count: 6000008, sum: 37035049380.000160217285156 0.63user 0.11system 0:00.74elapsed 100%CPU (0avgtext+0avgdata 264500maxresident)k 0inputs+0outputs (0major+98435minor)pagefaults 0swaps ===== ===== /usr/bin/time luajit sum-test-rev.lua count: 6000008, sum: 37035049380.000160217285156 0.48user 0.04system 0:00.52elapsed 100%CPU (0avgtext+0avgdata 117240maxresident)k 0inputs+0outputs (0major+41086minor)pagefaults 0swaps =====
Sep 13 2020
On Sunday, 13 September 2020 at 10:20:06 UTC, Domingo wrote:And here is the C++ using unordered map, c++ speed is equal to Lua but uses more memory: ===== #include <unordered_map> #include <string> #include <cstdio> int main() { const int DSIZE = 6000008; // Create an empty unordered_map std::unordered_map<int, double> intMap; for (int i = 0; i < DSIZE; ++i) { intMap.insert( { i, (i % 2) == 0 ? 0.0000000123 : 12345.0000000001 }); } double sum = 0.0; for (std::pair<int, double> element : intMap) sum += element.second; printf("count: %d, sum: %35.15f\n", DSIZE, sum); return 0; } ===== ===== g++ -O2 -o sum_test_aa-cpp sum_test_aa.cpp /usr/bin/time ./sum_test_aa-cpp count: 6000008, sum: 37035049380.000160217285156 0.49user 0.09system 0:00.59elapsed 99%CPU (0avgtext+0avgdata 330508maxresident)k 0inputs+0outputs (0major+93595minor)pagefaults 0swaps =====Default runtime map provide some properties which may be unavailable for other implementations. Its speed may be affected by GC, it uses to keep table items. You may include some hash map implementations from the dub repo in your benchmark.
Sep 13 2020
On Sun, Sep 13, 2020 at 10:50 PM Daniel Kozak <kozzi11 gmail.com> wrote:On Sun, Sep 13, 2020 at 1:25 PM ikod via Digitalmars-d < digitalmars-d puremagic.com> wrote:lua takes 489msDefault runtime map provide some properties which may be unavailable for other implementations. Its speed may be affected by GC, it uses to keep table items. You may include some hash map implementations from the dub repo in your benchmark.I have tried emsi_containers, memutils and ikod-containers. default D's AA take 1s on my pc emsi_containers take 3s realy slow memutils was fast but does not realy work (seems it is broken because there has been nan at d[0] and many other indexes) ikod-containers take 242ms
Sep 13 2020
On Sun, Sep 13, 2020 at 1:25 PM ikod via Digitalmars-d < digitalmars-d puremagic.com> wrote:Default runtime map provide some properties which may be unavailable for other implementations. Its speed may be affected by GC, it uses to keep table items. You may include some hash map implementations from the dub repo in your benchmark.I have tried emsi_containers, memutils and ikod-containers. default D's AA take 1s on my pc emsi_containers take 3s realy slow memutils was fast but does not realy work (seems it is broken because there has been nan at d[0] and many other indexes) ikod-containers take 242ms
Sep 13 2020
On 9/13/20 4:50 PM, Daniel Kozak wrote:On Sun, Sep 13, 2020 at 1:25 PM ikod via Digitalmars-d <digitalmars-d puremagic.com <mailto:digitalmars-d puremagic.com>> wrote: Default runtime map provide some properties which may be unavailable for other implementations. Its speed may be affected by GC, it uses to keep table items. You may include some hash map implementations from the dub repo in your benchmark. I have tried emsi_containers, memutils and ikod-containers. default D's AA take 1s on my pc emsi_containers take 3s realy slow memutils was fast but does not realy work (seems it is broken because there has been nan at d[0] and many other indexes) ikod-containers take 242msThe allocation patterns definitely affect the builtin containers. A long time ago, dcollections kicked the pants off of builtin AAs for most patterns, basically by using different memory allocation strategies (for example, allocating blocks of elements at a time, and using C malloc when no pointers were detected in the type). Builtin AA's will always have to be slower, because they must allocate a separate piece of memory for each bucket to retain the property that you can get a pointer to any element, and that memory will survive anything the AA can do. -Steve
Sep 13 2020
On Sunday, 13 September 2020 at 21:29:53 UTC, Steven Schveighoffer wrote:Builtin AA's will always have to be slower, because they must allocate a separate piece of memory for each bucket to retain the property that you can get a pointer to any element, and that memory will survive anything the AA can do. -SteveThis reinforces my increasing conviction that referentiability-by-default is on a level with nullability-by-default and mutability-by-default in the "Great Collection of C-Derived Language Design Errors".
Sep 15 2020
On 9/13/20 4:50 PM, Daniel Kozak wrote:I have tried emsi_containers, memutils and ikod-containers. default D's AA take 1s on my pc emsi_containers take 3s realy slow memutils was fast but does not realy work (seems it is broken because there has been nan at d[0] and many other indexes) ikod-containers take 242msikod-containers is nice. Don't forget also dklib's khash =) https://github.com/blachlylab/dklib/
Sep 13 2020
On Mon, Sep 14, 2020 at 3:55 AM James Blachly via Digitalmars-d < digitalmars-d puremagic.com> wrote:On 9/13/20 4:50 PM, Daniel Kozak wrote:Does not work, first I have to change auto map = khash!(keytype, valuetype); to auto map = khash!(keytype, valuetype)(); Because I get (source/app.d(20,11): Error: type khash!(uint, double, true, true) has no value) And after that it takes 262ms but end up with SIGSEGV. But it print right sum, so I guess it is something with freeing when try to dealocate hashmapI have tried emsi_containers, memutils and ikod-containers. default D's AA take 1s on my pc emsi_containers take 3s realy slow memutils was fast but does not realy work (seems it is broken because there has been nan at d[0] and many other indexes) ikod-containers take 242msikod-containers is nice. Don't forget also dklib's khash =) https://github.com/blachlylab/dklib/
Sep 13 2020
On Monday, 14 September 2020 at 05:25:03 UTC, Daniel Kozak wrote:Does not work, first I have to change ... And after that it takes 262ms but end up with SIGSEGV.Create a github repo, so everyone can run the benchmark, ... and fix it?
Sep 13 2020
On Monday, 14 September 2020 at 05:31:57 UTC, mw wrote:On Monday, 14 September 2020 at 05:25:03 UTC, Daniel Kozak wrote:Nice idea! It also should contain memory measurements and feature matrix (as feature set can affect speed). It also would be nice to have benchmarks on some different key/value types or sizes, like class as key or value, struct(fat struct) as key or value, etc..Does not work, first I have to change ... And after that it takes 262ms but end up with SIGSEGV.Create a github repo, so everyone can run the benchmark, ... and fix it?
Sep 13 2020
On Monday, 14 September 2020 at 06:02:49 UTC, ikod wrote:On Monday, 14 September 2020 at 05:31:57 UTC, mw wrote:I added benchmark folder to my project https://github.com/ikod/ikod-containers/tree/master/bench Please add your implementations Thanks!On Monday, 14 September 2020 at 05:25:03 UTC, Daniel Kozak wrote:Nice idea! It also should contain memory measurements and feature matrix (as feature set can affect speed). It also would be nice to have benchmarks on some different key/value types or sizes, like class as key or value, struct(fat struct) as key or value, etc..Does not work, first I have to change ... And after that it takes 262ms but end up with SIGSEGV.Create a github repo, so everyone can run the benchmark, ... and fix it?
Sep 24 2020
On Mon, Sep 14, 2020 at 7:25 AM Daniel Kozak <kozzi11 gmail.com> wrote:Does not work, first I have to change auto map = khash!(keytype, valuetype); to auto map = khash!(keytype, valuetype)(); Because I get (source/app.d(20,11): Error: type khash!(uint, double, true, true) has no value) And after that it takes 262ms but end up with SIGSEGV. But it print right sum, so I guess it is something with freeing when try to dealocate hashmapBut this line will fix that for me: extern(C) __gshared string[] rt_options = [ "gcopt=parallel:0 cleanup:none" ]; So it is something with GC trying to free already freed memory I geuss
Sep 13 2020
On 9/14/20 1:28 AM, Daniel Kozak wrote:On Mon, Sep 14, 2020 at 7:25 AM Daniel Kozak <kozzi11 gmail.com <mailto:kozzi11 gmail.com>> wrote: Does not work, first I have to change auto map = khash!(keytype, valuetype); to auto map = khash!(keytype, valuetype)(); Because I get (source/app.d(20,11): Error: type khash!(uint, double, true, true) has no value) And after that it takes 262ms but end up with SIGSEGV. But it print right sum, so I guess it is something with freeing when try to dealocate hashmap But this line will fix that for me: extern(C) __gshared string[] rt_options = [ "gcopt=parallel:0 cleanup:none" ]; So it is something with GC trying to free already freed memory I geussHmm, thank you for that. We are using extensively internally and have not had segfault. Can you share the benchmarking code? I will of course update the readme to include the omitted function call ();
Sep 14 2020