www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Speed of hash tables compared to lua

reply Domingo <mingodad gmail.com> writes:
Hello !
Doing some tests I found that D hash tables are a lot slower than 
Lua tables see bellow, D is using 2 times more memory and 5 times 
more time compared to Lua, when comparing with Luajit it's 
unbelievable worse.

I would expect a compiled language like D to use less resources 
than a scripting language, anyone have experienced something 
similar ?

Does the community know this ?

I hope this simple test would help to test and improve the 
situation !

=====
local  dsize = 6000008;
local d = {};

for i=1, dsize
do
	if ((i % 2) == 0) then  d[i] = 0.0000000123;
	else d[i] = 12345.0000000001; end
end

local sum = 0.0;
for i=1, dsize
do
       sum = sum + d[i];
end

print(string.format("count: %d, sum: %35.15f\n", #d, sum));
=====
/usr/bin/time lua sum-test.lua
count: 6000008, sum:         37035049380.000160217285156

0.39user 0.03system 0:00.43elapsed 99%CPU (0avgtext+0avgdata 
133528maxresident)k
520inputs+0outputs (1major+32908minor)pagefaults 0swaps

=====
/usr/bin/time luajit sum-test.lua
count: 6000008, sum:         37035049380.000160217285156

0.07user 0.02system 0:00.10elapsed 97%CPU (0avgtext+0avgdata 
67824maxresident)k
984inputs+0outputs (3major+18560minor)pagefaults 0swaps
=====

=====
import std.stdio;

int main(char[][] argv)
{
	immutable auto DSIZE = 6000008;
	double[int] d;

	for (int i = 0; i < DSIZE; ++i)
	{
	      d[i] = (i % 2) == 0 ? 0.0000000123 : 12345.0000000001;
	}
	//d.rehash;

	double sum = 0.0;
	for (int i = 0; i < DSIZE; ++i)
	{
	      sum += d[i];
	}

	writefln("count: %d, sum: %35.15f\n", DSIZE, sum);
	return 0;
}
=====
/usr/bin/time ./sum_test_aa
count: 6000008, sum:         37035049380.000160217285156

1.96user 0.10system 0:02.04elapsed 101%CPU (0avgtext+0avgdata 
272888maxresident)k
0inputs+0outputs (0major+68226minor)pagefaults 0swaps
=====
Cheers !
Sep 13 2020
parent reply Domingo <mingodad gmail.com> writes:
On Sunday, 13 September 2020 at 09:50:27 UTC, Domingo wrote:
 Hello !
 Doing some tests I found that D hash tables are a lot slower 
 than Lua tables see bellow, D is using 2 times more memory and 
 5 times more time compared to Lua, when comparing with Luajit 
 it's unbelievable worse.

 I would expect a compiled language like D to use less resources 
 than a scripting language, anyone have experienced something 
 similar ?

 Does the community know this ?

 I hope this simple test would help to test and improve the 
 situation !
After writing the previous message I realized the test was not exactly fair, Lua was probably using the array functionality of it's table so I changed it a bit and now Lua uses as much memory as D but still is faster by a good margin (~4). ===== local dsize = 6000008; local d = {}; for i=dsize, 1, -1 do if ((i % 2) == 0) then d[i] = 0.0000000123; else d[i] = 12345.0000000001; end end local sum = 0.0; for i=dsize, 1, -1 do sum = sum + d[i]; end print(string.format("count: %d, sum: %35.15f\n", #d, sum)); ===== ===== /usr/bin/time lua sum-test-rev.lua count: 6000008, sum: 37035049380.000160217285156 0.63user 0.11system 0:00.74elapsed 100%CPU (0avgtext+0avgdata 264500maxresident)k 0inputs+0outputs (0major+98435minor)pagefaults 0swaps ===== ===== /usr/bin/time luajit sum-test-rev.lua count: 6000008, sum: 37035049380.000160217285156 0.48user 0.04system 0:00.52elapsed 100%CPU (0avgtext+0avgdata 117240maxresident)k 0inputs+0outputs (0major+41086minor)pagefaults 0swaps =====
Sep 13 2020
parent reply Domingo <mingodad gmail.com> writes:
On Sunday, 13 September 2020 at 10:06:56 UTC, Domingo wrote:
 On Sunday, 13 September 2020 at 09:50:27 UTC, Domingo wrote:
 Hello !
 Doing some tests I found that D hash tables are a lot slower 
 than Lua tables see bellow, D is using 2 times more memory and 
 5 times more time compared to Lua, when comparing with Luajit 
 it's unbelievable worse.

 I would expect a compiled language like D to use less 
 resources than a scripting language, anyone have experienced 
 something similar ?

 Does the community know this ?

 I hope this simple test would help to test and improve the 
 situation !
After writing the previous message I realized the test was not exactly fair, Lua was probably using the array functionality of it's table so I changed it a bit and now Lua uses as much memory as D but still is faster by a good margin (~4). ===== local dsize = 6000008; local d = {}; for i=dsize, 1, -1 do if ((i % 2) == 0) then d[i] = 0.0000000123; else d[i] = 12345.0000000001; end end local sum = 0.0; for i=dsize, 1, -1 do sum = sum + d[i]; end print(string.format("count: %d, sum: %35.15f\n", #d, sum)); ===== ===== /usr/bin/time lua sum-test-rev.lua count: 6000008, sum: 37035049380.000160217285156 0.63user 0.11system 0:00.74elapsed 100%CPU (0avgtext+0avgdata 264500maxresident)k 0inputs+0outputs (0major+98435minor)pagefaults 0swaps ===== ===== /usr/bin/time luajit sum-test-rev.lua count: 6000008, sum: 37035049380.000160217285156 0.48user 0.04system 0:00.52elapsed 100%CPU (0avgtext+0avgdata 117240maxresident)k 0inputs+0outputs (0major+41086minor)pagefaults 0swaps =====
And here is the C++ using unordered map, c++ speed is equal to Lua but uses more memory: ===== #include <unordered_map> #include <string> #include <cstdio> int main() { const int DSIZE = 6000008; // Create an empty unordered_map std::unordered_map<int, double> intMap; for (int i = 0; i < DSIZE; ++i) { intMap.insert( { i, (i % 2) == 0 ? 0.0000000123 : 12345.0000000001 }); } double sum = 0.0; for (std::pair<int, double> element : intMap) sum += element.second; printf("count: %d, sum: %35.15f\n", DSIZE, sum); return 0; } ===== ===== g++ -O2 -o sum_test_aa-cpp sum_test_aa.cpp /usr/bin/time ./sum_test_aa-cpp count: 6000008, sum: 37035049380.000160217285156 0.49user 0.09system 0:00.59elapsed 99%CPU (0avgtext+0avgdata 330508maxresident)k 0inputs+0outputs (0major+93595minor)pagefaults 0swaps =====
Sep 13 2020
parent reply ikod <igor.khasilev gmail.com> writes:
On Sunday, 13 September 2020 at 10:20:06 UTC, Domingo wrote:
 And here is the C++ using unordered map, c++ speed is equal to 
 Lua but uses more memory:

 =====
 #include <unordered_map>
 #include <string>
 #include <cstdio>
 int main()
 {
 	const int DSIZE = 6000008;
 	// Create an empty unordered_map
 	std::unordered_map<int, double> intMap;

 	for (int i = 0; i < DSIZE; ++i)
 	{
 		intMap.insert( { i, (i % 2) == 0 ? 0.0000000123 : 
 12345.0000000001 });
 	}

 	double sum = 0.0;
 	for (std::pair<int, double> element : intMap)
 		sum += element.second;

 	printf("count: %d, sum: %35.15f\n", DSIZE, sum);
 	return 0;
 }
 =====
 =====
 g++ -O2 -o sum_test_aa-cpp sum_test_aa.cpp

 /usr/bin/time ./sum_test_aa-cpp
 count: 6000008, sum:         37035049380.000160217285156
 0.49user 0.09system 0:00.59elapsed 99%CPU (0avgtext+0avgdata 
 330508maxresident)k
 0inputs+0outputs (0major+93595minor)pagefaults 0swaps

 =====
Default runtime map provide some properties which may be unavailable for other implementations. Its speed may be affected by GC, it uses to keep table items. You may include some hash map implementations from the dub repo in your benchmark.
Sep 13 2020
next sibling parent Daniel Kozak <kozzi11 gmail.com> writes:
On Sun, Sep 13, 2020 at 10:50 PM Daniel Kozak <kozzi11 gmail.com> wrote:

 On Sun, Sep 13, 2020 at 1:25 PM ikod via Digitalmars-d <
 digitalmars-d puremagic.com> wrote:

 Default runtime map provide some properties which may be
 unavailable for other implementations. Its speed may be affected
 by GC, it uses to keep table items.

 You may include some hash map implementations from the dub repo
 in your benchmark.
I have tried emsi_containers, memutils and ikod-containers. default D's AA take 1s on my pc emsi_containers take 3s realy slow memutils was fast but does not realy work (seems it is broken because there has been nan at d[0] and many other indexes) ikod-containers take 242ms
lua takes 489ms
Sep 13 2020
prev sibling parent reply Daniel Kozak <kozzi11 gmail.com> writes:
On Sun, Sep 13, 2020 at 1:25 PM ikod via Digitalmars-d <
digitalmars-d puremagic.com> wrote:

 Default runtime map provide some properties which may be
 unavailable for other implementations. Its speed may be affected
 by GC, it uses to keep table items.

 You may include some hash map implementations from the dub repo
 in your benchmark.
I have tried emsi_containers, memutils and ikod-containers. default D's AA take 1s on my pc emsi_containers take 3s realy slow memutils was fast but does not realy work (seems it is broken because there has been nan at d[0] and many other indexes) ikod-containers take 242ms
Sep 13 2020
next sibling parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 9/13/20 4:50 PM, Daniel Kozak wrote:
 
 
 On Sun, Sep 13, 2020 at 1:25 PM ikod via Digitalmars-d 
 <digitalmars-d puremagic.com <mailto:digitalmars-d puremagic.com>> wrote:
 
 
     Default runtime map provide some properties which may be
     unavailable for other implementations. Its speed may be affected
     by GC, it uses to keep table items.
 
     You may include some hash map implementations from the dub repo
     in your benchmark.
 
 
 I have tried emsi_containers, memutils and  ikod-containers.
 default D's AA take 1s on my pc
 emsi_containers take 3s realy slow
 memutils was fast but does not realy work (seems it is broken because 
 there has been nan at d[0] and many  other indexes)
 ikod-containers take 242ms
 
The allocation patterns definitely affect the builtin containers. A long time ago, dcollections kicked the pants off of builtin AAs for most patterns, basically by using different memory allocation strategies (for example, allocating blocks of elements at a time, and using C malloc when no pointers were detected in the type). Builtin AA's will always have to be slower, because they must allocate a separate piece of memory for each bucket to retain the property that you can get a pointer to any element, and that memory will survive anything the AA can do. -Steve
Sep 13 2020
parent FeepingCreature <feepingcreature gmail.com> writes:
On Sunday, 13 September 2020 at 21:29:53 UTC, Steven 
Schveighoffer wrote:
 Builtin AA's will always have to be slower, because they must 
 allocate a separate piece of memory for each bucket to retain 
 the property that you can get a pointer to any element, and 
 that memory will survive anything the AA can do.

 -Steve
This reinforces my increasing conviction that referentiability-by-default is on a level with nullability-by-default and mutability-by-default in the "Great Collection of C-Derived Language Design Errors".
Sep 15 2020
prev sibling parent reply James Blachly <james.blachly gmail.com> writes:
On 9/13/20 4:50 PM, Daniel Kozak wrote:
 I have tried emsi_containers, memutils and  ikod-containers.
 default D's AA take 1s on my pc
 emsi_containers take 3s realy slow
 memutils was fast but does not realy work (seems it is broken because 
 there has been nan at d[0] and many  other indexes)
 ikod-containers take 242ms
ikod-containers is nice. Don't forget also dklib's khash =) https://github.com/blachlylab/dklib/
Sep 13 2020
next sibling parent reply Daniel Kozak <kozzi11 gmail.com> writes:
On Mon, Sep 14, 2020 at 3:55 AM James Blachly via Digitalmars-d <
digitalmars-d puremagic.com> wrote:

 On 9/13/20 4:50 PM, Daniel Kozak wrote:
 I have tried emsi_containers, memutils and  ikod-containers.
 default D's AA take 1s on my pc
 emsi_containers take 3s realy slow
 memutils was fast but does not realy work (seems it is broken because
 there has been nan at d[0] and many  other indexes)
 ikod-containers take 242ms
ikod-containers is nice. Don't forget also dklib's khash =) https://github.com/blachlylab/dklib/
Does not work, first I have to change auto map = khash!(keytype, valuetype); to auto map = khash!(keytype, valuetype)(); Because I get (source/app.d(20,11): Error: type khash!(uint, double, true, true) has no value) And after that it takes 262ms but end up with SIGSEGV. But it print right sum, so I guess it is something with freeing when try to dealocate hashmap
Sep 13 2020
parent reply mw <mingwu gmail.com> writes:
On Monday, 14 September 2020 at 05:25:03 UTC, Daniel Kozak wrote:
 Does not work, first I have to change
 ...
 And after that it takes 262ms but end up with SIGSEGV.
Create a github repo, so everyone can run the benchmark, ... and fix it?
Sep 13 2020
parent reply ikod <igor.khasilev gmail.com> writes:
On Monday, 14 September 2020 at 05:31:57 UTC, mw wrote:
 On Monday, 14 September 2020 at 05:25:03 UTC, Daniel Kozak 
 wrote:
 Does not work, first I have to change
 ...
 And after that it takes 262ms but end up with SIGSEGV.
Create a github repo, so everyone can run the benchmark, ... and fix it?
Nice idea! It also should contain memory measurements and feature matrix (as feature set can affect speed). It also would be nice to have benchmarks on some different key/value types or sizes, like class as key or value, struct(fat struct) as key or value, etc..
Sep 13 2020
parent ikod <igor.khasilev gmail.com> writes:
On Monday, 14 September 2020 at 06:02:49 UTC, ikod wrote:
 On Monday, 14 September 2020 at 05:31:57 UTC, mw wrote:
 On Monday, 14 September 2020 at 05:25:03 UTC, Daniel Kozak 
 wrote:
 Does not work, first I have to change
 ...
 And after that it takes 262ms but end up with SIGSEGV.
Create a github repo, so everyone can run the benchmark, ... and fix it?
Nice idea! It also should contain memory measurements and feature matrix (as feature set can affect speed). It also would be nice to have benchmarks on some different key/value types or sizes, like class as key or value, struct(fat struct) as key or value, etc..
I added benchmark folder to my project https://github.com/ikod/ikod-containers/tree/master/bench Please add your implementations Thanks!
Sep 24 2020
prev sibling parent reply Daniel Kozak <kozzi11 gmail.com> writes:
On Mon, Sep 14, 2020 at 7:25 AM Daniel Kozak <kozzi11 gmail.com> wrote:

 Does not work, first I have to change

 auto map = khash!(keytype, valuetype);

 to

 auto map = khash!(keytype, valuetype)();

 Because I get (source/app.d(20,11): Error: type khash!(uint, double, true,
 true) has no value)

 And after that it takes 262ms but end up with SIGSEGV.
 But it print right sum, so I guess it is something with freeing when try
 to dealocate hashmap
But this line will fix that for me: extern(C) __gshared string[] rt_options = [ "gcopt=parallel:0 cleanup:none" ]; So it is something with GC trying to free already freed memory I geuss
Sep 13 2020
parent James Blachly <james.blachly gmail.com> writes:
On 9/14/20 1:28 AM, Daniel Kozak wrote:
 On Mon, Sep 14, 2020 at 7:25 AM Daniel Kozak <kozzi11 gmail.com 
 <mailto:kozzi11 gmail.com>> wrote:
 
 
     Does not work, first I have to change
 
     auto map = khash!(keytype, valuetype);
 
     to
 
     auto map = khash!(keytype, valuetype)();
 
     Because I get (source/app.d(20,11): Error: type khash!(uint, double,
     true, true) has no value)
 
     And after that it takes 262ms but end up with SIGSEGV.
     But it print right sum, so I guess it is something with freeing when
     try to dealocate hashmap
 
 
 But this line will fix that for me:
 
 extern(C) __gshared string[] rt_options = [ "gcopt=parallel:0 
 cleanup:none" ];
 
 So it is something with GC trying to free already freed memory I geuss
Hmm, thank you for that. We are using extensively internally and have not had segfault. Can you share the benchmarking code? I will of course update the readme to include the omitted function call ();
Sep 14 2020