www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Cost of assoc array?

reply "Chris" <wendlec tcd.ie> writes:
I have code that uses the following:

string[][size_t] myArray;

1. myArray = [0:["t", "o", "m"], 1:["s", "m", "i", "th"]];

However, I've found out that I never need an assoc array and a 
"linear" array would be just fine, as in

2. myArray = [["t", "o", "m"], ["s", "m", "i", "th"]];

Is there any huge difference as regards performance and memory 
footprint between the two? Or is 2. basically 1. under the hood?
May 14 2014
next sibling parent reply "bearophile" <bearophileHUGS lycos.com> writes:
Chris:

 Is there any huge difference as regards performance and memory 
 footprint between the two? Or is 2. basically 1. under the hood?
An associative array is a rather more complex data structure, so if you don't need it, use something simpler. There is difference in both the amount of memory used and performance (in many cases such difference doesn't matter). In D there are also differences in the way they are initialized from a null or fat null. Bye, bearophile
May 14 2014
parent reply "Chris" <wendlec tcd.ie> writes:
On Wednesday, 14 May 2014 at 10:20:51 UTC, bearophile wrote:
 Chris:

 Is there any huge difference as regards performance and memory 
 footprint between the two? Or is 2. basically 1. under the 
 hood?
An associative array is a rather more complex data structure, so if you don't need it, use something simpler. There is difference in both the amount of memory used and performance (in many cases such difference doesn't matter). In D there are also differences in the way they are initialized from a null or fat null. Bye, bearophile
Thanks. Do you mean the difference is negligible in many cases? I'm not sure, because changing it would be a breaking change in the old code, meaning I would have to change various methods.
May 14 2014
next sibling parent dennis luehring <dl.soluz gmx.net> writes:
Am 14.05.2014 12:33, schrieb Chris:
 On Wednesday, 14 May 2014 at 10:20:51 UTC, bearophile wrote:
 Chris:

 Is there any huge difference as regards performance and memory
 footprint between the two? Or is 2. basically 1. under the
 hood?
An associative array is a rather more complex data structure, so if you don't need it, use something simpler. There is difference in both the amount of memory used and performance (in many cases such difference doesn't matter). In D there are also differences in the way they are initialized from a null or fat null. Bye, bearophile
Thanks. Do you mean the difference is negligible in many cases? I'm not sure, because changing it would be a breaking change in the old code, meaning I would have to change various methods.
a simple array would be faster because no access is generate for your key - just plain access, just don't use assoc arrays if you don't need key based access read more manuals about hashmaps and stuff and how to do benchmarking - helps alot in the future
May 14 2014
prev sibling parent "bearophile" <bearophileHUGS lycos.com> writes:
Chris:

 Do you mean the difference is negligible in many cases?
Yes, but you have to profile the code (or reason about it well, with knowledge of the data structures and their usage patterns) if you want to know what your case is. Bye, bearophile
May 14 2014
prev sibling parent reply "John Colvin" <john.loughran.colvin gmail.com> writes:
On Wednesday, 14 May 2014 at 09:38:10 UTC, Chris wrote:
 I have code that uses the following:

 string[][size_t] myArray;

 1. myArray = [0:["t", "o", "m"], 1:["s", "m", "i", "th"]];

 However, I've found out that I never need an assoc array and a 
 "linear" array would be just fine, as in

 2. myArray = [["t", "o", "m"], ["s", "m", "i", "th"]];

 Is there any huge difference as regards performance and memory 
 footprint between the two? Or is 2. basically 1. under the hood?
If you don't need the features of associative arrays, don't use them. Normal arrays are much simpler, faster and (due to some outstanding problems with associative arrays in D) less bug-prone. Associative arrays, by definition, require a lot more work behind the scenes for both reading and writing.
May 14 2014
parent reply "Chris" <wendlec tcd.ie> writes:
On Wednesday, 14 May 2014 at 11:13:10 UTC, John Colvin wrote:
 On Wednesday, 14 May 2014 at 09:38:10 UTC, Chris wrote:
 I have code that uses the following:

 string[][size_t] myArray;

 1. myArray = [0:["t", "o", "m"], 1:["s", "m", "i", "th"]];

 However, I've found out that I never need an assoc array and a 
 "linear" array would be just fine, as in

 2. myArray = [["t", "o", "m"], ["s", "m", "i", "th"]];

 Is there any huge difference as regards performance and memory 
 footprint between the two? Or is 2. basically 1. under the 
 hood?
If you don't need the features of associative arrays, don't use them. Normal arrays are much simpler, faster and (due to some outstanding problems with associative arrays in D) less bug-prone. Associative arrays, by definition, require a lot more work behind the scenes for both reading and writing.
The question is, if they are _much_ faster. With this type (string[][size_t]) I haven't encountered any bugs yet. On the other hand, it introduces some rather stilted logic sometimes, as in foreach (size_t i; 0..myArray.length) { // do something with myArray[i]; } because it's not sorted. This, or I sort it first. Anyway, there's always an overhead associated with associative arrays. I'll have to see how big this breaking change would be, and decide, if it's worth it. Profiling is not really feasible, because for this to work properly, I would have to introduce the change first to be able to compare both. Nothing worse than carefully changing things only to find out, it doesn't really speed up things.
May 14 2014
next sibling parent reply dennis luehring <dl.soluz gmx.net> writes:
Am 14.05.2014 15:20, schrieb Chris:
 Profiling is not really feasible, because for this to work
 properly, I would have to introduce the change first to be able
 to compare both. Nothing worse than carefully changing things
 only to find out, it doesn't really speed up things.
why not using an alias for easier switch between the versions? alias string[][size_t] my_array_type or alias string[][] my_array_type an do an search&replace of "string[][size_t]" with "my_array_type" thats it - still too hard :)
May 14 2014
parent "Chris" <wendlec tcd.ie> writes:
On Wednesday, 14 May 2014 at 13:31:53 UTC, dennis luehring wrote:
 Am 14.05.2014 15:20, schrieb Chris:
 Profiling is not really feasible, because for this to work
 properly, I would have to introduce the change first to be able
 to compare both. Nothing worse than carefully changing things
 only to find out, it doesn't really speed up things.
why not using an alias for easier switch between the versions? alias string[][size_t] my_array_type or alias string[][] my_array_type an do an search&replace of "string[][size_t]" with "my_array_type" thats it - still too hard :)
Talking about not seeing the forest for the trees ... I'll try that. :)
May 14 2014
prev sibling next sibling parent reply "John Colvin" <john.loughran.colvin gmail.com> writes:
On Wednesday, 14 May 2014 at 13:20:40 UTC, Chris wrote:
 On Wednesday, 14 May 2014 at 11:13:10 UTC, John Colvin wrote:
 On Wednesday, 14 May 2014 at 09:38:10 UTC, Chris wrote:
 I have code that uses the following:

 string[][size_t] myArray;

 1. myArray = [0:["t", "o", "m"], 1:["s", "m", "i", "th"]];

 However, I've found out that I never need an assoc array and 
 a "linear" array would be just fine, as in

 2. myArray = [["t", "o", "m"], ["s", "m", "i", "th"]];

 Is there any huge difference as regards performance and 
 memory footprint between the two? Or is 2. basically 1. under 
 the hood?
If you don't need the features of associative arrays, don't use them. Normal arrays are much simpler, faster and (due to some outstanding problems with associative arrays in D) less bug-prone. Associative arrays, by definition, require a lot more work behind the scenes for both reading and writing.
The question is, if they are _much_ faster. With this type (string[][size_t]) I haven't encountered any bugs yet. On the other hand, it introduces some rather stilted logic sometimes, as in foreach (size_t i; 0..myArray.length) { // do something with myArray[i]; } because it's not sorted. This, or I sort it first. Anyway, there's always an overhead associated with associative arrays. I'll have to see how big this breaking change would be, and decide, if it's worth it. Profiling is not really feasible, because for this to work properly, I would have to introduce the change first to be able to compare both. Nothing worse than carefully changing things only to find out, it doesn't really speed up things.
Yes, they are much faster. Normal array indexing is equivalent to *(myArray.ptr + index) plus an optional bounds check, whereas associative array indexing is a much, much larger job. Why were you using associative arrays in the first place? Unless your keys are somehow sparse* or of a non-integer type there isn't any reason to. * How I see that constraint in that context: (maxKey - minKey) / nElements > 1 + epsilon where epsilon is the maximum proportional wasted space you could afford in a normal array (emptyElements / usedElements). Bear in mind the memory overhead of associative arrays is itself non-zero. Also, while normal arrays tend to be more cache friendly than associative arrays, this isn't true for very sparse arrays.
May 14 2014
parent "Chris" <wendlec tcd.ie> writes:
On Wednesday, 14 May 2014 at 13:44:40 UTC, John Colvin wrote:
 Yes, they are much faster. Normal array indexing is equivalent 
 to *(myArray.ptr + index) plus an optional bounds check, 
 whereas associative array indexing is a much, much larger job.

 Why were you using associative arrays in the first place? 
 Unless your keys are somehow sparse* or of a non-integer type 
 there isn't any reason to.
It is very old code (dmd 2.052 times). When I set out to write it, I was a) new to the language and b) I could not yet tell whether or not I would need it (I think it also had to do with the way D was back then, but I don't remember my exact reasoning). It has always been in the back of my mind to change it into a normal array. It's basically a code corpse.
 * How I see that constraint in that context:

 (maxKey - minKey) / nElements > 1 + epsilon
 where epsilon is the maximum proportional wasted space you 
 could afford in a normal array (emptyElements / usedElements). 
 Bear in mind the memory overhead of associative arrays is 
 itself non-zero.

 Also, while normal arrays tend to be more cache friendly than 
 associative arrays, this isn't true for very sparse arrays.
May 14 2014
prev sibling parent reply "bearophile" <bearophileHUGS lycos.com> writes:
Chris:

 foreach (size_t i; 0..myArray.length) {
   // do something with myArray[i];
 }
There are various better ways to use a foreach on an array: foreach (immutable x; myArray) { foreach (ref const x; myArray) { foreach (ref x; myArray) { Bye, bearophile
May 14 2014
parent "John Colvin" <john.loughran.colvin gmail.com> writes:
On Wednesday, 14 May 2014 at 13:49:22 UTC, bearophile wrote:
 Chris:

 foreach (size_t i; 0..myArray.length) {
  // do something with myArray[i];
 }
There are various better ways to use a foreach on an array: foreach (immutable x; myArray) { foreach (ref const x; myArray) { foreach (ref x; myArray) { Bye, bearophile
His code was for an associative array.
May 14 2014