www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - log(n) amortized growth for containers

reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Many thanks to Jason Causey and Jakob Ovrum for their work on heaps and 
Randomized Slide to Front.

Here's a quick thought on growth schedule for arrays. The classic 
approach is to grow by a geometric schedule. For example: 1, 2, 4, 8, 
16, etc. That way, n memory moves are necessary for achieving a capacity 
equal to n, so on average we spend a constant amount of time per 
appended element.

The main disadvantage with this growth schedule is there are on average 
n/2 slack memory slots. Also, a growth factor of 2 is unfriendly to the 
memory allocator (long discussion) so a smaller one (such as the golden 
cut or less) would be better.

Anyhow, I was thinking why O(1) amortized growth and not O(log n)? The 
latter is considered scalable all right and the hope is to get a 
corresponding reduction in the slack memory space.

To figure a growth schedule that leads to log(n) work per element, a 
differential equation must be solved:

s(n) / s'(n) = log(x)

where s(n) will be the (continuous extension of the) growth schedule's 
sum, i.e. s(n) = f(1) + f(2) + ... + f(n). In other word, s(n) is how 
many elements we moved around, and s'(n) is the capacity after n steps. 
The division gives us how much work was done per element.

To verify, if s(n) = n, you get O(n) work per element, i.e. quadratic 
behavior for linear capacity growth. If s(n) = 2^n, you get O(1) work 
per element.

So I entered the equation above in Wolfram:

https://www.wolframalpha.com/input/?i=y%28x%29+%2F+y%27%28x%29+%3D+log%28x%29

So the growth schedule is exponential of the LogIntegral(n). Let's write 
it as:

f(n) = 2^^li(n)

Not too nice, but easy to tabulate for the relevant ranges. Let's take a 
look at the slack space at capacity f(n):

slack(f(n)) = f(n) - f(n - 1)

Sadly the expression doesn't seem to simplify and I couldn't find a good 
online plotter that can plot 2^^li(x). Here's what Mathematica free shows:

https://www.wolframalpha.com/input/?i=2%5Eli%28x%29+-+2%5Eli%28x+-+1%29

Compare with:

https://www.wolframalpha.com/input/?i=2%5Ex+-+2%5E%28x+-+1%29

The slack elements seem to grow considerably slower, which is great.

Does anyone know of a plotter that supports logarithmic integral? Of 
course better ideas about all of the above are welcome!


Thanks,

Andrei
Dec 10 2015
next sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 12/10/15 10:55 PM, Andrei Alexandrescu wrote:
 So the growth schedule is exponential of the LogIntegral(n). Let's write
 it as:

 f(n) = 2^^li(n)
I was wrong here, that's the number of total moves. The growth schedule is proportional to the derivative of that: f(n) = 2^^li(n)/log(n) Andrei
Dec 10 2015
prev sibling next sibling parent reply Timon Gehr <timon.gehr gmx.ch> writes:
On 12/11/2015 04:55 AM, Andrei Alexandrescu wrote:
 Here's a quick thought on growth schedule for arrays. The classic
 approach is to grow by a geometric schedule. For example: 1, 2, 4, 8,
 16, etc. That way, n memory moves are necessary for achieving a capacity
 equal to n, so on average we spend a constant amount of time per
 appended element.

 The main disadvantage with this growth schedule is there are on average
 n/2 slack memory slots. Also, a growth factor of 2 is unfriendly to the
 memory allocator (long discussion) so a smaller one (such as the golden
 cut or less) would be better.

 Anyhow, I was thinking why O(1) amortized growth and not O(log n)? The
 latter is considered scalable all right and the hope is to get a
 corresponding reduction in the slack memory space. ...
E.g. the following sequence of capacities leads to amortized Θ(log n). c(0) = 0, c(k+1) = c(k)+c(k)/⌊log₂(c(k)+2)⌋+1.
Dec 11 2015
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 12/11/15 9:02 PM, Timon Gehr wrote:
 On 12/11/2015 04:55 AM, Andrei Alexandrescu wrote:
 Here's a quick thought on growth schedule for arrays. The classic
 approach is to grow by a geometric schedule. For example: 1, 2, 4, 8,
 16, etc. That way, n memory moves are necessary for achieving a capacity
 equal to n, so on average we spend a constant amount of time per
 appended element.

 The main disadvantage with this growth schedule is there are on average
 n/2 slack memory slots. Also, a growth factor of 2 is unfriendly to the
 memory allocator (long discussion) so a smaller one (such as the golden
 cut or less) would be better.

 Anyhow, I was thinking why O(1) amortized growth and not O(log n)? The
 latter is considered scalable all right and the hope is to get a
 corresponding reduction in the slack memory space. ...
E.g. the following sequence of capacities leads to amortized Θ(log n). c(0) = 0, c(k+1) = c(k)+c(k)/⌊log₂(c(k)+2)⌋+1.
How do you prove that? -- Andrei
Dec 11 2015
parent Marco Nembrini <marco.nembrini.co gmail.com> writes:
On Saturday, 12 December 2015 at 02:12:13 UTC, Andrei 
Alexandrescu wrote:
 On 12/11/15 9:02 PM, Timon Gehr wrote:
 On 12/11/2015 04:55 AM, Andrei Alexandrescu wrote:
 Here's a quick thought on growth schedule for arrays. The 
 classic
 approach is to grow by a geometric schedule. For example: 1, 
 2, 4, 8,
 16, etc. That way, n memory moves are necessary for achieving 
 a capacity
 equal to n, so on average we spend a constant amount of time 
 per
 appended element.

 The main disadvantage with this growth schedule is there are 
 on average
 n/2 slack memory slots. Also, a growth factor of 2 is 
 unfriendly to the
 memory allocator (long discussion) so a smaller one (such as 
 the golden
 cut or less) would be better.

 Anyhow, I was thinking why O(1) amortized growth and not 
 O(log n)? The
 latter is considered scalable all right and the hope is to 
 get a
 corresponding reduction in the slack memory space. ...
E.g. the following sequence of capacities leads to amortized Θ(log n). c(0) = 0, c(k+1) = c(k)+c(k)/⌊log₂(c(k)+2)⌋+1.
How do you prove that? -- Andrei
Forget about going to continuous extensions, and substitute the derivative of s(n) with (s(n+1) - s(n)) / 1. Plugging into s(n) / s'(n) = log(n) and solving for s(n+1) gives you a recursion for s: s(n+1) = s(n) + s(n) / log(n). Then you decide what you want for s(0) and fix small things like log(0) not being defined or s(n) / log(n) not always being a whole number. You'll notice that Timon has log(c(k)) and not log(k) in his expression, which I think comes from some confusion in your original post between the number of elements n ( which is c(k)?) and the number of times you have to do move elements (k used by Timon)
Dec 12 2015
prev sibling next sibling parent deadalnix <deadalnix gmail.com> writes:
I've been using the same mechanism as jemalloc in SDC's runtime 
and it bucket basically by keeping 2 bits of precision + shift. 
It goes as :

4, 5, 6, 7, 8, 10, 12, 14, 16, 20, 24, 28, 32, 40, 48, 56, 64, ...

It work quite well in practice.
Dec 11 2015
prev sibling next sibling parent reply Jimmy Cao <jc2462 cornell.edu> writes:
On Friday, 11 December 2015 at 03:55:27 UTC, Andrei Alexandrescu 
wrote:
 Does anyone know of a plotter that supports logarithmic 
 integral? Of course better ideas about all of the above are 
 welcome!
Yes, my go-to for my university plotting needs is SageMath. Here is the plot of the two functions, the red one is without the li(x). The vertical axis scale is logarithmic. https://sagecell.sagemath.org/?z=eJyVjk0KgzAQhfeCdxjcmNAI1mXBK_QIlWAmaWhMZEzB9vRNBDdFCt3NPL73o1CDZiu_QFkAEMYneVgFdLcrc8EM1kc0JF1CODQHcnPmvCzKQqUgk4O-c9bNtoGZm6GHZZQxIg2zC5FNcmZaAElvkHXi3LacC9ByxDG4QH3lrLlHQ4i-EoDK7PouTZIeSIt9Y9-1-9vXSy1ykcN04mTT6lfNc__pYID5NYBQ_V_zAe2Da2c=&lang=sage
Dec 13 2015
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 12/13/15 2:56 PM, Jimmy Cao wrote:
 On Friday, 11 December 2015 at 03:55:27 UTC, Andrei Alexandrescu wrote:
 Does anyone know of a plotter that supports logarithmic integral? Of
 course better ideas about all of the above are welcome!
Yes, my go-to for my university plotting needs is SageMath. Here is the plot of the two functions, the red one is without the li(x). The vertical axis scale is logarithmic. https://sagecell.sagemath.org/?z=eJyVjk0KgzAQhfeCdxjcmNAI1mXBK_QIlWAmaWhMZEzB9vRNBDdFCt3NPL73o1CDZiu_QFkAEMYneVgFdLcrc8EM1kc0JF1CODQHcnPmvCzKQqUgk4O-c9bNtoGZm6GHZZQxIg2zC5FNcmZaAElvkHXi3LacC9ByxDG4QH3lrLlHQ4i-EoDK7PouTZIeSIt9Y9-1-9vXSy1ykcN04mTT6lfNc__pYID5NYBQ_V_zAe2Da2c=&lang=sage
Thanks, great! Now, this is going to be difficult to evaluate in the real world. Microbenchmarks are unlikely to be useful - most just grow a vector up to wazoo, which works faster with a more aggressive growth schedule. A better test is in a real application where several arrays compete for data cache. Andrei
Dec 13 2015
prev sibling parent Jimmy Cao <jc2462 cornell.edu> writes:
On Friday, 11 December 2015 at 03:55:27 UTC, Andrei Alexandrescu 
wrote:
 To figure a growth schedule that leads to log(n) work per 
 element, a differential equation must be solved:

 s(n) / s'(n) = log(x)

 where s(n) will be the (continuous extension of the) growth 
 schedule's sum, i.e. s(n) = f(1) + f(2) + ... + f(n). In other 
 word, s(n) is how many elements we moved around, and s'(n) is 
 the capacity after n steps. The division gives us how much work 
 was done per element.

 To verify, if s(n) = n, you get O(n) work per element, i.e. 
 quadratic behavior for linear capacity growth. If s(n) = 2^n, 
 you get O(1) work per element.

 So I entered the equation above in Wolfram:

 https://www.wolframalpha.com/input/?i=y%28x%29+%2F+y%27%28x%29+%3D+log%28x%29
I like this approach to amortized analysis. I have a question, though, the diff eq you have is s(n) / s'(n) = log(x) But the WolframAlpha query solves for effectively: s(n) / s'(n) = log(n) Why does x = n? Shouldn't x be s'(n), so that s(n)/s'(n), how much work was done per element, is bounded by the number of elements appended (that is about the capacity s'(n))? I thought number of elements != n, but rather something between s'(n) and s'(n-1)?
Dec 13 2015