digitalmars.D - log(n) amortized growth for containers

Andrei Alexandrescu (42/42) Dec 10 2015 Many thanks to Jason Causey and Jakob Ovrum for their work on heaps and

Andrei Alexandrescu (5/8) Dec 10 2015 I was wrong here, that's the number of total moves. The growth schedule
Timon Gehr (3/15) Dec 11 2015 E.g. the following sequence of capacities leads to amortized Θ(log n).

Andrei Alexandrescu (2/20) Dec 11 2015 How do you prove that? -- Andrei

Marco Nembrini (15/48) Dec 12 2015 Forget about going to continuous extensions, and substitute the

deadalnix (5/5) Dec 11 2015 I've been using the same mechanism as jemalloc in SDC's runtime
Jimmy Cao (6/9) Dec 13 2015 Yes, my go-to for my university plotting needs is SageMath.

Andrei Alexandrescu (8/16) Dec 13 2015 Thanks, great!

Jimmy Cao (13/26) Dec 13 2015 I like this approach to amortized analysis. I have a question,

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Many thanks to Jason Causey and Jakob Ovrum for their work on heaps and 
Randomized Slide to Front.

Here's a quick thought on growth schedule for arrays. The classic 
approach is to grow by a geometric schedule. For example: 1, 2, 4, 8, 
16, etc. That way, n memory moves are necessary for achieving a capacity 
equal to n, so on average we spend a constant amount of time per 
appended element.

The main disadvantage with this growth schedule is there are on average 
n/2 slack memory slots. Also, a growth factor of 2 is unfriendly to the 
memory allocator (long discussion) so a smaller one (such as the golden 
cut or less) would be better.

Anyhow, I was thinking why O(1) amortized growth and not O(log n)? The 
latter is considered scalable all right and the hope is to get a 
corresponding reduction in the slack memory space.

To figure a growth schedule that leads to log(n) work per element, a 
differential equation must be solved:

s(n) / s'(n) = log(x)

where s(n) will be the (continuous extension of the) growth schedule's 
sum, i.e. s(n) = f(1) + f(2) + ... + f(n). In other word, s(n) is how 
many elements we moved around, and s'(n) is the capacity after n steps. 
The division gives us how much work was done per element.

To verify, if s(n) = n, you get O(n) work per element, i.e. quadratic 
behavior for linear capacity growth. If s(n) = 2^n, you get O(1) work 
per element.

So I entered the equation above in Wolfram:

https://www.wolframalpha.com/input/?i=y%28x%29+%2F+y%27%28x%29+%3D+log%28x%29

So the growth schedule is exponential of the LogIntegral(n). Let's write 
it as:

f(n) = 2^^li(n)

Not too nice, but easy to tabulate for the relevant ranges. Let's take a 
look at the slack space at capacity f(n):

slack(f(n)) = f(n) - f(n - 1)

Sadly the expression doesn't seem to simplify and I couldn't find a good 
online plotter that can plot 2^^li(x). Here's what Mathematica free shows:

https://www.wolframalpha.com/input/?i=2%5Eli%28x%29+-+2%5Eli%28x+-+1%29

Compare with:

https://www.wolframalpha.com/input/?i=2%5Ex+-+2%5E%28x+-+1%29

The slack elements seem to grow considerably slower, which is great.

Does anyone know of a plotter that supports logarithmic integral? Of 
course better ideas about all of the above are welcome!


Thanks,

Andrei

Dec 10 2015

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 12/10/15 10:55 PM, Andrei Alexandrescu wrote:
 So the growth schedule is exponential of the LogIntegral(n). Let's write
 it as:

 f(n) = 2^^li(n)

I was wrong here, that's the number of total moves. The growth schedule 
is proportional to the derivative of that:

f(n) = 2^^li(n)/log(n)


Andrei

Dec 10 2015

Timon Gehr <timon.gehr gmx.ch> writes:

On 12/11/2015 04:55 AM, Andrei Alexandrescu wrote:
 Here's a quick thought on growth schedule for arrays. The classic
 approach is to grow by a geometric schedule. For example: 1, 2, 4, 8,
 16, etc. That way, n memory moves are necessary for achieving a capacity
 equal to n, so on average we spend a constant amount of time per
 appended element.

 The main disadvantage with this growth schedule is there are on average
 n/2 slack memory slots. Also, a growth factor of 2 is unfriendly to the
 memory allocator (long discussion) so a smaller one (such as the golden
 cut or less) would be better.

 Anyhow, I was thinking why O(1) amortized growth and not O(log n)? The
 latter is considered scalable all right and the hope is to get a
 corresponding reduction in the slack memory space. ...


E.g. the following sequence of capacities leads to amortized Θ(log n).

c(0) = 0, c(k+1) = c(k)+c(k)/⌊log₂(c(k)+2)⌋+1.

Dec 11 2015

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 12/11/15 9:02 PM, Timon Gehr wrote:
 On 12/11/2015 04:55 AM, Andrei Alexandrescu wrote:
 Here's a quick thought on growth schedule for arrays. The classic
 approach is to grow by a geometric schedule. For example: 1, 2, 4, 8,
 16, etc. That way, n memory moves are necessary for achieving a capacity
 equal to n, so on average we spend a constant amount of time per
 appended element.

 The main disadvantage with this growth schedule is there are on average
 n/2 slack memory slots. Also, a growth factor of 2 is unfriendly to the
 memory allocator (long discussion) so a smaller one (such as the golden
 cut or less) would be better.

 Anyhow, I was thinking why O(1) amortized growth and not O(log n)? The
 latter is considered scalable all right and the hope is to get a
 corresponding reduction in the slack memory space. ...


 E.g. the following sequence of capacities leads to amortized Θ(log n).

 c(0) = 0, c(k+1) = c(k)+c(k)/⌊log₂(c(k)+2)⌋+1.

How do you prove that? -- Andrei

Dec 11 2015

Marco Nembrini <marco.nembrini.co gmail.com> writes:

On Saturday, 12 December 2015 at 02:12:13 UTC, Andrei 
Alexandrescu wrote:
 On 12/11/15 9:02 PM, Timon Gehr wrote:
 On 12/11/2015 04:55 AM, Andrei Alexandrescu wrote:
 Here's a quick thought on growth schedule for arrays. The 
 classic
 approach is to grow by a geometric schedule. For example: 1, 
 2, 4, 8,
 16, etc. That way, n memory moves are necessary for achieving 
 a capacity
 equal to n, so on average we spend a constant amount of time 
 per
 appended element.

 The main disadvantage with this growth schedule is there are 
 on average
 n/2 slack memory slots. Also, a growth factor of 2 is 
 unfriendly to the
 memory allocator (long discussion) so a smaller one (such as 
 the golden
 cut or less) would be better.

 Anyhow, I was thinking why O(1) amortized growth and not 
 O(log n)? The
 latter is considered scalable all right and the hope is to 
 get a
 corresponding reduction in the slack memory space. ...


 E.g. the following sequence of capacities leads to amortized 
 Θ(log n).

 c(0) = 0, c(k+1) = c(k)+c(k)/⌊log₂(c(k)+2)⌋+1.

 How do you prove that? -- Andrei

Forget about going to continuous extensions, and substitute the 
derivative of s(n) with (s(n+1) - s(n)) / 1.
Plugging into s(n) / s'(n) = log(n) and solving for s(n+1) gives 
you a recursion for s:
s(n+1) = s(n) + s(n) / log(n).
Then you decide what you want for s(0) and fix small things like 
log(0) not being defined or s(n) / log(n) not always being a 
whole number.

You'll notice that Timon has log(c(k)) and not log(k) in his 
expression, which I think comes from some confusion in your 
original post between the number of elements n ( which is c(k)?) 
and the number of times you have to do move elements (k used by 
Timon)

Dec 12 2015

deadalnix <deadalnix gmail.com> writes:

I've been using the same mechanism as jemalloc in SDC's runtime 
and it bucket basically by keeping 2 bits of precision + shift. 
It goes as :

4, 5, 6, 7, 8, 10, 12, 14, 16, 20, 24, 28, 32, 40, 48, 56, 64, ...

It work quite well in practice.

Dec 11 2015

Jimmy Cao <jc2462 cornell.edu> writes:

On Friday, 11 December 2015 at 03:55:27 UTC, Andrei Alexandrescu 
wrote:
 Does anyone know of a plotter that supports logarithmic 
 integral? Of course better ideas about all of the above are 
 welcome!

Yes, my go-to for my university plotting needs is SageMath.

Here is the plot of the two functions, the red one is without the 
li(x).  The vertical axis scale is logarithmic.

https://sagecell.sagemath.org/?z=eJyVjk0KgzAQhfeCdxjcmNAI1mXBK_QIlWAmaWhMZEzB9vRNBDdFCt3NPL73o1CDZiu_QFkAEMYneVgFdLcrc8EM1kc0JF1CODQHcnPmvCzKQqUgk4O-c9bNtoGZm6GHZZQxIg2zC5FNcmZaAElvkHXi3LacC9ByxDG4QH3lrLlHQ4i-EoDK7PouTZIeSIt9Y9-1-9vXSy1ykcN04mTT6lfNc__pYID5NYBQ_V_zAe2Da2c=&lang=sage

Dec 13 2015

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 12/13/15 2:56 PM, Jimmy Cao wrote:
 On Friday, 11 December 2015 at 03:55:27 UTC, Andrei Alexandrescu wrote:
 Does anyone know of a plotter that supports logarithmic integral? Of
 course better ideas about all of the above are welcome!

 Yes, my go-to for my university plotting needs is SageMath.

 Here is the plot of the two functions, the red one is without the
 li(x).  The vertical axis scale is logarithmic.

 https://sagecell.sagemath.org/?z=eJyVjk0KgzAQhfeCdxjcmNAI1mXBK_QIlWAmaWhMZEzB9vRNBDdFCt3NPL73o1CDZiu_QFkAEMYneVgFdLcrc8EM1kc0JF1CODQHcnPmvCzKQqUgk4O-c9bNtoGZm6GHZZQxIg2zC5FNcmZaAElvkHXi3LacC9ByxDG4QH3lrLlHQ4i-EoDK7PouTZIeSIt9Y9-1-9vXSy1ykcN04mTT6lfNc__pYID5NYBQ_V_zAe2Da2c=&lang=sage

Thanks, great!

Now, this is going to be difficult to evaluate in the real world. 
Microbenchmarks are unlikely to be useful - most just grow a vector up 
to wazoo, which works faster with a more aggressive growth schedule. A 
better test is in a real application where several arrays compete for 
data cache.


Andrei

Dec 13 2015

Jimmy Cao <jc2462 cornell.edu> writes:

On Friday, 11 December 2015 at 03:55:27 UTC, Andrei Alexandrescu 
wrote:
 To figure a growth schedule that leads to log(n) work per 
 element, a differential equation must be solved:

 s(n) / s'(n) = log(x)

 where s(n) will be the (continuous extension of the) growth 
 schedule's sum, i.e. s(n) = f(1) + f(2) + ... + f(n). In other 
 word, s(n) is how many elements we moved around, and s'(n) is 
 the capacity after n steps. The division gives us how much work 
 was done per element.

 To verify, if s(n) = n, you get O(n) work per element, i.e. 
 quadratic behavior for linear capacity growth. If s(n) = 2^n, 
 you get O(1) work per element.

 So I entered the equation above in Wolfram:

 https://www.wolframalpha.com/input/?i=y%28x%29+%2F+y%27%28x%29+%3D+log%28x%29

I like this approach to amortized analysis.  I have a question, 
though,

the diff eq you have is

s(n) / s'(n) = log(x)

But the WolframAlpha query solves for effectively:

s(n) / s'(n) = log(n)

Why does x = n?

Shouldn't x be s'(n), so that s(n)/s'(n), how much work was done 
per element, is bounded by the number of elements appended (that 
is about the capacity s'(n))?  I thought number of elements != n, 
but rather something between s'(n) and s'(n-1)?

Dec 13 2015

D Programming

C/C++ Programming

Other

digitalmars.D - log(n) amortized growth for containers