digitalmars.D - log(n) amortized growth for containers
- Andrei Alexandrescu (42/42) Dec 10 2015 Many thanks to Jason Causey and Jakob Ovrum for their work on heaps and
- Andrei Alexandrescu (5/8) Dec 10 2015 I was wrong here, that's the number of total moves. The growth schedule
- Timon Gehr (3/15) Dec 11 2015 E.g. the following sequence of capacities leads to amortized Θ(log n).
- Andrei Alexandrescu (2/20) Dec 11 2015 How do you prove that? -- Andrei
- Marco Nembrini (15/48) Dec 12 2015 Forget about going to continuous extensions, and substitute the
- deadalnix (5/5) Dec 11 2015 I've been using the same mechanism as jemalloc in SDC's runtime
- Jimmy Cao (6/9) Dec 13 2015 Yes, my go-to for my university plotting needs is SageMath.
- Andrei Alexandrescu (8/16) Dec 13 2015 Thanks, great!
- Jimmy Cao (13/26) Dec 13 2015 I like this approach to amortized analysis. I have a question,
Many thanks to Jason Causey and Jakob Ovrum for their work on heaps and Randomized Slide to Front. Here's a quick thought on growth schedule for arrays. The classic approach is to grow by a geometric schedule. For example: 1, 2, 4, 8, 16, etc. That way, n memory moves are necessary for achieving a capacity equal to n, so on average we spend a constant amount of time per appended element. The main disadvantage with this growth schedule is there are on average n/2 slack memory slots. Also, a growth factor of 2 is unfriendly to the memory allocator (long discussion) so a smaller one (such as the golden cut or less) would be better. Anyhow, I was thinking why O(1) amortized growth and not O(log n)? The latter is considered scalable all right and the hope is to get a corresponding reduction in the slack memory space. To figure a growth schedule that leads to log(n) work per element, a differential equation must be solved: s(n) / s'(n) = log(x) where s(n) will be the (continuous extension of the) growth schedule's sum, i.e. s(n) = f(1) + f(2) + ... + f(n). In other word, s(n) is how many elements we moved around, and s'(n) is the capacity after n steps. The division gives us how much work was done per element. To verify, if s(n) = n, you get O(n) work per element, i.e. quadratic behavior for linear capacity growth. If s(n) = 2^n, you get O(1) work per element. So I entered the equation above in Wolfram: https://www.wolframalpha.com/input/?i=y%28x%29+%2F+y%27%28x%29+%3D+log%28x%29 So the growth schedule is exponential of the LogIntegral(n). Let's write it as: f(n) = 2^^li(n) Not too nice, but easy to tabulate for the relevant ranges. Let's take a look at the slack space at capacity f(n): slack(f(n)) = f(n) - f(n - 1) Sadly the expression doesn't seem to simplify and I couldn't find a good online plotter that can plot 2^^li(x). Here's what Mathematica free shows: https://www.wolframalpha.com/input/?i=2%5Eli%28x%29+-+2%5Eli%28x+-+1%29 Compare with: https://www.wolframalpha.com/input/?i=2%5Ex+-+2%5E%28x+-+1%29 The slack elements seem to grow considerably slower, which is great. Does anyone know of a plotter that supports logarithmic integral? Of course better ideas about all of the above are welcome! Thanks, Andrei
Dec 10 2015
On 12/10/15 10:55 PM, Andrei Alexandrescu wrote:So the growth schedule is exponential of the LogIntegral(n). Let's write it as: f(n) = 2^^li(n)I was wrong here, that's the number of total moves. The growth schedule is proportional to the derivative of that: f(n) = 2^^li(n)/log(n) Andrei
Dec 10 2015
On 12/11/2015 04:55 AM, Andrei Alexandrescu wrote:Here's a quick thought on growth schedule for arrays. The classic approach is to grow by a geometric schedule. For example: 1, 2, 4, 8, 16, etc. That way, n memory moves are necessary for achieving a capacity equal to n, so on average we spend a constant amount of time per appended element. The main disadvantage with this growth schedule is there are on average n/2 slack memory slots. Also, a growth factor of 2 is unfriendly to the memory allocator (long discussion) so a smaller one (such as the golden cut or less) would be better. Anyhow, I was thinking why O(1) amortized growth and not O(log n)? The latter is considered scalable all right and the hope is to get a corresponding reduction in the slack memory space. ...E.g. the following sequence of capacities leads to amortized Θ(log n). c(0) = 0, c(k+1) = c(k)+c(k)/⌊log₂(c(k)+2)⌋+1.
Dec 11 2015
On 12/11/15 9:02 PM, Timon Gehr wrote:On 12/11/2015 04:55 AM, Andrei Alexandrescu wrote:How do you prove that? -- AndreiHere's a quick thought on growth schedule for arrays. The classic approach is to grow by a geometric schedule. For example: 1, 2, 4, 8, 16, etc. That way, n memory moves are necessary for achieving a capacity equal to n, so on average we spend a constant amount of time per appended element. The main disadvantage with this growth schedule is there are on average n/2 slack memory slots. Also, a growth factor of 2 is unfriendly to the memory allocator (long discussion) so a smaller one (such as the golden cut or less) would be better. Anyhow, I was thinking why O(1) amortized growth and not O(log n)? The latter is considered scalable all right and the hope is to get a corresponding reduction in the slack memory space. ...E.g. the following sequence of capacities leads to amortized Θ(log n). c(0) = 0, c(k+1) = c(k)+c(k)/⌊log₂(c(k)+2)⌋+1.
Dec 11 2015
On Saturday, 12 December 2015 at 02:12:13 UTC, Andrei Alexandrescu wrote:On 12/11/15 9:02 PM, Timon Gehr wrote:Forget about going to continuous extensions, and substitute the derivative of s(n) with (s(n+1) - s(n)) / 1. Plugging into s(n) / s'(n) = log(n) and solving for s(n+1) gives you a recursion for s: s(n+1) = s(n) + s(n) / log(n). Then you decide what you want for s(0) and fix small things like log(0) not being defined or s(n) / log(n) not always being a whole number. You'll notice that Timon has log(c(k)) and not log(k) in his expression, which I think comes from some confusion in your original post between the number of elements n ( which is c(k)?) and the number of times you have to do move elements (k used by Timon)On 12/11/2015 04:55 AM, Andrei Alexandrescu wrote:How do you prove that? -- AndreiHere's a quick thought on growth schedule for arrays. The classic approach is to grow by a geometric schedule. For example: 1, 2, 4, 8, 16, etc. That way, n memory moves are necessary for achieving a capacity equal to n, so on average we spend a constant amount of time per appended element. The main disadvantage with this growth schedule is there are on average n/2 slack memory slots. Also, a growth factor of 2 is unfriendly to the memory allocator (long discussion) so a smaller one (such as the golden cut or less) would be better. Anyhow, I was thinking why O(1) amortized growth and not O(log n)? The latter is considered scalable all right and the hope is to get a corresponding reduction in the slack memory space. ...E.g. the following sequence of capacities leads to amortized Θ(log n). c(0) = 0, c(k+1) = c(k)+c(k)/⌊log₂(c(k)+2)⌋+1.
Dec 12 2015
I've been using the same mechanism as jemalloc in SDC's runtime and it bucket basically by keeping 2 bits of precision + shift. It goes as : 4, 5, 6, 7, 8, 10, 12, 14, 16, 20, 24, 28, 32, 40, 48, 56, 64, ... It work quite well in practice.
Dec 11 2015
On Friday, 11 December 2015 at 03:55:27 UTC, Andrei Alexandrescu wrote:Does anyone know of a plotter that supports logarithmic integral? Of course better ideas about all of the above are welcome!Yes, my go-to for my university plotting needs is SageMath. Here is the plot of the two functions, the red one is without the li(x). The vertical axis scale is logarithmic. https://sagecell.sagemath.org/?z=eJyVjk0KgzAQhfeCdxjcmNAI1mXBK_QIlWAmaWhMZEzB9vRNBDdFCt3NPL73o1CDZiu_QFkAEMYneVgFdLcrc8EM1kc0JF1CODQHcnPmvCzKQqUgk4O-c9bNtoGZm6GHZZQxIg2zC5FNcmZaAElvkHXi3LacC9ByxDG4QH3lrLlHQ4i-EoDK7PouTZIeSIt9Y9-1-9vXSy1ykcN04mTT6lfNc__pYID5NYBQ_V_zAe2Da2c=&lang=sage
Dec 13 2015
On 12/13/15 2:56 PM, Jimmy Cao wrote:On Friday, 11 December 2015 at 03:55:27 UTC, Andrei Alexandrescu wrote:Thanks, great! Now, this is going to be difficult to evaluate in the real world. Microbenchmarks are unlikely to be useful - most just grow a vector up to wazoo, which works faster with a more aggressive growth schedule. A better test is in a real application where several arrays compete for data cache. AndreiDoes anyone know of a plotter that supports logarithmic integral? Of course better ideas about all of the above are welcome!Yes, my go-to for my university plotting needs is SageMath. Here is the plot of the two functions, the red one is without the li(x). The vertical axis scale is logarithmic. https://sagecell.sagemath.org/?z=eJyVjk0KgzAQhfeCdxjcmNAI1mXBK_QIlWAmaWhMZEzB9vRNBDdFCt3NPL73o1CDZiu_QFkAEMYneVgFdLcrc8EM1kc0JF1CODQHcnPmvCzKQqUgk4O-c9bNtoGZm6GHZZQxIg2zC5FNcmZaAElvkHXi3LacC9ByxDG4QH3lrLlHQ4i-EoDK7PouTZIeSIt9Y9-1-9vXSy1ykcN04mTT6lfNc__pYID5NYBQ_V_zAe2Da2c=&lang=sage
Dec 13 2015
On Friday, 11 December 2015 at 03:55:27 UTC, Andrei Alexandrescu wrote:To figure a growth schedule that leads to log(n) work per element, a differential equation must be solved: s(n) / s'(n) = log(x) where s(n) will be the (continuous extension of the) growth schedule's sum, i.e. s(n) = f(1) + f(2) + ... + f(n). In other word, s(n) is how many elements we moved around, and s'(n) is the capacity after n steps. The division gives us how much work was done per element. To verify, if s(n) = n, you get O(n) work per element, i.e. quadratic behavior for linear capacity growth. If s(n) = 2^n, you get O(1) work per element. So I entered the equation above in Wolfram: https://www.wolframalpha.com/input/?i=y%28x%29+%2F+y%27%28x%29+%3D+log%28x%29I like this approach to amortized analysis. I have a question, though, the diff eq you have is s(n) / s'(n) = log(x) But the WolframAlpha query solves for effectively: s(n) / s'(n) = log(n) Why does x = n? Shouldn't x be s'(n), so that s(n)/s'(n), how much work was done per element, is bounded by the number of elements appended (that is about the capacity s'(n))? I thought number of elements != n, but rather something between s'(n) and s'(n-1)?
Dec 13 2015