digitalmars.D.learn - Efficiency of using array as stack

• Ivan Kazmenko (109/109) Mar 23 2013 Hello again!
• bearophile (6/7) Mar 23 2013 I think here D arrays are working according to their specs.
• Ivan Kazmenko (35/37) Mar 23 2013 Thank you for the suggestion! This seems to work fast enough and
• H. S. Teoh (68/72) Mar 23 2013 Using arrays as stack *can* be efficient -- but you should not be
• Ivan Kazmenko (17/22) Mar 23 2013 That looks good. However, I'd relocate on shrinking too if we
• Jonathan M Davis (36/39) Mar 23 2013 You might want to check out this article where someone ran into similar
• Ivan Kazmenko (67/73) Mar 23 2013 That was a good reading, thank you! The whole matter became
• Jonathan M Davis (30/35) Mar 23 2013 I'd start by looking in src/Object_.d, but it looks like the information...
• Steven Schveighoffer (58/123) Mar 25 2013 If you happened to get that accidental blank send, sorry, I tried to
• Ivan Kazmenko (13/20) Mar 25 2013 So that's a relief, a dynamic array in D actually *is* an
"Ivan Kazmenko" <gassa mail.ru> writes:
```Hello again!

Today, I had trouble using D built-in dynamic array as a stack:
it timed out badly.  I have then tried to reduce the problem down
to a minimal example.  I am using DMD 2.062.

Now, I have this sample program.  It creates an array of length
one million, and then keeps adding an element to the array and
removing it in a loop, also one million times.  Such a pattern,
and more complex ones, can easily occur whenever an array is used
as a stack.

What actually happens is constant relocation and movement
resulting in 1_000_000 * 1_000_000 operations which just looks

-----
import core.memory;
import std.range;
import std.stdio;

void main ()
{
version (NOGC) {GC.disable ();}
int n = 1_000_000;
auto s = array (iota (n));
s.reserve (n * 2);
foreach (i; 0..n)
{
s.length++;
debug {writefln ("after ++: %s %s", s.capacity, &s[0]);}
s.length--;
debug {writefln ("after --: %s %s", s.capacity, &s[0]);}
}
}
-----

Running debug build, we can see that the address of s[0] changes
after each increase of length, and the capacity is reduced to
zero after each decrease.  So, the line "s.reserve (n * 2)" fails
to hint the compiler that I want to allocate the array once and
then use it without relocation - and that I would like to be able
to do!

I was wondering if garbage collection has something to do with
the inefficiency, but the answer is no.  If we choose the "NOGC"
version, it quickly fills some large portion of memory (1GB in my
local test) with the copies and stops responding.  If GC is
active, it just constantly swaps between two memory locations.

Now, in section 4.1.10 of TDPL ("Assigning to .length") it is
stated that "If the array shrinks <...>, D guarantees that the
array is not reallocated", and later, "that guarantee does not
also imply that further expansions of the array will avoid
reallocation", so, formally, everything is by the letter of the
law so far.

However, inserting another "s.reserve (n * 2)" just after
"s.length--" does not help either.  And I think the whole thing
does contradict the intent described in TDPL section 4.1.9
("Expanding").  Here it goes: "If there is no slack space left,
... The allocator may find an empty block to the right of the
current block and gobble it into the current block.  This
operation is known as coalescing".  Now, this quote and its
context give no formal guarantee that the memory allocator works
exactly like that, but they definitely sound like a good thing to
do.

I hope many would agree that reducing the length once does not at
all imply we want to reduce it further.  Frankly, I have thought
so far that D dynamic arrays can be used as queue and stack
implementations done "right", i.e., efficiently.

So my two questions are:

1. I would like to have a way to reduce the length of the array
without removing the guarantees of the preceding "s.reserve".  Is
it possible in the current D implementation?  How?

2. Ideally, I would like a dynamic array in D to act efficiently
as a stack.  That means, the amortized cost of N stack operations
should be O(N).  To achieve this, I would propose "lazy"
reduction of the space reserved for the array.  I suppose the
implementation guarantees that for expanding arrays as shown in
the example at http://dlang.org/arrays.html#resize : when
capacity is equal to zero, the memory manager roughly doubles the
allocated size of the array.  The very same trick could be used
for array shrinking: instead of reducing the capacity to zero
(i.e., guaranteeing to allocate the exact amount, the memory
manager could leave the allocated size equal to (for example) min
(prev, cur * 2) where prev is the allocated size before the
shrinking and cur is the size used after the shrinking.

I suspect that the above suggestion could conflict some other
array usage patterns because the array syntax actually deals with
array views, not arrays "in flesh" (in memory).  One case I can
imagine is the following:

-----
import std.range;
import std.stdio;

void main ()
{
auto a = array (iota (30)); // [0, 1, ..., 29]
auto b = a[10..\$]; // [10, 11, ..., 19]
b.length -= 10; // *** what is now b.capacity?
b.length += 10; // now b should be [10, 11, ..., 19, 0, 0, ...,
0]
writeln (b);
}
-----

Now, at line ***, if memory manager leaves b.capacity as 10, it
will point at the space occupied by the array a, which does not
sound right.  As I'm not a D expert (yet? ;) such investigations
are insightful), I don't know right now whether this problem is

But at least if we don't have any more views into the memory
after b (as in my first example, and generally true when an array
is used as a stack), the memory manager could detect that and
take an optimal decision regarding b.capacity.

Thank you for reading to this point, I confess that was rather
lengthy.

-----
Ivan Kazmenko.
```
Mar 23 2013
"bearophile" <bearophileHUGS lycos.com> writes:
```Ivan Kazmenko:

so, formally, everything is by the letter of the law so far.

I think here D arrays are working according to their specs.

Call assumeSafeAppend every time you want to append with no
reallocations.

Bye,
bearophile
```
Mar 23 2013
"Ivan Kazmenko" <gassa mail.ru> writes:
```On Saturday, 23 March 2013 at 12:37:04 UTC, bearophile wrote:

Call assumeSafeAppend every time you want to append with no
reallocations.

Thank you for the suggestion!  This seems to work fast enough and
never relocate:

-----
import core.memory;
import std.range;
import std.stdio;

void main ()
{
version (NOGC) {GC.disable ();}
int n = 1_000_000;
auto s = array (iota (n));
s.reserve (n * 2);
foreach (i; 0..n)
{
assumeSafeAppend (s);
debug {writefln ("before ++: %s %s", s.capacity, &s[0]);}
s.length++;
debug {writefln ("after ++: %s %s", s.capacity, &s[0]);}
s.length--;
debug {writefln ("after --: %s %s", s.capacity, &s[0]);}
}
}
-----

Still, I am missing something here.  After assumeSafeAppend(s),
the capacity of s is restored to the original value (in my case,
2_000_891).  That means the memory manager keeps track of the
original array (memory, not view) and knows its limits.  Can't
it, or the compiler, also know there are no more writable views
into that portion of memory and just optimize out the "capacity =
0" part?  Does it lack such information, or is it hard to account
for all possible scenarios?  Or is "capacity = 0" actually useful
for some other concern?

-----
Ivan Kazmenko.
```
Mar 23 2013
=?UTF-8?B?QWxpIMOHZWhyZWxp?= <acehreli yahoo.com> writes:
```On 03/23/2013 03:28 PM, Ivan Kazmenko wrote:

Still, I am missing something here. After assumeSafeAppend(s), the
capacity of s is restored to the original value (in my case, 2_000_891).

assumeSafeAppend allows the programmer to say "I am sure there is no
other reference to the rest of this slice."

That means the memory manager keeps track of the original array (memory,
not view) and knows its limits.

Yes.

Can't it, or the compiler, also know
there are no more writable views into that portion of memory and just
optimize out the "capacity = 0" part? Does it lack such information, or
is it hard to account for all possible scenarios? Or is "capacity = 0"
actually useful for some other concern?

Such solutions would be too slow or consume too large memory. For each
slice to know about what other slices are doing, they would have to
reference some common information about how the elements in that region
of memory are being shared.

int[] makeArray()
{
auto result = new int[10];
// ...
return result;
}

void main()
{
auto a = makeArray();
auto b = a[0..\$/2];    // first half
auto c = a[\$/2..\$];    // second half

--a.length; /* The runtime would have to somehow let 'c' know
* that 'c' can now safely append, while neither
* 'a' nor 'b' can do so.  */
}

Quoting the array article, here is D's choice: "Remember, the slice
doesn't know what other slices refer to that data, or really where it is
in the array (it could be a segment at the beginning or the middle)."

http://dlang.org/d-array-article.html

Ali
```
Mar 24 2013
"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
```On Sat, Mar 23, 2013 at 01:10:56PM +0100, Ivan Kazmenko wrote:
Hello again!

Today, I had trouble using D built-in dynamic array as a stack: it

Using arrays as stack *can* be efficient -- but you should not be
appending/shrinking it constantly. Here's why.  Consider this code:

int[] arr = [1,2,3];
int[] barr = arr[0..1];
barr ~= 4;
writeln(arr[2]);	// what should this print?

In D, arrays are actually views into runtime-managed memory blocks. When
you shrink an array, the underlying memory block remains the same
(because the runtime system doesn't know immediately whether another
slice points to the data past the end). If you immediately append to the
array again, the system doesn't know whether it's safe to overwrite what
was there before (because another slice might be pointing to that data).
So, to be safe, it will reallocate the array.

To tell the runtime system that it's OK to overwrite what was there, you
should call assumeSafeAppend before appending to the array again.

Alternatively, you should not shrink the array, but keep another counter
on how much of the array is being used, so that you can reuse the space
immediately without risking reallocation. Here's an example array-based
stack implementation that does this:

struct Stack(E)
{
private E[] impl;
private size_t curSize = 0;

/**
* Returns: true if the stack contains no elements, false
* otherwise.
*/
property bool empty() { return (curSize == 0); }

/**
* Access the top element of the stack without popping it.
* Precondition: The stack must not be empty.
* Returns: The element at the top of the stack, by reference.
*/
property ref E top()
{
assert(curSize > 0);
return impl[curSize-1];
}

/**
* Pops an element off the stack.
* Precondition: The stack must not be empty.
* Returns: The popped element.
*/
E pop()
{
assert(curSize > 0);
auto elem = top();
curSize--;
return elem;
}

/// Pushes an element onto the stack.
void push(E elem)
{
if (curSize == impl.length)
{
// Array not big enough to fit element; increase capacity.
impl.length = (curSize + 1)*2;
}
assert(curSize < impl.length);
impl[curSize] = elem;
curSize++;
}
}

T

--
Answer: Because it breaks the logical sequence of discussion.
Question: Why is top posting bad?
```
Mar 23 2013
"Ivan Kazmenko" <gassa mail.ru> writes:
```On Saturday, 23 March 2013 at 13:55:57 UTC, H. S. Teoh wrote:
Alternatively, you should not shrink the array, but keep
another counter on how much of the array is being used,
so that you can reuse the space immediately without
risking reallocation. Here's an example array-based
stack implementation that does this: <...>

That looks good.  However, I'd relocate on shrinking too if we
use less than say one fourth of the space.  Consider, for
example, one million values constantly moving between one million
stacks; the worst case space usage would be million-squared if we
only grow the internal arrays but never shrink them.  Or would
that be a very rare usage pattern requiring a custom
implementation anyway?

My problem was that I thought D2 is mature enough to have a
working stack out-of-the-box.  I initially searched for a stack
in std.container, found none there, and then realized a dynamic
array could do.  Well, it surely does the job, but with a quirk
(assumeSafeAppend sufficed for my usage) which is not obvious at
all for a newcomer like me.  So, it seems to me that such a
container (on top of an array) would be useful in Phobos.

-----
Ivan Kazmenko.
```
Mar 23 2013
Jonathan M Davis <jmdavisProg gmx.com> writes:
```On Saturday, March 23, 2013 13:10:56 Ivan Kazmenko wrote:
Today, I had trouble using D built-in dynamic array as a stack:
it timed out badly.  I have then tried to reduce the problem down
to a minimal example.  I am using DMD 2.062.

You might want to check out this article where someone ran into similar
issues:

https://www.semitwist.com/articles/article/view/don-t-use-arrays-as-stacks

And if you haven't read Steven's article on arrays, you should do so:

http://dlang.org/d-array-article.html

But basically, you need to either wrap an array and then either

1. Use it as a holder a chunk of pre-allocated chunk of memory for your stack
rather than the stack itself. You then get something like

void push(T elem)
{
if(_stackLen == _arr.length)
_arr ~= elem;
else
_arr[stackLen++] = elem;
}

void pop()
{
++_stackLen;
}

2. Use the internal array as a stack, and use assumeSafeToAppend when you pop
an element, but you have to guarantee that there are no other slices to the
array for that to work, or you'll end up stomping on those other slices.

void push(T elem)
{
_arr ~= elem;
}

void pop()
{
--_arr.length;
assumeSafeAppend(arr);
}

In either case, using an array directly as a staci and simply appending to it
to push and decrementing its length to pop isn't going to be particularly
efficient do to how slices work.

- Jonathan M Davis
```
Mar 23 2013
"Ivan Kazmenko" <gassa mail.ru> writes:
```On Saturday, 23 March 2013 at 19:33:06 UTC, Jonathan M Davis
wrote:
You might want to check out this article where someone ran into
similar issues:

https://www.semitwist.com/articles/article/view/don-t-use-arrays-as-stacks

And if you haven't read Steven's article on arrays, you should
do so:

http://dlang.org/d-array-article.html

That was a good reading, thank you!  The whole matter became
clearer for me.

But now, I have one more example which confuses me, and the
articles don't seem to help right away with it.  In this example,
I am moving a "sliding window" of length n; now that we're done
with the stack, here is a simple usage pattern of the queue.
What I do is repeatedly (n times also) add an element to the back
of the array and then remove an element from the front of it,
keeping the whole queue constantly large.  Before the start, I
reserve c*n entries for my array, where c is a real number
between 1 and 2.  I track reallocations by monitoring the address
of an element of a which should remain still as long as the array
is not reallocated.

-----
import std.algorithm;
import std.array;
import std.range;
import std.stdio;

void main ()
{
int n = 2_000_000;
double c = 1.5;
auto a = array (iota (n));
a.reserve (cast (int) (n * c));
auto prev = &a[\$ - 1];
int changes = 0;
foreach (i; 0..n)
{
debug {writeln (a.capacity);}
a.length++;
a = a[1..\$];
auto next = &a[\$ - i - 1];
if (prev != next)
{
changes++;
prev = next;
}
}
writeln (changes);
}
-----

Now, if I set c to 2, there are no reallocations: all the work
goes with the 2*n pre-allocated elements.  But as I decrease c
down to 1, the number of reallocations grows *linearly*, roughly
1 per 2000 appends.  So for c = 1, there are 1044 reallocations
in total.  And that's still quadratic behavior, albeit divided by
a large constant (~2000)!

What happens (locally) is, once the array size is not enough, it
starts being grown by blocks of 1024 or 891 (?!) elements in a
repeating pattern; one of these reallocates, the other does not.
The textbook growth implementation suggests doubling the array
size, but it looks like D attempts something smarter here.
However, in my case, the result is ~8x slower behavior when not
pre-allocating.  The more is n, the slower is the version without
pre-allocation (c = 1) compared to the version with
pre-allocation (c = 2).  And this means that a D array is not
also an efficient queue out-of-the-box.  However, as in the case
with stack, it could perhaps be efficient with a bit of tweaking
(some custom tailored calls to reserve perhaps?).

So, what is the exact strategy of growing an array?  Maybe you
could just kindly point me to some more interesting reading in
druntime source?  And, well, now how do I implement a generally
efficient array-based queue in D?

-----
Ivan Kazmenko.
```
Mar 23 2013
Jonathan M Davis <jmdavisProg gmx.com> writes:
```On Sunday, March 24, 2013 00:45:21 Ivan Kazmenko wrote:
So, what is the exact strategy of growing an array?  Maybe you
could just kindly point me to some more interesting reading in
druntime source?

I'd start by looking in src/Object_.d, but it looks like the information that
you'd be looking for is probably in src/rt/lifetime.d, but I'd have to go
digging through the code to figure out what druntime currently does.
Regardless, remember that all of that is implementation-dependent, so if
you're relying on a particular behavior with regards to how much druntime
allocates when reallocating an array or anything like that, it could change on
you at any time. It probably doesn't get changed much, but there's zero
guarantee that it won't be, and if someone comes up with a way to improve how
that works, its implementation would change. So, relying on how that works is
probably not a good idea. You should be able to reliably know _when_ a
reallocation is going to occur by paying attention to capacity, but stuff like
how much memory gets allocated could change at any time.

And, well, now how do I implement a generally
efficient array-based queue in D?

By not removing elements from the front like that. You're just guaranteeing
that you're going to have to reallocate the array at some point, even if
you're always dealing with a small number of elements. I'd suggest making it a
circular queue and keeping track of where in the array the first element is as
well as the length. You'd have to reallocate if you ever reach the point that
the queue is full, and the first element in the queue was not the first element
in the array, but if you're queue size never had to grow, you'd never have to
reallocate, whereas with the scheme that you proposed, you'll have to
reallocate every time that you've queued as many items as the original
capacity of the array.

If you're going to try and use an array to implement another data structure, I
would strongly suggest that you use a wrapper struct, and that you generally
try and treat the array as a block of memory that you need to manage rather
than relying a lot of array magic. The array magic is cool, but it's likely to
have performance characteristics which conflict with what you need for other
data structures (like stacks or queues).

- Jonathan M Davis
```
Mar 23 2013
"Steven Schveighoffer" <schveiguy yahoo.com> writes:
```If you happened to get that accidental blank send, sorry, I tried to
cancel it.

On Sat, 23 Mar 2013 19:45:21 -0400, Ivan Kazmenko <gassa mail.ru> wrote:

On Saturday, 23 March 2013 at 19:33:06 UTC, Jonathan M Davis wrote:
You might want to check out this article where someone ran into similar
issues:

https://www.semitwist.com/articles/article/view/don-t-use-arrays-as-stacks

And if you haven't read Steven's article on arrays, you should do so:

http://dlang.org/d-array-article.html

That was a good reading, thank you!  The whole matter became clearer for
me.

But now, I have one more example which confuses me, and the articles
don't seem to help right away with it.  In this example, I am moving a
"sliding window" of length n; now that we're done with the stack, here
is a simple usage pattern of the queue.  What I do is repeatedly (n
times also) add an element to the back of the array and then remove an
element from the front of it, keeping the whole queue constantly large.
Before the start, I reserve c*n entries for my array, where c is a real
number between 1 and 2.  I track reallocations by monitoring the address
of an element of a which should remain still as long as the array is not
reallocated.

-----
import std.algorithm;
import std.array;
import std.range;
import std.stdio;

void main ()
{
int n = 2_000_000;
double c = 1.5;
auto a = array (iota (n));
a.reserve (cast (int) (n * c));
auto prev = &a[\$ - 1];
int changes = 0;
foreach (i; 0..n)
{
debug {writeln (a.capacity);}
a.length++;
a = a[1..\$];
auto next = &a[\$ - i - 1];
if (prev != next)
{
changes++;
prev = next;
}
}
writeln (changes);
}
-----

Now, if I set c to 2, there are no reallocations: all the work goes with
the 2*n pre-allocated elements.  But as I decrease c down to 1, the
number of reallocations grows *linearly*, roughly 1 per 2000 appends.
So for c = 1, there are 1044 reallocations in total.  And that's still
quadratic behavior, albeit divided by a large constant (~2000)!

So here is how the appender tries to add data:

1. if the block is less than a page, memory is tracked as a bin of
like-sized blocks that fit into a page.  Starting at 16-byte blocks, then
doubling until you reach half-page size.  So if you need something that
consumes less than 16-bytes, a 16-byte chunk is allocated out of a 16-byte
bin (which is a page that has nothing but 16-byte chunks in it).
2. When you reallocate to a larger block size (16 to 32 for instance), the
block MUST be moved into another bin, because only 16-byte blocks are
allowed in that bin.
3. When you get to PAGE size (4096 bytes) and larger, the appender takes a
different approach:
a. If the page following your block is unallocated, and it can simply
extend the block into that next page, it will do so.  This avoids copying
the data to another block, which arguably would be more expensive than
trying to double the capacity (not sure if this is true, but that's how it
works).
b. If not, it will reallocate into a new or existing empty block that
can hold the entire data, plus some additional space calculated by an
algorithm that is not quite double, but aims to amortize appending (if you
search in the lifetime.d file in druntime, look for newCapacity to find
the function that calculates this extra space).

HOWEVER, setting the length is NOT considered appending by the runtime.
The runtime takes a different approach when just setting the length -- it
does NOT add any extra capacity to try and amortize the allocations.  The
idea is that you set the length usually once, and then use the array
without appending.  It is more efficient if you are appending to give it
the elements to append rather than zero-init them first.

You will likely get better performance if you use:

a ~= int.init;

a.length++;

What happens (locally) is, once the array size is not enough, it starts
being grown by blocks of 1024 or 891 (?!) elements in a repeating
pattern; one of these reallocates, the other does not.  The textbook
growth implementation suggests doubling the array size, but it looks
like D attempts something smarter here.  However, in my case, the result
is ~8x slower behavior when not pre-allocating.  The more is n, the
slower is the version without pre-allocation (c = 1) compared to the
version with pre-allocation (c = 2).  And this means that a D array is
not also an efficient queue out-of-the-box.  However, as in the case
with stack, it could perhaps be efficient with a bit of tweaking (some
custom tailored calls to reserve perhaps?).

growth of 1024 is when the new pages are tacked onto the end (4096 /
sizeof(int)), growth of 891 is interesting.  I can explain it though :)

you have allocated 2,000,001 ints at the time you increment length, or
8,000,004 bytes.  The page size is 4096.  So the block size you will need
to hold this is 8,003,584 (must be a multiple of pages).

So you now have 3580 extra bytes to grow into, divide by 4 (because
capacity is in terms of int elements), and you get... 896.

Hm... where are those extra 5 ints?  The answer is weird, but the way the
array runtime can 'tack on' extra pages requires we store capacity (see my
previously referenced array article for more explanation) at the
beginning, and we require 16-byte alignment.  So the capacity requires 16
extra bytes, even though it only uses 4 or 8 of them (depending on
architecture).  But wait, that's only 4 ints!  Where's that extra int?

Now, this is REALLY weird.  Because blocks in memory can be sequential,
and we consider pointer at the end of a block to actually be pointing at
the beginning of the next block (for GC and other reasons), we add an
extra byte of padding at the END of the block to buffer it from
accidentally pointing at the next block.  Consider if you had a
max-capacity slice, and you did sl[\$..\$], that would now be pointing at
the NEXT block (which may not even be allocated!) and you could wreak some
havoc by appending to that block or setting its length.  Since ints must
be 4-byte aligned, we can't use the last 4 bytes of the block because of
that one padding byte, so you lose one more int for that.

-Steve
```
Mar 25 2013
"Ivan Kazmenko" <gassa mail.ru> writes:
``` You will likely get better performance if you use:

a ~= int.init;

a.length++;

So that's a relief, a dynamic array in D actually *is* an
efficient queue implementation out of the box, I just did it
improperly in the example.  Thank you for the correction.  With
that, the number of relocations in the queue example dropped down
to 2.

growth of 1024 is when the new pages are tacked onto the end
(4096 / sizeof(int)), growth of 891 is interesting.  I can
explain it though :)

Thank you for the very detailed explanation!  Knowing how things
work down to the bare bones, and the rationale behind that, is
definitely helpful, both logically (to write better code) and
psychologically (the feeling of things in control).  I just hope
such quality explanations will be incorporated into some book or
manual someday.

-----
Ivan Kazmenko.
```
Mar 25 2013