digitalmars.D.learn - Builtin array and AA efficiency questions

Random D user (57/57) Oct 15 2015 So I was doing some optimizations and I came up with couple basic

H. S. Teoh via Digitalmars-d-learn (115/157) Oct 15 2015 It adjusts the size of the allocated block in the GC so that subsequent

Random D user (38/52) Oct 15 2015 So how does capacity affect this? I mean what is exactly a GC

H. S. Teoh via Digitalmars-d-learn (13/26) Oct 15 2015 Haha, the current AA implementation is far from being highly optimized.
Kagamin (4/9) Oct 16 2015 Well, there was some work on alternative implementations of AA.

Steven Schveighoffer (42/74) Oct 15 2015 Without more context, I would say no. assumeSafeAppend is an assumption,...

Random D user (31/81) Oct 15 2015 Ah missed your post before replying to H.S. Teoh (I should

anonymous (4/8) Oct 15 2015 No. "clear" is too harmless a name for it to involve an unsafe operation...
Mike Parker (33/43) Oct 15 2015 There are a handful of attributes that can be set on memory

Steven Schveighoffer (9/32) Oct 16 2015 Yes, this flag is ONLY set when new'ing an array, or otherwise

Steven Schveighoffer (37/69) Oct 16 2015 Even if you know that you want to overwrite the data, you may not want

Random D user <no email.com> writes:

So I was doing some optimizations and I came up with couple basic 
questions...

A)
What does assumeSafeAppend actually do?
A.1) Should I call it always if before setting length if I want 
to have assumeSafeAppend semantics? (e.g. I don't know if it's 
called just before the function I'm in)
A.2) Or does it mark the array/slice itself as a "safe append" 
array? And I can call it once.
A.3) If A.2 is true, are there any conditions that it reverts to 
original behavior? (e.g. if I take a new slice of that array)

I read the array/slice article, but is seems that I still can't 
use them with confidece that it actually does what I want. I 
tried also look into lifetime.d, but there's so many potential 
entry/exit/branch paths that without case by case debugging (and 
no debug symbols for phobos.lib) it's bit too much.

What I'm trying to do is a reused buffer which only grows in 
capacity (and I want to overwrite all data). Preferably I'd 
manage the current active size of the buffer as array.length.

For a buffer typical pattern is:
array.length = 100
...
array.length = 0
...
some appends
...
array.length = 50
...
etc.

There's just so much magic going behind d arrays that it's a bit 
cumbersome to track manually what's actually happening. When it 
allocates and when it doesn't.
So, I already started doing my own Buffer type which gives me 
explicit control, but I wonder if there's a better way.

B.1) I have a temporary AA whose lifetime is limited to a known 
span (might be a function or a loop with couple functions). Is 
there way to tell the runtime to immeditially destroy and free 
the AA?

I'd like to assist the gc with manually destroying some AAs that 
I know I don't need anymore. I don't really want to get rid of 
gc, I just don't want to just batch it into some big batch of gc 
cycle work, since I know right then and there that I'm done with 
it.

For arrays you can do:
int[] arr;
arr.length = 100;
delete arr; // I assume this frees it

but for AAs:
int[string] aa;
delete aa; // gives compiler error  Error: cannot delete type 
int[string]

I could do aa.destroy(), but that just leaves it to gc according 
to docs.

Maybe I should start writing my own hashmap type as well?

B.2) Is there a simple way to reuse the memory/object of the AA?

I could just reuse a preallocated temp AA instead of 
alloc/freeing it.

Oct 15 2015

"H. S. Teoh via Digitalmars-d-learn" <digitalmars-d-learn puremagic.com> writes:

On Thu, Oct 15, 2015 at 04:47:35PM +0000, Random D user via Digitalmars-d-learn
wrote:
 So I was doing some optimizations and I came up with couple basic
 questions...
 
 A)
 What does assumeSafeAppend actually do?

It adjusts the size of the allocated block in the GC so that subsequent
appends will not reallocate.

Basically, whenever you try to append to an array and the end of the
array is not the same as the end of the allocated GC block, the GC will
conservatively assume that somebody else has an array (i.e. slice) that
points to the data between the end of the array and the end of the
block, so it will allocate a new block and copy the array to the new
block before appending the new data. Calling assumeSafeAppend "shrink
fits" the allocated GC block to the end of the array, so that the GC
won't reallocate, but simply extend the block to accomodate the new
element.


 A.1) Should I call it always if before setting length if I want to
 have assumeSafeAppend semantics? (e.g. I don't know if it's called
 just before the function I'm in)

Probably, otherwise the GC may sometimes reallocate when you don't want
it to.


 A.2) Or does it mark the array/slice itself as a "safe append" array?
 And I can call it once.

Not that I know of.


 A.3) If A.2 is true, are there any conditions that it reverts to
 original behavior? (e.g. if I take a new slice of that array)

[...]
 What I'm trying to do is a reused buffer which only grows in capacity
 (and I want to overwrite all data). Preferably I'd manage the current
 active size of the buffer as array.length.

[...]
 There's just so much magic going behind d arrays that it's a bit
 cumbersome to track manually what's actually happening. When it
 allocates and when it doesn't.
 So, I already started doing my own Buffer type which gives me explicit
 control, but I wonder if there's a better way.

This is probably the best way to do it, since the built-in arrays do
have a lot of "interesting" quirks that probably don't really do what
you want.

The thought that occurs to me is that you could still use the built-in
arrays as a base for your Buffer type, but with various operators
overridden so that it doesn't reallocate unnecessarily. So you'd keep a
T[] as the underlying array, but keep track of .length separately and
override the ~ and ~= operators so that they update Buffer.length
instead of the .length of the underlying array. Only when Buffer.length
is greater than .length, you'd increment .length so that the GC will
reallocate as needed. Similarly, you might want to override the slicing
operators as well so that they also return Buffer types instead of T[],
so that the user doesn't accidentally get access to the raw T[] and
cause unnecessary reallocations.


 B.1) I have a temporary AA whose lifetime is limited to a known span
 (might be a function or a loop with couple functions). Is there way to
 tell the runtime to immeditially destroy and free the AA?
 
 I'd like to assist the gc with manually destroying some AAs that I
 know I don't need anymore. I don't really want to get rid of gc, I
 just don't want to just batch it into some big batch of gc cycle work,
 since I know right then and there that I'm done with it.
 
 For arrays you can do:
 int[] arr;
 arr.length = 100;
 delete arr; // I assume this frees it

Unfortunately, delete has been deprecated, and may not be around for
very much longer.


 but for AAs:
 int[string] aa;
 delete aa; // gives compiler error  Error: cannot delete type int[string]
 
 I could do aa.destroy(), but that just leaves it to gc according to docs.

Perhaps what you could do is to trigger GC collection after setting the
AA to null:

	aa = null;	// delete references to GC data
	GC.collect();	// run collection cycle to free it

I'm not sure if it's a good idea to run collection cycles too often,
though, it will have performance impact.


 Maybe I should start writing my own hashmap type as well?

If you want to manually delete data, you probably want to implement your
own AA based on malloc/free instead of the GC. The nature of GC doesn't
lend it well to manual management.


 B.2) Is there a simple way to reuse the memory/object of the AA?
 
 I could just reuse a preallocated temp AA instead of alloc/freeing it.

Not that I know of... unfortunately, the current AA implementation
doesn't allow overriding of the allocator; it's hardcoded to use the
default GC.  This may change in the distant future, but I don't see it
happening anytime soon.

The only thing I can think of is to implement this manually, e.g., by
wrapping your AA in a type that keeps a size_t "generation counter",
where if any value in the AA is found to belong to a generation that's
already past, it pretends that the value doesn't exist yet.  Something
like this:

	struct AA(K,V) {
		static struct WrappedValue {
			size_t generation;
			V value;
		}
		V[WrappedValue] impl;
		size_t curGeneration = 1;
		size_t length;

		// Overload aa[key]
		V opIndex(K key) {
			auto p = key in impl;
			if (p is null || p.generation < curGeneration)
				throw RangeError(...);
			return p.value;
		}

		// Overload aa[key] = value
		void opIndexAssign(V value, K key) {
			auto p = key in impl;
			if (p is null || p.generation < curGeneration)
				length++;

			impl[key].generation = curGeneration;
			impl[key].value = value;
		}

		// Overload key in aa
		V* opBinary(string op : "in")(K key) {
			auto p = key in impl;
			if (p !is null && p.generation < curGeneration)
				return null;
			return &p.value;
		}

		// Overload aa.get(...);
		V get(K key, V defaultValue) {
			auto p = key in impl;
			if (p is null || p.generation < curGeneration) {
				return defaultValue;
			}
			return p.value;
		}

		// Overload aa.remove(...);
		void remove(K key) {
			auto p = key in impl;
			if (p is null)
				return;

			// Mark entry as outdated so that it will be
			// reused next time. Don't actually call
			// impl.remove() so that GC won't collect it.
			p.generation = 0;
		}

		 property bool empty() { return length == 0; }

		void destroy() {
			// Invalidate all existing entries
			curGeneration++;

			// N.B. doesn't set impl.length to 0 (which will
			// cause reallocation).
			length = 0;
		}
	}

Of course, this assumes that your keys will have a lot of overlap
between calls to .destroy, otherwise your AA will eat up a lot of memory
unnecessarily.  If that's not the case, it's probably better to just let
the GC do its job, or implement your own AA with malloc/free.


T

-- 
Why waste time reinventing the wheel, when you could be reinventing the engine?
-- Damian Conway

Oct 15 2015

Random D user <no email.com> writes:

Thanks for thorough answer.

On Thursday, 15 October 2015 at 18:46:22 UTC, H. S. Teoh wrote:
 It adjusts the size of the allocated block in the GC so that 
 subsequent appends will not reallocate.

So how does capacity affect this? I mean what is exactly a GC 
block here.

Shrink to fit bit was confusing, but after thinking about this 
few mins I guess there's like at least three concepts:

slice          0 .. length
allocation     0 .. max used/init size (end of 'gc block', also 
shared between slices)
raw mem block  0 .. capacity (or whatever gc set aside (like 
pages))

slice is managed by slice instance (ptr, length pair)
allocation is managed by array runtime (max used by some array)
raw mem block is managed by gc (knows the actual mem block)

So if slice.length != allocation.length then slice is not an mem 
"owning" array (it's a reference).
And assumeSafeAppend sets allocation.length to slice.length i.e. 
shrinks to fit. (slice.length > allocation.length not possible, 
because allocation.length = max(slice.length), so it always just 
shrinks)
Now that slice is a mem "owning" array it owns length growing 
length happens without reallocation until it hits raw mem 
block.length (aka capacity).

So basically the largest slice owns the memory allocation and 
it's length.

This is my understanding now. Although, I'll probably forget all 
this in 5..4..3..2...

 The thought that occurs to me is that you could still use the 
 built-in arrays as a base for your Buffer type, but with 
 various operators overridden so that it doesn't reallocate 
 unnecessarily.

Right, so custom array/buffer type it is. Seems the simplest 
solution.
I already started implementing this. Reusable arrays are 
everywhere.

 If you want to manually delete data, you probably want to 
 implement your own AA based on malloc/free instead of the GC. 
 The nature of GC doesn't lend it well to manual management.

I'll have to do this as well. Although, this one isn't that 
critical for me.

 The only thing I can think of is to implement this manually, 
 e.g., by wrapping your AA in a type that keeps a size_t 
 "generation counter", where if any value in the AA is found to 
 belong to a generation that's already past, it pretends that 
 the value doesn't exist yet.  Something like this:

Right. Like a handle system or AA of ValueHandles in this case. 
But I'll probably just hack up some custom map and reuse it's 
mem. Although, I'm mostly doing this for perf (realloc) and not 
mem size, so it might be too much effort if D AA is highly 
optimized.

Oct 15 2015

"H. S. Teoh via Digitalmars-d-learn" <digitalmars-d-learn puremagic.com> writes:

On Thu, Oct 15, 2015 at 09:00:36PM +0000, Random D user via Digitalmars-d-learn
wrote:
 Thanks for thorough answer.
 
 On Thursday, 15 October 2015 at 18:46:22 UTC, H. S. Teoh wrote:

[...]
The only thing I can think of is to implement this manually, e.g., by
wrapping your AA in a type that keeps a size_t "generation counter",
where if any value in the AA is found to belong to a generation
that's already past, it pretends that the value doesn't exist yet.
Something like this:

 
 Right. Like a handle system or AA of ValueHandles in this case. But
 I'll probably just hack up some custom map and reuse it's mem.
 Although, I'm mostly doing this for perf (realloc) and not mem size,
 so it might be too much effort if D AA is highly optimized.

Haha, the current AA implementation is far from being highly optimized.
There has been a slow trickle of gradual improvements over the years,
but if you want maximum performance, you're probably better off writing
a specialized hash map that fits your exact use case better. Or use a
different container that's more cache-friendly. (Hashes exhibit poor
locality, because they basically ensure random memory access patterns,
so your hardware prefetcher's predictions are out the window and it's
almost a guaranteed RAM roundtrip per hash lookup.)


T

-- 
If I were two-faced, would I be wearing this one? -- Abraham Lincoln

Oct 15 2015

Kagamin <spam here.lot> writes:

On Thursday, 15 October 2015 at 21:00:37 UTC, Random D user wrote:
 Right. Like a handle system or AA of ValueHandles in this case. 
 But I'll probably just hack up some custom map and reuse it's 
 mem. Although, I'm mostly doing this for perf (realloc) and not 
 mem size, so it might be too much effort if D AA is highly 
 optimized.

Well, there was some work on alternative implementations of AA.
https://github.com/MartinNowak/druntime/blob/libraryAA/src/core/aa.d
https://github.com/MartinNowak/druntime/blob/newaa-internal-aa/src/core/internal/aa.d

Oct 16 2015

Steven Schveighoffer <schveiguy yahoo.com> writes:

On 10/15/15 12:47 PM, Random D user wrote:
 So I was doing some optimizations and I came up with couple basic
 questions...

 A)
 What does assumeSafeAppend actually do?
 A.1) Should I call it always if before setting length if I want to have
 assumeSafeAppend semantics? (e.g. I don't know if it's called just
 before the function I'm in)

Without more context, I would say no. assumeSafeAppend is an assumption, 
and therefore unsafe. If you don't know what is passed in, you could 
potentially clobber data.

In addition, assumeSafeAppend is a non-inlineable, runtime function that 
can *potentially* be low-performing. If, for instance, you call it on a 
non-GC array, or one that is not marked for appending, you will most 
certainly need to take the GC lock and search through the heap for your 
block.

The best place to call assumeSafeAppend is when you are sure the array 
has "shrunk" and you are about to append. If you have not shrunk the 
array, then the call is a waste, if you are not sure what the array 
contains, then you are potentially stomping on referenced data.

Calling it just after shrinking every time is possible, but could 
potentially be sub-optimal, if you don't intend to append to that array 
again, or you intend to shrink it again before appending.

 A.2) Or does it mark the array/slice itself as a "safe append" array?
 And I can call it once.

An array uses a block marked for appending, assumeSafeAppend simply sets 
how much data is assumed to be valid. Calling assumeSafeAppend on a 
block not marked for appending will do nothing except burn CPU cycles.

So yours is not an accurate description.

 A.3) If A.2 is true, are there any conditions that it reverts to
 original behavior? (e.g. if I take a new slice of that array)

Any time data is appended, all references *besides* the one that was 
used to append now will reallocate on appending. Any time data is shrunk 
(i.e. arr = arr[0..$-1]), that reference now will reallocate on appending.

So when to call really sort of requires understanding what the runtime 
does. Note it is always safe to just never use assumeSafeAppend, it is 
an optimization. You can always append to anything (even non-GC array 
slices) and it will work properly.

 I read the array/slice article, but is seems that I still can't use them
 with confidece that it actually does what I want. I tried also look into
 lifetime.d, but there's so many potential entry/exit/branch paths that
 without case by case debugging (and no debug symbols for phobos.lib)
 it's bit too much.

I recommend NOT to try and understand lifetime.d, it's very complex, and 
the entry points are mostly defined by the compiler. I had to use trial 
and error to understand what happened when.

 What I'm trying to do is a reused buffer which only grows in capacity
 (and I want to overwrite all data). Preferably I'd manage the current
 active size of the buffer as array.length.

 For a buffer typical pattern is:
 array.length = 100
 ....
 array.length = 0
 ....
 some appends
 ....
 array.length = 50
 ....
 etc.

This is an easy call then:

array.reserve(100); // reserve 100 elements for appending
array ~= data; // automatically manages array length for you, if length 
exceeds 100, just automatically reallocates more data.
array.length = 0; // clear all the data
array.assumeSafeAppend; // NOW is the best time to call, because you 
can't shrink it any more, and you know you will be appending again.
array ~= data; // no reallocation, unless previous max size was exceeded.

 B.1) I have a temporary AA whose lifetime is limited to a known span
 (might be a function or a loop with couple functions). Is there way to
 tell the runtime to immeditially destroy and free the AA?

There isn't. This reminds me, I have a lingering PR to add aa.clear 
which destroys all the elements, but was waiting until object.clear had 
been removed for the right amount of time. Perhaps it's time to revive that.

-Steve

Oct 15 2015

Random D user <no email.com> writes:

Ah missed your post before replying to H.S. Teoh (I should 
refresh more often).
Thanks for reply.

On Thursday, 15 October 2015 at 19:50:27 UTC, Steven 
Schveighoffer wrote:
 Without more context, I would say no. assumeSafeAppend is an 
 assumption, and therefore unsafe. If you don't know what is 
 passed in, you could potentially clobber data.

 In addition, assumeSafeAppend is a non-inlineable, runtime 
 function that can *potentially* be low-performing.

Yeah I know that I want to overwrite the data, but still that's 
probably a lot of calls to assumeSafeAppend. So I agree.

 instance, you call it on a non-GC array, or one that is not 
 marked for appending, you will most certainly need to take the 
 GC lock and search through the heap for your block.

What does marked for appending mean. How does it happen or how is 
it marked?

 The best place to call assumeSafeAppend is when you are sure 
 the array has "shrunk" and you are about to append. If you have 
 not shrunk the array, then the call is a waste, if you are not 
 sure what the array contains, then you are potentially stomping 
 on referenced data.

So assumeSafeAppend is only useful when I have array whose length 
is set to lower than it was originally and I want to grow it back 
(that is arr.length += 1 or arr ~= 1).

 An array uses a block marked for appending, assumeSafeAppend 
 simply sets how much data is assumed to be valid. Calling 
 assumeSafeAppend on a block not marked for appending will do 
 nothing except burn CPU cycles.

 So yours is not an accurate description.

Related to my question above.
How do you get a block not marked for appending? a view slice?

Perhaps I should re-read the slice article. I believe it had 
something like capacity == 0 --> always allocates. Is it this?

 A.3) If A.2 is true, are there any conditions that it reverts 
 to
 original behavior? (e.g. if I take a new slice of that array)

 Any time data is appended, all references *besides* the one 
 that was used to append now will reallocate on appending. Any 
 time data is shrunk (i.e. arr = arr[0..$-1]), that reference 
 now will reallocate on appending.

Thanks. IMO this is very concise description of allocation 
behavior.
I'll use this as a guide.

 So when to call really sort of requires understanding what the 
 runtime does. Note it is always safe to just never use 
 assumeSafeAppend, it is an optimization. You can always append 
 to anything (even non-GC array slices) and it will work 
 properly.

Out of curiosity. How does this work? Does it always just 
reallocate with gc if it's allocated with something else?

 This is an easy call then:

 array.reserve(100); // reserve 100 elements for appending
 array ~= data; // automatically manages array length for you, 
 if length exceeds 100, just automatically reallocates more data.
 array.length = 0; // clear all the data
 array.assumeSafeAppend; // NOW is the best time to call, 
 because you can't shrink it any more, and you know you will be 
 appending again.
 array ~= data; // no reallocation, unless previous max size was 
 exceeded.

Thanks. This will probably cover 90% of cases.
Usually I just want to avoid throwing away memory that I already 
have.
Which is slow if it's all over your codebase.
Like re-reading or recomputing variables that you already have.
One doesn't hurt but a hundred does.

 B.1) I have a temporary AA whose lifetime is limited to a 
 known span
 (might be a function or a loop with couple functions). Is 
 there way to
 tell the runtime to immeditially destroy and free the AA?

 There isn't. This reminds me, I have a lingering PR to add 
 aa.clear which destroys all the elements, but was waiting until 
 object.clear had been removed for the right amount of time. 
 Perhaps it's time to revive that.

Should array have clear() as well?
Basically wrap array.length = 0; array.assumeSafeAppend();
At least it would then be symmetric (and more intuitive) with 
built-in containers.

 -Steve

Oct 15 2015

anonymous <anonymous example.com> writes:

On Thursday, October 15, 2015 11:48 PM, Random D user wrote:

 Should array have clear() as well?
 Basically wrap array.length = 0; array.assumeSafeAppend();
 At least it would then be symmetric (and more intuitive) with 
 built-in containers.

No. "clear" is too harmless a name for it to involve an unsafe operation 
like assumeSafeAppend. With containers there is always one container that 
owns the data. There is no such notion with dynamic arrays.

Oct 15 2015

Mike Parker <aldacron gmail.com> writes:

On Thursday, 15 October 2015 at 21:48:29 UTC, Random D user wrote:

 An array uses a block marked for appending, assumeSafeAppend 
 simply sets how much data is assumed to be valid. Calling 
 assumeSafeAppend on a block not marked for appending will do 
 nothing except burn CPU cycles.

 So yours is not an accurate description.

 Related to my question above.
 How do you get a block not marked for appending? a view slice?

 Perhaps I should re-read the slice article. I believe it had 
 something like capacity == 0 --> always allocates. Is it this?

There are a handful of attributes that can be set on memory 
allocated by the GC. See the BlkAttr enumeration in core.memory 
[1]. Under the hood, memory for dynamic arrays (slices) is marked 
with BlkAttr.APPENDABLE. If an array pointing to memory not 
marked as such, either manually allocated through the GC, through 
malloc, or another source, then assumeSafeAppend can't help you.

capacity tells you how many more elements can be appended to a 
dynamic array (slice) before an allocation will be triggered. So 
if you get a 0, that means the next append will trigger one. 
Consider this:

int[] dynarray = [1, 2, 3, 4, 5];
auto slice = dynarray[0 .. $-1];

slice points to the same memory as dynarray, but has 4 elements 
whereas dynarray has 5. Appending a single element to slice 
without reallocating will overwrite the 5 in that memory block, 
meaning dynarray will see the new value. For that reason, new 
slices like this will always have a 0 capacity. Append a new item 
to slice and a reallocation occurs, copying the existing elements 
of slice over and adding the new one. This way, dynarray's values 
are untouched and both arrays point to different blocks of memory.

assumeSafeAppend changes this behavior such that appending a new 
item to slice will reuse the same memory block and causing the 5 
to be overwritten. Normally, you don't want to use it unless you 
are sure there are no other slices pointing to the same memory 
block. So it's not something you should be using in a function 
that can receive an array from any source. That array might share 
memory with other slices, the block might not be appendable, you 
have no idea how the slice is actually used... just a bad idea. 
When you have complete control over a slice and know exactly how 
it is used, such as an internal buffer, then it becomes a useful 
tool.

Oct 15 2015

Steven Schveighoffer <schveiguy yahoo.com> writes:

On 10/16/15 1:05 AM, Mike Parker wrote:
 On Thursday, 15 October 2015 at 21:48:29 UTC, Random D user wrote:

 An array uses a block marked for appending, assumeSafeAppend simply
 sets how much data is assumed to be valid. Calling assumeSafeAppend
 on a block not marked for appending will do nothing except burn CPU
 cycles.

 So yours is not an accurate description.

 Related to my question above.
 How do you get a block not marked for appending? a view slice?

 Perhaps I should re-read the slice article. I believe it had something
 like capacity == 0 --> always allocates. Is it this?

 There are a handful of attributes that can be set on memory allocated by
 the GC. See the BlkAttr enumeration in core.memory [1]. Under the hood,
 memory for dynamic arrays (slices) is marked with BlkAttr.APPENDABLE. If
 an array pointing to memory not marked as such, either manually
 allocated through the GC, through malloc, or another source, then
 assumeSafeAppend can't help you.

Yes, this flag is ONLY set when new'ing an array, or otherwise 
allocating via an array operation (appending, extending length, etc). It 
should NOT be set any other way, because the runtime is the only place 
where this flag is understood.

 capacity tells you how many more elements can be appended to a dynamic
 array (slice) before an allocation will be triggered.

Mostly correct :) capacity includes current elements, so it doesn't 
change as you append.

 So if you get a 0,
 that means the next append will trigger one.

Correct.

-Steve

Oct 16 2015

Steven Schveighoffer <schveiguy yahoo.com> writes:

On 10/15/15 5:48 PM, Random D user wrote:
 Ah missed your post before replying to H.S. Teoh (I should refresh more
 often).
 Thanks for reply.

You're welcome!

 On Thursday, 15 October 2015 at 19:50:27 UTC, Steven Schveighoffer wrote:
 Without more context, I would say no. assumeSafeAppend is an
 assumption, and therefore unsafe. If you don't know what is passed in,
 you could potentially clobber data.

 In addition, assumeSafeAppend is a non-inlineable, runtime function
 that can *potentially* be low-performing.

 Yeah I know that I want to overwrite the data, but still that's probably
 a lot of calls to assumeSafeAppend. So I agree.

Even if you know that you want to overwrite the data, you may not want 
to overwrite the data *after* the data :)

A difficult "correct" mechanism is to be handed a buffer to utilize 
first, and then if more buffer is needed, appending into the block or 
beyond. Because the array slice doesn't contain enough information, you 
need something else. Something with both capacity and length.

A typical mechanism may be to pass a stack-allocated buffer that is 
"good enough" for most cases, but then is reallocated onto the GC if you 
need more. This is not easy to make work automatically (would be a nice 
addition to phobos to have something like this). Generally, I end up 
with some inner functions which handle the details.

 What does marked for appending mean. How does it happen or how is it
 marked?

See Mike's post and my reply

 So assumeSafeAppend is only useful when I have array whose length is set
 to lower than it was originally and I want to grow it back (that is
 arr.length += 1 or arr ~= 1).

Right, an array *by default* supports appending in place when it knows 
that the data in the block beyond the current array is unused. The 
assumeSafeAppend is saying "yes, I know that appending will clobber data 
beyond the array if you append in place, but it's safe because I don't 
care about that data any more". The runtime doesn't know that data isn't 
important until you tell it.

 Related to my question above.
 How do you get a block not marked for appending? a view slice?

The terms are confusing. A slice is not a block, it's a view of a block. 
So the block itself is marked as appendable (Technically, this is 
because the "used" field of the block is stored within the block. If I 
didn't have the flag, I may be reading a field that is garbage as the 
"used" size, and potentially make bad decisions). The only way you 
should get a block with an APPENDABLE flag set is via an array 
allocation, which correctly initializes the block.

 Perhaps I should re-read the slice article. I believe it had something
 like capacity == 0 --> always allocates. Is it this?

capacity == 0 means any append to that slice will allocate. But this is 
not the flag :) It means either a) the slice does not point at an 
appendable GC block, or b) it does point at such a block, but there is 
used data after the slice.

the b) situation can be changed by calling assumeSafeAppend (only if you 
are sure the data after isn't valuable).

 So when to call really sort of requires understanding what the runtime
 does. Note it is always safe to just never use assumeSafeAppend, it is
 an optimization. You can always append to anything (even non-GC array
 slices) and it will work properly.

 Out of curiosity. How does this work? Does it always just reallocate
 with gc if it's allocated with something else?

Yes.

 Usually I just want to avoid throwing away memory that I already have.
 Which is slow if it's all over your codebase.
 Like re-reading or recomputing variables that you already have.
 One doesn't hurt but a hundred does.

Right, this is a very useful mechanism, and can save performance cost 
quite a bit.

-Steve

Oct 16 2015

D Programming

C/C++ Programming

Other

digitalmars.D.learn - Builtin array and AA efficiency questions