digitalmars.D.learn - Allocate N elements

Namespace (9/9) Jul 15 2013 Is there no way (besides the ugly malloc or any wrappers) to

Adam D. Ruppe (2/3) Jul 15 2013 Did you also try int[] arr = new int[](sizeOfItems)?

Namespace (14/17) Jul 15 2013 Example code:

bearophile (22/33) Jul 15 2013 import std.stdio, std.array;

monarch_dodra (47/62) Jul 15 2013 So what? The only thing you showed, is that minimallyInitialized

bearophile (7/15) Jul 15 2013 I didn't know it, sorry. I forgot.

monarch_dodra (7/23) Jul 15 2013 One of the problems is that when you want an arbitrarily sized

H. S. Teoh (10/38) Jul 15 2013 This is wrong. The 2^N size should be applied *after* GC appendable info

monarch_dodra (25/72) Jul 16 2013 Just to be clear, the GC itself has no overhead. Only the

Namespace (8/10) Jul 15 2013 ----

Adam D. Ruppe (4/5) Jul 15 2013 arr.capacity checks the GC block, and since you malloced it,

Namespace (5/11) Jul 15 2013 Ah, good to know. But anyway malloc allocates exact N elements,

=?UTF-8?B?QWxpIMOHZWhyZWxp?= (4/5) Jul 15 2013 I doubt it. If its allocating from a bucket, then what actually gets
bearophile (5/9) Jul 15 2013 That wasted space is what you pay to reduce memory fragmentation
H. S. Teoh (11/21) Jul 15 2013 [...]

Namespace (13/13) Jul 15 2013 Another question:

bearophile (14/16) Jul 15 2013 For that I think you need to define a struct that disables the

"Namespace" <rswhite4 googlemail.com> writes:

Is there no way (besides the ugly malloc or any wrappers) to 
allocate _exactly_ N elements at runtime (no static array)?
I tried
----
int[] arr = new int[sizeOfItems];
----
but if sizeOfItems is e.g. 512, it allocates 1024.
So is there no way to allocate a fixed memory block without the 
usage of C functions?

Jul 15 2013

"Adam D. Ruppe" <destructionator gmail.com> writes:

On Monday, 15 July 2013 at 11:46:54 UTC, Namespace wrote:
 int[] arr = new int[sizeOfItems];

Did you also try int[] arr = new int[](sizeOfItems)?

Jul 15 2013

"Namespace" <rswhite4 googlemail.com> writes:

On Monday, 15 July 2013 at 12:23:21 UTC, Adam D. Ruppe wrote:
 On Monday, 15 July 2013 at 11:46:54 UTC, Namespace wrote:
 int[] arr = new int[sizeOfItems];

 Did you also try int[] arr = new int[](sizeOfItems)?

Example code:
----
void main()
{
	int[] arr1 = new int[512];
	writeln(arr1.length, "::", arr1.capacity);
	int[] arr2 = new int[](512);
	writeln(arr2.length, "::", arr2.capacity);
}
----

Output:
512::1019
512::1019

Jul 15 2013

"bearophile" <bearophileHUGS lycos.com> writes:

Namespace:

 void main()
 {
 	int[] arr1 = new int[512];
 	writeln(arr1.length, "::", arr1.capacity);
 	int[] arr2 = new int[](512);
 	writeln(arr2.length, "::", arr2.capacity);
 }
 ----

 Output:
 512::1019
 512::1019

import std.stdio, std.array;
void main() {
     auto a1 = new int[512];
     writeln(a1.length, " ", a1.capacity);
     auto a2 = new int[](512);
     writeln(a2.length, " ", a2.capacity);
     auto a3 = minimallyInitializedArray!(int[])(512);
     writeln(a3.length, " ", a3.capacity);
}


Output:
512 1019
512 1019
512 0

But that (of new arrays) is a bad design, it wastes too much 
memory, and I think it should be fixed. In Python this doesn't 
overallocate:

a4 = [0] * 512

An alternative is to introduce a function like 
std.array.InitializedArray.

Bye,
bearophile

Jul 15 2013

"monarch_dodra" <monarchdodra gmail.com> writes:

On Monday, 15 July 2013 at 13:13:42 UTC, bearophile wrote:
 Output:
 512 1019
 512 1019
 512 0

 But that (of new arrays) is a bad design, it wastes too much 
 memory, and I think it should be fixed. In Python this doesn't 
 overallocate:

So what? The only thing you showed, is that minimallyInitialized 
doesn't know how much it allocated. If you allocate 513 elements 
with malloc, you'll way over allocate too. What's your point?

//--------
import std.stdio, core.memory;

void main()
{
     auto u = GC.qalloc(513);
     writeln(u.size); //print 1024. Oops.
}
//--------

You'll waste memory either way. The only difference is a 1-2 byte 
difference for arrays smaller than 2K, and 16 bytes for arrays 
larger than 2K.

The comparison is very unfair, and the wasted room is minimal. 
It's just more visible... yet more exploitable. Dynamic GC arrays 
have a tendency to grow in-place, and use all their capacity, 
when malloc always eagerly relocates.

To answer the original question:

 Is there no way (besides the ugly malloc or any wrappers) to 
 allocate _exactly_ N elements at runtime (no static array)?
 I tried

No. *EVEN* with an "ugly malloc", you'll still over allocate (see 
above).

Hell, at the end of the day, it's even worst, because you 
over-allocate, yet you don't know it, nor exploit the 
over-allocated data (unless you *very* manually use qalloc).

 Another question:
 I have this code:
 [...]
 Is it possible to prohibit that the slice is resized,
 to avoid GC allocations?

No. The only way this works is *if* there is a GC to safety net 
catch you if relocation must occur. That said, you can very 
easily write your own:
//----
//Disclaimer: VERY unsafe.
ref T[] unsafeAppend(T, U)(ref T[] arr, U u)
{
     immutable len = arr.length;
     arr.ptr[len] = u;
     arr = arr[0 .. len + 1];
}
//----
void main()
{
     auto u = GC.qalloc(512);
     int[] arr = (cast(int*)u.base)[0 .. 0];
     arr.unsafeAppend(1).unsafeAppend(2).unsafeAppend(3).writeln();
}
//----

Disclaimer: This will give you 0 overflow protection. Also, do 
NOT use this with GC arrays: The added elements will not be 
"seen" by the GC, and will also be clobbered by normal appends.

Jul 15 2013

"bearophile" <bearophileHUGS lycos.com> writes:

monarch_dodra:

 But that (of new arrays) is a bad design, it wastes too much 
 memory, and I think it should be fixed. In Python this doesn't 
 overallocate:

 So what? The only thing you showed, is that 
 minimallyInitialized doesn't know how much it allocated. If you 
 allocate 513 elements with malloc, you'll way over allocate 
 too. What's your point?
 You'll waste memory either way.

I didn't know it, sorry. I forgot.
Can we let minimallyInitializedArray know the capacity?

Regarding the small arrays, so to avoid the memory wasting you 
need to allocate a larger one and slice it.

Bye,
bearophile

Jul 15 2013

"monarch_dodra" <monarchdodra gmail.com> writes:

On Monday, 15 July 2013 at 15:54:57 UTC, bearophile wrote:
 monarch_dodra:

 But that (of new arrays) is a bad design, it wastes too much 
 memory, and I think it should be fixed. In Python this 
 doesn't overallocate:

 So what? The only thing you showed, is that 
 minimallyInitialized doesn't know how much it allocated. If 
 you allocate 513 elements with malloc, you'll way over 
 allocate too. What's your point?
 You'll waste memory either way.

 I didn't know it, sorry. I forgot.
 Can we let minimallyInitializedArray know the capacity?

I'm working on it ;)

 Regarding the small arrays, so to avoid the memory wasting you 
 need to allocate a larger one and slice it.

 Bye,
 bearophile

One of the problems is that when you want an arbitrarily sized 
allocation, it is usually standard to allocate 2^N in size. While 
this is optimal for "traditional" malloc, it's actually the 
*worst* allocation size you could ask for a GC array with 
appendable info (eg, traditional new) :/

Jul 15 2013

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Mon, Jul 15, 2013 at 07:32:37PM +0200, monarch_dodra wrote:
 On Monday, 15 July 2013 at 15:54:57 UTC, bearophile wrote:
monarch_dodra:

But that (of new arrays) is a bad design, it wastes too much
memory, and I think it should be fixed. In Python this doesn't
overallocate:

So what? The only thing you showed, is that minimallyInitialized
doesn't know how much it allocated. If you allocate 513 elements
with malloc, you'll way over allocate too. What's your point?
You'll waste memory either way.

I didn't know it, sorry. I forgot.
Can we let minimallyInitializedArray know the capacity?

 
 I'm working on it ;)
 
Regarding the small arrays, so to avoid the memory wasting you
need to allocate a larger one and slice it.

Bye,
bearophile

 
 One of the problems is that when you want an arbitrarily sized
 allocation, it is usually standard to allocate 2^N in size. While
 this is optimal for "traditional" malloc, it's actually the *worst*
 allocation size you could ask for a GC array with appendable info
 (eg, traditional new) :/

This is wrong. The 2^N size should be applied *after* GC appendable info
(or whatever else bookkeeping info you need) has been accounted for, not
before. If the GC header/whatever requires, say, 16 bytes, then the
ideal array size should be 2^N-16, so that everything fits neatly within
a power of 2. Otherwise you end up with 2^N+16 bytes that are actually
allocated, and it just goes downhill from there.


T

-- 
What do you call optometrist jokes? Vitreous humor.

Jul 15 2013

"monarch_dodra" <monarchdodra gmail.com> writes:

On Monday, 15 July 2013 at 17:39:43 UTC, H. S. Teoh wrote:
 On Mon, Jul 15, 2013 at 07:32:37PM +0200, monarch_dodra wrote:
 On Monday, 15 July 2013 at 15:54:57 UTC, bearophile wrote:
monarch_dodra:

But that (of new arrays) is a bad design, it wastes too much
memory, and I think it should be fixed. In Python this 
doesn't
overallocate:

So what? The only thing you showed, is that 
minimallyInitialized
doesn't know how much it allocated. If you allocate 513 
elements
with malloc, you'll way over allocate too. What's your point?
You'll waste memory either way.

I didn't know it, sorry. I forgot.
Can we let minimallyInitializedArray know the capacity?

 
 I'm working on it ;)
 
Regarding the small arrays, so to avoid the memory wasting you
need to allocate a larger one and slice it.

Bye,
bearophile

 
 One of the problems is that when you want an arbitrarily sized
 allocation, it is usually standard to allocate 2^N in size. 
 While
 this is optimal for "traditional" malloc, it's actually the 
 *worst*
 allocation size you could ask for a GC array with appendable 
 info
 (eg, traditional new) :/

 This is wrong. The 2^N size should be applied *after* GC 
 appendable info
 (or whatever else bookkeeping info you need) has been accounted 
 for, not
 before. If the GC header/whatever requires, say, 16 bytes, then 
 the
 ideal array size should be 2^N-16, so that everything fits 
 neatly within
 a power of 2. Otherwise you end up with 2^N+16 bytes that are 
 actually
 allocated, and it just goes downhill from there.


 T

Just to be clear, the GC itself has no overhead. Only the 
APPENDABLE bookkeeping does any overhead.
EG: GC.malloc(1024);
This will allocate a 1024 byte block. No problem here.

HOWEVER, if you write:
int[] arr = new int[](1000); //Allocates a  4Ki block
int[] arr = new int[](1024); //Allocates an 8Ki block
The example is a bit extreme, but you should get what I mean.

If you do a malloc with GC.BlkAttr.APPENDABLE, you will also get 
exactly what you asked for. However, it will be the user's 
responsibility to take into account that some of those allocated 
bytes will not actually be useable (the APPENDABLE block).

FYI, the way APPEDNABLE works:
Arrays 1-255 bytes: 1 extra byte will be appended for used 
capacity info at the end of the array.
Arrays 256-2046: 2 extra bytes will be appended.
2047+: The array will *start* with a 16 byte block for appendable 
info. A dummy 1 byte block will also be appended to the end of 
the block, for "arr[$ .. $]". (So yes, you need to accound for a 
*17* byte overhead)

That said, allocating an array using a straight up 
malloc(APPEDNABLE) is *very* complicated, and it is all very 
specific to the current GC anyways: It is very very error prone, 
and non portable.

Jul 16 2013

"Namespace" <rswhite4 googlemail.com> writes:

 No. *EVEN* with an "ugly malloc", you'll still over allocate 
 (see above).

----
int* ptr = cast(int*) malloc(513 * int.sizeof);
int[] arr = ptr[0 .. 513];

writeln(arr.length, "::", arr.capacity);
----

Output:
513::0

Where is the over allocation?

Jul 15 2013

"Adam D. Ruppe" <destructionator gmail.com> writes:

On Monday, 15 July 2013 at 18:16:45 UTC, Namespace wrote:
 writeln(arr.length, "::", arr.capacity);

arr.capacity checks the GC block, and since you malloced it, 
there is no gc block for it to check. So it simply doesn't know 
if there's any extra capacity there and reports 0 just to be safe.

Jul 15 2013

"Namespace" <rswhite4 googlemail.com> writes:

On Monday, 15 July 2013 at 18:29:12 UTC, Adam D. Ruppe wrote:
 On Monday, 15 July 2013 at 18:16:45 UTC, Namespace wrote:
 writeln(arr.length, "::", arr.capacity);

 arr.capacity checks the GC block, and since you malloced it, 
 there is no gc block for it to check. So it simply doesn't know 
 if there's any extra capacity there and reports 0 just to be 
 safe.

Ah, good to know. But anyway malloc allocates exact N elements, 
without ugly overhead.
Would be really good if there was a way to avoid that the GC 
takes over the memory.

Jul 15 2013

=?UTF-8?B?QWxpIMOHZWhyZWxp?= <acehreli yahoo.com> writes:

On 07/15/2013 12:53 PM, Namespace wrote:

 But anyway malloc allocates exact N elements, without ugly overhead.

I doubt it. If its allocating from a bucket, then what actually gets 
used is the size of that bucket.

Ali

Jul 15 2013

"bearophile" <bearophileHUGS lycos.com> writes:

Namespace:

 Ah, good to know. But anyway malloc allocates exact N elements, 
 without ugly overhead.
 Would be really good if there was a way to avoid that the GC 
 takes over the memory.

That wasted space is what you pay to reduce memory fragmentation 
with a not copying GC.

Bye,
bearophile

Jul 15 2013

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Mon, Jul 15, 2013 at 09:53:32PM +0200, Namespace wrote:
 On Monday, 15 July 2013 at 18:29:12 UTC, Adam D. Ruppe wrote:
On Monday, 15 July 2013 at 18:16:45 UTC, Namespace wrote:
writeln(arr.length, "::", arr.capacity);

arr.capacity checks the GC block, and since you malloced it, there
is no gc block for it to check. So it simply doesn't know if
there's any extra capacity there and reports 0 just to be safe.

 
 Ah, good to know. But anyway malloc allocates exact N elements,
 without ugly overhead.

[...]

I doubt malloc has no overhead. AFAIK, many malloc implementations store
some kind of bookkeeping info in the area of memory just before the
pointer that's returned. After all, malloc/free has to somehow know
which memory blocks are allocated and which are free. Some
implementations also store canary values in that area in order to detect
memory corruptions.


T

-- 
Старый друг лучше новых двух.

Jul 15 2013

"Namespace" <rswhite4 googlemail.com> writes:

Another question:
I have this code:

----
void main() {
	int* ptr = cast(int*) malloc(11 * int.sizeof);
	int[] arr = ptr[0 .. 11];

	assert(arr.ptr is ptr);

	arr ~= 42;

	assert(ptr !is arr.ptr);
}
----

Is it possible to prohibit that the slice is resized, to avoid GC 
allocations?

Jul 15 2013

"bearophile" <bearophileHUGS lycos.com> writes:

Namespace:

 Is it possible to prohibit that the slice is resized, to avoid 
 GC allocations?

For that I think you need to define a struct that disables the 
append and uses an alias this. But then some of your array 
function argument signatures need to change, because lot of D 
code uses "raw" arrays signatures like:

void foo(int[] arg) {}

Instead of:

alias MArr = int[];
void foo(MArr arg) {}

Or using Typedef (untested):

alias MArr = Typedef!(int[]);
void foo(MArr arg) {}

Bye,
bearophile

Jul 15 2013

D Programming

C/C++ Programming

Other

digitalmars.D.learn - Allocate N elements