www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.bugs - [Issue 10589] New: GC.malloc(sz, GC.BlkAttr.APPENDABLE) fails after a certain size

reply d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=10589

           Summary: GC.malloc(sz, GC.BlkAttr.APPENDABLE) fails after a
                    certain size
           Product: D
           Version: unspecified
          Platform: All
        OS/Version: Linux
            Status: NEW
          Severity: major
          Priority: P2
         Component: druntime
        AssignedTo: nobody puremagic.com
        ReportedBy: monarchdodra gmail.com



As the title explains, after a certain size (2048 for my 32 bit install on an
64 linux), the appendable data cannot be exploited anymore:

//--------
import std.stdio;
import core.memory;

void main()
{
    for ( size_t i = 4 ; i < 1_000_000 ; i *= 1.4 )
    {
        ubyte[] s;
        s.reserve(i);
        writefln("%6s: s.capacity  is %6s", i, s.capacity);
        assert(s.capacity >= i);
        auto p = s.ptr;
        auto s2 = p[0 .. 0];
        writefln("%6s: s2.capacity is %6s", i, s2.capacity);
        assert(s2.capacity >= i);
    }
    for ( size_t i = 4 ; i < 1_000_000 ; i *= 1.4 )
    {
        ubyte*  p = cast(ubyte*)GC.malloc(i, GC.BlkAttr.APPENDABLE);
        ubyte[] s = p[0 .. 0];
        writefln("%6s: s.capacity  is %6s", i, s.capacity);
        assert(s.capacity + 4 >= i); //This ends up failing.
    }
}
//--------
439521: s.capacity  is 442351
439521: s2.capacity is 442351
615329: s.capacity  is 618479
615329: s2.capacity is 618479
861460: s.capacity  is 864239
861460: s2.capacity is 864239
     4: p.capacity  is     15
     5: p.capacity  is     15
     6: p.capacity  is     15
     8: p.capacity  is     15
...
  1443: p.capacity  is   2046
  2020: p.capacity  is   2046
  2827: p.capacity  is      0
core.exception.AssertError main(24): Assertion failure
//--------

I find this strange, because the first test shows that the allocation size
should not be a barrier.

Is the APPENDABLE data miss-placed or something? I can't really make sense of
why there would be a different behaviour between "s2" or "p" ... ?

This is problematic, as "malloc(APPENDABLE)" is the only way to allocate a
"true" d slice manually. Without this, functions such as array or appender,
will return arrays that will relocate after the very first append :(

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Jul 09 2013
next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=10589


Rainer Schuetze <r.sagitario gmx.de> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |r.sagitario gmx.de



PDT ---
I think you are mixing two levels of abstractions here:

        ubyte*  p = cast(ubyte*)GC.malloc(i, GC.BlkAttr.APPENDABLE);
        ubyte[] s = p[0 .. 0];
        writefln("%6s: s.capacity  is %6s", i, s.capacity);

GC.malloc requests raw memory from the GC. capacity is a function very specific
to the way arrays are implemented on top of it in rt/lifetime.d. It assumes
that any allocation with bit APPENDABLE set and that is larger than a page of
4kB reserves 16 bytes at the start of the allocation to store the actually used
length of the memory.

So, to create an empty array manually that works with capacity, you'd have to
set s to

   auto start = i <= 2048 ? 0 : 16;
   ubyte[] s = p[start .. start];

and you'd better clean that full memory block first to avoid the length field
containing garbage.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Jul 12 2013
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=10589





 I think you are mixing two levels of abstractions here:
 
         ubyte*  p = cast(ubyte*)GC.malloc(i, GC.BlkAttr.APPENDABLE);
         ubyte[] s = p[0 .. 0];
         writefln("%6s: s.capacity  is %6s", i, s.capacity);
 
 GC.malloc requests raw memory from the GC. capacity is a function very specific
 to the way arrays are implemented on top of it in rt/lifetime.d. It assumes
 that any allocation with bit APPENDABLE set and that is larger than a page of
 4kB reserves 16 bytes at the start of the allocation to store the actually used
 length of the memory.
 So, to create an empty array manually that works with capacity, you'd have to
 set s to
 
    auto start = i <= 2048 ? 0 : 16;
    ubyte[] s = p[start .. start];
Hum. That worked. Is the memory layout for the APPENDABLE data documented somewhere, or are these just reverse-engineered magic numbers?
 and you'd better clean that full memory block first to avoid the length field
 containing garbage.
Nope (I think). Length is carried in the slice, not the block. Block only contains capacity/used info. ------------------------------------------------------------------------ Thank you for your answer. I suppose the behavior isn't going to change. Still, this requires more support. I think APPENDABLE should be better documented. In particular, the magic numbers should be user accessible via manifest constants, or functions, so as to not have to guess/reverse engineer them. This is what I've discovered: 0 - 256 bytes: 1 Byte APPENDABLE info at end; Capacity = memory size - 1; 257 - 2048 bytes: 2 Byte APPENDABLE info at end; Capacity = memory size - 2; 2049 - **** bytes: 16 Byte APPENDABLE info at beginning; 1 Byte unknown info at end; Capacity = memory size - 17; Are these numbers platform depending? What the heck is that 1 byte that leads to 17 byte difference I'm seeing. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jul 12 2013
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=10589




PDT ---
Is the memory layout for the APPENDABLE data documented somewhere, or are these
just reverse-engineered magic numbers? There is some description in rt/lifetime.d starting around line 200.
Nope (I think). Length is carried in the slice, not the block. 
 Block only contains capacity/used info.
Sorry, I meant the "used" info that must be reset.
In particular, the magic numbers should be user accessible via manifest
 constants, or functions, so as to not have to guess/reverse engineer them. 
The constants are defined privately in lifetime.d as SMALLPAD, MEDPAD and LARGEPAD. I suspect the trailing byte for large arrays is added so that arr[$..$] always points into the same memory block, and not the next one. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jul 12 2013
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=10589


monarchdodra gmail.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
          Component|druntime                    |websites
           Severity|major                       |enhancement




Is the memory layout for the APPENDABLE data documented somewhere, or are these
just reverse-engineered magic numbers? There is some description in rt/lifetime.d starting around line 200.
Nope (I think). Length is carried in the slice, not the block. 
 Block only contains capacity/used info.
Sorry, I meant the "used" info that must be reset.
In particular, the magic numbers should be user accessible via manifest
 constants, or functions, so as to not have to guess/reverse engineer them. 
The constants are defined privately in lifetime.d as SMALLPAD, MEDPAD and LARGEPAD. I suspect the trailing byte for large arrays is added so that arr[$..$] always points into the same memory block, and not the next one.
Ok. I have just some tiny questions left, if you'd care to instruct me: 1. Why does the "Place at beginning" scheme start at 2K bytes, when a page is 4K byte? 2. Why is the padding 16 bytes? You'd think 8 is enough...? Is there a reason, or is it just "future proofing"? I'll write up the info acquired and update the doc. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jul 14 2013
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=10589




PDT ---


 Ok. I have just some tiny questions left, if you'd care to instruct me:
 1. Why does the "Place at beginning" scheme start at 2K bytes, when a page is
 4K byte?
Allocation sizes below 4k are rounded up to the next power of 2, so if the allocation is larger than 2048, a full page is used by the GC. The array functions then assume that extending is possible and place information at the start of the buffer. All this is very specific to the currnt GC implementation.
 2. Why is the padding 16 bytes? You'd think 8 is enough...? Is there a reason,
 or is it just "future proofing"?
It is necessary to support alignment attributes in arrays of structs (at least up to 16) or alignment for SIMD vectors.
 
 I'll write up the info acquired and update the doc.
Thanks. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jul 14 2013
prev sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=10589


monarchdodra gmail.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |INVALID





 Ok. I have just some tiny questions left, if you'd care to instruct me:
 1. Why does the "Place at beginning" scheme start at 2K bytes, when a page is
 4K byte?
It is necessary to support alignment attributes in arrays of structs (at least up to 16) or alignment for SIMD vectors.
That makes sense I guess. By keeping it to a default 16, it avoids the overhead and complexity of calculating alignement.
 2. Why is the padding 16 bytes? You'd think 8 is enough...? Is there a reason,
 or is it just "future proofing"?
Allocation sizes below 4k are rounded up to the next power of 2, so if the allocation is larger than 2048, a full page is used by the GC. The array functions then assume that extending is possible and place information at the start of the buffer.
Derp.
 All this is very specific to the currnt GC implementation.
This. Because it is so specific, I have the feeling that all that has been said here is useless at the end of the day. Only writters of druntime, or users who are writting their own GC, could make use of this. The end user, even a library writer, can't make use of this information.
 I'll write up the info acquired and update the doc.
Thanks.
I think I'll just end up writing that the only thing it does (technically) is set the bit "APPENDABLE" to the memory block. That its exploitation is ultimately implementation dependent. -------- I was able to work around my issue though: By using reserve + assumeSafeAppend, I can get what it was that I wanted to work: Check this out: ushort[] a; a.reserve(20); //GC appendable allocation. a = a.ptr[0 .. 20]; //occupy space assumeSafeAppend(a); //Take ownership of used space assert(a.capcity >= 20); Pretty neat huh? -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jul 14 2013