www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - GC.sizeOf(array.ptr)

reply "Dicebot" <public dicebot.lv> writes:
There is one issue I have encountered during CDGC porting I may 
need advice with.
Consider this simple snippet:

```
import core.memory;

void main()
{
     ubyte[] result;
     result.length = 4096;
     assert(GC.sizeOf(result.ptr) > 0);
}
``

Assertion passes with D1/Tango runtime but fails with current D2 
runtime. This happens because `result.ptr` is not actually a 
pointer returned by gc_qalloc from array reallocation, but 
interior pointer 16 bytes from the start of that block. Druntime 
stores some metadata (length/capacity I presume) in the very 
beginning.

As a result GC.sizeOf(array.ptr) results in 0 (being an interior 
pointer).

Interesting side effect is that this code:

```
void main()
{
     ubyte[] result;
     result.length = 4096;
     GC.free(result.ptr);
}
```

..does not actually free anything for the very same reason 
(result.ptr is interior pointer), it just silently succeeds doing 
nothing.

Is such behaviour intended?
Sep 30 2014
next sibling parent reply Steven Schveighoffer <schveiguy yahoo.com> writes:
On 9/30/14 9:42 AM, Dicebot wrote:
 There is one issue I have encountered during CDGC porting I may need
 advice with.
 Consider this simple snippet:

 ```
 import core.memory;

 void main()
 {
      ubyte[] result;
      result.length = 4096;
      assert(GC.sizeOf(result.ptr) > 0);
 }
 ``

 Assertion passes with D1/Tango runtime but fails with current D2
 runtime. This happens because `result.ptr` is not actually a pointer
 returned by gc_qalloc from array reallocation, but interior pointer 16
 bytes from the start of that block. Druntime stores some metadata
 (length/capacity I presume) in the very beginning.
This is accurate, it stores the "used" size of the array. But it's only the case for arrays, not general GC.malloc blocks. Alternative is to use result.capacity, which essentially looks up the same thing (and should be more accurate). But it doesn't cover the same inputs. Trivial note that you only need to allocate 2047 bytes to get this behavior.
 As a result GC.sizeOf(array.ptr) results in 0 (being an interior pointer).

 Interesting side effect is that this code:

 ```
 void main()
 {
      ubyte[] result;
      result.length = 4096;
      GC.free(result.ptr);
 }
 ```

 ..does not actually free anything for the very same reason (result.ptr
 is interior pointer), it just silently succeeds doing nothing.
This is a problem. We should provide a definitive way to dealloc an array, if it's not already present (assuming delete goes away).
 Is such behaviour intended?
Depends on what part you are talking about ;) But kidding aside, when I added the array appending update, I didn't intend to affect these functions, nor did I consider the cases above. So it was not intended to break these things, but I think we can repair them, or at least provide alternate means to create the same result. Interestingly, I think any lookup of block information for an interior pointer costs the same as looking up block info for a base pointer. I think adding a flag that allows performing operations on interior pointers might make this more palatable. But we absolutely need a way to free an array that does not require the knowledge of the inner workings of the runtime. -Steve
Sep 30 2014
parent reply "Dicebot" <public dicebot.lv> writes:
On Tuesday, 30 September 2014 at 14:01:17 UTC, Steven 
Schveighoffer wrote:
 Assertion passes with D1/Tango runtime but fails with current 
 D2
 runtime. This happens because `result.ptr` is not actually a 
 pointer
 returned by gc_qalloc from array reallocation, but interior 
 pointer 16
 bytes from the start of that block. Druntime stores some 
 metadata
 (length/capacity I presume) in the very beginning.
This is accurate, it stores the "used" size of the array. But it's only the case for arrays, not general GC.malloc blocks. Alternative is to use result.capacity, which essentially looks up the same thing (and should be more accurate). But it doesn't cover the same inputs.
Why is it stored in the beginning and not in the end of the block (like capacity)? I'd like to explore options of removing interior pointer completely before proceeding with adding more special cases to GC functions.
Sep 30 2014
parent reply Steven Schveighoffer <schveiguy yahoo.com> writes:
On 9/30/14 10:24 AM, Dicebot wrote:
 On Tuesday, 30 September 2014 at 14:01:17 UTC, Steven Schveighoffer wrote:
 Assertion passes with D1/Tango runtime but fails with current D2
 runtime. This happens because `result.ptr` is not actually a pointer
 returned by gc_qalloc from array reallocation, but interior pointer 16
 bytes from the start of that block. Druntime stores some metadata
 (length/capacity I presume) in the very beginning.
This is accurate, it stores the "used" size of the array. But it's only the case for arrays, not general GC.malloc blocks. Alternative is to use result.capacity, which essentially looks up the same thing (and should be more accurate). But it doesn't cover the same inputs.
Why is it stored in the beginning and not in the end of the block (like capacity)? I'd like to explore options of removing interior pointer completely before proceeding with adding more special cases to GC functions.
First, it is the capacity. It's just that the capacity lives at the beginning of larger blocks. The reason is due to the ability to extend pages. With smaller blocks (2048 bytes or less), the page is divided into equal portions, and those can NEVER be extended. Any attempt to extend results in a realloc into another block. Putting the capacity at the end makes sense for 2 reasons: 1. 1 byte is already reserved to prevent cross-block pointers, 2. It doesn't cause alignment issues. We can't very well offset a 16 byte block by 16 bytes. But importantly, the capacity field does not move. However, for page and above size (4096+ bytes), the original (D1 and early D2) runtime would attempt to extend into the next page, without moving the data. Thus we save the copy of data into a new block, and just set some bits and we're done. But this poses a problem for when the capacity field is stored at the end -- especially since we are caching the block info. The block info can change with a call to GC.extend (whereas a fixed-size block, the block info CANNOT change). Depending on what "version" of the block info you have, the "end" can be different, and you may end up corrupting data. This is especially important for shared or immutable array blocks, where multiple threads could be appending at the same time. So I made the call to put it at the beginning of the block, which obviously doesn't change, and offset everything by 16 bytes to maintain alignment. It may very well be that we can put it at the end of the block instead, and you can probably do so without much effort in the runtime (everything uses CTFE functions to calculate padding and location of the capacity). It has been such a long time since I did that, I'm not very sure of all the reasons not to do it. A look through the mailing list archives might be useful. -Steve
Sep 30 2014
next sibling parent reply "Dicebot" <public dicebot.lv> writes:
On Tuesday, 30 September 2014 at 15:46:54 UTC, Steven 
Schveighoffer wrote:
 On 9/30/14 10:24 AM, Dicebot wrote:
 On Tuesday, 30 September 2014 at 14:01:17 UTC, Steven 
 Schveighoffer wrote:
 Assertion passes with D1/Tango runtime but fails with 
 current D2
 runtime. This happens because `result.ptr` is not actually a 
 pointer
 returned by gc_qalloc from array reallocation, but interior 
 pointer 16
 bytes from the start of that block. Druntime stores some 
 metadata
 (length/capacity I presume) in the very beginning.
This is accurate, it stores the "used" size of the array. But it's only the case for arrays, not general GC.malloc blocks. Alternative is to use result.capacity, which essentially looks up the same thing (and should be more accurate). But it doesn't cover the same inputs.
Why is it stored in the beginning and not in the end of the block (like capacity)? I'd like to explore options of removing interior pointer completely before proceeding with adding more special cases to GC functions.
First, it is the capacity. It's just that the capacity lives at the beginning of larger blocks. The reason is due to the ability to extend pages. With smaller blocks (2048 bytes or less), the page is divided into equal portions, and those can NEVER be extended. Any attempt to extend results in a realloc into another block. Putting the capacity at the end makes sense for 2 reasons: 1. 1 byte is already reserved to prevent cross-block pointers, 2. It doesn't cause alignment issues. We can't very well offset a 16 byte block by 16 bytes. But importantly, the capacity field does not move. However, for page and above size (4096+ bytes), the original (D1 and early D2) runtime would attempt to extend into the next page, without moving the data. Thus we save the copy of data into a new block, and just set some bits and we're done.
Ah that must be what confused me - I looked at small block offset calculation originally and blindly assumed same logic for other sizes. Sorry, my fault!
 But this poses a problem for when the capacity field is stored 
 at the end -- especially since we are caching the block info. 
 The block info can change with a call to GC.extend (whereas a 
 fixed-size block, the block info CANNOT change). Depending on 
 what "version" of the block info you have, the "end" can be 
 different, and you may end up corrupting data. This is 
 especially important for shared or immutable array blocks, 
 where multiple threads could be appending at the same time.

 So I made the call to put it at the beginning of the block, 
 which obviously doesn't change, and offset everything by 16 
 bytes to maintain alignment.

 It may very well be that we can put it at the end of the block 
 instead, and you can probably do so without much effort in the 
 runtime (everything uses CTFE functions to calculate padding 
 and location of the capacity). It has been such a long time 
 since I did that, I'm not very sure of all the reasons not to 
 do it. A look through the mailing list archives might be useful.
I think it should be possible. That way actual block size will be simply considered a bit smaller and extending happen before reserved space is hit. But of course I have only a very vague knowledge of druntime ackquired while porting cdgc so may need to think about it a bit more and probably chat with Leandro too :) Have created bugzilla issue for now : https://issues.dlang.org/show_bug.cgi?id=13558
Sep 30 2014
parent Steven Schveighoffer <schveiguy yahoo.com> writes:
On 9/30/14 12:01 PM, Dicebot wrote:

 I think it should be possible. That way actual block size will be simply
 considered a bit smaller and extending happen before reserved space is
 hit. But of course I have only a very vague knowledge of druntime
 ackquired while porting cdgc so may need to think about it a bit more
 and probably chat with Leandro too :)
I think it is possible, and perhaps more correct, to put the array append size into a separate memory block for larger sizes. The placement at the end works very well and has good properties for smaller sizes. This is similar to how the flags work. -Steve
Sep 30 2014
prev sibling parent reply "Sean Kelly" <sean invisibleduck.org> writes:
On Tuesday, 30 September 2014 at 15:46:54 UTC, Steven
Schveighoffer wrote:
 So I made the call to put it at the beginning of the block, 
 which obviously doesn't change, and offset everything by 16 
 bytes to maintain alignment.

 It may very well be that we can put it at the end of the block 
 instead, and you can probably do so without much effort in the 
 runtime (everything uses CTFE functions to calculate padding 
 and location of the capacity). It has been such a long time 
 since I did that, I'm not very sure of all the reasons not to 
 do it. A look through the mailing list archives might be useful.
Yes, a lot of this is an artifact of the relatively simplistic manner that the current GC tracks memory. If large blocks had a header, for example, then this could theoretically live there and not cause any problems. As we move towards supporting precise scanning, the GC will need to be aware of the types of data it holds, and so some portion of the array appendability strategy should probably migrate into the GC. A redefinition of the GC interface is many years overdue. This just needs to be considered when it happens.
Sep 30 2014
parent reply "Dicebot" <public dicebot.lv> writes:
On Tuesday, 30 September 2014 at 17:28:02 UTC, Sean Kelly wrote:
 On Tuesday, 30 September 2014 at 15:46:54 UTC, Steven
 Schveighoffer wrote:
 So I made the call to put it at the beginning of the block, 
 which obviously doesn't change, and offset everything by 16 
 bytes to maintain alignment.

 It may very well be that we can put it at the end of the block 
 instead, and you can probably do so without much effort in the 
 runtime (everything uses CTFE functions to calculate padding 
 and location of the capacity). It has been such a long time 
 since I did that, I'm not very sure of all the reasons not to 
 do it. A look through the mailing list archives might be 
 useful.
Yes, a lot of this is an artifact of the relatively simplistic manner that the current GC tracks memory. If large blocks had a header, for example, then this could theoretically live there and not cause any problems. As we move towards supporting precise scanning, the GC will need to be aware of the types of data it holds, and so some portion of the array appendability strategy should probably migrate into the GC. A redefinition of the GC interface is many years overdue. This just needs to be considered when it happens.
I decided to add a similar workaround to CDGC for now and fix the way it is stored in druntime a bit later :) On a related note : CDGC D2 port passes druntime test suite starting with today (with only shared library tests disabled), so initial upstream PR should happen very soon (just need to clean up all trace and "omg hax" commits :))
Sep 30 2014
parent ketmar via Digitalmars-d <digitalmars-d puremagic.com> writes:
On Tue, 30 Sep 2014 17:43:04 +0000
Dicebot via Digitalmars-d <digitalmars-d puremagic.com> wrote:

 On a related note : CDGC D2 port passes druntime test suite=20
 starting with today (with only shared library tests disabled), so=20
 initial upstream PR should happen very soon (just need to clean=20
 up all trace and "omg hax" commits :))
wow! no, really, what else can i say? ;-)
Sep 30 2014
prev sibling parent reply "Sean Kelly" <sean invisibleduck.org> writes:
On Tuesday, 30 September 2014 at 13:42:14 UTC, Dicebot wrote:
 Is such behaviour intended?
Yes. As far as the GC is concerned, asking for the size of an interior pointer is asking for the size of a slice of the data, and as slices are not extendable it will always return 0. All of the APPENDABLE stuff takes place outside the GC (except for the definition of the APPENDABLE BlkAttr, which really should be defined externally and within the user-reserved range of the bitfield instead of the GC-reserved range, but I digress...) and so it has no way of knowing that someone is using the blocks this way.
Sep 30 2014
parent reply Steven Schveighoffer <schveiguy yahoo.com> writes:
On 9/30/14 1:23 PM, Sean Kelly wrote:

 (except for the
 definition of the APPENDABLE BlkAttr, which really should be
 defined externally and within the user-reserved range of the
 bitfield instead of the GC-reserved range, but I digress...)
The APPENDABLE bit was added as a solution to avoid having to reserve that memory for all allocations. Basically, if you try to append to a block that doesn't have the bit, it simply reallocates conservatively. So it does have to be part of the GC metadata, because the most important effect is on blocks that AREN'T allocated via the array runtime. Otherwise, the append mechanism creeps into all aspects of memory allocation. -Steve
Sep 30 2014
parent reply "Sean Kelly" <sean invisibleduck.org> writes:
On Tuesday, 30 September 2014 at 17:51:18 UTC, Steven
Schveighoffer wrote:
 On 9/30/14 1:23 PM, Sean Kelly wrote:

 (except for the
 definition of the APPENDABLE BlkAttr, which really should be
 defined externally and within the user-reserved range of the
 bitfield instead of the GC-reserved range, but I digress...)
The APPENDABLE bit was added as a solution to avoid having to reserve that memory for all allocations. Basically, if you try to append to a block that doesn't have the bit, it simply reallocates conservatively. So it does have to be part of the GC metadata, because the most important effect is on blocks that AREN'T allocated via the array runtime. Otherwise, the append mechanism creeps into all aspects of memory allocation.
Yeah I know. But when I defined the BlkAttr bitfield I'd reserved one portion of the range for internal GC stuff and another portion for user-defined stuff. APPENDABLE should probably have landed in the user-defined portion. I don't see any of those comments in the current code or I'd point to them. I guess they were deleted at some point.
Sep 30 2014
parent reply Steven Schveighoffer <schveiguy yahoo.com> writes:
On 9/30/14 2:19 PM, Sean Kelly wrote:
 On Tuesday, 30 September 2014 at 17:51:18 UTC, Steven
 Schveighoffer wrote:
 On 9/30/14 1:23 PM, Sean Kelly wrote:

 (except for the
 definition of the APPENDABLE BlkAttr, which really should be
 defined externally and within the user-reserved range of the
 bitfield instead of the GC-reserved range, but I digress...)
The APPENDABLE bit was added as a solution to avoid having to reserve that memory for all allocations. Basically, if you try to append to a block that doesn't have the bit, it simply reallocates conservatively. So it does have to be part of the GC metadata, because the most important effect is on blocks that AREN'T allocated via the array runtime. Otherwise, the append mechanism creeps into all aspects of memory allocation.
Yeah I know. But when I defined the BlkAttr bitfield I'd reserved one portion of the range for internal GC stuff and another portion for user-defined stuff. APPENDABLE should probably have landed in the user-defined portion. I don't see any of those comments in the current code or I'd point to them. I guess they were deleted at some point.
Hm... looked at the code, I have no idea how the GC would handle user-defined stuff. It seems to only deal with bits it knows about (i.e. APPENDABLE is specifically handled in gc/gc.d) I think I just took the next bit available, if there was a warning about GC internal bits when I added APPENDABLE, I either missed it or dismissed it. -Steve
Sep 30 2014
parent "Sean Kelly" <sean invisibleduck.org> writes:
On Tuesday, 30 September 2014 at 19:19:18 UTC, Steven
Schveighoffer wrote:
 Hm... looked at the code, I have no idea how the GC would 
 handle user-defined stuff. It seems to only deal with bits it 
 knows about (i.e. APPENDABLE is specifically handled in gc/gc.d)
It wouldn't. The GC was just going to provide storage for some number of user-defined bits.
 I think I just took the next bit available, if there was a 
 warning about GC internal bits when I added APPENDABLE, I 
 either missed it or dismissed it.
With BlkAttr defined independently in a bunch of different places at the time, the comment would have been easy to miss. I don't know that it really matters anyway, but it's something I noticed today.
Sep 30 2014