digitalmars.D - Setting array length without initializing/reallocating.
- Jonathan Levi (8/8) Dec 11 2020 Wow, there went several hours of debugging.
- H. S. Teoh (7/18) Dec 11 2020 I highly recommend reading the following article if you work with D
- Kagamin (2/2) Dec 12 2020 Yes, pointers are the only unsafe way to access memory, slices
- =?UTF-8?Q?S=c3=b6nke_Ludwig?= (4/15) Dec 12 2020 One way around this is to call `array.assumeSafeAppend();` before
- Bastiaan Veelo (12/20) Dec 12 2020 Hold on -- how does this not corrupt memory? As soon as the
- Mike Parker (4/12) Dec 12 2020 You're setting yourself up for failure with that. What are you
- Jackson22 (3/19) Dec 13 2020 How is avoiding an expensive potentially memory leaking operation
- rikki cattermole (5/13) Dec 13 2020 No bounds checking. That slice can extend into memory that isn't of that...
- Jackson22 (10/25) Dec 13 2020 No *automatic* bounds checking != no bounds checking.
- rikki cattermole (21/46) Dec 13 2020 I have used it in the past where appropriate with 0 issues resulting
- Dukc (12/22) Dec 13 2020 Yes it's possible to without automatic bounds checks. Sometimes
- Dukc (2/3) Dec 13 2020 Meant: Yes it's possible to live without automatic bounds checks.
- Mike Parker (19/35) Dec 13 2020 "avoiding an expensive potentially memory leaking operation" is
- Mike Parker (2/6) Dec 13 2020 There's no pretending here. What the OP is doing *is* dangerous.
- Jackson22 (7/15) Dec 14 2020 If someone writes a wrapper around .ptr which checks. It'd be
- Max Haughton (11/29) Dec 14 2020 Good practice is good practice. If you know what you're doing you
- Steven Schveighoffer (10/27) Dec 14 2020 It's possible you have misinterpreted what the OP is asking for.
- Paul Backus (4/6) Dec 14 2020 Though doing it correctly may be harder than you'd think:
- Mike Parker (15/29) Dec 14 2020 Of course. I'm not arguing otherwise. I don't see that anyone
- Dukc (30/38) Dec 13 2020 There is a big downside in doing that: the array will not check
- Dukc (4/11) Dec 13 2020 Okay, there is a bug in my code that it won't work if `arr` is
- Steven Schveighoffer (36/47) Dec 13 2020 Lots of good responses to a mostly ambiguous message.
- Ola Fosheim Grostad (7/15) Dec 15 2020 D and Go have both messed up the concept of a view of an array
Wow, there went several hours of debugging. Increasing the length of a slice, by setting its length, will initialize the new elements and reallocate if necessary. I did not realize length was "smart", I guess I should have guessed. Anyway, to work around this, and probably also be more clear, create a new slice from the same pointer. `array = array.ptr[0..newLength];`
Dec 11 2020
On Sat, Dec 12, 2020 at 12:53:09AM +0000, Jonathan Levi via Digitalmars-d wrote:Wow, there went several hours of debugging. Increasing the length of a slice, by setting its length, will initialize the new elements and reallocate if necessary. I did not realize length was "smart", I guess I should have guessed. Anyway, to work around this, and probably also be more clear, create a new slice from the same pointer. `array = array.ptr[0..newLength];`I highly recommend reading the following article if you work with D arrays in any non-trivial way: https://dlang.org/articles/d-array-article.html T -- Fact is stranger than fiction.
Dec 11 2020
Yes, pointers are the only unsafe way to access memory, slices don't allow it.
Dec 12 2020
Am 12.12.2020 um 01:53 schrieb Jonathan Levi:Wow, there went several hours of debugging. Increasing the length of a slice, by setting its length, will initialize the new elements and reallocate if necessary. I did not realize length was "smart", I guess I should have guessed. Anyway, to work around this, and probably also be more clear, create a new slice from the same pointer. `array = array.ptr[0..newLength];`One way around this is to call `array.assumeSafeAppend();` before setting the new length. In this case it will reuse the already allocated block as long as it is large enough and only reallocate if necessary.
Dec 12 2020
On Saturday, 12 December 2020 at 00:53:09 UTC, Jonathan Levi wrote:Wow, there went several hours of debugging. Increasing the length of a slice, by setting its length, will initialize the new elements and reallocate if necessary. I did not realize length was "smart", I guess I should have guessed. Anyway, to work around this, and probably also be more clear, create a new slice from the same pointer. `array = array.ptr[0..newLength];`Hold on -- how does this not corrupt memory? As soon as the length exceeds the allocated capacity (the point at which the slice would be reallocated when setting its length) you will have a silent out of bounds violation, identical to overflowing a C array. Am I wrong?? If you do not want the expansion to be initialized, I guess you could allocate a new uninitialized slice and copy contents over explicitly. --Bastiaan.
Dec 12 2020
On Saturday, 12 December 2020 at 00:53:09 UTC, Jonathan Levi wrote:Wow, there went several hours of debugging. Increasing the length of a slice, by setting its length, will initialize the new elements and reallocate if necessary. I did not realize length was "smart", I guess I should have guessed. Anyway, to work around this, and probably also be more clear, create a new slice from the same pointer. `array = array.ptr[0..newLength];`You're setting yourself up for failure with that. What are you trying to "work around"? The allocation, or the initialization?
Dec 12 2020
On Saturday, 12 December 2020 at 14:12:06 UTC, Mike Parker wrote:On Saturday, 12 December 2020 at 00:53:09 UTC, Jonathan Levi wrote:How is avoiding an expensive potentially memory leaking operation "setting yourself up for failure"?Wow, there went several hours of debugging. Increasing the length of a slice, by setting its length, will initialize the new elements and reallocate if necessary. I did not realize length was "smart", I guess I should have guessed. Anyway, to work around this, and probably also be more clear, create a new slice from the same pointer. `array = array.ptr[0..newLength];`You're setting yourself up for failure with that. What are you trying to "work around"? The allocation, or the initialization?
Dec 13 2020
On 14/12/2020 6:01 AM, Jackson22 wrote:No bounds checking. That slice can extend into memory that isn't of that type or even allocated to the process. By avoiding that "expensive" memory operation, you instead create a silent memory corruption in its place which is far worse.How is avoiding an expensive potentially memory leaking operation "setting yourself up for failure"?`array = array.ptr[0..newLength];`You're setting yourself up for failure with that. What are you trying to "work around"? The allocation, or the initialization?
Dec 13 2020
On Sunday, 13 December 2020 at 17:26:45 UTC, rikki cattermole wrote:On 14/12/2020 6:01 AM, Jackson22 wrote:No *automatic* bounds checking != no bounds checking. There's a reason .ptr exist, I wish people would stop pretending that using it where it is appropriate is somehow going to lead to failure when there are more successful programming languages that have zero automatic bounds checking.No bounds checking. That slice can extend into memory that isn't of that type or even allocated to the process.How is avoiding an expensive potentially memory leaking operation "setting yourself up for failure"?`array = array.ptr[0..newLength];`You're setting yourself up for failure with that. What are you trying to "work around"? The allocation, or the initialization?By avoiding that "expensive" memory operation, you instead create a silent memory corruption in its place which is far worse.Why did you quote expensive? Are you implying it isn't expensive? Are you saying re-allocating 4 GB of memory every 6 ms isn't expensive?
Dec 13 2020
On 14/12/2020 9:03 AM, Jackson22 wrote:On Sunday, 13 December 2020 at 17:26:45 UTC, rikki cattermole wrote:I have used it in the past where appropriate with 0 issues resulting from it. I do not believe that this is the case here.On 14/12/2020 6:01 AM, Jackson22 wrote:No *automatic* bounds checking != no bounds checking. There's a reason .ptr exist, I wish people would stop pretending that using it where it is appropriate is somehow going to lead to failureNo bounds checking. That slice can extend into memory that isn't of that type or even allocated to the process.How is avoiding an expensive potentially memory leaking operation "setting yourself up for failure"?`array = array.ptr[0..newLength];`You're setting yourself up for failure with that. What are you trying to "work around"? The allocation, or the initialization?when there are more successful programming languages that have zero automatic bounds checking.int[] a = [1, 2, 3]; assert(a.ptr is a.ptr[0 .. 4].ptr); Out of bounds, runs successfully. Doesn't mean that the GC is aware that it now has a length of 4. int[] b = a; b.length = 4; assert(a.ptr !is b.ptr); This is a case where .length is clearly doing the right thing. int[] c = a; c.length = 1; assert(a.ptr is c.ptr);Allocating memory is always more expensive than using a buffer where life times are known and predictable. You are right about that. In this case, that isn't what is being described. If length is allocating, then that code was not designed to be used with a buffer. The most expensive thing in this scenario is not allocating memory, it is silent memory corruption. Once corrupted not only can the process die at any point, but you can't trust its output any longer.By avoiding that "expensive" memory operation, you instead create a silent memory corruption in its place which is far worse.Why did you quote expensive? Are you implying it isn't expensive? Are you saying re-allocating 4 GB of memory every 6 ms isn't expensive?
Dec 13 2020
On Sunday, 13 December 2020 at 20:03:46 UTC, Jackson22 wrote:Yes it's possible to without automatic bounds checks. Sometimes one has to -when using those older langages or doing very low-level system programming. And other times it may not be necessary, but still worth it to gain that last bit of performance when optimizing. These are the reasons why `.ptr` exists. We really don't know whether either of those cases apply to OP:s case, but if the length extension with implicit duplications were even close to the desired performance, it seems unlikely.No bounds checking. That slice can extend into memory that isn't of that type or even allocated to the process.No *automatic* bounds checking != no bounds checking. There's a reason .ptr exist, I wish people would stop pretending that using it where it is appropriate is somehow going to lead to failure when there are more successful programming languages that have zero automatic bounds checking.Why did you quote expensive? Are you implying it isn't expensive? Are you saying re-allocating 4 GB of memory every 6 ms isn't expensive?I think he was comparing to extending the array in-place, but in a bounds-checked way.
Dec 13 2020
On Sunday, 13 December 2020 at 21:01:18 UTC, Dukc wrote:Yes it's possible to without automatic bounds checks.Meant: Yes it's possible to live without automatic bounds checks.
Dec 13 2020
On Sunday, 13 December 2020 at 20:03:46 UTC, Jackson22 wrote:On Sunday, 13 December 2020 at 17:26:45 UTC, rikki cattermole wrote:"avoiding an expensive potentially memory leaking operation" is not the issue, it's how the OP is going about it. Based on the OP's question and the example, the impression I get is that it's an attempt to arbitrarily increase the length of a slice with no regard to the capacity of its memory store. If `newLength` is greater than the remaining capacity in the memory store, then the new length will go beyond whatever has been allocated. That is what I meant by "setting yourself up for failure", and that is why the lack of bounds checking is an issue here. | Steven's post lays out other potential issues with taking this approach in D.On 14/12/2020 6:01 AM, Jackson22 wrote:How is avoiding an expensive potentially memory leaking operation "setting yourself up for failure"?`array = array.ptr[0..newLength];`You're setting yourself up for failure with that. What are you trying to "work around"? The allocation, or the initialization?But even with manual bounds checking, there has to be enough memory allocated somewhere to hold the new array elements. For a dynamically resizable array, there is no escaping the need to allocate memory. The cost can be mitigated by allocating enough up front, or with a tailored reallocation strategy, but it can't be eliminated.No bounds checking. That slice can extend into memory that isn't of that type or even allocated to the process.No *automatic* bounds checking != no bounds checking.
Dec 13 2020
On Sunday, 13 December 2020 at 20:03:46 UTC, Jackson22 wrote:There's a reason .ptr exist, I wish people would stop pretending that using it where it is appropriate is somehow going to lead to failure when there are more successful programming languages that have zero automatic bounds checking.There's no pretending here. What the OP is doing *is* dangerous.
Dec 13 2020
On Monday, 14 December 2020 at 01:36:02 UTC, Mike Parker wrote:On Sunday, 13 December 2020 at 20:03:46 UTC, Jackson22 wrote:If someone writes a wrapper around .ptr which checks. It'd be literally no different than the implementation in druntime. Like I said, I wish people would stop pretending that using it where it is appropriate is somehow going to lead to failure. Maybe those people just aren't knowledgeable enough to understand, I don't know.There's a reason .ptr exist, I wish people would stop pretending that using it where it is appropriate is somehow going to lead to failure when there are more successful programming languages that have zero automatic bounds checking.There's no pretending here. What the OP is doing *is* dangerous.
Dec 14 2020
On Monday, 14 December 2020 at 20:53:39 UTC, Jackson22 wrote:On Monday, 14 December 2020 at 01:36:02 UTC, Mike Parker wrote:Good practice is good practice. If you know what you're doing you probably shouldn't need to ask. What Mike is saying is important to know, because even if you use exactly the same concept as what druntime does in your code, you're still repeating a pattern which will lead to bugs if you get it wrong. Good code is all about compartmentalizing bad code, especially with memory where (thankfully we have sanitizers now) things can often go badly wrong without actually exhibiting any side-effects (i.e. we all know why C code has so many security problems)On Sunday, 13 December 2020 at 20:03:46 UTC, Jackson22 wrote:If someone writes a wrapper around .ptr which checks. It'd be literally no different than the implementation in druntime. Like I said, I wish people would stop pretending that using it where it is appropriate is somehow going to lead to failure. Maybe those people just aren't knowledgeable enough to understand, I don't know.There's a reason .ptr exist, I wish people would stop pretending that using it where it is appropriate is somehow going to lead to failure when there are more successful programming languages that have zero automatic bounds checking.There's no pretending here. What the OP is doing *is* dangerous.
Dec 14 2020
On 12/14/20 3:53 PM, Jackson22 wrote:On Monday, 14 December 2020 at 01:36:02 UTC, Mike Parker wrote:It's possible you have misinterpreted what the OP is asking for. Maybe the OP misstated what he is looking to do. Without a clarifying response from him, it's hard to tell how to respond, which means we have to respond with the most pessimistic interpretation of the post possible. Yes, you can use .ptr to avoid bounds checks, and it's safe if you do it correctly. No you shouldn't use .ptr to create array slices that refer to memory outside the range that exists (and using .ptr slicing as posted in the original can do this). It's as basic as that. -SteveOn Sunday, 13 December 2020 at 20:03:46 UTC, Jackson22 wrote:If someone writes a wrapper around .ptr which checks. It'd be literally no different than the implementation in druntime. Like I said, I wish people would stop pretending that using it where it is appropriate is somehow going to lead to failure. Maybe those people just aren't knowledgeable enough to understand, I don't know.There's a reason .ptr exist, I wish people would stop pretending that using it where it is appropriate is somehow going to lead to failure when there are more successful programming languages that have zero automatic bounds checking.There's no pretending here. What the OP is doing *is* dangerous.
Dec 14 2020
On Monday, 14 December 2020 at 23:55:09 UTC, Steven Schveighoffer wrote:Yes, you can use .ptr to avoid bounds checks, and it's safe if you do it correctly.Though doing it correctly may be harder than you'd think: https://gist.github.com/pbackus/39b13e8a2c6aea0e090e4b1fe8046df5#example-short-string
Dec 14 2020
On Monday, 14 December 2020 at 20:53:39 UTC, Jackson22 wrote:On Monday, 14 December 2020 at 01:36:02 UTC, Mike Parker wrote:Of course. I'm not arguing otherwise. I don't see that anyone else is either. I'm talking about the specific case raised by the OP, where the issue isn't just a lack of automatic bounds checking, but the lack of any bounds checking at all. Bounds checking before resizing has one of two possible outcomes: a reallocation, or no resizing occurs. The OP explicitly asked how to resize an array *without* reallocation, which implies that neither outcome of bounds checking is what he's looking for. So yes, arbitrarily slicing a pointer beyond its length in that situation is asking for trouble. I mean, if there were more to the story, e.g., the array is backed by a block of malloced memory that's large enough for newLength, as manual bounds checking would verify, then the question of how to resize without reallocating is a moot one, no?On Sunday, 13 December 2020 at 20:03:46 UTC, Jackson22 wrote:If someone writes a wrapper around .ptr which checks. It'd be literally no different than the implementation in druntime.There's a reason .ptr exist, I wish people would stop pretending that using it where it is appropriate is somehow going to lead to failure when there are more successful programming languages that have zero automatic bounds checking.There's no pretending here. What the OP is doing *is* dangerous.
Dec 14 2020
On Saturday, 12 December 2020 at 00:53:09 UTC, Jonathan Levi wrote:Wow, there went several hours of debugging. Increasing the length of a slice, by setting its length, will initialize the new elements and reallocate if necessary. I did not realize length was "smart", I guess I should have guessed. Anyway, to work around this, and probably also be more clear, create a new slice from the same pointer. `array = array.ptr[0..newLength];`There is a big downside in doing that: the array will not check whether it's still referring to valid memory after the resize. Your way is efficient in machine code, but in most cases it's highly unpractical to skip on memory safety to speed up code like this. In the general case, this is a better way to resize arrays without reallocating: ``` safe resizedWithin(T)(T[] arr, T[] within, size_t newSize) { if(newSize == 0) return arr[0..0]; auto startIndex= &arr[0] - &within[0]; return within[startIndex .. startIndex + newSize]; } safe void main() { import std; auto containerArray = iota(1000).array; auto array = containerArray[50 .. 60]; array = array.resizedWithin(containerArray, 20); //[50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, //66, 67, 68, 69] writeln(array); } ``` Here, if you accidently gave too big new size for `array`, or `array` wasn't withing `containerArray` (except if `array.length == 0`), the program would immediately abort instead of making an invalid array.
Dec 13 2020
On Sunday, 13 December 2020 at 13:19:35 UTC, Dukc wrote:``` safe resizedWithin(T)(T[] arr, T[] within, size_t newSize) { if(newSize == 0) return arr[0..0]; auto startIndex= &arr[0] - &within[0]; return within[startIndex .. startIndex + newSize]; } ```Okay, there is a bug in my code that it won't work if `arr` is originally of length 0. May well contain other bugs, use with care :D. But hey, at least no memory corruption, as it's ` safe`.
Dec 13 2020
On 12/11/20 7:53 PM, Jonathan Levi wrote:Wow, there went several hours of debugging. Increasing the length of a slice, by setting its length, will initialize the new elements and reallocate if necessary. I did not realize length was "smart", I guess I should have guessed. Anyway, to work around this, and probably also be more clear, create a new slice from the same pointer. `array = array.ptr[0..newLength];`Lots of good responses to a mostly ambiguous message. So let's go over some possibilities: 1. You want to *shrink* the array length. array = array[0 .. newLength] works just fine. No reallocation, no initialization. 2. You want to *grow* the array length. array = array.ptr[0 .. newLength] is incredibly wrong and dangerous. You should not do this. 3. You wish to have no allocation for growing an array beyond it's already-allocated block. I only mention this because it could be implied by your message, even though I'm pretty sure you don't mean this. This is fantasy, and you should not do this. Memory corruption is something you don't want to deal with. It's the reason why your chosen solution is incorrect. 4. You wish to have no allocation for growing an array into it's ALREADY allocated block. This is possible, and even possible without reinitializing the new elements. In this context, your code is actually OK, though like Sönke mentions, you should call assumeSafeAppend on the array: assert(newLength <= array.capacity); // ensure I am not growing beyond the block. array = array.ptr[0 .. newLength]; // yay, new data that is uninitialized (mostly). array.assumeSafeAppend(); // now the runtime is aware that I have taken over that data for use. Why is it important to call assumeSafeAppend? A few reasons: 1. The GC will run destructors on elements in an array only if they are known to be used (in the case that your elements have destructors). 2. If you don't call it, appending to the original slice could overwrite your data 3. If you try to append to the resulting array and there technically would be space to fill inside the current block, the runtime will needlessly reallocate if your array ends outside where it thinks it should end. Alternative to the assert, you could check for capacity and newLength to be consistent, and if not, reallocate yourself. -Steve
Dec 13 2020
On Saturday, 12 December 2020 at 00:53:09 UTC, Jonathan Levi wrote:Wow, there went several hours of debugging. Increasing the length of a slice, by setting its length, will initialize the new elements and reallocate if necessary. I did not realize length was "smart", I guess I should have guessed. Anyway, to work around this, and probably also be more clear, create a new slice from the same pointer. `array = array.ptr[0..newLength];`D and Go have both messed up the concept of a view of an array and owning the backing store of an array. I suggest using slices like c++ spans, only make them smaller, then create you own dynamic array ADT wrapper for explicit array ownership of the full array.
Dec 15 2020