digitalmars.D - Slices and GC
- BLM (9/9) Apr 05 2012 Recently I've been working on some projects that involve parsing
- Vladimir Panteleev (9/18) Apr 05 2012 The GC can't really know which parts of the array you're using.
- BLM (9/18) Apr 05 2012 I had considered using .dup, but I wanted to minimize overhead. I
- Dmitry Olshansky (12/29) Apr 05 2012 Another idea is to copy out interesting parts of the original chunk to a...
- Jonathan M Davis (3/12) Apr 05 2012 http://dlang.org/d-array-article.html
Recently I've been working on some projects that involve parsing binary files. I've mainly been using std.file.read() to get the whole file as a huge array and then extracting slices. I had initially assumed that the GC would free any chunks of the array that didn't end up being referenced by these slices, but after reading some more, it looks like the whole array is kept in memory even if only a few elements are actually referenced. Is this actually the case? If so, might the language be extended to handle this situation?
Apr 05 2012
On Thursday, 5 April 2012 at 15:00:04 UTC, BLM wrote:Recently I've been working on some projects that involve parsing binary files. I've mainly been using std.file.read() to get the whole file as a huge array and then extracting slices. I had initially assumed that the GC would free any chunks of the array that didn't end up being referenced by these slices, but after reading some more, it looks like the whole array is kept in memory even if only a few elements are actually referenced. Is this actually the case? If so, might the language be extended to handle this situation?The GC can't really know which parts of the array you're using. For example, your only reference to the array might be a pointer, and you might be traversing the array in either direction, only keeping count of the remaining bytes until the array boundary. Consider .dup-ing the slices you're going to need, or using std.mmfile to map the file into memory - in that case, the OS won't load the unnecessary parts of the file into memory in the first place.
Apr 05 2012
On Thursday, 5 April 2012 at 15:30:45 UTC, Vladimir Panteleev wrote:The GC can't really know which parts of the array you're using. For example, your only reference to the array might be a pointer, and you might be traversing the array in either direction, only keeping count of the remaining bytes until the array boundary. Consider .dup-ing the slices you're going to need, or using std.mmfile to map the file into memory - in that case, the OS won't load the unnecessary parts of the file into memory in the first place.I had considered using .dup, but I wanted to minimize overhead. I should probably look into std.mmfile or pull the data out in smaller chunks that the GC can handle individually. If the GC can distinguish between pointers and slices, it should theoretically be able to prune an array that is only referenced by slices, but I'm not sure how well that would fit into the current GC system.
Apr 05 2012
On 05.04.2012 20:35, BLM wrote:On Thursday, 5 April 2012 at 15:30:45 UTC, Vladimir Panteleev wrote:Another idea is to copy out interesting parts of the original chunk to a separate storage array. This array will contain your sliced-out data just packed more tightly. If you have a upper bound on % of useful bytes then you can get away without extra allocations. The tricky part is reallocating this storage array, as it will make slices that point to it dangling (and keeping GC from deallocation), a workaround would be to use pure index-based "slices" that work on this block only.The GC can't really know which parts of the array you're using. For example, your only reference to the array might be a pointer, and you might be traversing the array in either direction, only keeping count of the remaining bytes until the array boundary. Consider .dup-ing the slices you're going to need, or using std.mmfile to map the file into memory - in that case, the OS won't load the unnecessary parts of the file into memory in the first place.I had considered using .dup, but I wanted to minimize overhead. I should probably look into std.mmfile or pull the data out in smaller chunks that the GC can handle individually.If the GC can distinguish between pointers and slices, it should theoretically be able to prune an array that is only referenced by slices, but I'm not sure how well that would fit into the current GC system.-- Dmitry Olshansky
Apr 05 2012
On Thursday, April 05, 2012 17:00:03 BLM wrote:Recently I've been working on some projects that involve parsing binary files. I've mainly been using std.file.read() to get the whole file as a huge array and then extracting slices. I had initially assumed that the GC would free any chunks of the array that didn't end up being referenced by these slices, but after reading some more, it looks like the whole array is kept in memory even if only a few elements are actually referenced. Is this actually the case? If so, might the language be extended to handle this situation?http://dlang.org/d-array-article.html - Jonathan M Davis
Apr 05 2012