www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - shared arrray problem

reply Charles Hixson via Digitalmars-d-learn writes:
I have a piece of code that looks thus:

/**    Returns an editable file header missing the header length and data
  * length portions.  Those cannot be edited by a routine outside this 
class.
  * Access to them is available via the lenHead and lenRec functions.
  * Warning:  Do NOT change the size of the header.  If you do the size
  * will be reset to the current value before it is saved, and results are
  * unpredictable.  Also do not replace it.  This class maintains it's own
  * pointer to the header, and your replacement will be ignored. */
ubyte[]    header()         property    {    return fHead[4..$];    }

I want other classes to be able to modify the tail of the array, but not 
to resize it.  Eventually it should be written to a file. This way works 
(well, should work) but looks radically unsafe for the reasons indicated 
in the comments.  I'm sure there must be a better way, but can't think 
of what it would be.  The only alternative I've been able to think of 
is, approx:

bool    saveToHeader (ubyte[] rec)    { ... buf[4..$] = rec[0..$]; ... }

but that would be guaranteed to have an extra copying step, and it's 
bool because if the length of the passed parameter isn't correct 
saveToHeader would fail.  It may still be better since in this case the 
copying would be rather minimal, but the general problem bothers me 
because I don't see a good solution.
Nov 19 2016
parent reply Nicolas Gurrola <padresfan11 gmail.com> writes:
On Saturday, 19 November 2016 at 18:51:05 UTC, Charles Hixson 
wrote:

 ubyte[]    header()         property    {    return 
 fHead[4..$];    }
This method should do what you want. You are only returning a slice of the fHead array, so if the caller modifies the length it will only affect of the return value, and not the length of fHead itself.
Nov 19 2016
parent reply Charles Hixson via Digitalmars-d-learn writes:
On 11/19/2016 11:10 AM, Nicolas Gurrola via Digitalmars-d-learn wrote:
 On Saturday, 19 November 2016 at 18:51:05 UTC, Charles Hixson wrote:

 ubyte[]    header()         property {    return fHead[4..$];    }
This method should do what you want. You are only returning a slice of the fHead array, so if the caller modifies the length it will only affect of the return value, and not the length of fHead itself.
It's worse than that, if they modify the length the array may be reallocated in RAM so that the pointers held by the containing class do not point to the changed values. (Read the header comments...it's not nice at all.) More, the class explicitly requires the array to be a particular length as it needs to fit into a spot in a binary file, so I really want to forbid any replacement of the array for any reason. The first four bytes are managed by the class and returned via alternate routines which do not allow the original values to be altered, and this is necessary. I could make them immutable, but actually what I do is return, e.g., an int of 256 * fHead[0] + fHead[1], which is the length of the header. It's an int to allow negative values to be returned in case of error. So what I'll probably eventually decide on is some variation of saveToHead, e.g.: bool saveToHeader (ubyte[] rec) { if (rec.length + 4 > dheader) return false; fHead[4..recLen + 4] = rec[0..$]; return true; } unless I can think of something better. Actually for this particular case that's not a bad approach, but for the larger problem it's a lousy kludge.
Nov 19 2016
parent reply ag0aep6g <anonymous example.com> writes:
On 11/19/2016 10:26 PM, Charles Hixson via Digitalmars-d-learn wrote:
 It's worse than that, if they modify the length the array may be
 reallocated in RAM so that the pointers held by the containing class do
 not point to the changed values.  (Read the header comments...it's not
 nice at all.)
Arguably, any D programmer must be aware that appending to a dynamic array potentially means making a copy of the data, and that changes to length are not visible to other views of the data. But it's an opportunity to mess up, for sure. You could return a wrapper around the array that supports editing the data but not changing the length or appending. Looks like std.experimental.typecons.Final [1] is supposed to be that wrapper. But in a little test I can still set the length. Not sure if that's a bug, or if Final has slightly different goals. [1] https://dlang.org/phobos/std_experimental_typecons.html#.Final
Nov 19 2016
parent reply Charles Hixson via Digitalmars-d-learn writes:
On 11/19/2016 01:50 PM, ag0aep6g via Digitalmars-d-learn wrote:
 On 11/19/2016 10:26 PM, Charles Hixson via Digitalmars-d-learn wrote:
 It's worse than that, if they modify the length the array may be
 reallocated in RAM so that the pointers held by the containing class do
 not point to the changed values.  (Read the header comments...it's not
 nice at all.)
Arguably, any D programmer must be aware that appending to a dynamic array potentially means making a copy of the data, and that changes to length are not visible to other views of the data. But it's an opportunity to mess up, for sure. You could return a wrapper around the array that supports editing the data but not changing the length or appending. Looks like std.experimental.typecons.Final [1] is supposed to be that wrapper. But in a little test I can still set the length. Not sure if that's a bug, or if Final has slightly different goals. [1] https://dlang.org/phobos/std_experimental_typecons.html#.Final
Yes. I was hoping someone would pop up with some syntax making the array, but not its contents, const or immutable, which I couldn't figure out how to do, and which is what I really hoped would be the answer, but it appears that this isn't part of the syntax. If the array is constant, so is it's contents. I really *can't* allow the length to be changed, and if the array is reallocated, it won't get saved. But the contents of the array are intended to be changed by the calling routines. Again, for this particular problem the kludge of copying the values into the array works fine (and is what I've decided to do), but that's not a good general solution to this kind of problem.
Nov 19 2016
parent reply ag0aep6g <anonymous example.com> writes:
On 11/20/2016 01:33 AM, Charles Hixson via Digitalmars-d-learn wrote:
 Yes.  I was hoping someone would pop up with some syntax making the
 array, but not its contents, const or immutable, which I couldn't figure
 out how to do, and which is what I really hoped would be the answer, but
 it appears that this isn't part of the syntax.
Yup, head const is not part of the language. You'd have to find a library solution or write something yourself.
 I really *can't* allow the length to be
 changed,
Your emphasis suggests that user could break things for your code. They can't. Any changes to the length will only affect the slice on the user's end. They can only fool themselves. That may be bad enough to warrant a more restricted return type, but for your code it's safe to return a plain dynamic array.
Nov 19 2016
parent reply Charles Hixson via Digitalmars-d-learn writes:
On 11/19/2016 05:52 PM, ag0aep6g via Digitalmars-d-learn wrote:
 On 11/20/2016 01:33 AM, Charles Hixson via Digitalmars-d-learn wrote:
 Yes.  I was hoping someone would pop up with some syntax making the
 array, but not its contents, const or immutable, which I couldn't figure
 out how to do, and which is what I really hoped would be the answer, but
 it appears that this isn't part of the syntax.
Yup, head const is not part of the language. You'd have to find a library solution or write something yourself.
 I really *can't* allow the length to be
 changed,
Your emphasis suggests that user could break things for your code. They can't. Any changes to the length will only affect the slice on the user's end. They can only fool themselves. That may be bad enough to warrant a more restricted return type, but for your code it's safe to return a plain dynamic array.
Whether you would call the change "break things for your code" might be dubious. It would be effectively broken, even if technically my code was doing the correct thing. But my code wouldn't be storing the data that needed storing, so effectively it would be broken. "Write something for yourself" is what I'd like to do, given that the language doesn't have that built-in support, but I can't see how to do it. I want to end up with a continuous array of ubytes of a given length with certain parts reserved to only be directly accessible to the defining class, and other parts accessible to the calling class(es). And the length of the array isn't known until run time. So I guess the only safe solution is to do an extra copy...which isn't a problem in this particular application as I only need to do it twice per file opening (once on opening, once on closing), but for other applications would be a real drag.
Nov 19 2016
parent reply ag0aep6g <anonymous example.com> writes:
On 11/20/2016 04:34 AM, Charles Hixson via Digitalmars-d-learn wrote:
 Whether you would call the change "break things for your code" might be
 dubious.  It would be effectively broken, even if technically my code
 was doing the correct thing.  But my code wouldn't be storing the data
 that needed storing, so effectively it would be broken.
I don't see how it's dubious. It's an error by the user. When users are given a dynamic array (and not by reference), they cannot expect that your code sees changes to length. That's just not how arrays work. When a user has that wrong expectation, and writes wrong code because of it, then it's arguably their own fault. However, if you want you can hold their hand a bit and make the mistake less likely.
 "Write something
 for yourself" is what I'd like to do, given that the language doesn't
 have that built-in support, but I can't see how to do it.
Wrap the array in a struct that has indexing, but doesn't allow setting the length or appending. Here's a quick prototype: ---- struct ConstLengthArray(E) { private E[] data; this(E[] arr) { this.data = arr; } ref inout(E) opIndex(size_t i) inout { return data[i]; } property size_t length() const { return data.length; } } void main() { auto cla = ConstLengthArray!ubyte([1, 2, 3, 4, 5]); /* Mutating elements is allowed: */ cla[0] = 10; assert(cla[0] == 10); /* No setting length, no appending: */ static assert(!__traits(compiles, cla.length = 3)); static assert(!__traits(compiles, cla ~= 6)); } ---- You might want to add support for slicing, concatenation, etc. Maybe allow implicit conversion to const(E[]), though that would also allow conversion to const(E)[] and that has a settable length again.
Nov 20 2016
next sibling parent reply Charles Hixson via Digitalmars-d-learn writes:
On 11/20/2016 03:42 AM, ag0aep6g via Digitalmars-d-learn wrote:
 On 11/20/2016 04:34 AM, Charles Hixson via Digitalmars-d-learn wrote:
 Whether you would call the change "break things for your code" might be
 dubious.  It would be effectively broken, even if technically my code
 was doing the correct thing.  But my code wouldn't be storing the data
 that needed storing, so effectively it would be broken.
I don't see how it's dubious. It's an error by the user. When users are given a dynamic array (and not by reference), they cannot expect that your code sees changes to length. That's just not how arrays work. When a user has that wrong expectation, and writes wrong code because of it, then it's arguably their own fault. However, if you want you can hold their hand a bit and make the mistake less likely.
 "Write something
 for yourself" is what I'd like to do, given that the language doesn't
 have that built-in support, but I can't see how to do it.
Wrap the array in a struct that has indexing, but doesn't allow setting the length or appending. Here's a quick prototype: ---- struct ConstLengthArray(E) { private E[] data; this(E[] arr) { this.data = arr; } ref inout(E) opIndex(size_t i) inout { return data[i]; } property size_t length() const { return data.length; } } void main() { auto cla = ConstLengthArray!ubyte([1, 2, 3, 4, 5]); /* Mutating elements is allowed: */ cla[0] = 10; assert(cla[0] == 10); /* No setting length, no appending: */ static assert(!__traits(compiles, cla.length = 3)); static assert(!__traits(compiles, cla ~= 6)); } ---- You might want to add support for slicing, concatenation, etc. Maybe allow implicit conversion to const(E[]), though that would also allow conversion to const(E)[] and that has a settable length again.
Well, that precise approach wouldn't work. (The traits aren't a part of the sturct, e.g.), but returning a struct (or perhaps a class) rather than an actual array has promise. It could even allow separate callers to have separate views of the data based on some sort of registered key, which they could share on an as-needed basis. That's too much overhead work for this project, but has promise for the more general problem.
Nov 20 2016
parent ag0aep6g <anonymous example.com> writes:
On 11/20/2016 08:30 PM, Charles Hixson via Digitalmars-d-learn wrote:
 Well, that precise approach wouldn't work.  (The traits aren't a part of
 the sturct, e.g.),
What do you mean by "traits"?
Nov 20 2016
prev sibling parent reply Charles Hixson via Digitalmars-d-learn writes:
On 11/20/2016 03:42 AM, ag0aep6g via Digitalmars-d-learn wrote:
 On 11/20/2016 04:34 AM, Charles Hixson via Digitalmars-d-learn wrote:
 Whether you would call the change "break things for your code" might be
 dubious.  It would be effectively broken, even if technically my code
 was doing the correct thing.  But my code wouldn't be storing the data
 that needed storing, so effectively it would be broken.
I don't see how it's dubious. It's an error by the user. When users are given a dynamic array (and not by reference), they cannot expect that your code sees changes to length. That's just not how arrays work. When a user has that wrong expectation, and writes wrong code because of it, then it's arguably their own fault. However, if you want you can hold their hand a bit and make the mistake less likely.
 "Write something
 for yourself" is what I'd like to do, given that the language doesn't
 have that built-in support, but I can't see how to do it.
Wrap the array in a struct that has indexing, but doesn't allow setting the length or appending. Here's a quick prototype: ---- struct ConstLengthArray(E) { private E[] data; this(E[] arr) { this.data = arr; } ref inout(E) opIndex(size_t i) inout { return data[i]; } property size_t length() const { return data.length; } } void main() { auto cla = ConstLengthArray!ubyte([1, 2, 3, 4, 5]); /* Mutating elements is allowed: */ cla[0] = 10; assert(cla[0] == 10); /* No setting length, no appending: */ static assert(!__traits(compiles, cla.length = 3)); static assert(!__traits(compiles, cla ~= 6)); } ---- You might want to add support for slicing, concatenation, etc. Maybe allow implicit conversion to const(E[]), though that would also allow conversion to const(E)[] and that has a settable length again
Thinking it over a bit more, the item returned would need to be a struct, but the struct wouldn't contain the array, it would just contain a reference to the array and a start and end offset. The array would need to live somewhere else, in the class (or struct...but class is better as you don't want the array evaporating by accident) that created the returned value. This means you are dealing with multiple levels of indirection, so it's costly compared to array access, but cheap compared to lots of large copies. So the returned value would be something like: struct { private: /** this is a reference to the data that lives elsewhere. It should be a pointer, but I don't like the syntax*/ ubyte[] data; int start, end; /// first and last valid indicies into data public: this (ubyte[] data, int start, int end) { this.data = data; this.start = start; this.end = end;} ... // various routines to access the data, but to limit the access to the spec'd range, and // nothing to change the bounds } Which is really the answer you already posted, but just a bit more detail on the construct, and what it meant. (Yeah, I could allow types other than ubyte as the base case, but I don't want to. I'm thinking of this mainly as a means of sharing a buffer between applications where different parts have exclusive access to different parts of the buffer, and where the buffer will be written to a file with a single fwrite, or since the underlying storage will be an array, it could even be rawwrite). I don't want to specify any more than I must about how the methods calling this will format the storage, and this means that those with access to different parts may well use different collections of types, but all types eventually map down to ubytes (or bytes), so ubytes is the common ground. Perhaps I'll need to write inbuffer,outbuffer methods/wrappings, but that's far in the future. P.S.: The traits that I mentioned previously were those given by: static assert(!__traits(compiles, cla.length = 3)); static assert(!__traits(compiles, cla ~= 6)); in your main routine. I assumed that they were validity tests. I don't understand why they were static. I've never happened to use static asserts, but I would assume that when they ran cla wouldn't be defined. N.B.: Even this much is just thinking about design, not something I'll actually do at the moment. But this is a problem I keep coming up against, so a bit of thought now seemed a good idea.
Nov 20 2016
parent reply ag0aep6g <anonymous example.com> writes:
On 11/20/2016 09:09 PM, Charles Hixson via Digitalmars-d-learn wrote:
 Thinking it over a bit more, the item returned would need to be a
 struct, but the struct wouldn't contain the array, it would just contain
 a reference to the array and a start and end offset.  The array would
 need to live somewhere else, in the class (or struct...but class is
 better as you don't want the array evaporating by accident) that created
 the returned value.  This means you are dealing with multiple levels of
 indirection, so it's costly compared to array access, but cheap compared
 to lots of large copies.  So the returned value would be something like:
 struct
 {
     private:
     /** this is a reference to the data that lives elsewhere.  It should
 be a pointer, but I don't like the syntax*/
     ubyte[]  data;
     int    start, end;    ///    first and last valid indicies into data
     public:
     this (ubyte[] data, int start, int end)
     {    this.data = data; this.start = start; this.end = end;}
     ...
     // various routines to access the data, but to limit the access to
 the spec'd range, and
     // nothing to change the bounds
 }
Instead of extra 'start' and 'end' fields you can slice the array. A dynamic array already is just a reference coupled with a length, i.e. a pointer with restricted indexing. So you can slice the original array with your offsets and create the struct with that slice. I feel like there is a misunderstanding somewhere, but I'm not sure on whose side. As far as I can tell, your understanding of dynamic arrays may be lacking, or maybe I don't understand what you're getting at.
 Which is really the answer you already posted, but just a bit more
 detail on the construct, and what it meant.  (Yeah, I could allow types
 other than ubyte as the base case, but I don't want to.  I'm thinking of
 this mainly as a means of sharing a buffer between applications where
 different parts have exclusive access to different parts of the buffer,
 and where the buffer will be written to a file with a single fwrite, or
 since the underlying storage will be an array, it could even be
 rawwrite).  I don't want to specify any more than I must about how the
 methods calling this will format the storage, and this means that those
 with access to different parts may well use different collections of
 types, but all types eventually map down to ubytes (or bytes), so ubytes
 is the common ground.  Perhaps I'll need to write inbuffer,outbuffer
 methods/wrappings, but that's far in the future.
Sure, go with a specialized type instead of a template, if that makes more sense for your use case. As far as I see, the concept is independent of the element type, so it seemed natural to make it a template, but a special type is perfectly fine and probably has less pitfalls.
 P.S.:  The traits that I mentioned previously were those given by:
     static assert(!__traits(compiles, cla.length = 3));
     static assert(!__traits(compiles, cla ~= 6));
 in your main routine.  I assumed that they were validity tests.  I don't
 understand why they were static.  I've never happened to use static
 asserts, but I would assume that when they ran cla wouldn't be defined.
Those are tests to ensure that cla's length cannot be set and that it cannot be appended to. The asserts check that the code does not compile, i.e. that you cannot do those things. They're static simply because they can be. The code inside is not executed at run-time.
Nov 20 2016
parent Charles Hixson via Digitalmars-d-learn writes:
On 11/20/2016 12:41 PM, ag0aep6g via Digitalmars-d-learn wrote:
 On 11/20/2016 09:09 PM, Charles Hixson via Digitalmars-d-learn wrote:
 Thinking it over a bit more, the item returned would need to be a
 struct, but the struct wouldn't contain the array, it would just contain
 a reference to the array and a start and end offset.  The array would
 need to live somewhere else, in the class (or struct...but class is
 better as you don't want the array evaporating by accident) that created
 the returned value.  This means you are dealing with multiple levels of
 indirection, so it's costly compared to array access, but cheap compared
 to lots of large copies.  So the returned value would be something like:
 struct
 {
     private:
     /** this is a reference to the data that lives elsewhere. It should
 be a pointer, but I don't like the syntax*/
     ubyte[]  data;
     int    start, end;    ///    first and last valid indicies into data
     public:
     this (ubyte[] data, int start, int end)
     {    this.data = data; this.start = start; this.end = end;}
     ...
     // various routines to access the data, but to limit the access to
 the spec'd range, and
     // nothing to change the bounds
 }
Instead of extra 'start' and 'end' fields you can slice the array. A dynamic array already is just a reference coupled with a length, i.e. a pointer with restricted indexing. So you can slice the original array with your offsets and create the struct with that slice. I feel like there is a misunderstanding somewhere, but I'm not sure on whose side. As far as I can tell, your understanding of dynamic arrays may be lacking, or maybe I don't understand what you're getting at.
While you are definitely correct about slices, I really feel more comfortable with the start and end fields. I keep being afraid some smart optimizer is going to decide I don't really need the entire array. This is probably unjustified, but I find start and end fields easier to think about. I don't think either of us misunderstands what's going on, we just feel differently about methods that are approximately equivalent. Slices would probably be marginally more efficient (well, certainly so as you'd need two less fields), so if this were a public library there would be an excellent argument for doing it your way. I keep trying to think of a reason that start and end fields are better, and failing. All I've got is that I feel more comfortable with them.
 Which is really the answer you already posted, but just a bit more
 detail on the construct, and what it meant.  (Yeah, I could allow types
 other than ubyte as the base case, but I don't want to.  I'm thinking of
 this mainly as a means of sharing a buffer between applications where
 different parts have exclusive access to different parts of the buffer,
 and where the buffer will be written to a file with a single fwrite, or
 since the underlying storage will be an array, it could even be
 rawwrite).  I don't want to specify any more than I must about how the
 methods calling this will format the storage, and this means that those
 with access to different parts may well use different collections of
 types, but all types eventually map down to ubytes (or bytes), so ubytes
 is the common ground.  Perhaps I'll need to write inbuffer,outbuffer
 methods/wrappings, but that's far in the future.
Sure, go with a specialized type instead of a template, if that makes more sense for your use case. As far as I see, the concept is independent of the element type, so it seemed natural to make it a template, but a special type is perfectly fine and probably has less pitfalls.
 P.S.:  The traits that I mentioned previously were those given by:
     static assert(!__traits(compiles, cla.length = 3));
     static assert(!__traits(compiles, cla ~= 6));
 in your main routine.  I assumed that they were validity tests. I don't
 understand why they were static.  I've never happened to use static
 asserts, but I would assume that when they ran cla wouldn't be defined.
Those are tests to ensure that cla's length cannot be set and that it cannot be appended to. The asserts check that the code does not compile, i.e. that you cannot do those things. They're static simply because they can be. The code inside is not executed at run-time.
Nov 21 2016