digitalmars.D - Keeping references to dynamic arrays
- brad (42/42) Jul 23 2004 Hi guys, I've just started playing with D again. I'm having a little
- Russ Lewis (17/65) Jul 23 2004 You cat keep pointers to dynamic arrays just like anything else. I
- Brad Beveridge (14/82) Jul 23 2004 I'd seen that, but got tripped up on the nasty syntax.
- Sean Kelly (4/7) Jul 23 2004 Pretty standard Copy On Write behavior. I haven't decided if I really l...
- Brad Beveridge (56/66) Jul 23 2004 But it isn't actually copy on write behaviour, because you can slice int...
- J Anderson (6/78) Jul 23 2004 I think D arrays should be as efficient as possible. Copy on write
- Brad Beveridge (14/111) Jul 23 2004 I'm not advocating copy on write, I agree that behaviour like that is
- J Anderson (6/21) Jul 23 2004 Right-t-o then. I guess even that could be inefficient because then you...
- Brad Beveridge (38/41) Jul 24 2004 Wrapping a dynamic array with something as simple as
- J Anderson (7/54) Jul 24 2004 Obviously you would make the data private and use operators and methods
- Arcane Jill (16/25) Jul 24 2004 Now here's the part that I don't understand - why would you want to have...
- J Anderson (5/35) Jul 24 2004 Your right and wrong. Your wrong that your talking rubbish again, the
- Sean Kelly (4/12) Jul 24 2004 I don't know. Perhaps it's my experience with iterators and the STL, bu...
- Derek (39/83) Jul 23 2004 You are thinking correctly. Unfortunately D does subtly change the
-
Brad Beveridge
(11/16)
Jul 23 2004
Hi guys, I've just started playing with D again. I'm having a little trouble getting my head around Dynamic arrays, especially when the GC gets involved. My basic usage scenario is this, I have multiple classes that want to reference the same large dynamic array. From the small test cases I've written, if the array is resized and needs to be realloc'd to a different place then they don't keep their reference. So... int [] a1; a1.length = 1; int [] a2 = a1; a1.length = 300; // or someother value that moves the array a2[0] = 5; a1[0] = 1; // this is the point where I really want a1[0] == a2[0] Is there a way to do this? It seems to me that this is quite a subtle area, and may possibly introduce bugs. From what I can tell it has the following effects (maybe) - It is never safe to pass C a dynamic array if the library is going to keep that array around. (Note, are there any restrictions on what the GC can do with arrays, ie is it free to move arrays at will for heap compation?) - Even if the C library isn't going to keep the array around, multi-threaded D apps may have bugs due to. 1. int [] a2 = a1[0...4] 2. -- another thread resizes a1 & causes a move 3. a2[0] = 5 // ie, expecting a2 to be an alias of a1 - Does this mean that to have arrays that do allow resizing and move correctly, we need to wrap the array in a class? It just feels to me like you don't really know where you stand with D dynamic arrays. You can't trust to alias them with a pointer, or with an array slice. The only real way to know you are writing to element x in the array is to dereference the original array at position x. My intuitive way that I thought this would work was that all slices or dynamic array assignments would essentially act like a reference to the actual array, in much the same way as assigning objects is by reference. To my point of view this would be much more intuitive and consistant than the current way I think it works. Could I please be enlightened on how all this actually works - rather than how I think it may work? :) Cheers Brad
Jul 23 2004
You cat keep pointers to dynamic arrays just like anything else. I always get mixed up with the syntax, so I looked it up on http://digitalmars.com/d/arrays.html. It says that a pointer to a dynamic array looks like this: int[]* e; You have two choices how to allocate the array. You can declare it once, and then assign pointers to it: int[] a; int[]* b = &a; int[]* c = &a; int[]* d = &a; Or you should be able to allocate it with new. I'm not 100% sure of the syntax, though. I would try this and see if it works: int[]* e = new int[]; e.length = 1; e.length = 300; brad wrote:Hi guys, I've just started playing with D again. I'm having a little trouble getting my head around Dynamic arrays, especially when the GC gets involved. My basic usage scenario is this, I have multiple classes that want to reference the same large dynamic array. From the small test cases I've written, if the array is resized and needs to be realloc'd to a different place then they don't keep their reference. So... int [] a1; a1.length = 1; int [] a2 = a1; a1.length = 300; // or someother value that moves the array a2[0] = 5; a1[0] = 1; // this is the point where I really want a1[0] == a2[0] Is there a way to do this? It seems to me that this is quite a subtle area, and may possibly introduce bugs. From what I can tell it has the following effects (maybe) - It is never safe to pass C a dynamic array if the library is going to keep that array around. (Note, are there any restrictions on what the GC can do with arrays, ie is it free to move arrays at will for heap compation?) - Even if the C library isn't going to keep the array around, multi-threaded D apps may have bugs due to. 1. int [] a2 = a1[0...4] 2. -- another thread resizes a1 & causes a move 3. a2[0] = 5 // ie, expecting a2 to be an alias of a1 - Does this mean that to have arrays that do allow resizing and move correctly, we need to wrap the array in a class? It just feels to me like you don't really know where you stand with D dynamic arrays. You can't trust to alias them with a pointer, or with an array slice. The only real way to know you are writing to element x in the array is to dereference the original array at position x. My intuitive way that I thought this would work was that all slices or dynamic array assignments would essentially act like a reference to the actual array, in much the same way as assigning objects is by reference. To my point of view this would be much more intuitive and consistant than the current way I think it works. Could I please be enlightened on how all this actually works - rather than how I think it may work? :) Cheers Brad
Jul 23 2004
I'd seen that, but got tripped up on the nasty syntax. int [] a1; a1.length = 1; a1[0] = 5; int []* a2; a2 = &a1; *a2[0] = 0; printf("%i %i\n", a1[0], *a2[0]); This works as expected, but I think the *a2[0] syntax is a bit of a dog. Also, what about the whole "sometimes a=b is an alias, sometimes (when b is resized), it is a copy?" Cheers Brad Russ Lewis wrote:You cat keep pointers to dynamic arrays just like anything else. I always get mixed up with the syntax, so I looked it up on http://digitalmars.com/d/arrays.html. It says that a pointer to a dynamic array looks like this: int[]* e; You have two choices how to allocate the array. You can declare it once, and then assign pointers to it: int[] a; int[]* b = &a; int[]* c = &a; int[]* d = &a; Or you should be able to allocate it with new. I'm not 100% sure of the syntax, though. I would try this and see if it works: int[]* e = new int[]; e.length = 1; e.length = 300; brad wrote:Hi guys, I've just started playing with D again. I'm having a little trouble getting my head around Dynamic arrays, especially when the GC gets involved. My basic usage scenario is this, I have multiple classes that want to reference the same large dynamic array. From the small test cases I've written, if the array is resized and needs to be realloc'd to a different place then they don't keep their reference. So... int [] a1; a1.length = 1; int [] a2 = a1; a1.length = 300; // or someother value that moves the array a2[0] = 5; a1[0] = 1; // this is the point where I really want a1[0] == a2[0] Is there a way to do this? It seems to me that this is quite a subtle area, and may possibly introduce bugs. From what I can tell it has the following effects (maybe) - It is never safe to pass C a dynamic array if the library is going to keep that array around. (Note, are there any restrictions on what the GC can do with arrays, ie is it free to move arrays at will for heap compation?) - Even if the C library isn't going to keep the array around, multi-threaded D apps may have bugs due to. 1. int [] a2 = a1[0...4] 2. -- another thread resizes a1 & causes a move 3. a2[0] = 5 // ie, expecting a2 to be an alias of a1 - Does this mean that to have arrays that do allow resizing and move correctly, we need to wrap the array in a class? It just feels to me like you don't really know where you stand with D dynamic arrays. You can't trust to alias them with a pointer, or with an array slice. The only real way to know you are writing to element x in the array is to dereference the original array at position x. My intuitive way that I thought this would work was that all slices or dynamic array assignments would essentially act like a reference to the actual array, in much the same way as assigning objects is by reference. To my point of view this would be much more intuitive and consistant than the current way I think it works. Could I please be enlightened on how all this actually works - rather than how I think it may work? :) Cheers Brad
Jul 23 2004
In article <cds4tg$2qrp$1 digitaldaemon.com>, Brad Beveridge says...This works as expected, but I think the *a2[0] syntax is a bit of a dog. Also, what about the whole "sometimes a=b is an alias, sometimes (when b is resized), it is a copy?"Pretty standard Copy On Write behavior. I haven't decided if I really like it yet, but it is a good solution for most cases. Sean
Jul 23 2004
But it isn't actually copy on write behaviour, because you can slice into an array and use that slice as a window into the main array. So what it really is, is copy on resize. The rest of the time it is by reference. I've got no problems with copy on write, the thing that really irks me about this is the inconsistancy of it all - sometimes slices and copies can be used as methods to manipulate the main array, sometimes they can't. And the problem is that you can't nessecarily be sure what you are getting. What I think would make everything consistant is 1 - Slicing and assignment are always windows into the same array, ie int [] dyn; int [] dyn2; dyn2 = dyn; dyn.length = 45; int [] slice = dyn[0..2]; All of the above point to the same dynamic array (dyn), and do so even if dyn is resized and needs to be moved and reallocated. If dyn is resized to be smaller than one of its slices then accessing that slice causes an out of bounds exception. 2 - To get a copy of an array use the explicit .dup property. I guess the thing that bugs me the most is that class objects in D behave with reference behaviour all the time, and the GC is free to move them as it likes - which is essentially what is going on here. But dynamic array assignment of the type above behaves in a completely different manner. And the manner is a bit random. Imagine what would happen if you had a thread resizing a dynamic array, and another thread trying to keep track of it? Cheers Brad <code> void printa(char[] name, int [] a) { printf("%.*s : ", name); foreach (int i; a) { printf ("%i ", i); } printf("\n"); } int main(char [][] a) { int [] dyn; dyn.length = 4; for (int i = 0; i < dyn.length; i++) dyn[i] = i; printa("dyn", dyn); int [] slice; slice = dyn[1..3]; printa("slice", slice); slice[0] = 50; printa("dyn", dyn); printa("slice", slice); return 0; } </code> Sean Kelly wrote:In article <cds4tg$2qrp$1 digitaldaemon.com>, Brad Beveridge says...This works as expected, but I think the *a2[0] syntax is a bit of a dog. Also, what about the whole "sometimes a=b is an alias, sometimes (when b is resized), it is a copy?"Pretty standard Copy On Write behavior. I haven't decided if I really like it yet, but it is a good solution for most cases. Sean
Jul 23 2004
I think D arrays should be as efficient as possible. Copy on write would make it not so. If you need such high-level functionality it should be part of the standard lib, not D itself. Brad Beveridge wrote:But it isn't actually copy on write behaviour, because you can slice into an array and use that slice as a window into the main array. So what it really is, is copy on resize. The rest of the time it is by reference. I've got no problems with copy on write, the thing that really irks me about this is the inconsistancy of it all - sometimes slices and copies can be used as methods to manipulate the main array, sometimes they can't. And the problem is that you can't nessecarily be sure what you are getting. What I think would make everything consistant is 1 - Slicing and assignment are always windows into the same array, ie int [] dyn; int [] dyn2; dyn2 = dyn; dyn.length = 45; int [] slice = dyn[0..2]; All of the above point to the same dynamic array (dyn), and do so even if dyn is resized and needs to be moved and reallocated. If dyn is resized to be smaller than one of its slices then accessing that slice causes an out of bounds exception. 2 - To get a copy of an array use the explicit .dup property. I guess the thing that bugs me the most is that class objects in D behave with reference behaviour all the time, and the GC is free to move them as it likes - which is essentially what is going on here. But dynamic array assignment of the type above behaves in a completely different manner. And the manner is a bit random. Imagine what would happen if you had a thread resizing a dynamic array, and another thread trying to keep track of it? Cheers Brad <code> void printa(char[] name, int [] a) { printf("%.*s : ", name); foreach (int i; a) { printf ("%i ", i); } printf("\n"); } int main(char [][] a) { int [] dyn; dyn.length = 4; for (int i = 0; i < dyn.length; i++) dyn[i] = i; printa("dyn", dyn); int [] slice; slice = dyn[1..3]; printa("slice", slice); slice[0] = 50; printa("dyn", dyn); printa("slice", slice); return 0; } </code> Sean Kelly wrote:-- -Anderson: http://badmama.com.au/~anderson/In article <cds4tg$2qrp$1 digitaldaemon.com>, Brad Beveridge says...This works as expected, but I think the *a2[0] syntax is a bit of a dog. Also, what about the whole "sometimes a=b is an alias, sometimes (when b is resized), it is a copy?"Pretty standard Copy On Write behavior. I haven't decided if I really like it yet, but it is a good solution for most cases. Sean
Jul 23 2004
I'm not advocating copy on write, I agree that behaviour like that is perhaps too inefficient to be part of the language. All I am saying is that if I do int [] orig; int [] other; other = orig; orig.length = 20; Then other and orig will always point to the same array, no matter if the array needs to be moved in order to satisfy the realloc. This already (I think) happens with D object references, the GC is free to move the location of the object memory, but all references are updated. Why not the same with arrays? Who cards where the physical memory is, as long as the handles to the array that I hold are correct. BradI think D arrays should be as efficient as possible. Copy on write would make it not so. If you need such high-level functionality it should be part of the standard lib, not D itself. Brad Beveridge wrote:But it isn't actually copy on write behaviour, because you can slice into an array and use that slice as a window into the main array. So what it really is, is copy on resize. The rest of the time it is by reference. I've got no problems with copy on write, the thing that really irks me about this is the inconsistancy of it all - sometimes slices and copies can be used as methods to manipulate the main array, sometimes they can't. And the problem is that you can't nessecarily be sure what you are getting. What I think would make everything consistant is 1 - Slicing and assignment are always windows into the same array, ie int [] dyn; int [] dyn2; dyn2 = dyn; dyn.length = 45; int [] slice = dyn[0..2]; All of the above point to the same dynamic array (dyn), and do so even if dyn is resized and needs to be moved and reallocated. If dyn is resized to be smaller than one of its slices then accessing that slice causes an out of bounds exception. 2 - To get a copy of an array use the explicit .dup property. I guess the thing that bugs me the most is that class objects in D behave with reference behaviour all the time, and the GC is free to move them as it likes - which is essentially what is going on here. But dynamic array assignment of the type above behaves in a completely different manner. And the manner is a bit random. Imagine what would happen if you had a thread resizing a dynamic array, and another thread trying to keep track of it? Cheers Brad <code> void printa(char[] name, int [] a) { printf("%.*s : ", name); foreach (int i; a) { printf ("%i ", i); } printf("\n"); } int main(char [][] a) { int [] dyn; dyn.length = 4; for (int i = 0; i < dyn.length; i++) dyn[i] = i; printa("dyn", dyn); int [] slice; slice = dyn[1..3]; printa("slice", slice); slice[0] = 50; printa("dyn", dyn); printa("slice", slice); return 0; } </code> Sean Kelly wrote:In article <cds4tg$2qrp$1 digitaldaemon.com>, Brad Beveridge says...This works as expected, but I think the *a2[0] syntax is a bit of a dog. Also, what about the whole "sometimes a=b is an alias, sometimes (when b is resized), it is a copy?"Pretty standard Copy On Write behavior. I haven't decided if I really like it yet, but it is a good solution for most cases. Sean
Jul 23 2004
Brad Beveridge wrote:I'm not advocating copy on write, I agree that behaviour like that is perhaps too inefficient to be part of the language. All I am saying is that if I do int [] orig; int [] other; other = orig; orig.length = 20; Then other and orig will always point to the same array, no matter if the array needs to be moved in order to satisfy the realloc. This already (I think) happens with D object references, the GC is free to move the location of the object memory, but all references are updated. Why not the same with arrays? Who cards where the physical memory is, as long as the handles to the array that I hold are correct. BradRight-t-o then. I guess even that could be inefficient because then you need to keep track of which arrays are which. If you wrap the array in a class then you'd have no problems. -- -Anderson: http://badmama.com.au/~anderson/
Jul 23 2004
Wrapping a dynamic array with something as simple as class Darray(T) { T [] data; } lets you do Darray!(int) a = new Darray!(int); Darray!(int) b = a; Which works as expected, ie, resizing 'a' still gives leaves you with a.data == b.data. OK, so there is a work around. It is debatable if this is more or less ugly than the pointer to array syntax. (int [] *a, then using *a[0]) Can somebody please explain the rational for the current system? As I see it, the current system has no pros, only cons. Ie currently: - Assignment of dynamic arrays is ambiguous, int [] a = b may be either a reference to "b", or if a gets resized then b may be a copy. - The behaviour is inconsistant with the rest of D, where classes are always by reference, unless explicitly copied. I would argue that the following should be true: int [] a; // create a reference to a length 0 dynamic array; int [] b; // as above a = b; // point a to the same underlying array as b - ie, just as object assignment sematics work a.length = 100; // resize the array that a (and b) points to b[0] = 50; // set element 0 of b to 50, a[0] == b[0] So a and b reference the same underlying array. This way the dynamic array semantics are identical to the object reference semantics, and we remove the ambiguity that the current system has. You may argue that having to update all the references to the underlying array may be expensive, however: 1 - This already happens for object references if the GC moves the object, and nobody complains 2 - Updating the references is probably not overly significant compared to resizing the array. Thoughts? Cheers Brad J Anderson wrote:Right-t-o then. I guess even that could be inefficient because then you need to keep track of which arrays are which. If you wrap the array in a class then you'd have no problems.
Jul 24 2004
Brad Beveridge wrote:Wrapping a dynamic array with something as simple as class Darray(T) { T [] data; } lets you do Darray!(int) a = new Darray!(int); Darray!(int) b = a; Which works as expected, ie, resizing 'a' still gives leaves you with a.data == b.data.Obviously you would make the data private and use operators and methods to deal with the array.OK, so there is a work around. It is debatable if this is more or less ugly than the pointer to array syntax. (int [] *a, then using *a[0]) Can somebody please explain the rational for the current system? As I see it, the current system has no pros, only cons. Ie currently: - Assignment of dynamic arrays is ambiguous, int [] a = b may be either a reference to "b", or if a gets resized then b may be a copy. - The behaviour is inconsistant with the rest of D, where classes are always by reference, unless explicitly copied. I would argue that the following should be true: int [] a; // create a reference to a length 0 dynamic array; int [] b; // as above a = b; // point a to the same underlying array as b - ie, just as object assignment sematics work a.length = 100; // resize the array that a (and b) points to b[0] = 50; // set element 0 of b to 50, a[0] == b[0] So a and b reference the same underlying array. This way the dynamic array semantics are identical to the object reference semantics, and we remove the ambiguity that the current system has. You may argue that having to update all the references to the underlying array may be expensive, however: 1 - This already happens for object references if the GC moves the object, and nobody complains 2 - Updating the references is probably not overly significant compared to resizing the array. Thoughts? Cheers BradIf it was done the way you suggest, you need another level of indirection which requires more memory.J Anderson wrote:-- -Anderson: http://badmama.com.au/~anderson/Right-t-o then. I guess even that could be inefficient because then you need to keep track of which arrays are which. If you wrap the array in a class then you'd have no problems.
Jul 24 2004
In article <cdsvhu$hc9$1 digitaldaemon.com>, Brad Beveridge says...I'm not advocating copy on write, I agree that behaviour like that is perhaps too inefficient to be part of the language. All I am saying is that if I do int [] orig; int [] other; other = orig; orig.length = 20; Then other and orig will always point to the same array, no matter if the array needs to be moved in order to satisfy the realloc.Now here's the part that I don't understand - why would you want to have two variable names (orig and other in the above example) /in the same scope/ to be references to the same thing. If they're in the same scope, why not just refer to it by the same name consistently throughout. If they're in /different/ scope, then the keyword inout does just fine, doesn't it? Arcane Jill, probably talking rubbish again.
Jul 24 2004
Arcane Jill wrote:In article <cdsvhu$hc9$1 digitaldaemon.com>, Brad Beveridge says...Your right and wrong. Your wrong that your talking rubbish again, the rest is right. I agree aliasing is a bad thing. -- -Anderson: http://badmama.com.au/~anderson/I'm not advocating copy on write, I agree that behaviour like that is perhaps too inefficient to be part of the language. All I am saying is that if I do int [] orig; int [] other; other = orig; orig.length = 20; Then other and orig will always point to the same array, no matter if the array needs to be moved in order to satisfy the realloc.Now here's the part that I don't understand - why would you want to have two variable names (orig and other in the above example) /in the same scope/ to be references to the same thing. If they're in the same scope, why not just refer to it by the same name consistently throughout. If they're in /different/ scope, then the keyword inout does just fine, doesn't it? Arcane Jill, probably talking rubbish again.
Jul 24 2004
In article <cdsrhf$ec6$1 digitaldaemon.com>, Brad Beveridge says...But it isn't actually copy on write behaviour, because you can slice into an array and use that slice as a window into the main array. So what it really is, is copy on resize. The rest of the time it is by reference. I've got no problems with copy on write, the thing that really irks me about this is the inconsistancy of it all - sometimes slices and copies can be used as methods to manipulate the main array, sometimes they can't. And the problem is that you can't nessecarily be sure what you are getting.I don't know. Perhaps it's my experience with iterators and the STL, but I find the rules pretty straightforward. Sean
Jul 24 2004
On Fri, 23 Jul 2004 21:40:29 +0000 (UTC), brad wrote:Hi guys, I've just started playing with D again. I'm having a little trouble getting my head around Dynamic arrays, especially when the GC gets involved. My basic usage scenario is this, I have multiple classes that want to reference the same large dynamic array. From the small test cases I've written, if the array is resized and needs to be realloc'd to a different place then they don't keep their reference. So... int [] a1; a1.length = 1; int [] a2 = a1; a1.length = 300; // or someother value that moves the array a2[0] = 5; a1[0] = 1; // this is the point where I really want a1[0] == a2[0] Is there a way to do this? It seems to me that this is quite a subtle area, and may possibly introduce bugs. From what I can tell it has the following effects (maybe) - It is never safe to pass C a dynamic array if the library is going to keep that array around. (Note, are there any restrictions on what the GC can do with arrays, ie is it free to move arrays at will for heap compation?) - Even if the C library isn't going to keep the array around, multi-threaded D apps may have bugs due to. 1. int [] a2 = a1[0...4] 2. -- another thread resizes a1 & causes a move 3. a2[0] = 5 // ie, expecting a2 to be an alias of a1 - Does this mean that to have arrays that do allow resizing and move correctly, we need to wrap the array in a class? It just feels to me like you don't really know where you stand with D dynamic arrays. You can't trust to alias them with a pointer, or with an array slice. The only real way to know you are writing to element x in the array is to dereference the original array at position x. My intuitive way that I thought this would work was that all slices or dynamic array assignments would essentially act like a reference to the actual array, in much the same way as assigning objects is by reference. To my point of view this would be much more intuitive and consistant than the current way I think it works. Could I please be enlightened on how all this actually works - rather than how I think it may work? :)You are thinking correctly. Unfortunately D does subtly change the semantics of slicing. Here is some code to prove it. <code> void pc(char[] x) { foreach(char c;x) { if (c != '\0') printf("%c", c); } } void main() { char[] a; char[] b; // Give it something to work with a = "1234567890"; // Set 'b' to point into a subset of 'a' b = a[2..7]; pc("a='" ~a~"' b='"~b~"'\n"); // Now prove it by chaging 'a' to see if 'b' also changes. a[5] = 'a'; pc("a='" ~a~"' b='"~b~"'\n"); // resize 'a' to force it to move. a.length=10000; pc("a='" ~a~"' b='"~b~"'\n"); // Change 'a' again to see if 'b' still changes. a[4] = 'a'; pc("a='" ~a~"' b='"~b~"'\n"); // Ahhh! But 'b' is no longer pointing into a subset of 'a'. } </code> -- Derek Melbourne, Australia
Jul 23 2004
Derek wrote: <snip>You are thinking correctly. Unfortunately D does subtly change the semantics of slicing. Here is some code to prove it. <code><snip> OK, well at least I "get" the way it works. However, I'm still not overly pleased that it works this way :) I am thinking that for the best consistancy I should probably roll my own template array (or does DTL have one?) that is always by reference, unless .dup is used - an preserves aliasing when resized. Does this sound possible? Cheers Brad
Jul 23 2004