digitalmars.D - Re: Memory allocation in D (noob question)
- mandel (11/11) Nov 30 2007 It probably is a noob question,
- Jarrett Billingsley (7/18) Nov 30 2007 What?
- Janice Caron (5/6) Nov 30 2007 Raw pointers are discouraged in modern languages such as D. They are
- Robert Fraser (5/17) Nov 30 2007 The extra space allocated isn't for the length (in fact, it's just a
- mandel (7/25) Dec 01 2007 Thanks, that answers my question.
- Don Clugston (5/32) Dec 01 2007 An observation...
- Steven Schveighoffer (20/44) Dec 03 2007 Think of it this way:
- Sean Kelly (17/68) Dec 03 2007 This is true in D 1.0. However, there has been talk that arrays in D
- mandel (12/21) Dec 03 2007 I see the problem.
- Oskar Linde (19/36) Dec 03 2007 Appending to a (empty or not) array slice starting at the start of an
- Steven Schveighoffer (16/40) Dec 04 2007 Hm... I think you are slightly incorrect. I think the array is appended...
- Oskar Linde (16/61) Dec 04 2007 Try this:
- Steven Schveighoffer (31/44) Dec 04 2007 Outputs "ac"
- Sean Kelly (17/53) Dec 04 2007 Then the spec is wrong. The current behavior is very deliberate,
- Steven Schveighoffer (9/9) Dec 05 2007 BTW, here are a list of tango classes that can corrupt data that was not...
- Sean Kelly (3/13) Dec 05 2007 Thanks, I'll look into these.
- Steven Schveighoffer (23/29) Dec 04 2007 more bugs :)
- Sean Kelly (3/14) Dec 04 2007 This is expected behavior.
- Steven Schveighoffer (16/29) Dec 04 2007 Behavior by design, perhaps. Expected, I should hope not. I would neve...
- Regan Heath (21/49) Dec 05 2007 In this post I'm commenting on the example shown above, not the 2nd one
- Steven Schveighoffer (20/46) Dec 05 2007 The problem is that invariant data is changing. This is a no-no for pur...
- Regan Heath (41/71) Dec 05 2007 That is another issue which I didn't even address.
- Sean Kelly (11/55) Dec 05 2007 I don't know that it's broken so much as potentially misleading. That
- Regan Heath (9/33) Dec 05 2007 True for 'a' but when appending to 'a' it writes over the memory which
- Sean Kelly (5/41) Dec 05 2007 Or perhaps it should always reallocate. I'd originally thought it
- Derek Parnell (12/18) Dec 05 2007 However, this is fine ...
- Steven Schveighoffer (7/21) Dec 05 2007 Yes, I noticed that too. However, it's simply the non-deterministic
- Regan Heath (7/31) Dec 06 2007 To me, all the behaviour is "normal" ;)
- Matti Niemenmaa (13/22) Dec 05 2007 It's probably just a side effect of the fact that string literals are im...
- Regan Heath (6/19) Dec 06 2007 I suspect you're right. I think the reason is that a string literal is
- Steven Schveighoffer (15/22) Dec 05 2007 Look at the example for append:
- Sean Kelly (14/65) Dec 05 2007 One could argue that invariant data is changing because of a programmer
- Janice Caron (13/18) Dec 05 2007 It would, /unless/ we had a vector type (a C++ std::vector, not a math
- Sean Kelly (12/15) Dec 05 2007 The same "copy on write" issue applies to each case, but I agree that
- Regan Heath (19/43) Dec 05 2007 This one worries me.
- Sean Kelly (3/34) Dec 05 2007 Yes :-)
It probably is a noob question, but aren't array lengths just hidden size_t values that are passed around? Why do we need to allocate space for them, too? voif foo() { size_t length; char* ptr; //allocated memory of 2^n //.. the same as..? char[] data; }
Nov 30 2007
"mandel" <oh no.es> wrote in message news:fiqu9l$18v$1 digitalmars.com...It probably is a noob question, but aren't array lengths just hidden size_t values that are passed around? Why do we need to allocate space for them, too? voif foo() { size_t length; char* ptr; //allocated memory of 2^n //.. the same as..? char[] data; }What? I mean, yes, a size_t and a pointer will be the same size as an array reference, but the point of an array reference is that, well, it's an array reference. And you can do all kinds of things with them that you can't with pointers. What are you getting at?
Nov 30 2007
On 12/1/07, mandel <oh no.es> wrote:Why do we need to allocate space for them, too?Raw pointers are discouraged in modern languages such as D. They are the source of too many bugs. Use them if you need to do down-and-dirty, under-the-hood stuff, but for general use, forget pointers. Use arrays. They're safer.
Nov 30 2007
mandel wrote:It probably is a noob question, but aren't array lengths just hidden size_t values that are passed around? Why do we need to allocate space for them, too? voif foo() { size_t length; char* ptr; //allocated memory of 2^n //.. the same as..? char[] data; }The extra space allocated isn't for the length (in fact, it's just a byte I think); it's to make checking for array bounds errors possible (since there's a byte of space that, if accessed, indicates an overflow). I tmight be used for something else, too.
Nov 30 2007
Robert Fraser Wrote:mandel wrote:Thanks, that answers my question. But I can't think how it could be used for array bounds errors checking right now. Well, I guess there some ng post about this, somewhere. But the page allocation overhead looks ugly for a language like D. Anyway, good to have D arrays. Working with pointers in C was often ready for surprises in case of reduced attention. :>It probably is a noob question, but aren't array lengths just hidden size_t values that are passed around? Why do we need to allocate space for them, too? voif foo() { size_t length; char* ptr; //allocated memory of 2^n //.. the same as..? char[] data; }The extra space allocated isn't for the length (in fact, it's just a byte I think); it's to make checking for array bounds errors possible (since there's a byte of space that, if accessed, indicates an overflow). I tmight be used for something else, too.
Dec 01 2007
mandel wrote:Robert Fraser Wrote:An observation... In my experience, most pointer bugs are actually uninitialised variables. An uninitialised pointer is a truly horrible thing. But since D initialises variables, pointers in D aren't nearly as bad as in C.mandel wrote:Thanks, that answers my question. But I can't think how it could be used for array bounds errors checking right now. Well, I guess there some ng post about this, somewhere. But the page allocation overhead looks ugly for a language like D. Anyway, good to have D arrays. Working with pointers in C was often ready for surprises in case of reduced attention. :>It probably is a noob question, but aren't array lengths just hidden size_t values that are passed around? Why do we need to allocate space for them, too? voif foo() { size_t length; char* ptr; //allocated memory of 2^n //.. the same as..? char[] data; }The extra space allocated isn't for the length (in fact, it's just a byte I think); it's to make checking for array bounds errors possible (since there's a byte of space that, if accessed, indicates an overflow). I tmight be used for something else, too.
Dec 01 2007
"mandel" wrote in messageRobert Fraser Wrote:Think of it this way: int[] array1 = new int[5]; int[] array2 = new int[5]; imagine that array 1 and array 2 are now sequential in memory *AND* there is no extra byte separating them. Now I create the valid array slices: int[] array3 = array1[$..$]; int[] array4 = array2[0..0]; Note that both of these arrays are bit-for-bit identical (both have 0 length and the same ptr value). Which one points to which piece of memory? How is the GC to decide which memory gets collected? These are the types of problems that the extra byte helps with. I personally think there exists a way to fix this efficiently without adding the extra byte, but I can't think of one :) Oh, and also, the size_t length is not stored in the allocated memory. It's stored in the array structure, usually on the stack or inside a class instance. I hope this helps your understanding of the issue. -Stevemandel wrote:Thanks, that answers my question. But I can't think how it could be used for array bounds errors checking right now. Well, I guess there some ng post about this, somewhere. But the page allocation overhead looks ugly for a language like D.It probably is a noob question, but aren't array lengths just hidden size_t values that are passed around? Why do we need to allocate space for them, too? voif foo() { size_t length; char* ptr; //allocated memory of 2^n //.. the same as..? char[] data; }The extra space allocated isn't for the length (in fact, it's just a byte I think); it's to make checking for array bounds errors possible (since there's a byte of space that, if accessed, indicates an overflow). I tmight be used for something else, too.
Dec 03 2007
Steven Schveighoffer wrote:"mandel" wrote in messageThis is true in D 1.0. However, there has been talk that arrays in D 2.0 would change from: struct Array { size_t length; byte* ptr; } to: struct Array { byte* ptr; byte* end; } Which would make every array reference always point to itself and to the block immediately following it in memory, if no padding is done. SeanRobert Fraser Wrote:Think of it this way: int[] array1 = new int[5]; int[] array2 = new int[5]; imagine that array 1 and array 2 are now sequential in memory *AND* there is no extra byte separating them. Now I create the valid array slices: int[] array3 = array1[$..$]; int[] array4 = array2[0..0]; Note that both of these arrays are bit-for-bit identical (both have 0 length and the same ptr value). Which one points to which piece of memory? How is the GC to decide which memory gets collected? These are the types of problems that the extra byte helps with. I personally think there exists a way to fix this efficiently without adding the extra byte, but I can't think of one :) Oh, and also, the size_t length is not stored in the allocated memory. It's stored in the array structure, usually on the stack or inside a class instance.mandel wrote:Thanks, that answers my question. But I can't think how it could be used for array bounds errors checking right now. Well, I guess there some ng post about this, somewhere. But the page allocation overhead looks ugly for a language like D.It probably is a noob question, but aren't array lengths just hidden size_t values that are passed around? Why do we need to allocate space for them, too? voif foo() { size_t length; char* ptr; //allocated memory of 2^n //.. the same as..? char[] data; }The extra space allocated isn't for the length (in fact, it's just a byte I think); it's to make checking for array bounds errors possible (since there's a byte of space that, if accessed, indicates an overflow). I tmight be used for something else, too.
Dec 03 2007
Steven Schveighoffer wrote: [..]Now I create the valid array slices: int[] array3 = array1[$..$]; int[] array4 = array2[0..0]; Note that both of these arrays are bit-for-bit identical (both have 0 length and the same ptr value). Which one points to which piece of memory? How is the GC to decide which memory gets collected?I see the problem. The first possible solution that comes to my mind seeing this is to make array1[0..0] and array1[$..$] equal. array1[$..$] could point to the begin of the array. Since the slice length is null, it shouldn't matter - would it? Second thought, why not ignore empty slices at all by telling the GC that the pointers doesn't hold any data. Anyway, I guess there some things I missed. ;-) [..]I hope this helps your understanding of the issue.Yes, it does. :)
Dec 03 2007
mandel wrote:Steven Schveighoffer wrote: [..]Appending to a (empty or not) array slice starting at the start of an allocated block appends in-place rather than allocate a new array. This is the reason while(x) a ~= b; can be reasonably efficient. So appending to the [$..$] array would (without padding) mean that you corrupt the following array. The upcoming D2 T[new] (hopefully T[*] :) ) array type will probably make that a non-issue though.Now I create the valid array slices: int[] array3 = array1[$..$]; int[] array4 = array2[0..0]; Note that both of these arrays are bit-for-bit identical (both have 0 length and the same ptr value). Which one points to which piece of memory? How is the GC to decide which memory gets collected?I see the problem. The first possible solution that comes to my mind seeing this is to make array1[0..0] and array1[$..$] equal. array1[$..$] could point to the begin of the array. Since the slice length is null, it shouldn't matter - would it?Second thought, why not ignore empty slices at all by telling the GC that the pointers doesn't hold any data.Except for the fact that having an empty slice at the start of an allocated block is needed for appending to a preallocated block in current D, the reason is that the current GC doesn't have that fine grained information. It currently only knows "this block might contain pointers" and "this block doesn't contain pointers", and in the former case, treats everything properly aligned as potential pointers. -- Oskar
Dec 03 2007
"Oskar Linde" wrotemandel wrote:Hm... I think you are slightly incorrect. I think the array is appended to in place ONLY if the data after the slice is unallocated. In this case, it would be allocated, so the array would be re-allocated elsewhere.Steven Schveighoffer wrote: [..]Appending to a (empty or not) array slice starting at the start of an allocated block appends in-place rather than allocate a new array. This is the reason while(x) a ~= b; can be reasonably efficient.Now I create the valid array slices: int[] array3 = array1[$..$]; int[] array4 = array2[0..0]; Note that both of these arrays are bit-for-bit identical (both have 0 length and the same ptr value). Which one points to which piece of memory? How is the GC to decide which memory gets collected?I see the problem. The first possible solution that comes to my mind seeing this is to make array1[0..0] and array1[$..$] equal. array1[$..$] could point to the begin of the array. Since the slice length is null, it shouldn't matter - would it?So appending to the [$..$] array would (without padding) mean that you corrupt the following array.I think this is incorrect for the reasons I stated above. An allocated block should never be re-assigned to another array. Maybe I am wrong, but I think mandel might have a possible solution to this problem. If you slice an empty array (or even allocate an empty array), set the pointer to null. No reason to allocate an empty array, and no reason you need to keep memory around for it. If you append to it, it's going to be like appending to an init array anyways. That would also make null comparisons more consistent like: int[] array1 = array2[0..0]; if(array1 is null) // evaluates to true! ... -Steve
Dec 04 2007
Steven Schveighoffer wrote:"Oskar Linde" wroteTry this: char[] ab = "ab".dup; char[] a = ab[0..1]; a ~= "c"; writefln("ab = ",ab);mandel wrote:Hm... I think you are slightly incorrect. I think the array is appended to in place ONLY if the data after the slice is unallocated. In this case, it would be allocated, so the array would be re-allocated elsewhere.Steven Schveighoffer wrote: [..]Appending to a (empty or not) array slice starting at the start of an allocated block appends in-place rather than allocate a new array. This is the reason while(x) a ~= b; can be reasonably efficient.Now I create the valid array slices: int[] array3 = array1[$..$]; int[] array4 = array2[0..0]; Note that both of these arrays are bit-for-bit identical (both have 0 length and the same ptr value). Which one points to which piece of memory? How is the GC to decide which memory gets collected?I see the problem. The first possible solution that comes to my mind seeing this is to make array1[0..0] and array1[$..$] equal. array1[$..$] could point to the begin of the array. Since the slice length is null, it shouldn't matter - would it?See my example above. The allocated block is deduced from the slice .ptr. If the pointer points at the start of another array, DMD would have no way of knowing it isn't a slice of that other array.So appending to the [$..$] array would (without padding) mean that you corrupt the following array.I think this is incorrect for the reasons I stated above. An allocated block should never be re-assigned to another array.Maybe I am wrong, but I think mandel might have a possible solution to this problem. If you slice an empty array (or even allocate an empty array), set the pointer to null. No reason to allocate an empty array, and no reason you need to keep memory around for it. If you append to it, it's going to be like appending to an init array anyways. That would also make null comparisons more consistent like: int[] array1 = array2[0..0]; if(array1 is null) // evaluates to true! ...There are several cases where it is useful to retain the pointer when the array is of zero length, for example a zero length regexp sub-expression match. It used to be the case that setting a slice length to 0 (via the length property) made the pointer null as well. That changed a while ago. -- Oskar
Dec 04 2007
"Oskar Linde" wroteTry this: char[] ab = "ab".dup; char[] a = ab[0..1]; a ~= "c"; writefln("ab = ",ab);Outputs "ac" So this appears to be a bug then. Because from the spec: "Concatenation always creates a copy of its operands, even if one of the operands is a 0 length array" and then for the append operator example: "a ~= b; // a becomes the concatenation of a and b" Changing the line to a = a ~ "c" changes the output of the program to "ab". Another reason why this is seems to be a bug and NOT a feature: string ab = "ab".idup; string a = ab[0..1]; a ~= "c"; writefln("ab = ",ab); // also outputs "ac" This changes an invariant string without compiler complaint! Note that Tango has this problem too.I think your example exposes a bug, and does not agree with what the spec says. There seems to be a silent agreement among everyone that D should behave that way, but I can't find anything in the spec that states it should. Is this something that is planned to be fixed or at least described correctly in the spec? If someone desires this behavior, I would say that it's possible to keep a reference to the entire array and use the copy operator. i.e.: ab[1..$] = "c"; perhaps there could be another way to extend the slice if more buffer space exists? With the caveat that you know that if you have other references to that data, they could be changed too? I can see the usefulness of using an array as a buffer which keeps its allocated space as it shrinks, but this is not worth having x ~= y not mean the same thing as x = x ~ y. The current meaning is too error prone in my opinion. -SteveSee my example above. The allocated block is deduced from the slice .ptr. If the pointer points at the start of another array, DMD would have no way of knowing it isn't a slice of that other array.So appending to the [$..$] array would (without padding) mean that you corrupt the following array.I think this is incorrect for the reasons I stated above. An allocated block should never be re-assigned to another array.
Dec 04 2007
Steven Schveighoffer wrote:"Oskar Linde" wroteThen the spec is wrong. The current behavior is very deliberate, insofar as the code is concerned. Look at internal/gc/gc.d. It is also deliberate for interior slices to always reallocate on an append. But the runtime has no way to know whether something pointing to the head of a block is a slice or is the original array. I've never actually found this to be a problem in practice, and I'll admit to having used the slice terminology from time to time, because it's more succinct than resizing using the length property.Try this: char[] ab = "ab".dup; char[] a = ab[0..1]; a ~= "c"; writefln("ab = ",ab);Outputs "ac" So this appears to be a bug then. Because from the spec: "Concatenation always creates a copy of its operands, even if one of the operands is a 0 length array"The spec also says that D 1.0 has inheritable contracts, and maybe we will one day, but it's not even on the radar at the moment. For better or worse, I've learned not to put much stock in what the spec says about some things.I think your example exposes a bug, and does not agree with what the spec says.See my example above. The allocated block is deduced from the slice .ptr. If the pointer points at the start of another array, DMD would have no way of knowing it isn't a slice of that other array.So appending to the [$..$] array would (without padding) mean that you corrupt the following array.I think this is incorrect for the reasons I stated above. An allocated block should never be re-assigned to another array.There seems to be a silent agreement among everyone that D should behave that way, but I can't find anything in the spec that states it should. Is this something that is planned to be fixed or at least described correctly in the spec? If someone desires this behavior, I would say that it's possible to keep a reference to the entire array and use the copy operator. i.e.: ab[1..$] = "c"; perhaps there could be another way to extend the slice if more buffer space exists?It would be easy to allow all slices to be extended in place, even the interior ones. But going the other direction would be difficult. The proposed T[new] syntax might help in that direction, but I hate it. Sean
Dec 04 2007
BTW, here are a list of tango classes that can corrupt data that was not passed to them. These should probably be fixed: tango.net.ftp.FtpClient tango.net.cluster.NetworkCall tango.util.log.Hierarchy tango.stdc.stringz I haven't looked through phobos, but I'm sure there are instances which have this problem as well. -Steve
Dec 05 2007
Steven Schveighoffer wrote:BTW, here are a list of tango classes that can corrupt data that was not passed to them. These should probably be fixed: tango.net.ftp.FtpClient tango.net.cluster.NetworkCall tango.util.log.Hierarchy tango.stdc.stringz I haven't looked through phobos, but I'm sure there are instances which have this problem as well.Thanks, I'll look into these. Sean
Dec 05 2007
"Steven Schveighoffer" wroteAnother reason why this is seems to be a bug and NOT a feature: string ab = "ab".idup; string a = ab[0..1]; a ~= "c"; writefln("ab = ",ab); // also outputs "ac" This changes an invariant string without compiler complaint!more bugs :) import std.stdio; struct X { char[5] myArray; int x; } void main() { X[] x = new X[2]; x[0].myArray[] = "hello"; char[] myslice = x[0].myArray[0..3]; writefln("%x %x %x", &x[0].x, &x[0].myArray[0], &myslice[0]); myslice ~= "hithere"; writefln("%x %x %x", &x[0].x, &x[0].myArray[0], &myslice[0]); writefln("%s %d", x[0].myArray, x[0].x); } output: 868FE8 868FE0 868FE0 868FE8 868FE0 868FE0 helhi 25970 -Steve
Dec 04 2007
Steven Schveighoffer wrote:"Steven Schveighoffer" wroteThis is expected behavior. SeanAnother reason why this is seems to be a bug and NOT a feature: string ab = "ab".idup; string a = ab[0..1]; a ~= "c"; writefln("ab = ",ab); // also outputs "ac" This changes an invariant string without compiler complaint!more bugs :)
Dec 04 2007
"Sean Kelly" wroteSteven Schveighoffer wrote:Behavior by design, perhaps. Expected, I should hope not. I would never expect to be able to have one variable overwrite another without obvious casting. And why should it be 'expected behavior' for the GC to assume that because an array is at the beginning of a memory block, it is free to use any memory in that block? I think I've proven that there are cases where it should not assume that. I'm not saying there is a bug in the compiler implementation, or that the docs need to be changed to reflect the compiler behavior. I'm saying the design here is flat out wrong, and needs to be reflected in the compiler. My recommendation would be to make the ~= behave exactly as the spec says, that it always makes a copy of it's arguments. If you need buffer-like behavior for performance, write a new type. Isn't one of Walter's goal to prevent silent runtime errors? IMO, memory corruption errors are the worst kind of silent errors. -Steve"Steven Schveighoffer" wroteThis is expected behavior.Another reason why this is seems to be a bug and NOT a feature: string ab = "ab".idup; string a = ab[0..1]; a ~= "c"; writefln("ab = ",ab); // also outputs "ac" This changes an invariant string without compiler complaint!more bugs :)
Dec 04 2007
Steven Schveighoffer wrote:"Sean Kelly" wroteIn this post I'm commenting on the example shown above, not the 2nd one (which to be honest is much more worrying). I am a bit confused as to which example Sean was saying was "expected behaviour".Steven Schveighoffer wrote:"Steven Schveighoffer" wroteThis is expected behavior.Another reason why this is seems to be a bug and NOT a feature: string ab = "ab".idup; string a = ab[0..1]; a ~= "c"; writefln("ab = ",ab); // also outputs "ac" This changes an invariant string without compiler complaint!more bugs :)Behavior by design, perhaps. Expected, I should hope not. I would never expect to be able to have one variable overwrite another without obvious casting.Both variables above are references to the same data. You're using one variable to change that data, therefore the other variable which still refers to the same data, sees the changes. If the concatenation operation had to reallocate the memory it would produce a copy, and you wouldn't see the changes. So, this behaviour is non deterministic, however...And why should it be 'expected behavior' for the GC to assume that because an array is at the beginning of a memory block, it is free to use any memory in that block? I think I've proven that there are cases where it should not assume that.The assumption fits with D's (semi-)official "copy on write" policy. If you want to write to memory and you cannot be sure you are the only reference then you should copy the data before writing. Following this guideline makes the behaviour deterministic, and...I'm not saying there is a bug in the compiler implementation, or that the docs need to be changed to reflect the compiler behavior. I'm saying the design here is flat out wrong, and needs to be reflected in the compiler. My recommendation would be to make the ~= behave exactly as the spec says, that it always makes a copy of it's arguments. If you need buffer-like behavior for performance, write a new type.The current behaviour allows you to skip the copy step if you _know_ you hold the only reference to the data, it's putting the choice/power in the programmers hands. As always, power can be a dangerous thing if missused :)Isn't one of Walter's goal to prevent silent runtime errors? IMO, memory corruption errors are the worst kind of silent errors.The example shown above is not corrupting any memory. The 2nd one (not shown above) seems to be and it worries me much more. Regan
Dec 05 2007
"Regan Heath" wroteSteven Schveighoffer wrote:The problem is that invariant data is changing. This is a no-no for pure functions which Walter has planned. If invariant data can change without violating the rules of the spec, then the compiler implementation or design is flawed. I think the design is what is flawed. I have several problems with this concat operator issue. First, that x ~= y does not effect the same behavior as x = x ~ y. This is a fundamental flaw in the language in my opinion. any operator of the op= form is supposed to mean the same as x = x op y. This is consistent throughout all of D, except in this case. Second, there is the issue of the spec. The spec clearly states that concatenation should result in a copy of both sides. Obviously, this isn't true in all cases. The spec should be changed for both D 1.x and 2.x IMMEDIATELY to prevent unsuspecting coders from using ~= when what they really want is just ~. Third, I have not seen this T[new] operator described anywhere, but I am concerned that D 1.0 will not be updated. This leaves all coders who are not ready to switch to D 2 at risk. But from the inferred behavior of T[new], I'm expecting that this will probably fix the problem. -Steve"Sean Kelly" wroteIn this post I'm commenting on the example shown above, not the 2nd one (which to be honest is much more worrying). I am a bit confused as to which example Sean was saying was "expected behaviour".Steven Schveighoffer wrote:"Steven Schveighoffer" wroteThis is expected behavior.Another reason why this is seems to be a bug and NOT a feature: string ab = "ab".idup; string a = ab[0..1]; a ~= "c"; writefln("ab = ",ab); // also outputs "ac" This changes an invariant string without compiler complaint!more bugs :)Behavior by design, perhaps. Expected, I should hope not. I would never expect to be able to have one variable overwrite another without obvious casting.Both variables above are references to the same data. You're using one variable to change that data, therefore the other variable which still refers to the same data, sees the changes. If the concatenation operation had to reallocate the memory it would produce a copy, and you wouldn't see the changes. So, this behaviour is non deterministic, however...
Dec 05 2007
Steven Schveighoffer wrote:"Regan Heath" wroteThat is another issue which I didn't even address. Assuming 'string' means 'invariant(char)' and assuming that means the char values cannot change (I say assuming because I haven't had the chance to really internalise the new const yet) then I reckon the implementation of invariant is simply broken/buggy.Both variables above are references to the same data. You're using one variable to change that data, therefore the other variable which still refers to the same data, sees the changes. If the concatenation operation had to reallocate the memory it would produce a copy, and you wouldn't see the changes. So, this behaviour is non deterministic, however...The problem is that invariant data is changing. This is a no-no for pure functions which Walter has planned. If invariant data can change without violating the rules of the spec, then the compiler implementation or design is flawed. I think the design is what is flawed.I have several problems with this concat operator issue. First, that x ~= y does not effect the same behavior as x = x ~ y. This is a fundamental flaw in the language in my opinion. any operator of the op= form is supposed to mean the same as x = x op y. This is consistent throughout all of D, except in this case.The problem stems from the fact that x ~= y always assigns the result to x, whereas x ~ y can potentially be assigned to something else. This means the latter must create a new/temporary object to store the result. In the case of arrays this effectively means that x ~ y always creates a new array which is a copy of the old ones. But x ~= y need not create a new array as it can append to the existing one. The ~= form therefore allows an optimisation which is beneficial. Not allowing people to have both methods at their disposal would likely cause an outcry.Second, there is the issue of the spec. The spec clearly states that concatenation should result in a copy of both sides.1. The website cannot be trusted completely and is often behind the compiler when it comes to the spec. 2. It could be argued that "concatenation" is the x ~ y form and not the ~= form, which is called "append". From the website spec: "The binary operator ~ is the cat operator. It is used to concatenate arrays" "Similarly, the ~= operator means append" "Concatenation always creates a copy of its operands, even if one of the operands is a 0 length array" I'm probably splitting hairs here and I doubt there is much point arguing it - I just wanted to point out another way of reading the spec.Obviously, this isn't true in all cases. The spec should be changed for both D 1.x and 2.x IMMEDIATELY to prevent unsuspecting coders from using ~= when what they really want is just ~. Third, I have not seen this T[new] operator described anywhere, but I am concerned that D 1.0 will not be updated. This leaves all coders who are not ready to switch to D 2 at risk. But from the inferred behavior of T[new], I'm expecting that this will probably fix the problem.Aside from the apparent invariant bug the only case which causes me a slight worry is the case involving a struct. The only solution I can imagine would be to somehow determine the memory was originally allocated to a 'struct' and therefore reallocation for an 'array' must cause a copy. I'm not sure what information the GC keeps on allocated blocks, I believe there is a pointers/nopointers flag and that could form the basis of a fairly crude test perhaps (as struct contains pointers and char[] does not); Even if nothing can be done to detect this case I'm not sure it's a huge issue, after all it only affects people using static arrays as the first member of a struct which they take a slice of and then modify (concatenate) without performing "copy on write" - which is a no no in D anyway. Regan
Dec 05 2007
Regan Heath wrote:Steven Schveighoffer wrote:I don't know that it's broken so much as potentially misleading. That example never actually changed any of the data in the string, it simply appended additional data to the string. Thus invariance of the data was preserved."Regan Heath" wroteThat is another issue which I didn't even address. Assuming 'string' means 'invariant(char)' and assuming that means the char values cannot change (I say assuming because I haven't had the chance to really internalise the new const yet) then I reckon the implementation of invariant is simply broken/buggy.Both variables above are references to the same data. You're using one variable to change that data, therefore the other variable which still refers to the same data, sees the changes. If the concatenation operation had to reallocate the memory it would produce a copy, and you wouldn't see the changes. So, this behaviour is non deterministic, however...The problem is that invariant data is changing. This is a no-no for pure functions which Walter has planned. If invariant data can change without violating the rules of the spec, then the compiler implementation or design is flawed. I think the design is what is flawed.That's pretty much it. And if the GC were to retain additional type info it would be tailored to finding pointers for collection purposes rather than determining whether one section of a block is a static array or an int.Third, I have not seen this T[new] operator described anywhere, but I am concerned that D 1.0 will not be updated. This leaves all coders who are not ready to switch to D 2 at risk. But from the inferred behavior of T[new], I'm expecting that this will probably fix the problem.Aside from the apparent invariant bug the only case which causes me a slight worry is the case involving a struct. The only solution I can imagine would be to somehow determine the memory was originally allocated to a 'struct' and therefore reallocation for an 'array' must cause a copy. I'm not sure what information the GC keeps on allocated blocks, I believe there is a pointers/nopointers flag and that could form the basis of a fairly crude test perhaps (as struct contains pointers and char[] does not);Even if nothing can be done to detect this case I'm not sure it's a huge issue, after all it only affects people using static arrays as the first member of a struct which they take a slice of and then modify (concatenate) without performing "copy on write" - which is a no no in D anyway.Right. Sean
Dec 05 2007
Sean Kelly wrote:Regan Heath wrote:[example pasted again for clarity]Steven Schveighoffer wrote:I don't know that it's broken so much as potentially misleading. That example never actually changed any of the data in the string, it simply appended additional data to the string. Thus invariance of the data was preserved.The problem is that invariant data is changing. This is a no-no for pure functions which Walter has planned. If invariant data can change without violating the rules of the spec, then the compiler implementation or design is flawed. I think the design is what is flawed.That is another issue which I didn't even address. Assuming 'string' means 'invariant(char)' and assuming that means the char values cannot change (I say assuming because I haven't had the chance to really internalise the new const yet) then I reckon the implementation of invariant is simply broken/buggy.string ab = "ab".idup; string a = ab[0..1]; a ~= "c"; writefln("ab = ",ab); // also outputs "ac"True for 'a' but when appending to 'a' it writes over the memory which 'ab' guarantees is invariant. So, in the general case any slice of invariant data which is shorter than the original invariant data can be used to overwrite the original invariant data after the slice. Perhaps ~= should be disabled for invariant arrays. Regan
Dec 05 2007
Regan Heath wrote:Sean Kelly wrote:Oops, you're right.Regan Heath wrote:[example pasted again for clarity] > string ab = "ab".idup; > string a = ab[0..1]; > a ~= "c"; > writefln("ab = ",ab); // also outputs "ac" True for 'a' but when appending to 'a' it writes over the memory which 'ab' guarantees is invariant.Steven Schveighoffer wrote:I don't know that it's broken so much as potentially misleading. That example never actually changed any of the data in the string, it simply appended additional data to the string. Thus invariance of the data was preserved.The problem is that invariant data is changing. This is a no-no for pure functions which Walter has planned. If invariant data can change without violating the rules of the spec, then the compiler implementation or design is flawed. I think the design is what is flawed.That is another issue which I didn't even address. Assuming 'string' means 'invariant(char)' and assuming that means the char values cannot change (I say assuming because I haven't had the chance to really internalise the new const yet) then I reckon the implementation of invariant is simply broken/buggy.So, in the general case any slice of invariant data which is shorter than the original invariant data can be used to overwrite the original invariant data after the slice. Perhaps ~= should be disabled for invariant arrays.Or perhaps it should always reallocate. I'd originally thought it actually did this based on something Walter said, but I misunderstood. Sean
Dec 05 2007
On Wed, 05 Dec 2007 16:18:46 +0000, Regan Heath wrote:[example pasted again for clarity] > string ab = "ab".idup; > string a = ab[0..1]; > a ~= "c"; > writefln("ab = ",ab); // also outputs "ac"However, this is fine ... string ab = "ab"; string a = ab[0..1]; a ~= "c"; writefln("ab = ",ab); // outputs "ab" writefln("a = ",a); // outputs "ac" So it seems that the '.idup' property is affecting things. -- Derek Parnell Melbourne, Australia skype: derek.j.parnell
Dec 05 2007
"Derek Parnell" wroteOn Wed, 05 Dec 2007 16:18:46 +0000, Regan Heath wrote:Yes, I noticed that too. However, it's simply the non-deterministic behavior of the ~= operator that is causing this. For literal strings, I suspect they are not allocated by the GC, and so the GC can't extend them, so the normal behavior kicks in. But idup is supposed to give me an invariant array. The code is still changing invariant data... -Steve[example pasted again for clarity] > string ab = "ab".idup; > string a = ab[0..1]; > a ~= "c"; > writefln("ab = ",ab); // also outputs "ac"However, this is fine ... string ab = "ab"; string a = ab[0..1]; a ~= "c"; writefln("ab = ",ab); // outputs "ab" writefln("a = ",a); // outputs "ac" So it seems that the '.idup' property is affecting things.
Dec 05 2007
Steven Schveighoffer wrote:"Derek Parnell" wroteTo me, all the behaviour is "normal" ;) I think you're right about the reason it copies in this case. I wonder if the solution is for the GC to keep a seperate list of memory blocks which are invariant... then on reallocate it simply ignores this list - resulting in the same behavior as the string literal case. ReganOn Wed, 05 Dec 2007 16:18:46 +0000, Regan Heath wrote:Yes, I noticed that too. However, it's simply the non-deterministic behavior of the ~= operator that is causing this. For literal strings, I suspect they are not allocated by the GC, and so the GC can't extend them, so the normal behavior kicks in. But idup is supposed to give me an invariant array. The code is still changing invariant data...[example pasted again for clarity] > string ab = "ab".idup; > string a = ab[0..1]; > a ~= "c"; > writefln("ab = ",ab); // also outputs "ac"However, this is fine ... string ab = "ab"; string a = ab[0..1]; a ~= "c"; writefln("ab = ",ab); // outputs "ab" writefln("a = ",a); // outputs "ac" So it seems that the '.idup' property is affecting things.
Dec 06 2007
Derek Parnell wrote:However, this is fine ... string ab = "ab"; string a = ab[0..1]; a ~= "c"; writefln("ab = ",ab); // outputs "ab" writefln("a = ",a); // outputs "ac" So it seems that the '.idup' property is affecting things.It's probably just a side effect of the fact that string literals are immutable. The compiler knows that it has to reallocate when appending to it, I guess? import std.stdio; void main() { int[] ab = [0, 1]; int[] a = ab[0..1]; a ~= 2; writefln(ab); writefln(a); } -- E-mail address: matti.niemenmaa+news, domain is iki (DOT) fi
Dec 05 2007
Matti Niemenmaa wrote:Derek Parnell wrote:I suspect you're right. I think the reason is that a string literal is not allocated in the same way as the result from idup and maybe does not appear in the GC's list of memory blocks. So, if the GC doesn't know about it, the GC does not reallocate but instead copies. ReganHowever, this is fine ... string ab = "ab"; string a = ab[0..1]; a ~= "c"; writefln("ab = ",ab); // outputs "ab" writefln("a = ",a); // outputs "ac" So it seems that the '.idup' property is affecting things.It's probably just a side effect of the fact that string literals are immutable. The compiler knows that it has to reallocate when appending to it, I guess?
Dec 06 2007
"Regan Heath" wrote2. It could be argued that "concatenation" is the x ~ y form and not the ~= form, which is called "append". From the website spec: "The binary operator ~ is the cat operator. It is used to concatenate arrays" "Similarly, the ~= operator means append" "Concatenation always creates a copy of its operands, even if one of the operands is a 0 length array"Look at the example for append: "a ~= b; // a becomes the concatenation of a and b" This is the only explanation of what "append" does. Yes, we are splitting hairs, but they are important hairs to split :) Having the spec be accurate is important for not only compiler implementors (which right now doesn't matter much but might in the future) and to developers using D. Just a simple explanation of: append may or may not re-use the memory that the original array uses. Therefore you should not use the append operator unless you know the array to be appended to is a dynamic array and not a slice of a dynamic array. If this isn't the case, memory corruption can occur: (paste Oskar's example here) -Steve
Dec 05 2007
Steven Schveighoffer wrote:"Regan Heath" wroteOne could argue that invariant data is changing because of a programmer error, but you have a point.Steven Schveighoffer wrote:The problem is that invariant data is changing. This is a no-no for pure functions which Walter has planned. If invariant data can change without violating the rules of the spec, then the compiler implementation or design is flawed. I think the design is what is flawed."Sean Kelly" wroteIn this post I'm commenting on the example shown above, not the 2nd one (which to be honest is much more worrying). I am a bit confused as to which example Sean was saying was "expected behaviour".Steven Schveighoffer wrote:"Steven Schveighoffer" wroteThis is expected behavior.Another reason why this is seems to be a bug and NOT a feature: string ab = "ab".idup; string a = ab[0..1]; a ~= "c"; writefln("ab = ",ab); // also outputs "ac" This changes an invariant string without compiler complaint!more bugs :)Behavior by design, perhaps. Expected, I should hope not. I would never expect to be able to have one variable overwrite another without obvious casting.Both variables above are references to the same data. You're using one variable to change that data, therefore the other variable which still refers to the same data, sees the changes. If the concatenation operation had to reallocate the memory it would produce a copy, and you wouldn't see the changes. So, this behaviour is non deterministic, however...I have several problems with this concat operator issue. First, that x ~= y does not effect the same behavior as x = x ~ y. This is a fundamental flaw in the language in my opinion. any operator of the op= form is supposed to mean the same as x = x op y. This is consistent throughout all of D, except in this case. Second, there is the issue of the spec. The spec clearly states that concatenation should result in a copy of both sides. Obviously, this isn't true in all cases. The spec should be changed for both D 1.x and 2.x IMMEDIATELY to prevent unsuspecting coders from using ~= when what they really want is just ~.Or the runtime could be changed to always copy. However, it would absolutely murder application performance for something like this: char[] buf; for( int i = 0; i < 1_000_000; ++i ) buf ~= 'a'; And looping on an append is a pretty typical use case, in my experience.Third, I have not seen this T[new] operator described anywhere, but I am concerned that D 1.0 will not be updated. This leaves all coders who are not ready to switch to D 2 at risk. But from the inferred behavior of T[new], I'm expecting that this will probably fix the problem.The T[new] syntax basically said that resizable arrays would be declared as T[new] and non-resizable slices would be declared as T[]. My major problem with this is that it would change the way normal arrays are declared, and break tons of code in the process. Sean
Dec 05 2007
On 12/5/07, Sean Kelly <sean f4.ca> wrote:Or the runtime could be changed to always copy. However, it would absolutely murder application performance for something like this: char[] buf; for( int i = 0; i < 1_000_000; ++i ) buf ~= 'a';It would, /unless/ we had a vector type (a C++ std::vector, not a math vector). Java has something similar, I believe - immutable strings, but also a StringBuffer type. (I could be wrong about the details). Anyway, the point is, you'd just rewrite the above loop as: Vector!(char) buf; for( int i = 0; i < 1_000_000; ++i ) buf ~= 'a'; // // and when you're done // return buf.toArray(); That sort of thing.
Dec 05 2007
Regan Heath wrote:The example shown above is not corrupting any memory. The 2nd one (not shown above) seems to be and it worries me much more.The same "copy on write" issue applies to each case, but I agree that the behavior of the second is certainly less appealing. And you're right that it can and will corrupt memory if used in this manner. A pointer to the head of the struct is equal to a pointer to the head of the array, so right now the runtime is assuming that the entire block belongs to the array, which is wrong. Unfortunately, there is little that can be done about this mechanically. When a slice of the struct array is taken, type information is lost, so the compiler doesn't even know there's a struct involved, and so would be unable to supply additional type information to the runtime. Sean
Dec 05 2007
Steven Schveighoffer wrote:import std.stdio; struct X { char[5] myArray; int x; } void main() { X[] x = new X[2]; x[0].myArray[] = "hello"; char[] myslice = x[0].myArray[0..3]; writefln("%x %x %x", &x[0].x, &x[0].myArray[0], &myslice[0]); myslice ~= "hithere"; writefln("%x %x %x", &x[0].x, &x[0].myArray[0], &myslice[0]); writefln("%s %d", x[0].myArray, x[0].x); } output: 868FE8 868FE0 868FE0 868FE8 868FE0 868FE0 helhi 25970This one worries me. I believe the problem is caused by the memory address of myArray[0] being the same as the memory address of the struct. Is this what you realised Sean... I may be a bit slow on the uptake here :) When the slice needs to reallocate the GC checks this address and finds enough space following the struct (or perhaps it has allocated on a power of two boundary and already has enough) and it allows the concatenation to write to that memory. The problem is that it doesn't realise the memory was allocated to a struct, and is being reallocated by an array slice. So, the array concatenation overwrites the memory occupied by the int 'x'. Ick. I would have expected a static array to be un-reallocatable, so any concatenation performed on a slice of one to cause a copy to be made. But of course all that information is lost at the place where the reallocation is done, it's simply a memory address with a certain amount of memory associated with it. R
Dec 05 2007
Regan Heath wrote:Steven Schveighoffer wrote:Yes :-) Seanimport std.stdio; struct X { char[5] myArray; int x; } void main() { X[] x = new X[2]; x[0].myArray[] = "hello"; char[] myslice = x[0].myArray[0..3]; writefln("%x %x %x", &x[0].x, &x[0].myArray[0], &myslice[0]); myslice ~= "hithere"; writefln("%x %x %x", &x[0].x, &x[0].myArray[0], &myslice[0]); writefln("%s %d", x[0].myArray, x[0].x); } output: 868FE8 868FE0 868FE0 868FE8 868FE0 868FE0 helhi 25970This one worries me. I believe the problem is caused by the memory address of myArray[0] being the same as the memory address of the struct. Is this what you realised Sean... I may be a bit slow on the uptake here :)
Dec 05 2007