digitalmars.D - D array expansion and non-deterministic re-allocation
- Bartosz Milewski (21/21) Nov 15 2009 I read Andrei's chapter on arrays and there's one thing that concerns me...
- Tim Matthews (3/30) Nov 15 2009 Isn't the new kind of arrays created to fix this already? See T[new].
- Leandro Lucarella (8/9) Nov 15 2009 T[new] is gone.
- Walter Bright (7/13) Nov 15 2009 It is not non-deterministic. Try it - you'll get the same results every
- Nick Sabalausky (29/42) Nov 15 2009 Definition used by Implementation "Foo":
- Walter Bright (11/14) Nov 15 2009 It's deterministic in the sense that if you run the program again with
- Denis Koroskin (5/11) Nov 15 2009 It is *non*-deterministic. The decision to reallocate depends (or will
- Walter Bright (2/5) Nov 15 2009 The LRU is thread local.
- Denis Koroskin (3/9) Nov 17 2009 It will then prevent a moving GC from being possible in D.
- Sean Kelly (2/14) Nov 17 2009 It won't work with the existing GC either. If a page is completely empt...
- Walter Bright (2/20) Nov 17 2009 In both cases, the LRU can be adjusted rather than wiped to account for ...
- grauzone (4/9) Nov 15 2009 Even if it's memory safe, it could cause a bug. Look at Bartosz example
- Walter Bright (5/8) Nov 16 2009 Yes, it could cause a bug. But it isn't undefined-behavior, it isn't
- Rainer Deyke (10/16) Nov 16 2009 On the same platform, with the same compiler, compiler settings, and
- Walter Bright (4/19) Nov 16 2009 That is still determinate. Indeterminate means you get different results...
- Nick Sabalausky (6/16) Nov 16 2009 Even if it is technically determinate if you run it on the same machine ...
- Walter Bright (10/14) Nov 16 2009 It's not a security hole in any more serious manner than any other
- BCS (2/17) Nov 16 2009 would you agree that it is not something the programer can predict in ad...
- Walter Bright (3/5) Nov 16 2009 He can, but it is not reasonable to expect him to. But it's still
- Bartosz Milewski (3/9) Nov 17 2009 I've been following this discussion about determinism and I think it add...
- Bill Baxter (9/18) Nov 17 2009 esses the problem from the wrong point of view, that of a specific imple...
- Andrei Alexandrescu (6/17) Nov 17 2009 It would not be difficult to specify in the language definition (and
- Bartosz Milewski (4/10) Nov 18 2009 Actually this wouldn't fix the problem. Although this would make the pro...
- Andrei Alexandrescu (8/29) Nov 18 2009 Then you must restart your argument which was centered around
- Ali Cehreli (9/12) Nov 16 2009 I don't see it being any different than C++'s std::vector invalidating i...
- Andrei Alexandrescu (3/13) Nov 16 2009 It is different because invalid STL iterators could do anything.
- dsimcha (17/38) Nov 16 2009 slice is extended, the decision to re-allocate, and therefore to cut its
- Bartosz Milewski (3/7) Nov 17 2009 In the long run (D3), I proposed using the "unique" type modifier. If an...
- dsimcha (13/20) Nov 17 2009 is unique, the compiler knows that there are no slices to worry about, a...
- Andrei Alexandrescu (21/44) Nov 17 2009 We will have collections and all those good things, but I don't see how
- Leandro Lucarella (16/25) Nov 18 2009 I didn't say anything (until now) because this was discussed already.
- Andrei Alexandrescu (6/23) Nov 18 2009 In which ways do you think arrays horribly broken? Same as Bartosz menti...
- Bartosz Milewski (9/31) Nov 18 2009 What Leandro points out is that using D arrays is not straightforward an...
- Steven Schveighoffer (33/74) Nov 19 2009 a[1] would be 1 if an MRU cache was introduced. This is exactly the sor...
- Denis Koroskin (5/84) Nov 19 2009 Slices are ranges. And ranges are *shrink-only*. Are you going to append...
- Steven Schveighoffer (8/11) Nov 19 2009 Arrays are ranges too. Just because the range interface does not define...
- Bartosz Milewski (2/23) Nov 19 2009 Notice that you are using particular implementation detail (MRU cache) t...
- dsimcha (14/37) Nov 19 2009 explain the semantics of D arrays. There is a very important distinction...
- Leandro Lucarella (18/24) Nov 19 2009 Yes, but I think this is a place where you are doing early optimization ...
- Andrei Alexandrescu (9/26) Nov 19 2009 I think one aspect that tends to be forgotten is that built-in arrays
- Leandro Lucarella (14/42) Nov 19 2009 I don't think so, slices (a view over somebody elses memory, no appendin...
- Andrei Alexandrescu (26/62) Nov 19 2009 Well it also makes arrays/slices quite capable.
- dsimcha (14/76) Nov 19 2009 without
- Leandro Lucarella (13/17) Nov 20 2009 I think this is a really bad idea, it's so obvious dynamic arrays are
- Leandro Lucarella (41/107) Nov 20 2009 Nobody is judging the expressiveness of arrays, the bug-prone semantics
- Denis Koroskin (17/50) Nov 19 2009 Not so true. T[] appends are slow, *very* slow. Whenever I need anything...
- Andrei Alexandrescu (7/66) Nov 19 2009 I was speaking somewhat anticipatively; the MRU is slated to accelerate
- Bartosz Milewski (6/10) Nov 20 2009 Let's look at your proposal from a slightly different perspective. First...
- Steven Schveighoffer (14/46) Nov 23 2009 I haven't yet read all the other posts, so someone else may already have...
- Leandro Lucarella (16/40) Nov 23 2009 The thing is, with realloc() is less likely that you forget that the dat...
- Steven Schveighoffer (13/43) Nov 23 2009 realloc is worse. If I have multiple aliases to the same data, then I
- Leandro Lucarella (33/80) Nov 23 2009 Well, you are comparing GC vs no-GC, not realloc() vs. slices. I have no
- Steven Schveighoffer (81/138) Nov 24 2009 I was originally saying that realloc is similar to appending because it ...
- Andrei Alexandrescu (13/17) Nov 24 2009 [snip]
- Steven Schveighoffer (19/35) Nov 24 2009 Have you considered the performance impact on allocating non-array types...
- Andrei Alexandrescu (7/49) Nov 24 2009 Having access to the requested length is important at larger lengths, so...
- Steven Schveighoffer (13/59) Nov 24 2009 So in other words, appending any array under 128 bytes which does not
- Steven Schveighoffer (9/16) Nov 24 2009 If all this is true, I had another thought for an MRU cache. It could
- Andrei Alexandrescu (5/25) Nov 24 2009 That's a great idea. I keep on dreaming someone will implement all this
- dsimcha (3/28) Nov 24 2009 But if the full semantics of shared are not implemented for D2 (will the...
- Steven Schveighoffer (5/28) Nov 24 2009 What about immutable arrays? Can't those be shared without being marked...
- Steven Schveighoffer (7/25) Nov 24 2009 Note this also solves the problem of having to flush the MRU cache on a ...
- Steven Schveighoffer (7/13) Nov 24 2009 This doesn't compute. The heap is shared, you need to store the allocat...
- Bartosz Milewski (33/33) Nov 24 2009 It's hard to argue whether something is or isn't bug prone and how sever...
- Steven Schveighoffer (13/61) Nov 24 2009 Yes, but there is a very fuzzy line between preventing all bugs and
- Bartosz Milewski (4/52) Nov 24 2009
- Steven Schveighoffer (13/70) Nov 24 2009 Then he forgot the const decoration for the parameter. No duping is
- Bartosz Milewski (12/88) Nov 24 2009 Well, I was wondering if it could be rewritten as:
- Steven Schveighoffer (26/122) Nov 24 2009 I think this warrants more code reviews of this particular developer's
- Bartosz Milewski (10/24) Nov 24 2009
- Steven Schveighoffer (10/44) Nov 25 2009 C++ const is kind of a joke :)
- Simen kjaeraas (7/13) Nov 25 2009 As long as the compiler is aware of arrays' existence, and knows how
- Steven Schveighoffer (10/24) Nov 25 2009 As long as the spec says changing length may expand the array to hold en...
- Bartosz Milewski (5/6) Nov 25 2009 This is all true for user-defined data types, but array is totally under...
- Andrei Alexandrescu (14/24) Nov 25 2009 Yes, but this gets into optimizing sequences of complex operations, so
- Bartosz Milewski (3/33) Nov 25 2009 I'm trying to nail down the semantics. Here is the case where there prov...
- Andrei Alexandrescu (16/63) Nov 25 2009 The answer is very simple, and it is present in the book. I have also
- Steven Schveighoffer (9/19) Nov 25 2009 In fact, adding length modifies the contents.
- Walter Bright (5/15) Nov 25 2009 The optimizer is free to change around implementation-defined-behavior
- Bartosz Milewski (2/20) Nov 26 2009 So the real question is, is extension defined in terms of re-allocation ...
- Bartosz Milewski (4/31) Nov 25 2009 The problem here is that the guy is sort of right. Indeed quote() expand...
- Andrei Alexandrescu (13/46) Nov 25 2009 I agree that the underlying array sharing can be often a hassle. This is...
- Bartosz Milewski (4/13) Nov 25 2009 I'm afraid this would further muddle the message: "If you want safe arra...
- Andrei Alexandrescu (10/40) Nov 25 2009 I think the message is "If you want values, use Value. If you want
- Bartosz Milewski (5/51) Nov 26 2009 I know you don't like analogies, but for me it sounds like an advice to ...
- Andrei Alexandrescu (9/60) Nov 26 2009 Some would use that, some would use T[]. For example std.algorithm could...
- Steven Schveighoffer (7/10) Nov 25 2009 Actually, the requirement for not reallocating is that the slice ends at...
- Bartosz Milewski (4/22) Nov 26 2009 Steve, I don't know about you, but this exchange clarified some things f...
- Steven Schveighoffer (42/56) Dec 01 2009 I was defending the semantics by using an example implementation. I was...
- Bartosz Milewski (2/8) Dec 01 2009 This might just be it. Experienced D programmers will avoid using ~= . I...
- Bill Baxter (25/75) Nov 24 2009 in
- Andrei Alexandrescu (4/55) Nov 24 2009 The cache does not work with shared arrays, and threads cannot share
- Leandro Lucarella (21/45) Nov 18 2009 Yes, I think dynamic arrays are grouping 2 separate things: a proper
- Bartosz Milewski (3/11) Nov 18 2009 Actually I was thinking of deep unique, but understandably I neither des...
- Steven Schveighoffer (8/27) Nov 19 2009 I wonder what your expert opinion is: Is something like unique possible...
- Bartosz Milewski (3/8) Nov 19 2009 The most complete description of "unique" that I've seen is in Haller/Od...
- Bartosz Milewski (9/15) Nov 21 2009 Can this question be answered without the recourse to implementation, an...
- Ali Cehreli (5/23) Nov 21 2009 It is natural that a language like D has dynamic arrays (as core feature...
- Bartosz Milewski (2/3) Nov 21 2009 So is your answer different from that of dsimcha? I'm sitting on a fence...
- dsimcha (6/9) Nov 21 2009 the other is a slice... In D, slices have "discretionary sharing semanti...
- Andrei Alexandrescu (14/24) Nov 22 2009 I'm not sure I figure what the issue is, and I'd appreciate a rehash of
- Bartosz Milewski (2/4) Nov 22 2009 Some of the points are in my message from Sat, 05:19 pm. Especially this...
- Steven Schveighoffer (27/34) Nov 23 2009 My view is that a conforming implementation could reallocate on every
- Andrei Alexandrescu (13/23) Nov 23 2009 I don't think a conforming implementation is allowed to do that. The
- Bartosz Milewski (2/19) Nov 23 2009 I wish we had some performance numbers to support this optimization. The...
- Andrei Alexandrescu (5/26) Nov 23 2009 Very true. It would be great if you or someone else interested could
- dsimcha (19/45) Nov 23 2009 always a break-even point between O(1) and O(N) and I wonder at what N i...
- Steven Schveighoffer (9/33) Nov 23 2009 It's hard to say that your example is a good fit for the arrays of D.
- Bill Baxter (16/49) Nov 23 2009 on
- Andrei Alexandrescu (11/52) Nov 23 2009 I agree. Overall, our eyes should be at this point focused on principle
- Bartosz Milewski (2/6) Nov 24 2009 I've been thinking the same. Full uniqueness might be hard to implement,...
- Michel Fortin (22/27) Nov 22 2009 I think it's the "may terminate" part Bartosz (and some others) don't
- dsimcha (20/34) Nov 21 2009 expanded part of the array is never implicitly shared with any other arr...
- Bartosz Milewski (2/8) Nov 21 2009 This might be true at the experimental phase of language design. But I e...
- Ali Cehreli (8/31) Nov 21 2009 Consider there is also c:
- dsimcha (14/45) Nov 22 2009 array, and in this case b ~= 4 operation might not relocate b, and b[0] ...
- Steven Schveighoffer (16/46) Nov 23 2009 I believe that's because Andrei is hoping this problem will be fixed
- gzp (34/43) Nov 19 2009 It seems as we cannot persuade Andrei and Walter, that it is a
- Denis Koroskin (31/81) Nov 19 2009 f =
- grauzone (11/25) Nov 19 2009 I think Walter and Andrei just felt like many people do when they have
- Andrei Alexandrescu (4/33) Nov 19 2009 However flattering the attention to Walter's and my psychology could be,...
I read Andrei's chapter on arrays and there's one thing that concerns me. When a slice is extended, the decision to re-allocate, and therefore to cut its connection to other slices, is non-deterministic. How does that influence program testing and can it be exploited to attack a buggy system? Here's a piece of code I came up with--it's not realistic but it describes a certain class of scenarios: void f(byte[] code) { doSomeChecks(code); execute(code); } void doSomeChecks(byte[] a) { // to avoid special-casing empty arrays a.length += 1; // extension // get byte b (or a whole bunch) from somewhere a[0] = b; // write // check what happens when a[0] = b } Of course this code is incorrect because doSomeChecks assumes that it holds a unique array, and the caller failed to make a copy before sending it to doSomeChecks. Bad bad programmer! However imagine this scenario: Under testing conditions, the array extension just happens to always cause reallocation, so the write is NOT visible in the calling routine. All tests pass with flying colors. But since the re-allocation is, by language definition, non-deterministic, it might happen that, after the code has been deployed, the re-allocation stopped happening, say, on Sundays. If a hacker learns about that, he may be able to inject malicious code or data into the program. In this case the non-determinism built into the language helped camouflage a serious bug. Should we, or shouldn't we, worry about it in practice?
Nov 15 2009
Bartosz Milewski wrote:I read Andrei's chapter on arrays and there's one thing that concerns me. When a slice is extended, the decision to re-allocate, and therefore to cut its connection to other slices, is non-deterministic. How does that influence program testing and can it be exploited to attack a buggy system? Here's a piece of code I came up with--it's not realistic but it describes a certain class of scenarios: void f(byte[] code) { doSomeChecks(code); execute(code); } void doSomeChecks(byte[] a) { // to avoid special-casing empty arrays a.length += 1; // extension // get byte b (or a whole bunch) from somewhere a[0] = b; // write // check what happens when a[0] = b } Of course this code is incorrect because doSomeChecks assumes that it holds a unique array, and the caller failed to make a copy before sending it to doSomeChecks. Bad bad programmer! However imagine this scenario: Under testing conditions, the array extension just happens to always cause reallocation, so the write is NOT visible in the calling routine. All tests pass with flying colors. But since the re-allocation is, by language definition, non-deterministic, it might happen that, after the code has been deployed, the re-allocation stopped happening, say, on Sundays. If a hacker learns about that, he may be able to inject malicious code or data into the program. In this case the non-determinism built into the language helped camouflage a serious bug. Should we, or shouldn't we, worry about it in practice?Isn't the new kind of arrays created to fix this already? See T[new]. PS: Are you still working on D's concurrency?
Nov 15 2009
Tim Matthews, el 16 de noviembre a las 15:05 me escribiste:Isn't the new kind of arrays created to fix this already? See T[new].T[new] is gone. -- Leandro Lucarella (AKA luca) http://llucax.com.ar/ ---------------------------------------------------------------------- GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145 104C 949E BFB6 5F5A 8D05) ---------------------------------------------------------------------- Parece McGuevara's o CheDonald's
Nov 15 2009
Bartosz Milewski wrote:I read Andrei's chapter on arrays and there's one thing that concerns me. When a slice is extended, the decision to re-allocate, and therefore to cut its connection to other slices, is non-deterministic.It is not non-deterministic. Try it - you'll get the same results every time. It is implementation-defined behavior.How does that influence program testing and can it be exploited to attack a buggy system?Exploitations rely on undefined-behavior, such as buffer overflows, not implementation-defined behavior. This issue is no more conducive to an "exploit" than any other garden variety programming issue. It's entirely different than a buffer overflow attack.
Nov 15 2009
"Walter Bright" <newshound1 digitalmars.com> wrote in message news:hdqea8$2mte$1 digitalmars.com...Bartosz Milewski wrote:Definition used by Implementation "Foo": "Resize if there's room, otherwise reallocate" (sounds fairly realistic to me) Code in program using Implementation "Foo": myArrayOnHeap ~= getUserInput(); Resizes or Reallocates?: Deterministically dependant on "Is there room?" which is deterministically dependant on two things: 1. The size of the user input (non-deterministic). 2. The amount of freespace after the current end of myArrayOnHeap (dependent on the specific actions of the GC, which in turn, is dependent on other behaviors of the program, which, for almost any useful program, are dependent on non-deterministic runtime conditions, such as other user input). If 'A' ("resize or reallocate?") is deterministically determined by 'B', and 'B' ("is there room?") is non-deterministic ("dependent on things like user input"), then 'A' is, in effect, non-deterministic. Simplified example of the same basic principle in action: void main(char[][] args) { if(args.length < 10) showPrettyPicture(); else blowUpEarth(); } Deterministic? Only in the same sense that "resize or realloc upon appending" is deterministic. Safe? Fuck no.I read Andrei's chapter on arrays and there's one thing that concerns me. When a slice is extended, the decision to re-allocate, and therefore to cut its connection to other slices, is non-deterministic.It is not non-deterministic. Try it - you'll get the same results every time. It is implementation-defined behavior.How does that influence program testing and can it be exploited to attack a buggy system?Exploitations rely on undefined-behavior, such as buffer overflows, not implementation-defined behavior. This issue is no more conducive to an "exploit" than any other garden variety programming issue. It's entirely different than a buffer overflow attack.
Nov 15 2009
Nick Sabalausky wrote:Deterministic? Only in the same sense that "resize or realloc upon appending" is deterministic.It's deterministic in the sense that if you run the program again with the same inputs, you will get the same result. This is a highly useful attribute for testing and debugging. A non-deterministic problem will give you a different result every time you run it. Threading problems are an example of this, as are any dependencies on uninitialized data. This particular issue is implementation-defined.Safe? Fuck no.It's safe as in memory safe. This is as opposed to undefined-behavior, which is not memory safe. A buffer overflow is an example of undefined-behavior.
Nov 15 2009
On Mon, 16 Nov 2009 09:24:13 +0300, Walter Bright <newshound1 digitalmars.com> wrote:Nick Sabalausky wrote:It is *non*-deterministic. The decision to reallocate depends (or will depend) on LRU and it may be cleared by another thread (e.g. another thread may reset it manually or via a GC cycle run).Deterministic? Only in the same sense that "resize or realloc upon appending" is deterministic.It's deterministic in the sense that if you run the program again with the same inputs, you will get the same result. This is a highly useful attribute for testing and debugging.
Nov 15 2009
Denis Koroskin wrote:It is *non*-deterministic. The decision to reallocate depends (or will depend) on LRU and it may be cleared by another thread (e.g. another thread may reset it manually or via a GC cycle run).The LRU is thread local.
Nov 15 2009
Walter Bright Wrote:Denis Koroskin wrote:It will then prevent a moving GC from being possible in D. Otherwise, moving a block of data will invalidate LRU which will result in non-deterministic reallocation.It is *non*-deterministic. The decision to reallocate depends (or will depend) on LRU and it may be cleared by another thread (e.g. another thread may reset it manually or via a GC cycle run).The LRU is thread local.
Nov 17 2009
Denis Koroskin Wrote:Walter Bright Wrote:It won't work with the existing GC either. If a page is completely emptied then it can be re-used for different sized allocations. This is why the current cache is wiped during a collection.Denis Koroskin wrote:It will then prevent a moving GC from being possible in D. Otherwise, moving a block of data will invalidate LRU which will result in non-deterministic reallocation.It is *non*-deterministic. The decision to reallocate depends (or will depend) on LRU and it may be cleared by another thread (e.g. another thread may reset it manually or via a GC cycle run).The LRU is thread local.
Nov 17 2009
Sean Kelly wrote:Denis Koroskin Wrote:In both cases, the LRU can be adjusted rather than wiped to account for it.Walter Bright Wrote:It won't work with the existing GC either. If a page is completely emptied then it can be re-used for different sized allocations. This is why the current cache is wiped during a collection.Denis Koroskin wrote:It will then prevent a moving GC from being possible in D. Otherwise, moving a block of data will invalidate LRU which will result in non-deterministic reallocation.It is *non*-deterministic. The decision to reallocate depends (or will depend) on LRU and it may be cleared by another thread (e.g. another thread may reset it manually or via a GC cycle run).The LRU is thread local.
Nov 17 2009
Walter Bright wrote:Even if it's memory safe, it could cause a bug. Look at Bartosz example again; if the memory block gets copied really depends from the input data length.Safe? Fuck no.It's safe as in memory safe. This is as opposed to undefined-behavior, which is not memory safe. A buffer overflow is an example of undefined-behavior.
Nov 15 2009
grauzone wrote:Even if it's memory safe, it could cause a bug. Look at Bartosz example again; if the memory block gets copied really depends from the input data length.Yes, it could cause a bug. But it isn't undefined-behavior, it isn't memory corruption, and it isn't non-deterministic. i = ++i; can also cause a bug.
Nov 16 2009
Walter Bright wrote:It's deterministic in the sense that if you run the program again with the same inputs, you will get the same result. This is a highly useful attribute for testing and debugging.On the same platform, with the same compiler, compiler settings, and standard library implementation. That makes it harder to test, not easier. You now have to test with multiple compilers.It's safe as in memory safe. This is as opposed to undefined-behavior, which is not memory safe. A buffer overflow is an example of undefined-behavior.The current behavior is unsafe in that you can accidentally have two variables pointing at the same buffer. Let's say one buffer holds network input and the other holds some bytecode to execute. Boom - a bug that can be exploited to execute malicious (byte-)code. -- Rainer Deyke - rainerd eldwood.com
Nov 16 2009
Rainer Deyke wrote:Walter Bright wrote:That is still determinate. Indeterminate means you get different results if your run it again on the same machine.It's deterministic in the sense that if you run the program again with the same inputs, you will get the same result. This is a highly useful attribute for testing and debugging.On the same platform, with the same compiler, compiler settings, and standard library implementation. That makes it harder to test, not easier. You now have to test with multiple compilers.It is not two random arrays randomly sharing data.It's safe as in memory safe. This is as opposed to undefined-behavior, which is not memory safe. A buffer overflow is an example of undefined-behavior.The current behavior is unsafe in that you can accidentally have two variables pointing at the same buffer. Let's say one buffer holds network input and the other holds some bytecode to execute. Boom - a bug that can be exploited to execute malicious (byte-)code.
Nov 16 2009
"Walter Bright" <newshound1 digitalmars.com> wrote in message news:hdr6mm$1bf0$1 digitalmars.com...Rainer Deyke wrote:Even if it is technically determinate if you run it on the same machine with the same inputs, that still does nothing to address Bartosz's claim that it's a potential security hole - Apps don't always get run on the same machine with the same inputs.Walter Bright wrote:That is still determinate. Indeterminate means you get different results if your run it again on the same machine.It's deterministic in the sense that if you run the program again with the same inputs, you will get the same result. This is a highly useful attribute for testing and debugging.On the same platform, with the same compiler, compiler settings, and standard library implementation. That makes it harder to test, not easier. You now have to test with multiple compilers.
Nov 16 2009
Nick Sabalausky wrote:Even if it is technically determinate if you run it on the same machine with the same inputs, that still does nothing to address Bartosz's claim that it's a potential security hole - Apps don't always get run on the same machine with the same inputs.It's not a security hole in any more serious manner than any other routine programming bug would be. Very few ordinary programming bugs are exploitable. A buffer overflow, however, is much more of a security hole because they are nearly always exploitable, because it allows arbitrary user data to be executed. This is not the case with the array resizing issue. That's why I drew a distinction between undefined-behavior and implementation-defined behavior. The former is a couple more orders of magnitude more serious.
Nov 16 2009
Hello Walter,Nick Sabalausky wrote:would you agree that it is not something the programer can predict in advance?Deterministic? Only in the same sense that "resize or realloc upon appending" is deterministic.It's deterministic in the sense that if you run the program again with the same inputs, you will get the same result. This is a highly useful attribute for testing and debugging. A non-deterministic problem will give you a different result every time you run it. Threading problems are an example of this, as are any dependencies on uninitialized data. This particular issue is implementation-defined.
Nov 16 2009
BCS wrote:would you agree that it is not something the programer can predict in advance?He can, but it is not reasonable to expect him to. But it's still deterministic.
Nov 16 2009
Walter Bright Wrote:BCS wrote:I've been following this discussion about determinism and I think it addresses the problem from the wrong point of view, that of a specific implementation. My concern is the semantics of the language. As it is defined right now, a conforming implementation is free to use a quantum random number generator to decide whether to re-allocate or not. Is it likely? I don't think so; but the non-determinism is part of the semantics of D arrays.would you agree that it is not something the programer can predict in advance?He can, but it is not reasonable to expect him to. But it's still deterministic.
Nov 17 2009
On Tue, Nov 17, 2009 at 4:38 PM, Bartosz Milewski <bartosz-nospam relisoft.com> wrote:Walter Bright Wrote:esses the problem from the wrong point of view, that of a specific implemen= tation.BCS wrote:I've been following this discussion about determinism and I think it addr=would you agree that it is not something the programer can predict in advance?He can, but it is not reasonable to expect him to. But it's still deterministic.My concern is the semantics of the language. As it is defined right now, =a conforming implementation is free to use a quantum random number generato= r to decide whether to re-allocate or not. Is it likely? I don't think so; = but the non-determinism is part of the semantics of D arrays. Then why not just fix the spec to say the behavior has to be deterministic? --bb
Nov 17 2009
Bartosz Milewski wrote:Walter Bright Wrote:It would not be difficult to specify in the language definition (and TDPL) that behavior is deterministic for a given platform. I think this has some impact on the freedom of the memory allocator, but probably not major. AndreiBCS wrote:I've been following this discussion about determinism and I think it addresses the problem from the wrong point of view, that of a specific implementation. My concern is the semantics of the language. As it is defined right now, a conforming implementation is free to use a quantum random number generator to decide whether to re-allocate or not. Is it likely? I don't think so; but the non-determinism is part of the semantics of D arrays.would you agree that it is not something the programer can predict in advance?He can, but it is not reasonable to expect him to. But it's still deterministic.
Nov 17 2009
Andrei Alexandrescu Wrote:My concern is the semantics of the language. As it is defined right now, a conforming implementation is free to use a quantum random number generator to decide whether to re-allocate or not. Is it likely? I don't think so; but the non-determinism is part of the semantics of D arrays. It would not be difficult to specify in the language definition (and TDPL) that behavior is deterministic for a given platform. I think this has some impact on the freedom of the memory allocator, but probably not major.Actually this wouldn't fix the problem. Although this would make the program deterministic, it would still exhibit chaotic behavior (and chaos is a pretty good simulator of non-determinism--see random number generators). An input string that is one character longer than in the previous run in one part of the program could cause change in allocation in a completely different part of the program (arbitrary long distance coupling). Theoretically, the heap is deterministic, but in practice no program should depend on all pointers having exactly the same values from run to run. For all intents and purposes the heap should be treated as non-deterministic. This is why no language bothers to impose determinism on the heap. Neither should D.
Nov 18 2009
Bartosz Milewski wrote:Andrei Alexandrescu Wrote:I am glad you have also reached that conclusion.My concern is the semantics of the language. As it is defined right now, a conforming implementation is free to use a quantum random number generator to decide whether to re-allocate or not. Is it likely? I don't think so; but the non-determinism is part of the semantics of D arrays. It would not be difficult to specify in the language definition (and TDPL) that behavior is deterministic for a given platform. I think this has some impact on the freedom of the memory allocator, but probably not major.Actually this wouldn't fix the problem. Although this would make the program deterministic, it would still exhibit chaotic behavior (and chaos is a pretty good simulator of non-determinism--see random number generators).An input string that is one character longer than in the previous run in one part of the program could cause change in allocation in a completely different part of the program (arbitrary long distance coupling).Then you must restart your argument which was centered around non-determinism. It starts like that: "I read Andrei's chapter on arrays and there's one thing that concerns me. When a slice is extended, the decision to re-allocate, and therefore to cut its connection to other slices, is non-deterministic." Andrei
Nov 18 2009
Bartosz Milewski Wrote:I read Andrei's chapter on arrays and there's one thing that concerns me.I tried to voice the same concern earlier, but my voice is not loud yet. ;)When a slice is extended, the decision to re-allocate, and therefore to cut its connection to other slices, is non-deterministic.I don't see it being any different than C++'s std::vector invalidating iterators when the vector is expanded. That has the same "non-determinism", but we've been living with it. The reason we've been living happily with it is, because it's been spelled out: "may invalidate all iterators." In D2's slices, what we're missing is the correct language to describe the semantics. As long as we set the semantics right, there will be no problem. (Earlier, I proposed "discretionary sharing semantics" for D2's slice semantics.) The first code examples that are used to describe slices better say "slice" on the left hand side: int[] slice = new int[2]; // GOOD for the semantics int[] array = new int[2]; // BAD for the semantics Ali
Nov 16 2009
Ali Cehreli wrote:Bartosz Milewski Wrote:It is different because invalid STL iterators could do anything. AndreiI read Andrei's chapter on arrays and there's one thing that concerns me.I tried to voice the same concern earlier, but my voice is not loud yet. ;)When a slice is extended, the decision to re-allocate, and therefore to cut its connection to other slices, is non-deterministic.I don't see it being any different than C++'s std::vector invalidating iterators when the vector is expanded. That has the same "non-determinism", but we've been living with it.
Nov 16 2009
== Quote from Bartosz Milewski (bartosz-nospam relisoft.com)'s articleI read Andrei's chapter on arrays and there's one thing that concerns me. When aslice is extended, the decision to re-allocate, and therefore to cut its connection to other slices, is non-deterministic. How does that influence program testing and can it be exploited to attack a buggy system?Here's a piece of code I came up with--it's not realistic but it describes acertain class of scenarios:void f(byte[] code) { doSomeChecks(code); execute(code); } void doSomeChecks(byte[] a) { // to avoid special-casing empty arrays a.length += 1; // extension // get byte b (or a whole bunch) from somewhere a[0] = b; // write // check what happens when a[0] = b } Of course this code is incorrect because doSomeChecks assumes that it holds aunique array, and the caller failed to make a copy before sending it to doSomeChecks. Bad bad programmer!However imagine this scenario: Under testing conditions, the array extensionjust happens to always cause reallocation, so the write is NOT visible in the calling routine. All tests pass with flying colors. But since the re-allocation is, by language definition, non-deterministic, it might happen that, after the code has been deployed, the re-allocation stopped happening, say, on Sundays. If a hacker learns about that, he may be able to inject malicious code or data into the program.In this case the non-determinism built into the language helped camouflage aserious bug. Should we, or shouldn't we, worry about it in practice? The one thing that I think has been missing from this discussion is, what would be the alternative if we didn't have this "non-deterministic" reallocation? How else could you **efficiently** implement dynamic arrays?
Nov 16 2009
dsimcha Wrote:The one thing that I think has been missing from this discussion is, what would be the alternative if we didn't have this "non-deterministic" reallocation? How else could you **efficiently** implement dynamic arrays?In the long run (D3), I proposed using the "unique" type modifier. If an array is unique, the compiler knows that there are no slices to worry about, and it can use in-place reallocation to its heart content. That pretty much solves the performance problem. In the short run (D2), I would suggest sticking to "reallocate on every extension" semantics (especially in SafeD) and provide a library solution (a la C++ std::vector) where the performance of appending is an issue.
Nov 17 2009
== Quote from Bartosz Milewski (bartosz-nospam relisoft.com)'s articledsimcha Wrote:is unique, the compiler knows that there are no slices to worry about, and it can use in-place reallocation to its heart content. That pretty much solves the performance problem.The one thing that I think has been missing from this discussion is, what would be the alternative if we didn't have this "non-deterministic" reallocation? How else could you **efficiently** implement dynamic arrays?In the long run (D3), I proposed using the "unique" type modifier. If an arrayIn the short run (D2), I would suggest sticking to "reallocate on everyextension" semantics (especially in SafeD) and provide a library solution (a la C++ std::vector) where the performance of appending is an issue. Probably not a bad idea, since: 1. I've concluded that appending to slices simply can't be made both efficient and safe. 2. We'll probably get a std.collections anyhow. 3. If you **really** care about performance, you should only append when you don't know the length in advance. If you know the length, you should always pre-allocate.
Nov 17 2009
dsimcha wrote:== Quote from Bartosz Milewski (bartosz-nospam relisoft.com)'s articleWe will have collections and all those good things, but I don't see how the proposal follows from the feedback. My perception is that this is a group of people who use D. Bartosz' concern didn't cause quite a riot, so as far as I can tell there is no big issue at stake. (Add to that the fact that today's arrays are even _more_ potentially puzzling because, without the MRU cache, expanding slices may trample over other slices.) In contrast, slowness of ~= comes time and again to haunt this newsgroup. Want it or not, people will want to use T[] and ~= to manage arrays. We could remove the ~= operator and require use of a library type for appends, but the aforementioned facts suggest that that might not be the best decision. One thing that Bartosz didn't mention is that "this unique" is different than "the other unique", which I think reveals the half-bakedness of the proposal. "The other unique" that was intensively discussed before is transitive, meaning that the root reference points to a graph of objects having no other connection to the outer world. "This unique" is very different because it's shallow - only the array chunk is unaliased, but the members of the array don't need to be. This is a very important distinction. Andreidsimcha Wrote:is unique, the compiler knows that there are no slices to worry about, and it can use in-place reallocation to its heart content. That pretty much solves the performance problem.The one thing that I think has been missing from this discussion is, what would be the alternative if we didn't have this "non-deterministic" reallocation? How else could you **efficiently** implement dynamic arrays?In the long run (D3), I proposed using the "unique" type modifier. If an arrayIn the short run (D2), I would suggest sticking to "reallocate on everyextension" semantics (especially in SafeD) and provide a library solution (a la C++ std::vector) where the performance of appending is an issue. Probably not a bad idea, since: 1. I've concluded that appending to slices simply can't be made both efficient and safe. 2. We'll probably get a std.collections anyhow. 3. If you **really** care about performance, you should only append when you don't know the length in advance. If you know the length, you should always pre-allocate.
Nov 17 2009
Andrei Alexandrescu, el 17 de noviembre a las 18:45 me escribiste:I didn't say anything (until now) because this was discussed already. Dynamic arrays/slices appending is horribly broken (I know it's well defined, and deterministic, but you are just condemned to make mistakes, it just doesn't work as one would expect, even when you know how it works you have to keep fighting your intuition all the time). But it's a little pointless to keep making riots, when we were so close to finally fix this with T[new]/whatever and it all went back. -- Leandro Lucarella (AKA luca) http://llucax.com.ar/ ---------------------------------------------------------------------- GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145 104C 949E BFB6 5F5A 8D05) ---------------------------------------------------------------------- Hey you, out there on your own Sitting naked by the phone Would you touch me?3. If you **really** care about performance, you should only append when you don't know the length in advance. If you know the length, you should always pre-allocate.We will have collections and all those good things, but I don't see how the proposal follows from the feedback. My perception is that this is a group of people who use D. Bartosz' concern didn't cause quite a riot, so as far as I can tell there is no big issue at stake.
Nov 18 2009
Leandro Lucarella wrote:Andrei Alexandrescu, el 17 de noviembre a las 18:45 me escribiste:In which ways do you think arrays horribly broken? Same as Bartosz mentions? One question is whether you actually have had bugs and problems coding, or if arrays stopped you from getting work done.I didn't say anything (until now) because this was discussed already. Dynamic arrays/slices appending is horribly broken (I know it's well defined, and deterministic, but you are just condemned to make mistakes, it just doesn't work as one would expect, even when you know how it works you have to keep fighting your intuition all the time).3. If you **really** care about performance, you should only append when you don't know the length in advance. If you know the length, you should always pre-allocate.We will have collections and all those good things, but I don't see how the proposal follows from the feedback. My perception is that this is a group of people who use D. Bartosz' concern didn't cause quite a riot, so as far as I can tell there is no big issue at stake.But it's a little pointless to keep making riots, when we were so close to finally fix this with T[new]/whatever and it all went back.Hmmmm... resignation is not a trait of this group :o). Andrei
Nov 18 2009
Andrei Alexandrescu Wrote:Leandro Lucarella wrote:What Leandro points out is that using D arrays is not straightforward and may present a steep learning curve. In particular, even the beginner would be required to understand section 4.1.9 of TDPL (Expanding). For many people arrays' behavior might be counter-intuitive and a source of mistakes. In fact, even after reading 4.1.9 I don't know what to expect in some cases. Here's an example: int[] a = [0]; auto b = a; a ~= 1; b ~= 2; What is a[1]? Is this considered "stomping" and requiring a re-allocation?Andrei Alexandrescu, el 17 de noviembre a las 18:45 me escribiste:In which ways do you think arrays horribly broken? Same as Bartosz mentions? One question is whether you actually have had bugs and problems coding, or if arrays stopped you from getting work done.I didn't say anything (until now) because this was discussed already. Dynamic arrays/slices appending is horribly broken (I know it's well defined, and deterministic, but you are just condemned to make mistakes, it just doesn't work as one would expect, even when you know how it works you have to keep fighting your intuition all the time).3. If you **really** care about performance, you should only append when you don't know the length in advance. If you know the length, you should always pre-allocate.We will have collections and all those good things, but I don't see how the proposal follows from the feedback. My perception is that this is a group of people who use D. Bartosz' concern didn't cause quite a riot, so as far as I can tell there is no big issue at stake.
Nov 18 2009
On Wed, 18 Nov 2009 13:21:49 -0500, Bartosz Milewski <bartosz-nospam relisoft.com> wrote:Andrei Alexandrescu Wrote:a[1] would be 1 if an MRU cache was introduced. This is exactly the sort of problem that the MRU cache is supposed to solve. What should happen is: a ~= 1 should allocate in place because its parameters were in the MRU cache. b ~= 1 should reallocate because the a~=1 should have removed its parameters from the MRU cache. I agree that arrays behave differently than what people naively expect. However, their usage is so much better than anything from other languages I've used. One of the biggest "problems" in understanding arrays I've found from having to explain them to other people is that they are half value type and half reference type. I'd say most of the confusion comes from that, but it's that feature that makes them so great IMO. I'm unsure whether introducing a different usage would leave them as great, but I hope we can find a way to make them intuitive *and* keep them as useful as they are now. The implementation-defined determinisim is probably another issue that bites people, but I think the cases are so much rarer. Generally when you are appending *and* manipulating the original data, you have a unique array. Otherwise your usage falls into either read-only + append usage, or buffer usage where you don't usually append. Data-stomping is a memory issue (and a const/invariant issue), so I think that needs the most immediate attention. If it can be solved without changing the syntax or system, then I think it's a win-win. We can always decide to use a new concept later if we want to fix the deterministic behavior. In fact, how hard would it be to introduce a deterministic construct *without* changing current behavior? I mean, if the end result of your proposal is you add something like std::vector, why not still allow appending to slices *and* add something like std::vector where people can use that if they want better behavior? -SteveLeandro Lucarella wrote:What Leandro points out is that using D arrays is not straightforward and may present a steep learning curve. In particular, even the beginner would be required to understand section 4.1.9 of TDPL (Expanding). For many people arrays' behavior might be counter-intuitive and a source of mistakes. In fact, even after reading 4.1.9 I don't know what to expect in some cases. Here's an example: int[] a = [0]; auto b = a; a ~= 1; b ~= 2; What is a[1]? Is this considered "stomping" and requiring a re-allocation?Andrei Alexandrescu, el 17 de noviembre a las 18:45 me escribiste:append when you3. If you **really** care about performance, you should onlyshould alwaysdon't know the length in advance. If you know the length, youmistakes,I didn't say anything (until now) because this was discussed already. Dynamic arrays/slices appending is horribly broken (I know it's well defined, and deterministic, but you are just condemned to makepre-allocate.We will have collections and all those good things, but I don't see how the proposal follows from the feedback. My perception is that this is a group of people who use D. Bartosz' concern didn't cause quite a riot, so as far as I can tell there is no big issue at stake.it just doesn't work as one would expect, even when you know how itworksyou have to keep fighting your intuition all the time).In which ways do you think arrays horribly broken? Same as Bartosz mentions? One question is whether you actually have had bugs and problems coding, or if arrays stopped you from getting work done.
Nov 19 2009
On Thu, 19 Nov 2009 14:56:40 +0300, Steven Schveighoffer <schveiguy yahoo.com> wrote:On Wed, 18 Nov 2009 13:21:49 -0500, Bartosz Milewski <bartosz-nospam relisoft.com> wrote:Slices are ranges. And ranges are *shrink-only*. Are you going to append to your data structure? Use an Array instead. Slices are not suitable for that purpose!Andrei Alexandrescu Wrote:a[1] would be 1 if an MRU cache was introduced. This is exactly the sort of problem that the MRU cache is supposed to solve. What should happen is: a ~= 1 should allocate in place because its parameters were in the MRU cache. b ~= 1 should reallocate because the a~=1 should have removed its parameters from the MRU cache. I agree that arrays behave differently than what people naively expect. However, their usage is so much better than anything from other languages I've used. One of the biggest "problems" in understanding arrays I've found from having to explain them to other people is that they are half value type and half reference type. I'd say most of the confusion comes from that, but it's that feature that makes them so great IMO. I'm unsure whether introducing a different usage would leave them as great, but I hope we can find a way to make them intuitive *and* keep them as useful as they are now. The implementation-defined determinisim is probably another issue that bites people, but I think the cases are so much rarer. Generally when you are appending *and* manipulating the original data, you have a unique array. Otherwise your usage falls into either read-only + append usage, or buffer usage where you don't usually append. Data-stomping is a memory issue (and a const/invariant issue), so I think that needs the most immediate attention. If it can be solved without changing the syntax or system, then I think it's a win-win. We can always decide to use a new concept later if we want to fix the deterministic behavior. In fact, how hard would it be to introduce a deterministic construct *without* changing current behavior? I mean, if the end result of your proposal is you add something like std::vector, why not still allow appending to slices *and* add something like std::vector where people can use that if they want better behavior? -SteveLeandro Lucarella wrote:What Leandro points out is that using D arrays is not straightforward and may present a steep learning curve. In particular, even the beginner would be required to understand section 4.1.9 of TDPL (Expanding). For many people arrays' behavior might be counter-intuitive and a source of mistakes. In fact, even after reading 4.1.9 I don't know what to expect in some cases. Here's an example: int[] a = [0]; auto b = a; a ~= 1; b ~= 2; What is a[1]? Is this considered "stomping" and requiring a re-allocation?Andrei Alexandrescu, el 17 de noviembre a las 18:45 me escribiste:append when you3. If you **really** care about performance, you should onlyshould alwaysdon't know the length in advance. If you know the length, youmistakes,I didn't say anything (until now) because this was discussed already. Dynamic arrays/slices appending is horribly broken (I know it's well defined, and deterministic, but you are just condemned to makepre-allocate.We will have collections and all those good things, but I don't see how the proposal follows from the feedback. My perception is that this is a group of people who use D. Bartosz' concern didn't cause quite a riot, so as far as I can tell there is no big issue at stake.it just doesn't work as one would expect, even when you know how itworksyou have to keep fighting your intuition all the time).In which ways do you think arrays horribly broken? Same as Bartosz mentions? One question is whether you actually have had bugs and problems coding, or if arrays stopped you from getting work done.
Nov 19 2009
On Thu, 19 Nov 2009 07:32:03 -0500, Denis Koroskin <2korden gmail.com> wrote:Slices are ranges. And ranges are *shrink-only*. Are you going to append to your data structure? Use an Array instead. Slices are not suitable for that purpose!Arrays are ranges too. Just because the range interface does not define operations that allow growth doesn't mean that something that implements the interface *can't* add growth methods. I see no problem with slices and arrays being the same type, as long as the stomping problem is solved. -Steve
Nov 19 2009
Steven Schveighoffer Wrote:Notice that you are using particular implementation detail (MRU cache) to explain the semantics of D arrays. There is a very important distinction between language specification and compiler implementation. Andrei already had to go pretty deep into implementation to describe arrays and slices: You can't define D arrays without talking about shared buffers and memory allocation. I don't think including the description of the MRU cache in the language specification is the right solution. But I'm afraid an abstract definition of "stomping" might turn out to be quite non-trivial.In fact, even after reading 4.1.9 I don't know what to expect in some cases. Here's an example: int[] a = [0]; auto b = a; a ~= 1; b ~= 2; What is a[1]? Is this considered "stomping" and requiring a re-allocation?a[1] would be 1 if an MRU cache was introduced. This is exactly the sort of problem that the MRU cache is supposed to solve. What should happen is: a ~= 1 should allocate in place because its parameters were in the MRU cache. b ~= 1 should reallocate because the a~=1 should have removed its parameters from the MRU cache.
Nov 19 2009
== Quote from Bartosz Milewski (bartosz-nospam relisoft.com)'s articleSteven Schveighoffer Wrote:explain the semantics of D arrays. There is a very important distinction between language specification and compiler implementation. Andrei already had to go pretty deep into implementation to describe arrays and slices: You can't define D arrays without talking about shared buffers and memory allocation. I don't think including the description of the MRU cache in the language specification is the right solution. But I'm afraid an abstract definition of "stomping" might turn out to be quite non-trivial. Face it, D is a close to the metal language. Any attempt to define the spec at a purely abstract level without reference to any implementation details, and without leaving certain details implementation defined is absurd. I think people need to face the fact that you can't have close-to-the-metal flexibility and performance and at the same time far-from-the-metal strict separation of specification from implementation details.Notice that you are using particular implementation detail (MRU cache) toIn fact, even after reading 4.1.9 I don't know what to expect in some cases. Here's an example: int[] a = [0]; auto b = a; a ~= 1; b ~= 2; What is a[1]? Is this considered "stomping" and requiring a re-allocation?a[1] would be 1 if an MRU cache was introduced. This is exactly the sort of problem that the MRU cache is supposed to solve. What should happen is: a ~= 1 should allocate in place because its parameters were in the MRU cache. b ~= 1 should reallocate because the a~=1 should have removed its parameters from the MRU cache.
Nov 19 2009
dsimcha, el 19 de noviembre a las 23:21 me escribiste:Face it, D is a close to the metal language. Any attempt to define the spec at a purely abstract level without reference to any implementation details, and without leaving certain details implementation defined is absurd. I think people need to face the fact that you can't have close-to-the-metal flexibility and performance and at the same time far-from-the-metal strict separation of specification from implementation details.Yes, but I think this is a place where you are doing early optimization at the expense of usability. You can always do the slice-thinguie yourself if you are that needed for speed that you can't afford an extra indirection when using an array (which will be the only overhead if T[new] were implemented as a pure reference type). I think the cost here in terms of usability if far higher than the little performance you loose, specially when you can overcome the issue for those weird cases where you need it. -- Leandro Lucarella (AKA luca) http://llucax.com.ar/ ---------------------------------------------------------------------- GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145 104C 949E BFB6 5F5A 8D05) ---------------------------------------------------------------------- Pa' ella cocinΓ©, pa' ella lavΓ©, pa' ella soΓ±e Paella completa, $2,50 Pero, la luz mala me tira, y yo? yo soy ligero pa'l trote La luz buena, estΓ‘ en el monte, allΓ‘ voy, al horizonte
Nov 19 2009
Leandro Lucarella wrote:dsimcha, el 19 de noviembre a las 23:21 me escribiste:I think one aspect that tends to be forgotten is that built-in arrays are the lowest abstraction - really just one small step above pointers. Efficiency lost there is lost forever. That's why I consider the current design adequate and the proposed designs less so. In other words: you can build a more convenient facility on top of T[], but you couldn't build a more efficient facility on top of a convenient one. I think D made the right choice with built-in arrays. AndreiFace it, D is a close to the metal language. Any attempt to define the spec at a purely abstract level without reference to any implementation details, and without leaving certain details implementation defined is absurd. I think people need to face the fact that you can't have close-to-the-metal flexibility and performance and at the same time far-from-the-metal strict separation of specification from implementation details.Yes, but I think this is a place where you are doing early optimization at the expense of usability. You can always do the slice-thinguie yourself if you are that needed for speed that you can't afford an extra indirection when using an array (which will be the only overhead if T[new] were implemented as a pure reference type). I think the cost here in terms of usability if far higher than the little performance you loose, specially when you can overcome the issue for those weird cases where you need it.
Nov 19 2009
Andrei Alexandrescu, el 19 de noviembre a las 17:13 me escribiste:Leandro Lucarella wrote:I don't think so, slices (a view over somebody elses memory, no appending) should be the lowest level abstraction.dsimcha, el 19 de noviembre a las 23:21 me escribiste:I think one aspect that tends to be forgotten is that built-in arrays are the lowest abstraction - really just one small step above pointers. Efficiency lost there is lost forever. That's why I consider the current design adequate and the proposed designs less so.Face it, D is a close to the metal language. Any attempt to define the spec at a purely abstract level without reference to any implementation details, and without leaving certain details implementation defined is absurd. I think people need to face the fact that you can't have close-to-the-metal flexibility and performance and at the same time far-from-the-metal strict separation of specification from implementation details.Yes, but I think this is a place where you are doing early optimization at the expense of usability. You can always do the slice-thinguie yourself if you are that needed for speed that you can't afford an extra indirection when using an array (which will be the only overhead if T[new] were implemented as a pure reference type). I think the cost here in terms of usability if far higher than the little performance you loose, specially when you can overcome the issue for those weird cases where you need it.In other words: you can build a more convenient facility on top of T[], but you couldn't build a more efficient facility on top of a convenient one. I think D made the right choice with built-in arrays.I don't. Current built-in arrays are both dynamic arrays and slices, which is what messes all up. But I'll drop the subject, once again, for the sake of... well, everybody ;) -- Leandro Lucarella (AKA luca) http://llucax.com.ar/ ---------------------------------------------------------------------- GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145 104C 949E BFB6 5F5A 8D05) ---------------------------------------------------------------------- SEΓOR BIELSA: CON TODO RESPETO ΒΏUSTED LO VE JUGAR A RIQUELME? -- CrΓ³nica TV
Nov 19 2009
Leandro Lucarella wrote:Andrei Alexandrescu, el 19 de noviembre a las 17:13 me escribiste:Well it also makes arrays/slices quite capable. Right now they only have one problem (which different people ascribe a different gravity to) and are otherwise very expressive. Now think of this: the MRU solves a grave issue with slices. We haven't thought of that for years, and all of a sudden stomping is not an issue anymore. I wouldn't be in the least surprised if someone posts in this group tomorrow that they found a way to make ~= more predictable.Leandro Lucarella wrote:I don't think so, slices (a view over somebody elses memory, no appending) should be the lowest level abstraction.dsimcha, el 19 de noviembre a las 23:21 me escribiste:I think one aspect that tends to be forgotten is that built-in arrays are the lowest abstraction - really just one small step above pointers. Efficiency lost there is lost forever. That's why I consider the current design adequate and the proposed designs less so.Face it, D is a close to the metal language. Any attempt to define the spec at a purely abstract level without reference to any implementation details, and without leaving certain details implementation defined is absurd. I think people need to face the fact that you can't have close-to-the-metal flexibility and performance and at the same time far-from-the-metal strict separation of specification from implementation details.Yes, but I think this is a place where you are doing early optimization at the expense of usability. You can always do the slice-thinguie yourself if you are that needed for speed that you can't afford an extra indirection when using an array (which will be the only overhead if T[new] were implemented as a pure reference type). I think the cost here in terms of usability if far higher than the little performance you loose, specially when you can overcome the issue for those weird cases where you need it.In other words: you can build a more convenient facility on top of T[], but you couldn't build a more efficient facility on top of a convenient one. I think D made the right choice with built-in arrays.I don't. Current built-in arrays are both dynamic arrays and slices, which is what messes all up.But I'll drop the subject, once again, for the sake of... well, everybody ;)I encourage you to further your point (while in understanding that your existing arguments and proposals have been understood). This is the best time to make a contribution to D's design. We may as well undefine "~" or "~=" or both for arrays if there's enough arguments and evidence that it's a problem to keep them. When making proposals just keep in mind one important thing: with the context that you (all) have accumulated regarding this discussion, a lot of things seem very easy to understand, which are really weird when just starting with the language. One other thing is that it's all tradeoffs; things that are "good" in all dimensions are in very short supply. With arrays, we may be in the position of choosing a solution that won't please everyone. The question right now is "which 5% of users you will disappoint". As for reviving T[new], I believe that, unless some new insight comes forward, practically that ship has sailed. That doesn't preclude defining a library type that is a more encapsulated array. Fortunately, that can be implemented using T[] because T[] is expressive enough :o). Andrei
Nov 19 2009
== Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s articleLeandro Lucarella wrote:spec at aAndrei Alexandrescu, el 19 de noviembre a las 17:13 me escribiste:Leandro Lucarella wrote:dsimcha, el 19 de noviembre a las 23:21 me escribiste:Face it, D is a close to the metal language. Any attempt to define thewithoutpurely abstract level without reference to any implementation details, andneed toleaving certain details implementation defined is absurd. I think peopleperformanceface the fact that you can't have close-to-the-metal flexibility andI'd be inclined to support the MRU (and even implement it) iff: 1. We still get a good library array type that pulls out all the stops to make appending efficient. (I'm putting my effort where my mouth is and working on writing one pursuant to my ideas about combining it with uniqueness right now, actually). 2. We state in the documentation that ~= on slices is a convenience that isn't terribly efficient and recommend that people that care more about performance than simplicity use the library type for their serious array building, especially for large arrays.Well it also makes arrays/slices quite capable. Right now they only have one problem (which different people ascribe a different gravity to) and are otherwise very expressive. Now think of this: the MRU solves a grave issue with slices. We haven't thought of that for years, and all of a sudden stomping is not an issue anymore. I wouldn't be in the least surprised if someone posts in this group tomorrow that they found a way to make ~= more predictable.I don't think so, slices (a view over somebody elses memory, no appending) should be the lowest level abstraction.I think one aspect that tends to be forgotten is that built-in arrays are the lowest abstraction - really just one small step above pointers. Efficiency lost there is lost forever. That's why I consider the current design adequate and the proposed designs less so.and at the same time far-from-the-metal strict separation of specification from implementation details.Yes, but I think this is a place where you are doing early optimization at the expense of usability. You can always do the slice-thinguie yourself if you are that needed for speed that you can't afford an extra indirection when using an array (which will be the only overhead if T[new] were implemented as a pure reference type). I think the cost here in terms of usability if far higher than the little performance you loose, specially when you can overcome the issue for those weird cases where you need it.In other words: you can build a more convenient facility on top of T[], but you couldn't build a more efficient facility on top of a convenient one. I think D made the right choice with built-in arrays.I don't. Current built-in arrays are both dynamic arrays and slices, which is what messes all up.But I'll drop the subject, once again, for the sake of... well, everybody ;)I encourage you to further your point (while in understanding that your existing arguments and proposals have been understood). This is the best time to make a contribution to D's design. We may as well undefine "~" or "~=" or both for arrays if there's enough arguments and evidence that it's a problem to keep them. When making proposals just keep in mind one important thing: with the context that you (all) have accumulated regarding this discussion, a lot of things seem very easy to understand, which are really weird when just starting with the language. One other thing is that it's all tradeoffs; things that are "good" in all dimensions are in very short supply. With arrays, we may be in the position of choosing a solution that won't please everyone. The question right now is "which 5% of users you will disappoint". As for reviving T[new], I believe that, unless some new insight comes forward, practically that ship has sailed. That doesn't preclude defining a library type that is a more encapsulated array. Fortunately, that can be implemented using T[] because T[] is expressive enough :o). Andrei
Nov 19 2009
dsimcha, el 20 de noviembre a las 03:35 me escribiste:2. We state in the documentation that ~= on slices is a convenience that isn't terribly efficient and recommend that people that care more about performance than simplicity use the library type for their serious array building, especially for large arrays.I think this is a really bad idea, it's so obvious dynamic arrays are broken when you have to say in the documentation "you have this operation, but don't use it because it's broken". -- Leandro Lucarella (AKA luca) http://llucax.com.ar/ ---------------------------------------------------------------------- GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145 104C 949E BFB6 5F5A 8D05) ---------------------------------------------------------------------- Si ella es la flor, yo soy la espina Si ella es quien florece, yo quien se marchita Y estamos en eclipse total, y estamos en eclipse total Completamente cruzados, completamente cruzados
Nov 20 2009
Andrei Alexandrescu, el 19 de noviembre a las 18:25 me escribiste:Leandro Lucarella wrote:Yes, capable of introducing bugs specially =)Andrei Alexandrescu, el 19 de noviembre a las 17:13 me escribiste:Well it also makes arrays/slices quite capable.Leandro Lucarella wrote:I don't think so, slices (a view over somebody elses memory, no appending) should be the lowest level abstraction.dsimcha, el 19 de noviembre a las 23:21 me escribiste:I think one aspect that tends to be forgotten is that built-in arrays are the lowest abstraction - really just one small step above pointers. Efficiency lost there is lost forever. That's why I consider the current design adequate and the proposed designs less so.Face it, D is a close to the metal language. Any attempt to define the spec at a purely abstract level without reference to any implementation details, and without leaving certain details implementation defined is absurd. I think people need to face the fact that you can't have close-to-the-metal flexibility and performance and at the same time far-from-the-metal strict separation of specification from implementation details.Yes, but I think this is a place where you are doing early optimization at the expense of usability. You can always do the slice-thinguie yourself if you are that needed for speed that you can't afford an extra indirection when using an array (which will be the only overhead if T[new] were implemented as a pure reference type). I think the cost here in terms of usability if far higher than the little performance you loose, specially when you can overcome the issue for those weird cases where you need it.In other words: you can build a more convenient facility on top of T[], but you couldn't build a more efficient facility on top of a convenient one. I think D made the right choice with built-in arrays.I don't. Current built-in arrays are both dynamic arrays and slices, which is what messes all up.Right now they only have one problem (which different people ascribe a different gravity to) and are otherwise very expressive.Nobody is judging the expressiveness of arrays, the bug-prone semantics is the problem.Now think of this: the MRU solves a grave issue with slices.You are using arrays and slices interchangeably on purpose, right? I hate you! =)We haven't thought of that for years, and all of a sudden stomping is not an issue anymore.And suddenly you are tying array implementation with the GC. You are making things worse, not better. This whole issue reminds me to this blog: http://thereifixedit.com/ =PI wouldn't be in the least surprised if someone posts in this group tomorrow that they found a way to make ~= more predictable.Split slices and arrays, add a capacity field to arrays, remove appending from slices. There.I keep replying just because I can't control myself, but I really think it's pointless, I have said (as many others) several times what are the problems and what would be a nice solution. You just find the solution unsuitable and think the cache is good enough.But I'll drop the subject, once again, for the sake of... well, everybody ;)I encourage you to further your point (while in understanding that your existing arguments and proposals have been understood). This is the best time to make a contribution to D's design.We may as well undefine "~" or "~=" or both for arrays if there's enough arguments and evidence that it's a problem to keep them.I think that would be very bad. For once, I thought "~" where out of discussion, because it always create a new temporal (like any other binary operator), so I can't see a problem with it. And I hope you are only seeing as a possibility to remove "~=" from slices, making possible for a library array type to define it. I can totally lived with that. If the array type were imported in object.d it would be even better.When making proposals just keep in mind one important thing: with the context that you (all) have accumulated regarding this discussion, a lot of things seem very easy to understand, which are really weird when just starting with the language.Please, name them! And BTW, understanding the MRU cache (and trying to predict it) will be any easier?One other thing is that it's all tradeoffs; things that are "good" in all dimensions are in very short supply.I know that! This discussion is just because some think one tradeoff is better than the other.With arrays, we may be in the position of choosing a solution that won't please everyone. The question right now is "which 5% of users you will disappoint".I really think more than 5% of the users will hit the weird half value half reference semantics of slices, but, of course, I'm making this figures up, just like you did ;)As for reviving T[new], I believe that, unless some new insight comes forward, practically that ship has sailed. That doesn't preclude defining a library type that is a more encapsulated array.I don't care about T[new], I'm fine with a library defined type. Just remove appending from slices (and make .length read-only).Fortunately, that can be implemented using T[] because T[] is expressive enough :o).... -- Leandro Lucarella (AKA luca) http://llucax.com.ar/ ---------------------------------------------------------------------- GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145 104C 949E BFB6 5F5A 8D05) ---------------------------------------------------------------------- Relax. I'll need some information first. Just the basic facts. Can you show me where it hurts?
Nov 20 2009
On Fri, 20 Nov 2009 04:13:42 +0300, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:Leandro Lucarella wrote:Not so true. T[] appends are slow, *very* slow. Whenever I need anything efficient, I always use an additional "size" field (and arr.length is used as capacity). This is tricky, error-prone and not too intuitive to newcomers. As such, it contradicts with all your statements above. And there is also a hack to make it work a bit faster: T[] arr; arr.length = capacity arr.length = 0; Much to my disgust, having no higher-level abstraction in language core only encourages writing code like this. TBH, I used to write code like above before, too, but not now. I slowly eliminate everything like this (as well as all the ~= operations) from the code I wrote. It's unpredictable, slow, and leads to hard-to-fix bugs. In other words, I disagree with your opinion that D made a right choice with built-in pseudo-arrays.dsimcha, el 19 de noviembre a las 23:21 me escribiste:I think one aspect that tends to be forgotten is that built-in arrays are the lowest abstraction - really just one small step above pointers. Efficiency lost there is lost forever. That's why I consider the current design adequate and the proposed designs less so. In other words: you can build a more convenient facility on top of T[], but you couldn't build a more efficient facility on top of a convenient one. I think D made the right choice with built-in arrays. AndreiFace it, D is a close to the metal language. Any attempt to define the spec at a purely abstract level without reference to any implementation details, and without leaving certain details implementation defined is absurd. I think people need to face the fact that you can't have close-to-the-metal flexibility and performance and at the same time far-from-the-metal strict separation of specification from implementation details.Yes, but I think this is a place where you are doing early optimization at the expense of usability. You can always do the slice-thinguie yourself if you are that needed for speed that you can't afford an extra indirection when using an array (which will be the only overhead if T[new] were implemented as a pure reference type). I think the cost here in terms of usability if far higher than the little performance you loose, specially when you can overcome the issue for those weird cases where you need it.
Nov 19 2009
Denis Koroskin wrote:On Fri, 20 Nov 2009 04:13:42 +0300, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:I was speaking somewhat anticipatively; the MRU is slated to accelerate append speed while also improving its safety. I have tried in vain to convince someone to look into implementing it.Leandro Lucarella wrote:Not so true. T[] appends are slow, *very* slow. Whenever I need anything efficient, I always use an additional "size" field (and arr.length is used as capacity). This is tricky, error-prone and not too intuitive to newcomers. As such, it contradicts with all your statements above.dsimcha, el 19 de noviembre a las 23:21 me escribiste:I think one aspect that tends to be forgotten is that built-in arrays are the lowest abstraction - really just one small step above pointers. Efficiency lost there is lost forever. That's why I consider the current design adequate and the proposed designs less so. In other words: you can build a more convenient facility on top of T[], but you couldn't build a more efficient facility on top of a convenient one. I think D made the right choice with built-in arrays. AndreiFace it, D is a close to the metal language. Any attempt to define the spec at a purely abstract level without reference to any implementation details, and without leaving certain details implementation defined is absurd. I think people need to face the fact that you can't have close-to-the-metal flexibility and performance and at the same time far-from-the-metal strict separation of specification from implementation details.Yes, but I think this is a place where you are doing early optimization at the expense of usability. You can always do the slice-thinguie yourself if you are that needed for speed that you can't afford an extra indirection when using an array (which will be the only overhead if T[new] were implemented as a pure reference type). I think the cost here in terms of usability if far higher than the little performance you loose, specially when you can overcome the issue for those weird cases where you need it.And there is also a hack to make it work a bit faster: T[] arr; arr.length = capacity arr.length = 0; Much to my disgust, having no higher-level abstraction in language core only encourages writing code like this.I hope that idiom will be put to rest soon.TBH, I used to write code like above before, too, but not now. I slowly eliminate everything like this (as well as all the ~= operations) from the code I wrote. It's unpredictable, slow, and leads to hard-to-fix bugs. In other words, I disagree with your opinion that D made a right choice with built-in pseudo-arrays.I really meant to say "will have made". Andrei
Nov 19 2009
Andrei Alexandrescu Wrote:I think one aspect that tends to be forgotten is that built-in arrays are the lowest abstraction - really just one small step above pointers. Efficiency lost there is lost forever. That's why I consider the current design adequate and the proposed designs less so.Let's look at your proposal from a slightly different perspective. First of all, a conforming implementation is free to do the re-allocation on every append. There are no complexity requirements specified. The controversy stems from the fact that the spec deliberately leaves a window for certain optimizations that might result in the array being write-shared after the append. In that case the spec dictates certain restrictions that are expressed informally as no-stomping. BTW, we can't currently evaluate the proposal until no-stomping is formally defined. Because of that, a program written for and tested on one conforming implementation may not work correctly on another, depending on what optimization strategy is picked. That will probably lead to implementations that use a compile-time switch to disable such optimizations. The bugs resulting from not understanding when write sharing is possible (and I'm afraid a lot of programmers will have problems following the spec in this area) will be subtle. They will also be "soft" (as opposed to "hard" bugs), and those are much harder to discover and track down. All these things have to be carefully weighed. We are in uncharted territory.
Nov 20 2009
On Thu, 19 Nov 2009 17:58:15 -0500, Bartosz Milewski <bartosz-nospam relisoft.com> wrote:Steven Schveighoffer Wrote:I haven't yet read all the other posts, so someone else may already have pointed this out but... Having an MRU cache makes it so you *don't* have to explain its semantics (or stomping). Currently there is a paragraph in the spec (complete with example) talking about how stomping can occur, so you can just remove that part. There is no need to talk about the MRU cache when talking about arrays, that's an implementation detail. I was pointing it out because you are already used to the current bad implementation :) I wouldn't even bring up the MRU cache in the book or the spec. You just say that you can append to an array and it may make a copy of the data if necessary. It's just like realloc, except safer. -SteveNotice that you are using particular implementation detail (MRU cache) to explain the semantics of D arrays. There is a very important distinction between language specification and compiler implementation. Andrei already had to go pretty deep into implementation to describe arrays and slices: You can't define D arrays without talking about shared buffers and memory allocation. I don't think including the description of the MRU cache in the language specification is the right solution. But I'm afraid an abstract definition of "stomping" might turn out to be quite non-trivial.In fact, even after reading 4.1.9 I don't know what to expect in some cases. Here's an example: int[] a = [0]; auto b = a; a ~= 1; b ~= 2; What is a[1]? Is this considered "stomping" and requiring a re-allocation?a[1] would be 1 if an MRU cache was introduced. This is exactly the sort of problem that the MRU cache is supposed to solve. What should happen is: a ~= 1 should allocate in place because its parameters were in the MRU cache. b ~= 1 should reallocate because the a~=1 should have removed its parameters from the MRU cache.
Nov 23 2009
Steven Schveighoffer, el 23 de noviembre a las 07:34 me escribiste:The thing is, with realloc() is less likely that you forget that the data can be copied because it returns the new pointer (that can be the same as the original pointer). And even in this case, I saw a lot of bugs related to realloc() misuse (and I made a couple myself). With slices is much worse. -- Leandro Lucarella (AKA luca) http://llucax.com.ar/ ---------------------------------------------------------------------- GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145 104C 949E BFB6 5F5A 8D05) ---------------------------------------------------------------------- Si le decΓa "Vamos al cine, rica" Me decΓa "Veamos una de Kusturica" Si le decΓa "Vamos a oler las flores" Me hablaba de Virginia Wolf y sus amores Me hizo mucho mal la cumbiera intelectualNotice that you are using particular implementation detail (MRU cache) to explain the semantics of D arrays. There is a very important distinction between language specification and compiler implementation. Andrei already had to go pretty deep into implementation to describe arrays and slices: You can't define D arrays without talking about shared buffers and memory allocation. I don't think including the description of the MRU cache in the language specification is the right solution. But I'm afraid an abstract definition of "stomping" might turn out to be quite non-trivial.I haven't yet read all the other posts, so someone else may already have pointed this out but... Having an MRU cache makes it so you *don't* have to explain its semantics (or stomping). Currently there is a paragraph in the spec (complete with example) talking about how stomping can occur, so you can just remove that part. There is no need to talk about the MRU cache when talking about arrays, that's an implementation detail. I was pointing it out because you are already used to the current bad implementation :) I wouldn't even bring up the MRU cache in the book or the spec. You just say that you can append to an array and it may make a copy of the data if necessary. It's just like realloc, except safer.
Nov 23 2009
On Mon, 23 Nov 2009 11:10:48 -0500, Leandro Lucarella <llucax gmail.com> wrote:Steven Schveighoffer, el 23 de noviembre a las 07:34 me escribiste:realloc is worse. If I have multiple aliases to the same data, then I realloc one of those, the others all can become dangling pointers if the runtime decides to move the data. You also cannot realloc data that's not malloc'd but you can append to a slice of non-heap data without issue. No matter what you do with slice appending in D, you cannot access dangling pointers unless you access the slice's ptr field. Yes, you can run into trouble if you append to a slice and then change the original data in the slice, but that is a rare event, and it won't result in corrupting memory you didn't already have access to (at least, with the MRU cache). -SteveThe thing is, with realloc() is less likely that you forget that the data can be copied because it returns the new pointer (that can be the same as the original pointer). And even in this case, I saw a lot of bugs related to realloc() misuse (and I made a couple myself). With slices is much worse.Notice that you are using particular implementation detail (MRU cache) to explain the semantics of D arrays. There is a very important distinction between language specification and compiler implementation. Andrei already had to go pretty deep into implementation to describe arrays and slices: You can't define D arrays without talking about shared buffers and memory allocation. I don't think including the description of the MRU cache in the language specification is the right solution. But I'm afraid an abstract definition of "stomping" might turn out to be quite non-trivial.I haven't yet read all the other posts, so someone else may already have pointed this out but... Having an MRU cache makes it so you *don't* have to explain its semantics (or stomping). Currently there is a paragraph in the spec (complete with example) talking about how stomping can occur, so you can just remove that part. There is no need to talk about the MRU cache when talking about arrays, that's an implementation detail. I was pointing it out because you are already used to the current bad implementation :) I wouldn't even bring up the MRU cache in the book or the spec. You just say that you can append to an array and it may make a copy of the data if necessary. It's just like realloc, except safer.
Nov 23 2009
Steven Schveighoffer, el 23 de noviembre a las 15:18 me escribiste:On Mon, 23 Nov 2009 11:10:48 -0500, Leandro Lucarella <llucax gmail.com> wrote:Well, you are comparing GC vs no-GC, not realloc() vs. slices. I have no intention on following that discussion, I'm just saying that realloc() is less error-prone than slices (and realloc() is error-prone already).Steven Schveighoffer, el 23 de noviembre a las 07:34 me escribiste:realloc is worse. If I have multiple aliases to the same data, then I realloc one of those, the others all can become dangling pointers if the runtime decides to move the data.The thing is, with realloc() is less likely that you forget that the data can be copied because it returns the new pointer (that can be the same as the original pointer). And even in this case, I saw a lot of bugs related to realloc() misuse (and I made a couple myself). With slices is much worse.Notice that you are using particular implementation detail (MRU cache) to explain the semantics of D arrays. There is a very important distinction between language specification and compiler implementation. Andrei already had to go pretty deep into implementation to describe arrays and slices: You can't define D arrays without talking about shared buffers and memory allocation. I don't think including the description of the MRU cache in the language specification is the right solution. But I'm afraid an abstract definition of "stomping" might turn out to be quite non-trivial.I haven't yet read all the other posts, so someone else may already have pointed this out but... Having an MRU cache makes it so you *don't* have to explain its semantics (or stomping). Currently there is a paragraph in the spec (complete with example) talking about how stomping can occur, so you can just remove that part. There is no need to talk about the MRU cache when talking about arrays, that's an implementation detail. I was pointing it out because you are already used to the current bad implementation :) I wouldn't even bring up the MRU cache in the book or the spec. You just say that you can append to an array and it may make a copy of the data if necessary. It's just like realloc, except safer.You also cannot realloc data that's not malloc'd but you can append to a slice of non-heap data without issue.How is that? AFAIK slices uses gc_realloc() to do the actual realloc, if that's done in a piece of malloc'ed data it will fail. And even if it were true, I don't really see this as a big source of bugs, I really never had a bug because I tried to realloc() a non-heap piece of memory or appending to a slice of non-GC-heap memory either.No matter what you do with slice appending in D, you cannot access dangling pointers unless you access the slice's ptr field.Again, that's only because D is GCed, not because slices are great.Yes, you can run into trouble if you append to a slice and then change the original data in the slice, but that is a rare event, and it won't result in corrupting memory you didn't already have access to (at least, with the MRU cache).I'm a little lost now. I don't know of what hypothetical D are you talking about. I can't see how the MRU cache can provide any safety. The cache is finite, and not all the slices will fit in it, so for those slices that are not cached, I don't see how the cache can provide any safety. Safety can be provided if a ~= b is defined to be semantically the same as a = a ~ b, and leaving the MRU cache as an optimization. In that case we agree slices are predictable and a little safer (because stomping is not possible). But they are still error prone if you expect them to be a full reference type. Being the only entity in D with such semantics, is something one can forget very easily and introduce subtle bugs. In this case, I really think providing ~= is a bad idea, it's just too error prone and doesn't give you anything. I still think the best is to just make slices immutable value types (you can mutate the data they point to, you just can't modify slices; ptr and length), and provide a proper dynamic array type in the library or something. -- Leandro Lucarella (AKA luca) http://llucax.com.ar/ ---------------------------------------------------------------------- GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145 104C 949E BFB6 5F5A 8D05) ---------------------------------------------------------------------- The biggest lie you can tell yourself is When I get what I want I will be happy
Nov 23 2009
On Mon, 23 Nov 2009 18:34:48 -0500, Leandro Lucarella <llucax gmail.com> wrote:Steven Schveighoffer, el 23 de noviembre a las 15:18 me escribiste:I was originally saying that realloc is similar to appending because it may or may not choose to move the data. The only difference from realloc in slices is that your other aliases to the old data don't become invalidated, so there are no dangling pointers. The GC is partly the cause for that, but also the fact that slices don't deallocate the original data (that would be possible, even in a GC'd environment). Your assertion that realloc is less error prone doesn't make any sense to me. realloc is as error prone as appending slices *and* it can also leave dangling pointers.On Mon, 23 Nov 2009 11:10:48 -0500, Leandro Lucarella <llucax gmail.com> wrote:Well, you are comparing GC vs no-GC, not realloc() vs. slices. I have no intention on following that discussion, I'm just saying that realloc() is less error-prone than slices (and realloc() is error-prone already).The thing is, with realloc() is less likely that you forget that thedatacan be copied because it returns the new pointer (that can be the sameasthe original pointer). And even in this case, I saw a lot of bugsrelatedto realloc() misuse (and I made a couple myself). With slices is much worse.realloc is worse. If I have multiple aliases to the same data, then I realloc one of those, the others all can become dangling pointers if the runtime decides to move the data.I believe step 1 of the append process is to find the GC heap block that contains the slice. If it doesn't, it simply copies the data to a new block. If the original data is stack-allocated or allocated outside the GC, no errors occur AFAIK.You also cannot realloc data that's not malloc'd but you can append to a slice of non-heap data without issue.How is that? AFAIK slices uses gc_realloc() to do the actual realloc, if that's done in a piece of malloc'ed data it will fail.And even if it were true, I don't really see this as a big source of bugs, I really never had a bug because I tried to realloc() a non-heap piece of memory or appending to a slice of non-GC-heap memory either.You probably didn't have such bugs because you probably didn't think of doing this in C: void foo(char *arg) { realloc(arg, 10000); } Oops, what if arg isn't malloc'd? But in D this is safe code, and works as you expect, no matter what size arg was to begin with. In fact, tango uses this all the time for buffer optimization. Basically, you pass in a stack buffer, if it's big enough, you save the penalty of heap allocation. If it's not, no big deal, just resize the buffer and the GC takes care of the rest. void foo(char[] arg) { arg.length = 10000; }We are comparing realloc to slices. You assert slices are worse and I gave you a reason why they are not. You can't just ignore that because comparing slices to realloc requires taking the GC into account.No matter what you do with slice appending in D, you cannot access dangling pointers unless you access the slice's ptr field.Again, that's only because D is GCed, not because slices are great.Here's the deal. You have two very common operations on an array: appending and modification. In practice you either append data or you modify data. Rarely do you append data *and* modify the original data, and the only case I've seen for that is for the buffer optimization trick above. This is the only case where slices might get you "non-deterministic" behavior (and that's only if you still want to use the original array before appending). I have no proof of this except for my experience reading tango code. The two major use cases for appending are, building an array of items using append, and using an array as a buffer. In the first case, you start out with an empty or fully owned array, so no harm. In the second case, there are actually two types of functions that accept buffers, ones that return a slice of the buffer, and ones which don't. The ones that return a slice, you use the slice, not the original buffer. The ones which don't return a slice, you use the original buffer if you expect important data to be there. I don't think there's ever a case where you see a function that takes an array append to the array, then modify the original part of the array *without* returning the result.Yes, you can run into trouble if you append to a slice and then change the original data in the slice, but that is a rare event, and it won't result in corrupting memory you didn't already have access to (at least, with the MRU cache).I'm a little lost now. I don't know of what hypothetical D are you talking about.I can't see how the MRU cache can provide any safety. The cache is finite, and not all the slices will fit in it, so for those slices that are not cached, I don't see how the cache can provide any safety.In those cases, the slice is reallocated because the runtime isn't sure whether it will result in stomping. The MRU cache is an optimization which defaults to reallocation for safety. Andrei has mentioned that he thinks we can store the allocated length in the GC block, which I think would also work. You also wouldn't need an MRU cache in that case, but he says it's in *addition* to the MRU cache, so I'm not sure what he means.Safety can be provided if a ~= b is defined to be semantically the same as a = a ~ b, and leaving the MRU cache as an optimization.Yes, exactly. That is what the MRU cache does. It's no good if the current behavior becomes the fallback, the fallback must be a reallocation.In that case we agree slices are predictable and a little safer (because stomping is not possible). But they are still error prone if you expect them to be a full reference type. Being the only entity in D with such semantics, is something one can forget very easily and introduce subtle bugs. In this case, I really think providing ~= is a bad idea, it's just too error prone and doesn't give you anything.Providing the simple syntax for simple cases is what this does. I think it's worth having in the language. Let's all face it: slices and arrays being the same thing is a new concept that most of us have only seen in D (I don't know many languages, but probably it's been done before). And the hybrid value/reference nature of a slice is definitely new to me. I think it provides great power and flexibility that makes code just work the way you want it to *in most cases*. If you read a lot of the tango code, you see how elegant things can be with slices. But it's always going to be confusing to newcomers. It's like explaining pointers for the first time to someone.I still think the best is to just make slices immutable value types (you can mutate the data they point to, you just can't modify slices; ptr and length), and provide a proper dynamic array type in the library or something.I think you mean slices should only be *shrinkable*, immutable makes no sense, you need to be able to rebind a slice. I don't see the harm in allowing appending as long as it's safe. Having a separate type in the library that optimizes appending and has complete reference semantics is fine with me, just leave slices and arrays alone (well, except the fix for stomping). They *can* live together in the same language. -Steve
Nov 24 2009
Steven Schveighoffer wrote: [snip]Andrei has mentioned that he thinks we can store the allocated length in the GC block, which I think would also work. You also wouldn't need an MRU cache in that case, but he says it's in *addition* to the MRU cache, so I'm not sure what he means.[snip] Reaching the GC block is relatively expensive, so the MRU still helps. In essence it's like this. When appending: a) look up the cache, if there, you have O(1) amortized append that's really fast b) if not in the cache, talk to the GC, still O(1) amortized append that's not as fast Both help providing an important performance guarantee. I was a bit worried about guaranteeing "O(1) amortized append for up to 8 arrays at a time." Andrei
Nov 24 2009
On Tue, 24 Nov 2009 11:01:10 -0500, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:Steven Schveighoffer wrote: [snip]Have you considered the performance impact on allocating non-array types? That is, are you intending on all allocations storing the allocated length, even class or struct allocations who will likely never append? Or will there be a "non-appendable" flag? Also, the part without the MRU cache was my original proposal from last year, I had some ideas on how length could be stored. For example, in a page of up to 128 byte blocks, you only need 8 bits to store the length (alas, you cannot store with 4 bits for 16-byte blocks because you need to cover both 0 and 16). This reduces the overhead for those blocks. For 256 byte to 1-page blocks, 16 bits is acceptable multi-page blocks, the cost of storing a 32-bit integer is negligible. It is true the lookup of the MRU cache will not involve dissecting the address of the block to find it's container block, but you still will need a lock, or are you planning on doing something clever? I think the locking will be the bottleneck, and if you don't make it the same as the global lock, add the cost of both locks when you actually need to append. -SteveAndrei has mentioned that he thinks we can store the allocated length in the GC block, which I think would also work. You also wouldn't need an MRU cache in that case, but he says it's in *addition* to the MRU cache, so I'm not sure what he means.[snip] Reaching the GC block is relatively expensive, so the MRU still helps. In essence it's like this. When appending: a) look up the cache, if there, you have O(1) amortized append that's really fast b) if not in the cache, talk to the GC, still O(1) amortized append that's not as fast Both help providing an important performance guarantee. I was a bit worried about guaranteeing "O(1) amortized append for up to 8 arrays at a time."
Nov 24 2009
Steven Schveighoffer wrote:On Tue, 24 Nov 2009 11:01:10 -0500, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:Having access to the requested length is important at larger lengths, so probably we could be fine by not actually storing it for allocations up to 128 bytes.Steven Schveighoffer wrote: [snip]Have you considered the performance impact on allocating non-array types? That is, are you intending on all allocations storing the allocated length, even class or struct allocations who will likely never append? Or will there be a "non-appendable" flag? Also, the part without the MRU cache was my original proposal from last year, I had some ideas on how length could be stored. For example, in a page of up to 128 byte blocks, you only need 8 bits to store the length (alas, you cannot store with 4 bits for 16-byte blocks because you need to cover both 0 and 16). This reduces the overhead for those blocks. For 256 byte to 1-page blocks, 16 bits is acceptable multi-page blocks, the cost of storing a 32-bit integer is negligible.Andrei has mentioned that he thinks we can store the allocated length in the GC block, which I think would also work. You also wouldn't need an MRU cache in that case, but he says it's in *addition* to the MRU cache, so I'm not sure what he means.[snip] Reaching the GC block is relatively expensive, so the MRU still helps. In essence it's like this. When appending: a) look up the cache, if there, you have O(1) amortized append that's really fast b) if not in the cache, talk to the GC, still O(1) amortized append that's not as fast Both help providing an important performance guarantee. I was a bit worried about guaranteeing "O(1) amortized append for up to 8 arrays at a time."It is true the lookup of the MRU cache will not involve dissecting the address of the block to find it's container block, but you still will need a lock, or are you planning on doing something clever? I think the locking will be the bottleneck, and if you don't make it the same as the global lock, add the cost of both locks when you actually need to append.The cache is a thread-local map from pointers to size_t. Using it does not require any locking I think. Andrei
Nov 24 2009
On Tue, 24 Nov 2009 11:26:11 -0500, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:Steven Schveighoffer wrote:So in other words, appending any array under 128 bytes which does not still sit in the MRU cache causes a reallocation? I'm not sure if it's the right move, but it is an optimization, so maybe it works well. It also handily fixes the problem of requiring allocated lengths for non-array allocations since most would be under 128 bytes.On Tue, 24 Nov 2009 11:01:10 -0500, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:Having access to the requested length is important at larger lengths, so probably we could be fine by not actually storing it for allocations up to 128 bytes.Steven Schveighoffer wrote: [snip]Have you considered the performance impact on allocating non-array types? That is, are you intending on all allocations storing the allocated length, even class or struct allocations who will likely never append? Or will there be a "non-appendable" flag? Also, the part without the MRU cache was my original proposal from last year, I had some ideas on how length could be stored. For example, in a page of up to 128 byte blocks, you only need 8 bits to store the length (alas, you cannot store with 4 bits for 16-byte blocks because you need to cover both 0 and 16). This reduces the overhead for those blocks. For 256 byte to 1-page blocks, 16 bits is acceptable multi-page blocks, the cost of storing a 32-bit integer is negligible.Andrei has mentioned that he thinks we can store the allocated length in the GC block, which I think would also work. You also wouldn't need an MRU cache in that case, but he says it's in *addition* to the MRU cache, so I'm not sure what he means.[snip] Reaching the GC block is relatively expensive, so the MRU still helps. In essence it's like this. When appending: a) look up the cache, if there, you have O(1) amortized append that's really fast b) if not in the cache, talk to the GC, still O(1) amortized append that's not as fast Both help providing an important performance guarantee. I was a bit worried about guaranteeing "O(1) amortized append for up to 8 arrays at a time."When reallocating, do you not also need to update the allocated length in the heap, even if the allocation fits into the same block to prevent other threads from stomping? Doesn't that require a lock? Somewhere you have to share the allocated length between threads sharing the same array. I assumed it was going to be in the MRU cache... -SteveIt is true the lookup of the MRU cache will not involve dissecting the address of the block to find it's container block, but you still will need a lock, or are you planning on doing something clever? I think the locking will be the bottleneck, and if you don't make it the same as the global lock, add the cost of both locks when you actually need to append.The cache is a thread-local map from pointers to size_t. Using it does not require any locking I think.
Nov 24 2009
On Tue, 24 Nov 2009 11:41:17 -0500, Steven Schveighoffer <schveiguy yahoo.com> wrote:If all this is true, I had another thought for an MRU cache. It could simply be a map of pointers to the right blocks in the GC that store the lengths (to avoid the block search with the global mutex locked). Lookup should be atomic without locking (I think, simply an integer load right?), but you'd have to lock to actually do an append. I don't think we solve the lock problem without having multiple heaps... -SteveThe cache is a thread-local map from pointers to size_t. Using it does not require any locking I think.When reallocating, do you not also need to update the allocated length in the heap, even if the allocation fits into the same block to prevent other threads from stomping? Doesn't that require a lock? Somewhere you have to share the allocated length between threads sharing the same array. I assumed it was going to be in the MRU cache...
Nov 24 2009
Steven Schveighoffer wrote:On Tue, 24 Nov 2009 11:41:17 -0500, Steven Schveighoffer <schveiguy yahoo.com> wrote:That's a great idea. I keep on dreaming someone will implement all this stuff we talk about :o).If all this is true, I had another thought for an MRU cache. It could simply be a map of pointers to the right blocks in the GC that store the lengths (to avoid the block search with the global mutex locked).The cache is a thread-local map from pointers to size_t. Using it does not require any locking I think.When reallocating, do you not also need to update the allocated length in the heap, even if the allocation fits into the same block to prevent other threads from stomping? Doesn't that require a lock? Somewhere you have to share the allocated length between threads sharing the same array. I assumed it was going to be in the MRU cache...Lookup should be atomic without locking (I think, simply an integer load right?), but you'd have to lock to actually do an append. I don't think we solve the lock problem without having multiple heaps...I don't think we need to worry about optimizing growth of shared arrays. Andrei
Nov 24 2009
== Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s articleSteven Schveighoffer wrote:But if the full semantics of shared are not implemented for D2 (will they be), how do we even detect shared arrays?On Tue, 24 Nov 2009 11:41:17 -0500, Steven Schveighoffer <schveiguy yahoo.com> wrote:That's a great idea. I keep on dreaming someone will implement all this stuff we talk about :o).If all this is true, I had another thought for an MRU cache. It could simply be a map of pointers to the right blocks in the GC that store the lengths (to avoid the block search with the global mutex locked).The cache is a thread-local map from pointers to size_t. Using it does not require any locking I think.When reallocating, do you not also need to update the allocated length in the heap, even if the allocation fits into the same block to prevent other threads from stomping? Doesn't that require a lock? Somewhere you have to share the allocated length between threads sharing the same array. I assumed it was going to be in the MRU cache...Lookup should be atomic without locking (I think, simply an integer load right?), but you'd have to lock to actually do an append. I don't think we solve the lock problem without having multiple heaps...I don't think we need to worry about optimizing growth of shared arrays. Andrei
Nov 24 2009
On Tue, 24 Nov 2009 12:46:37 -0500, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:Steven Schveighoffer wrote:What about immutable arrays? Can't those be shared without being marked as shared? -SteveOn Tue, 24 Nov 2009 11:41:17 -0500, Steven Schveighoffer <schveiguy yahoo.com> wrote:That's a great idea. I keep on dreaming someone will implement all this stuff we talk about :o).If all this is true, I had another thought for an MRU cache. It could simply be a map of pointers to the right blocks in the GC that store the lengths (to avoid the block search with the global mutex locked).The cache is a thread-local map from pointers to size_t. Using it does not require any locking I think.When reallocating, do you not also need to update the allocated length in the heap, even if the allocation fits into the same block to prevent other threads from stomping? Doesn't that require a lock? Somewhere you have to share the allocated length between threads sharing the same array. I assumed it was going to be in the MRU cache...Lookup should be atomic without locking (I think, simply an integer load right?), but you'd have to lock to actually do an append. I don't think we solve the lock problem without having multiple heaps...I don't think we need to worry about optimizing growth of shared arrays.
Nov 24 2009
On Tue, 24 Nov 2009 12:46:37 -0500, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:Steven Schveighoffer wrote:Note this also solves the problem of having to flush the MRU cache on a collection -- the same allocated length property should remain attached to that block. :) -SteveOn Tue, 24 Nov 2009 11:41:17 -0500, Steven Schveighoffer <schveiguy yahoo.com> wrote:That's a great idea. I keep on dreaming someone will implement all this stuff we talk about :o).If all this is true, I had another thought for an MRU cache. It could simply be a map of pointers to the right blocks in the GC that store the lengths (to avoid the block search with the global mutex locked).The cache is a thread-local map from pointers to size_t. Using it does not require any locking I think.When reallocating, do you not also need to update the allocated length in the heap, even if the allocation fits into the same block to prevent other threads from stomping? Doesn't that require a lock? Somewhere you have to share the allocated length between threads sharing the same array. I assumed it was going to be in the MRU cache...
Nov 24 2009
On Tue, 24 Nov 2009 12:46:37 -0500, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:Steven Schveighoffer wrote:This doesn't compute. The heap is shared, you need to store the allocated length in the heap, you need to lock on every allocate, even for non-shared arrays. What am I missing? Can you update *parts* of the heap without locking? If so, why do we need an MRU cache? -SteveLookup should be atomic without locking (I think, simply an integer load right?), but you'd have to lock to actually do an append. I don't think we solve the lock problem without having multiple heaps...I don't think we need to worry about optimizing growth of shared arrays.
Nov 24 2009
It's hard to argue whether something is or isn't bug prone and how severe the bugs could be. An experienced programmer knows all the gotchas so he or she will never introduce such bugs. A more bug-prone person will be afraid to speak up in public so as not to appear stupid or be bullied away. It's only when the language goes into production that the problems surface. Then an experienced programmer becomes part of a group or not so experienced programmers and he or she starts seeing how much the rests of the team is prone to certain bugs and how hard it is to prevent or even detect them. That's why the best languages attempt to turn as many errors as possible into hard bugs. Non-deterministic behavior is a major source of soft bugs. A buggy program might work correctly 99% of the time, introduce minor errors 0.9% of the time, and in 0.1% of the time lead to a disaster. I will try to come up with some scenarios that would qualify for the "100 Gotchas with D arrays" to provide more food for though. Consider these little language puzzles. Imagine you're reviewing this code written by a relative newbee: char[] quote(char[] word) { word.length += 2; word[length-1] = '"'; for(int i = word.length-2; i > 0; --i) word[i] = word[i - 1]; word[0] = '"'; return word; } void storeFoos(char[] buffer, string pattern, Container c) { char[] res = find(buffer, pattern); c.store(quote(res[0..pattern.len]); } Is this buggy? If so, how and when will the bug manifest itself? Or another story. An enthusiastic programmer comes to you with a very clever rewrite of a function. This is the original: T[] duplicate(T)(T[] src) { static if (is(T == invariant)) return src.idup; else return src.dup; } This is his improved version: T[] duplicate(T)(T[] src) { T tmp = src[$-1]; src.length -= 1; src ~= tmp; return src; } No need for static if! The extension is bound to re-allocate because of stomping rules. Will this always work, or is there a gotcha?
Nov 24 2009
On Tue, 24 Nov 2009 15:00:11 -0500, Bartosz Milewski <bartosz-nospam relisoft.com> wrote:It's hard to argue whether something is or isn't bug prone and how severe the bugs could be. An experienced programmer knows all the gotchas so he or she will never introduce such bugs. A more bug-prone person will be afraid to speak up in public so as not to appear stupid or be bullied away. It's only when the language goes into production that the problems surface. Then an experienced programmer becomes part of a group or not so experienced programmers and he or she starts seeing how much the rests of the team is prone to certain bugs and how hard it is to prevent or even detect them. That's why the best languages attempt to turn as many errors as possible into hard bugs.Yes, but there is a very fuzzy line between preventing all bugs and preventing useful code. The safest most secure system is one that has no inputs or outputs.Non-deterministic behavior is a major source of soft bugs. A buggy program might work correctly 99% of the time, introduce minor errors 0.9% of the time, and in 0.1% of the time lead to a disaster. I will try to come up with some scenarios that would qualify for the "100 Gotchas with D arrays" to provide more food for though. Consider these little language puzzles. Imagine you're reviewing this code written by a relative newbee: char[] quote(char[] word) { word.length += 2; word[length-1] = '"'; for(int i = word.length-2; i > 0; --i) word[i] = word[i - 1]; word[0] = '"'; return word; } void storeFoos(char[] buffer, string pattern, Container c) { char[] res = find(buffer, pattern); c.store(quote(res[0..pattern.len]); } Is this buggy? If so, how and when will the bug manifest itself?In my mind it is not buggy. If quote or storeFoos is not meant to be able to molest the data, it should require a const(char)[]. If this was D1 code, I'd say it depends on the purpose of storeFoos, if storeFoos is expecting to own the input buffer, it's not buggy. If it's not, then storeFoos is buggy (in all cases). But we aren't talking about D1.Or another story. An enthusiastic programmer comes to you with a very clever rewrite of a function. This is the original: T[] duplicate(T)(T[] src) { static if (is(T == invariant)) return src.idup; else return src.dup; } This is his improved version: T[] duplicate(T)(T[] src) { T tmp = src[$-1]; src.length -= 1; src ~= tmp; return src; } No need for static if! The extension is bound to re-allocate because of stomping rules. Will this always work, or is there a gotcha?He forgot to check for length 0. Also, you are a bit generous on your use of 'clever' :) -Steve
Nov 24 2009
Steven Schveighoffer Wrote:Yes, a lot depends on what's expected of storeFoos. Suppose that the surrounding code expects the buffer not to be modified. The newbee didn't want to idup the buffer because in most buffers the pattern is not found.Imagine you're reviewing this code written by a relative newbee: char[] quote(char[] word) { word.length += 2; word[length-1] = '"'; for(int i = word.length-2; i > 0; --i) word[i] = word[i - 1]; word[0] = '"'; return word; } void storeFoos(char[] buffer, string pattern, Container c) { char[] res = find(buffer, pattern); c.store(quote(res[0..pattern.len]); } Is this buggy? If so, how and when will the bug manifest itself?In my mind it is not buggy. If quote or storeFoos is not meant to be able to molest the data, it should require a const(char)[]. If this was D1 code, I'd say it depends on the purpose of storeFoos, if storeFoos is expecting to own the input buffer, it's not buggy. If it's not, then storeFoos is buggy (in all cases). But we aren't talking about D1.That he did. So now he goes back to the drawing board and adds an early return for length 0. Is his code correct now?Or another story. An enthusiastic programmer comes to you with a very clever rewrite of a function. This is the original: T[] duplicate(T)(T[] src) { static if (is(T == invariant)) return src.idup; else return src.dup; } This is his improved version: T[] duplicate(T)(T[] src) { T tmp = src[$-1]; src.length -= 1; src ~= tmp; return src; } No need for static if! The extension is bound to re-allocate because of stomping rules. Will this always work, or is there a gotcha?He forgot to check for length 0.
Nov 24 2009
On Tue, 24 Nov 2009 16:52:52 -0500, Bartosz Milewski <bartosz-nospam relisoft.com> wrote:Steven Schveighoffer Wrote:Then he forgot the const decoration for the parameter. No duping is necessary: void storeFoos(const(char)[] buffer, string pattern, Container c) { ... // will result in compiler errors for given quote function, fix that. }Yes, a lot depends on what's expected of storeFoos. Suppose that the surrounding code expects the buffer not to be modified. The newbee didn't want to idup the buffer because in most buffers the pattern is not found.Imagine you're reviewing this code written by a relative newbee: char[] quote(char[] word) { word.length += 2; word[length-1] = '"'; for(int i = word.length-2; i > 0; --i) word[i] = word[i - 1]; word[0] = '"'; return word; } void storeFoos(char[] buffer, string pattern, Container c) { char[] res = find(buffer, pattern); c.store(quote(res[0..pattern.len]); } Is this buggy? If so, how and when will the bug manifest itself?In my mind it is not buggy. If quote or storeFoos is not meant to be able to molest the data, it should require a const(char)[]. If this was D1 code, I'd say it depends on the purpose of storeFoos, if storeFoos is expecting to own the input buffer, it's not buggy. If it's not, then storeFoos is buggy (in all cases). But we aren't talking about D1.Yes. I have a feeling you have some trick up your sleeve that you think makes it incorrect :) -SteveThat he did. So now he goes back to the drawing board and adds an early return for length 0. Is his code correct now?Or another story. An enthusiastic programmer comes to you with a very clever rewrite of a function. This is the original: T[] duplicate(T)(T[] src) { static if (is(T == invariant)) return src.idup; else return src.dup; } This is his improved version: T[] duplicate(T)(T[] src) { T tmp = src[$-1]; src.length -= 1; src ~= tmp; return src; } No need for static if! The extension is bound to re-allocate becauseofstomping rules. Will this always work, or is there a gotcha?He forgot to check for length 0.
Nov 24 2009
Steven Schveighoffer Wrote:On Tue, 24 Nov 2009 16:52:52 -0500, Bartosz Milewski <bartosz-nospam relisoft.com> wrote:Well, he tried that, but then his quote function wouldn't work and he needs it in other places as well. You know how they are ;-). Besides he thinks "const" is for wimps. He won't stop arguing until you prove to him that there is an actual bug in his code.Steven Schveighoffer Wrote:Then he forgot the const decoration for the parameter. No duping is necessary: void storeFoos(const(char)[] buffer, string pattern, Container c) { ... // will result in compiler errors for given quote function, fix that. }Yes, a lot depends on what's expected of storeFoos. Suppose that the surrounding code expects the buffer not to be modified. The newbee didn't want to idup the buffer because in most buffers the pattern is not found.Imagine you're reviewing this code written by a relative newbee: char[] quote(char[] word) { word.length += 2; word[length-1] = '"'; for(int i = word.length-2; i > 0; --i) word[i] = word[i - 1]; word[0] = '"'; return word; } void storeFoos(char[] buffer, string pattern, Container c) { char[] res = find(buffer, pattern); c.store(quote(res[0..pattern.len]); } Is this buggy? If so, how and when will the bug manifest itself?In my mind it is not buggy. If quote or storeFoos is not meant to be able to molest the data, it should require a const(char)[]. If this was D1 code, I'd say it depends on the purpose of storeFoos, if storeFoos is expecting to own the input buffer, it's not buggy. If it's not, then storeFoos is buggy (in all cases). But we aren't talking about D1.Well, I was wondering if it could be rewritten as: T[] duplicate(T)(T[] src) { assert(src.length != 0); T tmp = src[$-1]; src.length -= 1; src.length += 1; src[$-1] = tmp; return src; } and whether it's okay for the optimizer to do some simple arithmetic optimization.Yes. I have a feeling you have some trick up your sleeve that you think makes it incorrect :)That he did. So now he goes back to the drawing board and adds an early return for length 0. Is his code correct now?Or another story. An enthusiastic programmer comes to you with a very clever rewrite of a function. This is the original: T[] duplicate(T)(T[] src) { static if (is(T == invariant)) return src.idup; else return src.dup; } This is his improved version: T[] duplicate(T)(T[] src) { T tmp = src[$-1]; src.length -= 1; src ~= tmp; return src; } No need for static if! The extension is bound to re-allocate becauseofstomping rules. Will this always work, or is there a gotcha?He forgot to check for length 0.
Nov 24 2009
On Tue, 24 Nov 2009 17:35:43 -0500, Bartosz Milewski <bartosz-nospam relisoft.com> wrote:Steven Schveighoffer Wrote:I think this warrants more code reviews of this particular developer's code :) Bottom line: if a function isn't supposed to change the buffer, the signature should be const for that parameter. It's one of the principles of const, and why it's in D2 in the first place. I'd explain to the coder that he is wrong to expect that modifying a buffer in a function that isn't supposed to modify a buffer is acceptable (and hopefully he gets it, or else I don't have time to deal with people who insist on being right when they are not). BTW, in my experience, the newbie expectaction of ~= is usually that it modifies the original even when it doesn't, not the other way around.On Tue, 24 Nov 2009 16:52:52 -0500, Bartosz Milewski <bartosz-nospam relisoft.com> wrote:Well, he tried that, but then his quote function wouldn't work and he needs it in other places as well. You know how they are ;-). Besides he thinks "const" is for wimps. He won't stop arguing until you prove to him that there is an actual bug in his code.Steven Schveighoffer Wrote:D1Imagine you're reviewing this code written by a relative newbee: char[] quote(char[] word) { word.length += 2; word[length-1] = '"'; for(int i = word.length-2; i > 0; --i) word[i] = word[i - 1]; word[0] = '"'; return word; } void storeFoos(char[] buffer, string pattern, Container c) { char[] res = find(buffer, pattern); c.store(quote(res[0..pattern.len]); } Is this buggy? If so, how and when will the bug manifest itself?In my mind it is not buggy. If quote or storeFoos is not meant to be able to molest the data, it should require a const(char)[]. If this wasThen he forgot the const decoration for the parameter. No duping is necessary: void storeFoos(const(char)[] buffer, string pattern, Container c) { ... // will result in compiler errors for given quote function, fix that. }code, I'd say it depends on the purpose of storeFoos, if storeFoos is expecting to own the input buffer, it's not buggy. If it's not, then storeFoos is buggy (in all cases). But we aren't talking about D1.Yes, a lot depends on what's expected of storeFoos. Suppose that the surrounding code expects the buffer not to be modified. The newbee didn't want to idup the buffer because in most buffers the pattern is not found.veryOr another story. An enthusiastic programmer comes to you with aclever rewrite of a function. This is the original: T[] duplicate(T)(T[] src) { static if (is(T == invariant)) return src.idup; else return src.dup; }First, this code isn't equivalent to the original, because it doesn't work on const and immutable arrays. Second, the optimizer cannot do simple math optimization on src.length -= 1 and src.length += 1 because length is a property, not a field. Setting the length calls a function, which cannot be optimized out. I also thought of some problems with the original code with static if's. Assuming that version, why must this code fail?: class C {} const(C)[] arr; const(C)[] arr2 = arr.duplicate(); Oops, just tried it and no problems! I'll file a bug if it still happens on the latest compiler. -SteveWell, I was wondering if it could be rewritten as: T[] duplicate(T)(T[] src) { assert(src.length != 0); T tmp = src[$-1]; src.length -= 1; src.length += 1; src[$-1] = tmp; return src; } and whether it's okay for the optimizer to do some simple arithmetic optimization.becauseThis is his improved version: T[] duplicate(T)(T[] src) { T tmp = src[$-1]; src.length -= 1; src ~= tmp; return src; } No need for static if! The extension is bound to re-allocateearlyofThat he did. So now he goes back to the drawing board and adds anstomping rules. Will this always work, or is there a gotcha?He forgot to check for length 0.return for length 0. Is his code correct now?Yes. I have a feeling you have some trick up your sleeve that you think makes it incorrect :)
Nov 24 2009
Steven Schveighoffer Wrote:Bottom line: if a function isn't supposed to change the buffer, the signature should be const for that parameter. It's one of the principles of const, and why it's in D2 in the first place. I'd explain to the coder that he is wrong to expect that modifying a buffer in a function that isn't supposed to modify a buffer is acceptable (and hopefully he gets it, or else I don't have time to deal with people who insist on being right when they are not). BTW, in my experience, the newbie expectaction of ~= is usually that it modifies the original even when it doesn't, not the other way around.The guy insists that the reallocation happens in the quote function (otherwise there would be stomping), so no sharing happens and the original is not modified. He tested it on a variety of inputs! I'm not totally making it up--I had this kind of arguments with C++ programmers. You tell them that it's safer to use const and they laugh at you. "My code works, why should I change it!"You are again resorting to implementation. I guess in the current implementation it's true that the compiler will indeed insert a call to its internal function. But the language spec does not proscribe that. Would you also argue that this optimization is incorrect? Changing: a.length += 2; a.length += 3; to: a.lenght += 5; I'm not saying that this is an unsolvable problem. I'm just showing how hard it is to reason about D arrays.1 and src.length += 1 because length is a property, not a field. Setting the length calls a function, which cannot be optimized out.Second, the optimizer cannot do simple math optimization on src.length -=
Nov 24 2009
Bartosz Milewski Wrote:Steven Schveighoffer Wrote:C++ const is kind of a joke :) But in any case, it's trivial to come up with a case that causes buffer issues. If he insists on a test case, then I'd give him one. Programmers like this don't last long at companies. I had a coworker once who insisted that a bug he was working on was going to be impossible to fix, so there was no point in working on it. My boss said that if he (the boss) could fix it, the coworker would be let go. My boss fixed it in one day. And there are *tons* of logic errors that don't cause bugs for most inputs, I don't see why we are focusing on this one case.Bottom line: if a function isn't supposed to change the buffer, the signature should be const for that parameter. It's one of the principles of const, and why it's in D2 in the first place. I'd explain to the coder that he is wrong to expect that modifying a buffer in a function that isn't supposed to modify a buffer is acceptable (and hopefully he gets it, or else I don't have time to deal with people who insist on being right when they are not). BTW, in my experience, the newbie expectaction of ~= is usually that it modifies the original even when it doesn't, not the other way around.The guy insists that the reallocation happens in the quote function (otherwise there would be stomping), so no sharing happens and the original is not modified. He tested it on a variety of inputs! I'm not totally making it up--I had this kind of arguments with C++ programmers. You tell them that it's safer to use const and they laugh at you. "My code works, why should I change it!"Sure it does. It says that setting the length may reallocate the array to hold enough space. How do you do that without a function call?You are again resorting to implementation. I guess in the current implementation it's true that the compiler will indeed insert a call to its internal function. But the language spec does not proscribe that.1 and src.length += 1 because length is a property, not a field. Setting the length calls a function, which cannot be optimized out.Second, the optimizer cannot do simple math optimization on src.length -=Would you also argue that this optimization is incorrect? Changing: a.length += 2; a.length += 3; to: a.lenght += 5;To the developer? no. To the compiler? yes. The compiler cannot know what affect it will have by making this optimization, it does not know what the underlying function will do or the semantic meaning of those statements. What if adding to a.length outputs it's argument? Then the optimizer would have changed the output from 23 to 5. But the developer who knows the semantic meaning can make the optimization change.I'm not saying that this is an unsolvable problem. I'm just showing how hard it is to reason about D arrays.I will agree that D arrays are hard to explain, I've had my share of training sessions on IRC. It's not an easy concept to get right off, but the benefits of doing arrays this way are huge. -Steve
Nov 25 2009
Steven Schveighoffer <schveiguy yahoo.com> wrote:As long as the compiler is aware of arrays' existence, and knows how those functions work, there might indeed be an optimization that turns length+=1;length-=1; into a nop. There isn't at the moment, and there might never be, but it's a conceivable optimization. -- SimenYou are again resorting to implementation. I guess in the current implementation it's true that the compiler will indeed insert a call to its internal function. But the language spec does not proscribe that.Sure it does. It says that setting the length may reallocate the array to hold enough space. How do you do that without a function call?
Nov 25 2009
Simen kjaeraas Wrote:Steven Schveighoffer <schveiguy yahoo.com> wrote:As long as the spec says changing length may expand the array to hold enough space, the optimizer can't, because the optimization would change the side effects of the function. An optimizer should not change the outcome or side effects of a function. It's not unheard of for an optimizer to eliminate important operations that it thinks are invalid, but in that case, it's a broken optimizer, I don't see why we would add this case. Now, I could see a future optimizer changing a.length += 2; a.length += 3; into a.length += 5; But I don't see a huge gain. Why not just write a.length += 5 in the first place? This point brought up is a non-issue, an optimizer that changes the outcome/side effects of a function is a broken optimizer, end of story. -SteveAs long as the compiler is aware of arrays' existence, and knows how those functions work, there might indeed be an optimization that turns length+=1;length-=1; into a nop. There isn't at the moment, and there might never be, but it's a conceivable optimization.You are again resorting to implementation. I guess in the current implementation it's true that the compiler will indeed insert a call to its internal function. But the language spec does not proscribe that.Sure it does. It says that setting the length may reallocate the array to hold enough space. How do you do that without a function call?
Nov 25 2009
Steven Schveighoffer Wrote:As long as the spec says changing length may expand the array to hold enough space, the optimizer can't, because the optimization would change the side effects of the function. An optimizer should not change the outcome or side effects of a function. It's not unheard of for an optimizer to eliminate important operations that it thinks are invalid, but in that case, it's a broken optimizer, I don't see why we would add this case.This is all true for user-defined data types, but array is totally under compiler's control. What optimizations are acceptable depends only on the interpretation of the language spec. In particular the sequence a.length -= 1; a.length += 1; as a whole does not result in stomping, so why would it be wrong to optimize it away? I claim it's perfectly legitimate.
Nov 25 2009
Bartosz Milewski wrote:Steven Schveighoffer Wrote:Yes, but this gets into optimizing sequences of complex operations, so the question is one of expectancy. What if I insert something there: a.length -= 1; writeln("Why oh why???"); a.length += 1; After all people don't ordinarily expect C++ compilers to optimize away a.resize(a.size() - 1); a.resize(a.size() + 1); Each function has some semantics and effects, and I don't know why we'd want to get down the route that effects depend on the flow in which the function gets called. What is the example trying to accomplish? AndreiAs long as the spec says changing length may expand the array to hold enough space, the optimizer can't, because the optimization would change the side effects of the function. An optimizer should not change the outcome or side effects of a function. It's not unheard of for an optimizer to eliminate important operations that it thinks are invalid, but in that case, it's a broken optimizer, I don't see why we would add this case.This is all true for user-defined data types, but array is totally under compiler's control. What optimizations are acceptable depends only on the interpretation of the language spec. In particular the sequence a.length -= 1; a.length += 1; as a whole does not result in stomping, so why would it be wrong to optimize it away? I claim it's perfectly legitimate.
Nov 25 2009
Andrei Alexandrescu Wrote:Bartosz Milewski wrote:I'm trying to nail down the semantics. Here is the case where there provably is no stomping. If this can be optimized away then the language guarantee has to be formulated accordingly. Right now I don't know what to expect from the code I wrote. Is it a good substitute for duplicate or not? I get worried when the semantics of a very important and essential primitive--the array--is hard to explain and comprehend.Steven Schveighoffer Wrote:Yes, but this gets into optimizing sequences of complex operations, so the question is one of expectancy. What if I insert something there: a.length -= 1; writeln("Why oh why???"); a.length += 1; After all people don't ordinarily expect C++ compilers to optimize away a.resize(a.size() - 1); a.resize(a.size() + 1); Each function has some semantics and effects, and I don't know why we'd want to get down the route that effects depend on the flow in which the function gets called. What is the example trying to accomplish?As long as the spec says changing length may expand the array to hold enough space, the optimizer can't, because the optimization would change the side effects of the function. An optimizer should not change the outcome or side effects of a function. It's not unheard of for an optimizer to eliminate important operations that it thinks are invalid, but in that case, it's a broken optimizer, I don't see why we would add this case.This is all true for user-defined data types, but array is totally under compiler's control. What optimizations are acceptable depends only on the interpretation of the language spec. In particular the sequence a.length -= 1; a.length += 1; as a whole does not result in stomping, so why would it be wrong to optimize it away? I claim it's perfectly legitimate.
Nov 25 2009
Bartosz Milewski wrote:Andrei Alexandrescu Wrote:The answer is very simple, and it is present in the book. I have also repeated it here. Appending to an array may or may not terminate an existing sharing relationship, and it never initiates a sharing relationship. Assigning a larger length is equivalent to appending a number of default-initialized values. There is nothing difficult here, although convoluted examples can be created that compound "if"s in complex interactions.Bartosz Milewski wrote:I'm trying to nail down the semantics. Here is the case where there provably is no stomping. If this can be optimized away then the language guarantee has to be formulated accordingly. Right now I don't know what to expect from the code I wrote. Is it a good substitute for duplicate or not?Steven Schveighoffer Wrote:Yes, but this gets into optimizing sequences of complex operations, so the question is one of expectancy. What if I insert something there: a.length -= 1; writeln("Why oh why???"); a.length += 1; After all people don't ordinarily expect C++ compilers to optimize away a.resize(a.size() - 1); a.resize(a.size() + 1); Each function has some semantics and effects, and I don't know why we'd want to get down the route that effects depend on the flow in which the function gets called. What is the example trying to accomplish?As long as the spec says changing length may expand the array to hold enough space, the optimizer can't, because the optimization would change the side effects of the function. An optimizer should not change the outcome or side effects of a function. It's not unheard of for an optimizer to eliminate important operations that it thinks are invalid, but in that case, it's a broken optimizer, I don't see why we would add this case.This is all true for user-defined data types, but array is totally under compiler's control. What optimizations are acceptable depends only on the interpretation of the language spec. In particular the sequence a.length -= 1; a.length += 1; as a whole does not result in stomping, so why would it be wrong to optimize it away? I claim it's perfectly legitimate.I get worried when the semantics of a very important and essential primitive--the array--is hard to explain and comprehend.We have a couple of dozens of other things to worry about. There are deadlines and many other more important and more urgent things to work on. The primitive is not important and not essential. It's a primitive, and like all primitives, it can be used to create better abstractions. UDP is a primitive protocol with few guarantees, but TCP is implemented on top of it to provide the guarantees. Simple, efficient primitives are a defensible choice. Andrei
Nov 25 2009
Bartosz Milewski Wrote:Steven Schveighoffer Wrote:In fact, adding length modifies the contents. For example: auto str = "hello"; str.length -= 1; str.length += 1; assert(str == "hell\xFF"); If the length modifications are optimized out, the assert fails. Looks like an invalid optimization to me. -SteveAs long as the spec says changing length may expand the array to hold enough space, the optimizer can't, because the optimization would change the side effects of the function. An optimizer should not change the outcome or side effects of a function. It's not unheard of for an optimizer to eliminate important operations that it thinks are invalid, but in that case, it's a broken optimizer, I don't see why we would add this case.This is all true for user-defined data types, but array is totally under compiler's control. What optimizations are acceptable depends only on the interpretation of the language spec. In particular the sequence a.length -= 1; a.length += 1; as a whole does not result in stomping, so why would it be wrong to optimize it away? I claim it's perfectly legitimate.
Nov 25 2009
Steven Schveighoffer wrote:As long as the spec says changing length may expand the array to hold enough space, the optimizer can't, because the optimization would change the side effects of the function. An optimizer should not change the outcome or side effects of a function. It's not unheard of for an optimizer to eliminate important operations that it thinks are invalid, but in that case, it's a broken optimizer, I don't see why we would add this case.The optimizer is free to change around implementation-defined-behavior and undefined-behavior. For defined-behavior, it can change things around as long as the observer cannot observe a change.This point brought up is a non-issue, an optimizer that changes the outcome/side effects of a function is a broken optimizer, end of story.It can within the constraints of defined-behavior.
Nov 25 2009
Walter Bright Wrote:Steven Schveighoffer wrote:So the real question is, is extension defined in terms of re-allocation or in terms of stomping. In the former case, the compiler should not optimize, because re-allocation is the expected side effect. In the latter case, it can eliminate the code, because what counts is the absence of stomping.As long as the spec says changing length may expand the array to hold enough space, the optimizer can't, because the optimization would change the side effects of the function. An optimizer should not change the outcome or side effects of a function. It's not unheard of for an optimizer to eliminate important operations that it thinks are invalid, but in that case, it's a broken optimizer, I don't see why we would add this case.The optimizer is free to change around implementation-defined-behavior and undefined-behavior. For defined-behavior, it can change things around as long as the observer cannot observe a change.This point brought up is a non-issue, an optimizer that changes the outcome/side effects of a function is a broken optimizer, end of story.It can within the constraints of defined-behavior.
Nov 26 2009
Steven Schveighoffer Wrote:Bartosz Milewski Wrote:The problem here is that the guy is sort of right. Indeed quote() expands the slice before overwriting it and that requires re-allocation to avoid stomping over the original buffer. There is however a boundary case, when the whole buffer matches the pattern. Then the expansion is done in place and the buffer is modified. This would never happen if expansion were guaranteed to re-allocate. What this example was supposed to illustrate is that it's easy to forget to explicitly duplicate an array, especially if it's not clear who's responsible--the caller or the callee. This is somewhat reminiscent of the situation in C++ and the responsibility to delete an object. Interestingly, in C++ programs you can use unique_ptr's to have deterministic deletion. In D you could also use uniqueness to have deterministic reallocation (or rather avoid it when it's not observable).Steven Schveighoffer Wrote:C++ const is kind of a joke :) But in any case, it's trivial to come up with a case that causes buffer issues. If he insists on a test case, then I'd give him one. Programmers like this don't last long at companies. I had a coworker once who insisted that a bug he was working on was going to be impossible to fix, so there was no point in working on it. My boss said that if he (the boss) could fix it, the coworker would be let go. My boss fixed it in one day. And there are *tons* of logic errors that don't cause bugs for most inputs, I don't see why we are focusing on this one case.Bottom line: if a function isn't supposed to change the buffer, the signature should be const for that parameter. It's one of the principles of const, and why it's in D2 in the first place. I'd explain to the coder that he is wrong to expect that modifying a buffer in a function that isn't supposed to modify a buffer is acceptable (and hopefully he gets it, or else I don't have time to deal with people who insist on being right when they are not). BTW, in my experience, the newbie expectaction of ~= is usually that it modifies the original even when it doesn't, not the other way around.The guy insists that the reallocation happens in the quote function (otherwise there would be stomping), so no sharing happens and the original is not modified. He tested it on a variety of inputs! I'm not totally making it up--I had this kind of arguments with C++ programmers. You tell them that it's safer to use const and they laugh at you. "My code works, why should I change it!"
Nov 25 2009
Bartosz Milewski wrote:Steven Schveighoffer Wrote:I agree that the underlying array sharing can be often a hassle. This is a problem that goes beyond just expansion - it's all the implied aliasing that goes on whenever you pass arrays around. How about creating a struct Value!T that transforms T (be it an array or a class) into a value type? Then if you use Value!(int[]), you're effectively dealing with values throughout (even though internally they might be COWed). Sometimes I also see a need for DeepValue!T which would e.g. duplicate transitively arrays of arrays and full object trees. For the latter we need some more introspection though. But we have everything in the laguage to make Value and DeepValue work with arrays. What do you think? AndreiBartosz Milewski Wrote:The problem here is that the guy is sort of right. Indeed quote() expands the slice before overwriting it and that requires re-allocation to avoid stomping over the original buffer. There is however a boundary case, when the whole buffer matches the pattern. Then the expansion is done in place and the buffer is modified. This would never happen if expansion were guaranteed to re-allocate. What this example was supposed to illustrate is that it's easy to forget to explicitly duplicate an array, especially if it's not clear who's responsible--the caller or the callee. This is somewhat reminiscent of the situation in C++ and the responsibility to delete an object. Interestingly, in C++ programs you can use unique_ptr's to have deterministic deletion. In D you could also use uniqueness to have deterministic reallocation (or rather avoid it when it's not observable).Steven Schveighoffer Wrote:C++ const is kind of a joke :) But in any case, it's trivial to come up with a case that causes buffer issues. If he insists on a test case, then I'd give him one. Programmers like this don't last long at companies. I had a coworker once who insisted that a bug he was working on was going to be impossible to fix, so there was no point in working on it. My boss said that if he (the boss) could fix it, the coworker would be let go. My boss fixed it in one day. And there are *tons* of logic errors that don't cause bugs for most inputs, I don't see why we are focusing on this one case.Bottom line: if a function isn't supposed to change the buffer, the signature should be const for that parameter. It's one of the principles of const, and why it's in D2 in the first place. I'd explain to the coder that he is wrong to expect that modifying a buffer in a function that isn't supposed to modify a buffer is acceptable (and hopefully he gets it, or else I don't have time to deal with people who insist on being right when they are not). BTW, in my experience, the newbie expectaction of ~= is usually that it modifies the original even when it doesn't, not the other way around.The guy insists that the reallocation happens in the quote function (otherwise there would be stomping), so no sharing happens and the original is not modified. He tested it on a variety of inputs! I'm not totally making it up--I had this kind of arguments with C++ programmers. You tell them that it's safer to use const and they laugh at you. "My code works, why should I change it!"
Nov 25 2009
Andrei Alexandrescu Wrote:How about creating a struct Value!T that transforms T (be it an array or a class) into a value type? Then if you use Value!(int[]), you're effectively dealing with values throughout (even though internally they might be COWed). Sometimes I also see a need for DeepValue!T which would e.g. duplicate transitively arrays of arrays and full object trees. For the latter we need some more introspection though. But we have everything in the laguage to make Value and DeepValue work with arrays. What do you think?I'm afraid this would further muddle the message: "If you want safe arrays, use the Value device; if you want to live dangerously, use the built in type." I'd rather see the reverse: D arrays are safe to use. They have the usual reference semantics of static arrays. But if you expand them, the sharing goes away and you get a unique reference to a copy. This is the "no gotcha" semantics, totally predictable, easy to reason about. How the compiler supports that semantics while performing clever optimizations is another story. It's fine if this part is hard. The language can even impose complexity requirements, if you are sure that they are possible to implement (it's enough to have one compliant implementation to prove this point). By the way, what are the library algorithms that rely on O(1) behavior of array append?
Nov 25 2009
Bartosz Milewski wrote:Andrei Alexandrescu Wrote:I think the message is "If you want values, use Value. If you want slices, use slices." To me that's rather a clear message.How about creating a struct Value!T that transforms T (be it an array or a class) into a value type? Then if you use Value!(int[]), you're effectively dealing with values throughout (even though internally they might be COWed). Sometimes I also see a need for DeepValue!T which would e.g. duplicate transitively arrays of arrays and full object trees. For the latter we need some more introspection though. But we have everything in the laguage to make Value and DeepValue work with arrays. What do you think?I'm afraid this would further muddle the message: "If you want safe arrays, use the Value device; if you want to live dangerously, use the built in type."I'd rather see the reverse: D arrays are safe to use.But that's backwards. You can do Value with slices. You can't do slices with values. The logical primitive to pick is the slice.They have the usual reference semantics of static arrays. But if you expand them, the sharing goes away and you get a unique reference to a copy. This is the "no gotcha" semantics, totally predictable, easy to reason about. How the compiler supports that semantics while performing clever optimizations is another story. It's fine if this part is hard. The language can even impose complexity requirements, if you are sure that they are possible to implement (it's enough to have one compliant implementation to prove this point).Well the problem is we don't have that. It's more difficult to "be fine" if the onus of implementation is on you.By the way, what are the library algorithms that rely on O(1) behavior of array append?I don't know, but there are plenty of algorithms out there that rely on that. Andrei
Nov 25 2009
Andrei Alexandrescu Wrote:Bartosz Milewski wrote:I know you don't like analogies, but for me it sounds like an advice to a C++ programmer: if you want to control destruction, use unique_ptr (or shared_ptr). This, BTW, is how I program in C++ because I can't live with memory leaks and double-deletions. What would your advice be to somebody who is learning D as her first language. Don't use arrays directly, use Value!(T[]) ? Should libraries be written in terms of Value!(T[])?Andrei Alexandrescu Wrote:I think the message is "If you want values, use Value. If you want slices, use slices." To me that's rather a clear message.How about creating a struct Value!T that transforms T (be it an array or a class) into a value type? Then if you use Value!(int[]), you're effectively dealing with values throughout (even though internally they might be COWed). Sometimes I also see a need for DeepValue!T which would e.g. duplicate transitively arrays of arrays and full object trees. For the latter we need some more introspection though. But we have everything in the laguage to make Value and DeepValue work with arrays. What do you think?I'm afraid this would further muddle the message: "If you want safe arrays, use the Value device; if you want to live dangerously, use the built in type."I'd rather see the reverse: D arrays are safe to use.But that's backwards. You can do Value with slices. You can't do slices with values. The logical primitive to pick is the slice.I thought you had concrete examples.They have the usual reference semantics of static arrays. But if you expand them, the sharing goes away and you get a unique reference to a copy. This is the "no gotcha" semantics, totally predictable, easy to reason about. How the compiler supports that semantics while performing clever optimizations is another story. It's fine if this part is hard. The language can even impose complexity requirements, if you are sure that they are possible to implement (it's enough to have one compliant implementation to prove this point).Well the problem is we don't have that. It's more difficult to "be fine" if the onus of implementation is on you.By the way, what are the library algorithms that rely on O(1) behavior of array append?I don't know, but there are plenty of algorithms out there that rely on that.
Nov 26 2009
Bartosz Milewski wrote:Andrei Alexandrescu Wrote:Some would use that, some would use T[]. For example std.algorithm could derive only little, yet nonzero, use from Value!(T[]).Bartosz Milewski wrote:I know you don't like analogies, but for me it sounds like an advice to a C++ programmer: if you want to control destruction, use unique_ptr (or shared_ptr). This, BTW, is how I program in C++ because I can't live with memory leaks and double-deletions. What would your advice be to somebody who is learning D as her first language. Don't use arrays directly, use Value!(T[]) ? Should libraries be written in terms of Value!(T[])?Andrei Alexandrescu Wrote:I think the message is "If you want values, use Value. If you want slices, use slices." To me that's rather a clear message.How about creating a struct Value!T that transforms T (be it an array or a class) into a value type? Then if you use Value!(int[]), you're effectively dealing with values throughout (even though internally they might be COWed). Sometimes I also see a need for DeepValue!T which would e.g. duplicate transitively arrays of arrays and full object trees. For the latter we need some more introspection though. But we have everything in the laguage to make Value and DeepValue work with arrays. What do you think?I'm afraid this would further muddle the message: "If you want safe arrays, use the Value device; if you want to live dangerously, use the built in type."I'd rather see the reverse: D arrays are safe to use.But that's backwards. You can do Value with slices. You can't do slices with values. The logical primitive to pick is the slice.I do (one famous example is in Windows 3), but I wouldn't think there's a need to argue that point. If we're facing the prospect of an append that always reallocates, we better don't provide it at all. People here have been complaining about ~= being slow because it acquires a lock, even though its complexity is amortized O(1). AndreiI thought you had concrete examples.They have the usual reference semantics of static arrays. But if you expand them, the sharing goes away and you get a unique reference to a copy. This is the "no gotcha" semantics, totally predictable, easy to reason about. How the compiler supports that semantics while performing clever optimizations is another story. It's fine if this part is hard. The language can even impose complexity requirements, if you are sure that they are possible to implement (it's enough to have one compliant implementation to prove this point).Well the problem is we don't have that. It's more difficult to "be fine" if the onus of implementation is on you.By the way, what are the library algorithms that rely on O(1) behavior of array append?I don't know, but there are plenty of algorithms out there that rely on that.
Nov 26 2009
Bartosz Milewski Wrote:The problem here is that the guy is sort of right. Indeed quote() expands the slice before overwriting it and that requires re-allocation to avoid stomping over the original buffer. There is however a boundary case, when the whole buffer matches the pattern. Then the expansion is done in place and the buffer is modified. This would never happen if expansion were guaranteed to re-allocate.Actually, the requirement for not reallocating is that the slice ends at the end of an allocated block, so for example if the pattern matched at the end of the buffer, you may have problems. The larger problem in the code is that the developer expects reallocation when it is not guaranteed. That is, he is *depending* on reallocation. In that case, you should explicitly reallocate. This doesn't necessarily mean using .dup, the following works just fine: return '"' ~ word ~ '"'; And it actually performs much better than the example. In all honesty, I was sort of trying to play your game for this example. In real life, I would have mentioned that the above line replaces the quote function, and most likely the coder would have accepted it not because it deterministically allocates but because it's cleaner and shorter :)What this example was supposed to illustrate is that it's easy to forget to explicitly duplicate an array, especially if it's not clear who's responsible--the caller or the callee.I don't think the example illustrates that. It illustrates that there are convoluted ways to try and explicitly use appending to duplicate arrays, and why one shouldn't do that. -Steve
Nov 25 2009
Steve, I don't know about you, but this exchange clarified some things for me. The major one is that it's dangerous to define the semantics of a language construct in terms of implementation. You were defending some points using implementation arguments rather than sticking to defined semantics. We have found out that one should never rely on the array being re-allocated on expansion, even if it seems like there's no other way. The only correct statement is that the freshly expanded part of the array is guaranteed not to be write-shared with any other array. However, this discussion veered away from a more important point. I don't believe that programmers will consciously make assumptions about re-allocation breaking sharing. The danger is that it's easy to miss accidental sharing and it's very hard to test for it. That was my original point and I haven't been convinced otherwise.Bartosz Milewski Wrote:The problem here is that the guy is sort of right. Indeed quote() expands the slice before overwriting it and that requires re-allocation to avoid stomping over the original buffer. There is however a boundary case, when the whole buffer matches the pattern. Then the expansion is done in place and the buffer is modified. This would never happen if expansion were guaranteed to re-allocate.Actually, the requirement for not reallocating is that the slice ends at the end of an allocated block, so for example if the pattern matched at the end of the buffer, you may have problems. The larger problem in the code is that the developer expects reallocation when it is not guaranteed. That is, he is *depending* on reallocation. In that case, you should explicitly reallocate. This doesn't necessarily mean using .dup, the following works just fine: return '"' ~ word ~ '"'; And it actually performs much better than the example. In all honesty, I was sort of trying to play your game for this example. In real life, I would have mentioned that the above line replaces the quote function, and most likely the coder would have accepted it not because it deterministically allocates but because it's cleaner and shorter :)What this example was supposed to illustrate is that it's easy to forget to explicitly duplicate an array, especially if it's not clear who's responsible--the caller or the callee.I don't think the example illustrates that. It illustrates that there are convoluted ways to try and explicitly use appending to duplicate arrays, and why one shouldn't do that. -Steve
Nov 26 2009
On Thu, 26 Nov 2009 17:45:30 -0500, Bartosz Milewski <bartosz-nospam relisoft.com> wrote:Steve, I don't know about you, but this exchange clarified some things for me. The major one is that it's dangerous to define the semantics of a language construct in terms of implementation. You were defending some points using implementation arguments rather than sticking to defined semantics.I was defending the semantics by using an example implementation. I was not defining the semantics in terms of implementation. The semantics are defined by the spec, and do not indicate when an array is reallocated and when it is not. That detail is implementation defined. My examples use dmd's implementation to show how the assumption can break. You said the guy needs me to show him that it is broken, and all his tests pass, why can't I use my knowledge of the implementation to come up with an example? I could rewrite my statements as: "You should not rely on the array being reallocated via append, because D does not guarantee such reallocation. Using the reference implementation of dmd, it is possible to come up with an example of where this fails: ..."We have found out that one should never rely on the array being re-allocated on expansion, even if it seems like there's no other way. The only correct statement is that the freshly expanded part of the array is guaranteed not to be write-shared with any other array.I agree with this (except for "even if it seems like there's no other way," The spec says an allocation always occurs when you do a ~ b, so you can always rewrite a ~= b as a = a ~ b). In fact, at one point to avoid stomping I went through Tango and found all places where append could result in stomping, and changed the code this way. There were probably less than 5 instances. Append is not a very common operation when you didn't create the array to begin with.However, this discussion veered away from a more important point. I don't believe that programmers will consciously make assumptions about re-allocation breaking sharing.For the most part, this is ok -- rarely do you see someone append to an array they didn't create *and* modify the original data. My belief is that people will expect more that appending an array *doesn't* reallocate. If you have experience in programming, the language you are used to either treats arrays as value types or as reference types. I don't think I've ever seen a language besides D that uses the hybrid type for arrays. So you are going to come to D expecting value or reference. If you expect value, you should quickly learn that's not the case because 99% of the time, arrays look like reference types. It is natural then to expect appending to an array to affect all other aliases of that array, after all it is a reference type. I just think your examples don't ring true in practice because there are simpler ways to guarantee allocation. You have to go out of your way to write bad code that doesn't work correctly. Finally, it's easy to turn an array into a reference type when passing as a parameter, just use the ref decorator. All we need is a way to turn it into a value type, and I think Andrei's idea of Value!(arr) would be great for that.The danger is that it's easy to miss accidental sharing and it's very hard to test for it.I think this danger is rare, and it's easy to search for (just search for ~= in your code, I did it with Tango). I think it can be very well defined in a tutorial or book chapter. -Steve
Dec 01 2009
Steven Schveighoffer Wrote:This might just be it. Experienced D programmers will avoid using ~= . It's the newbies who will be stumbling into these problems.The danger is that it's easy to miss accidental sharing and it's very hard to test for it.I think this danger is rare, and it's easy to search for (just search for ~= in your code, I did it with Tango). I think it can be very well defined in a tutorial or book chapter.
Dec 01 2009
On Tue, Nov 24, 2009 at 8:26 AM, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:Steven Schveighoffer wrote:inOn Tue, 24 Nov 2009 11:01:10 -0500, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:Steven Schveighoffer wrote: [snip]Andrei has mentioned that he thinks we can store the allocated length =an MRUthe GC block, which I think would also work. =A0You also wouldn't need=o I'mcache in that case, but he says it's in *addition* to the MRU cache, s=Innot sure what he means.[snip] Reaching the GC block is relatively expensive, so the MRU still helps. =aessence it's like this. When appending: a) look up the cache, if there, you have O(1) amortized append that's really fast b) if not in the cache, talk to the GC, still O(1) amortized append that's not as fast Both help providing an important performance guarantee. I was a bit worried about guaranteeing "O(1) amortized append for up to 8 arrays at=?time."Have you considered the performance impact on allocating non-array types=ength,=A0That is, are you intending on all allocations storing the allocated l=leven class or struct allocations who will likely never append? =A0Or wil=athere be a "non-appendable" flag? Also, the part without the MRU cache was my original proposal from last year, I had some ideas on how length could be stored. =A0For example, in=topage of up to 128 byte blocks, you only need 8 bits to store the length (alas, you cannot store with 4 bits for 16-byte blocks because you need =For 256cover both 0 and 16). =A0This reduces the overhead for those blocks. =A0=ofbyte to 1-page blocks, 16 bits is acceptable multi-page blocks, the cost=ostoring a 32-bit integer is negligible.Having access to the requested length is important at larger lengths, so probably we could be fine by not actually storing it for allocations up t=128 bytes.ed aIt is true the lookup of the MRU cache will not involve dissecting the address of the block to find it's container block, but you still will ne=kinglock, or are you planning on doing something clever? =A0 I think the loc=twill be the bottleneck, and if you don't make it the same as the global lock, add the cost of both locks when you actually need to append.The cache is a thread-local map from pointers to size_t. Using it does no=require any locking I think.Wait, what happens if you're sharing an array between two different threads= ? The MRU in my thread might be out of date. So some kind of cache flushing or snooping seems required. And that means a lock. It's the same cache consistency problem faced by multiple CPU cores sharing one system memory. --bb
Nov 24 2009
Bill Baxter wrote:On Tue, Nov 24, 2009 at 8:26 AM, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:The cache does not work with shared arrays, and threads cannot share non-shared arrays. AndreiSteven Schveighoffer wrote:Wait, what happens if you're sharing an array between two different threads?On Tue, 24 Nov 2009 11:01:10 -0500, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:Having access to the requested length is important at larger lengths, so probably we could be fine by not actually storing it for allocations up to 128 bytes.Steven Schveighoffer wrote: [snip]Have you considered the performance impact on allocating non-array types? That is, are you intending on all allocations storing the allocated length, even class or struct allocations who will likely never append? Or will there be a "non-appendable" flag? Also, the part without the MRU cache was my original proposal from last year, I had some ideas on how length could be stored. For example, in a page of up to 128 byte blocks, you only need 8 bits to store the length (alas, you cannot store with 4 bits for 16-byte blocks because you need to cover both 0 and 16). This reduces the overhead for those blocks. For 256 byte to 1-page blocks, 16 bits is acceptable multi-page blocks, the cost of storing a 32-bit integer is negligible.Andrei has mentioned that he thinks we can store the allocated length in the GC block, which I think would also work. You also wouldn't need an MRU cache in that case, but he says it's in *addition* to the MRU cache, so I'm not sure what he means.[snip] Reaching the GC block is relatively expensive, so the MRU still helps. In essence it's like this. When appending: a) look up the cache, if there, you have O(1) amortized append that's really fast b) if not in the cache, talk to the GC, still O(1) amortized append that's not as fast Both help providing an important performance guarantee. I was a bit worried about guaranteeing "O(1) amortized append for up to 8 arrays at a time."It is true the lookup of the MRU cache will not involve dissecting the address of the block to find it's container block, but you still will need a lock, or are you planning on doing something clever? I think the locking will be the bottleneck, and if you don't make it the same as the global lock, add the cost of both locks when you actually need to append.The cache is a thread-local map from pointers to size_t. Using it does not require any locking I think.
Nov 24 2009
Andrei Alexandrescu, el 18 de noviembre a las 07:22 me escribiste:Leandro Lucarella wrote:Yes, I think dynamic arrays are grouping 2 separate things: a proper dynamic array able to efficiently appending stuff (without hacks like the size cache) and a slice (a read-only view of a piece of memory); a "random range" if you want to put it in terms of D. A dynamic array should be a reference type, a slice a value type. I explained my view before. I wont explain it with much detail again (I hope you remember the thread).Andrei Alexandrescu, el 17 de noviembre a las 18:45 me escribiste:In which ways do you think arrays horribly broken? Same as Bartosz mentions?I didn't say anything (until now) because this was discussed already. Dynamic arrays/slices appending is horribly broken (I know it's well defined, and deterministic, but you are just condemned to make mistakes, it just doesn't work as one would expect, even when you know how it works you have to keep fighting your intuition all the time).3. If you **really** care about performance, you should only append when you don't know the length in advance. If you know the length, you should always pre-allocate.We will have collections and all those good things, but I don't see how the proposal follows from the feedback. My perception is that this is a group of people who use D. Bartosz' concern didn't cause quite a riot, so as far as I can tell there is no big issue at stake.One question is whether you actually have had bugs and problems coding, or if arrays stopped you from getting work done.I had many bugs because of the reallocation when appending.It is after a good fight :) From time to time this kind of issues (that are present in the group since I can remember, which is about 4 years maybe) keep popping in the NG. I get tired of repeating myself over and over again (like may others, I guess). -- Leandro Lucarella (AKA luca) http://llucax.com.ar/ ---------------------------------------------------------------------- GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145 104C 949E BFB6 5F5A 8D05) ---------------------------------------------------------------------- Hey, it's George. I got nothing to say. -- George Constanza, dejando un mensaje en la contestadora de JerryBut it's a little pointless to keep making riots, when we were so close to finally fix this with T[new]/whatever and it all went back.Hmmmm... resignation is not a trait of this group :o).
Nov 18 2009
Andrei Alexandrescu Wrote:One thing that Bartosz didn't mention is that "this unique" is different than "the other unique", which I think reveals the half-bakedness of the proposal. "The other unique" that was intensively discussed before is transitive, meaning that the root reference points to a graph of objects having no other connection to the outer world. "This unique" is very different because it's shallow - only the array chunk is unaliased, but the members of the array don't need to be. This is a very important distinction.Actually I was thinking of deep unique, but understandably I neither described full details of my proposal, nor have I completely "baked" it, which would require some serious work and a lot more discussion. One of these days I will blog about uniqueness in more detail. The problem of transitivity with respect to arrays has been discussed before in the context of const and immutable. We realized that shallow constness/immutability would be useful in some scenarios but, because of the added complexity, we punted it. My feeling is that, similarly, in most cases the restriction that a unique array may only hold unique objects is not unreasonable.
Nov 18 2009
On Wed, 18 Nov 2009 14:03:11 -0500, Bartosz Milewski <bartosz-nospam relisoft.com> wrote:Andrei Alexandrescu Wrote:I wonder what your expert opinion is: Is something like unique possible without introducing all the other type modifiers (like 'lent', etc.)? If not, what are the minimum type modifiers you'd need to get unique to work? I really like the unique concept, and think it is really one of the biggest holes missing from the const system (aside from bug 1961). -SteveOne thing that Bartosz didn't mention is that "this unique" is different than "the other unique", which I think reveals the half-bakedness of the proposal. "The other unique" that was intensively discussed before is transitive, meaning that the root reference points to a graph of objects having no other connection to the outer world. "This unique" is very different because it's shallow - only the array chunk is unaliased, but the members of the array don't need to be. This is a very important distinction.Actually I was thinking of deep unique, but understandably I neither described full details of my proposal, nor have I completely "baked" it, which would require some serious work and a lot more discussion. One of these days I will blog about uniqueness in more detail. The problem of transitivity with respect to arrays has been discussed before in the context of const and immutable. We realized that shallow constness/immutability would be useful in some scenarios but, because of the added complexity, we punted it. My feeling is that, similarly, in most cases the restriction that a unique array may only hold unique objects is not unreasonable.
Nov 19 2009
Steven Schveighoffer Wrote:I wonder what your expert opinion is: Is something like unique possible without introducing all the other type modifiers (like 'lent', etc.)? If not, what are the minimum type modifiers you'd need to get unique to work? I really like the unique concept, and think it is really one of the biggest holes missing from the const system (aside from bug 1961).The most complete description of "unique" that I've seen is in Haller/Odersky Scala paper "Capabilities for External Uniqueness." It's not an easy paper to read, but I'm planning on writing a blog post in which I'll try to make it more palatable. They use annotations to extend the type system, something that was pioneered in Java. The recent introduction of annotations (like property) in D makes this approach quite attractive. They used the following annotations: unique, transient, and exposed ( transient loosely corresponds to "lent"). Don't get me wrong, the full specification of uniqueness is non-trivial and I wouldn't propose it for D just to solve the array expansion problem. The real power of "unique" shows in multithreaded programming where you can pass unique objects between threads and safely operate on them without any synchronization.
Nov 19 2009
int[] a = [0]; auto b = a; a ~= 1; b ~= 2; What is a[1]? Is this considered "stomping" and requiring a re-allocation?Can this question be answered without the recourse to implementation, and the MRU cache in particular? I thought about an abstract definition of "stomping".Something like that: The expanded part of the array is never implicitly shared with any other array. But I'm not sure about this case: int[] a = [1, 2, 3]; int[] b = a[0..1]; b ~= 4; what is a[1]? By my definition, this would be considered stomping, but I couldn't find this example in TDPL. I'm also not sure what to make of this statement: "client code is guaranteed to benefit of good average performance over a large number of appends to the same array". Since "good" is open to interpretation, such a guarantee is meaningless. If it is meant as "amortized O(1)", it's impossible to fulfill, unless the MRU cache is infinite and a lookup is O(1). Then there is the problem of multithreading. A global MRU cache would require potentially expensive synchronization. A thread-local cache, on the other hand, makes thread creation even more heavy-weight than it is right now. My prediction is that D will be evolving towards user-level threads and fine granularity concurrency, which requires very small thread footprint.
Nov 21 2009
Bartosz Milewski Wrote:It is natural that a language like D has dynamic arrays (as core features or as library implentations). I think that's why we keep using the word "array" to describe slices. (No, I don't have proof for this theory.) In *current* D, the user code does not handle dynamic arrays. Both 'a' and 'b' below are slices.int[] a = [0]; auto b = a; a ~= 1; b ~= 2; What is a[1]? Is this considered "stomping" and requiring a re-allocation?Can this question be answered without the recourse to implementation, and the MRU cache in particular? I thought about an abstract definition of "stomping".Something like that: The expanded part of the array is never implicitly shared with any other array. But I'm not sure about this case:int[] a = [1, 2, 3]; int[] b = a[0..1]; b ~= 4; what is a[1]? By my definition, this would be considered stomping, but I couldn't find this example in TDPL.Neither 'a' nor 'b' own the elements; in that sense, if one is a slice then the other is a slice... In D, slices have "discretionary sharing semantics." Any slice may leave the sharing contract as it sees unfit. Ali
Nov 21 2009
Ali Cehreli Wrote:Neither 'a' nor 'b' own the elements; in that sense, if one is a slice then the other is a slice... In D, slices have "discretionary sharing semantics." Any slice may leave the sharing contract as it sees unfit.So is your answer different from that of dsimcha? I'm sitting on a fence about it. Andrei, could you settle this issue (and a few others I raised before)?
Nov 21 2009
== Quote from Bartosz Milewski (bartosz-nospam relisoft.com)'s articleAli Cehreli Wrote:the other is a slice... In D, slices have "discretionary sharing semantics." Any slice may leave the sharing contract as it sees unfit.Neither 'a' nor 'b' own the elements; in that sense, if one is a slice thenSo is your answer different from that of dsimcha? I'm sitting on a fence aboutit. Andrei, could you settle this issue (and a few others I raised before)? I think the confusion stems from Ali Cehreli describing things as currently implemented and me describing things as proposed under the MRU cache scheme.
Nov 21 2009
Bartosz Milewski wrote:Ali Cehreli Wrote:I'm not sure I figure what the issue is, and I'd appreciate a rehash of the few other issues brought up. (I thought they had been responded to.) David's reply is valid but may be a bit misleading because it explains things in terms of the cache, which could reinforce the impression that the cache is part of the definition. Ali's point is also correct in the sense that arrays are not owned. Garbage collection makes that possible. It's very simple, really. Appending to an array never results in overwriting elements of an existing array. Appending may terminate a sharing relationship. This is simple, easy to understand, and requires no understanding of the optimization paraphernalia, which I hope to be able to improve. AndreiNeither 'a' nor 'b' own the elements; in that sense, if one is a slice then the other is a slice... In D, slices have "discretionary sharing semantics." Any slice may leave the sharing contract as it sees unfit.So is your answer different from that of dsimcha? I'm sitting on a fence about it. Andrei, could you settle this issue (and a few others I raised before)?
Nov 22 2009
Andrei Alexandrescu Wrote:I'm not sure I figure what the issue is, and I'd appreciate a rehash of the few other issues brought up. (I thought they had been responded to.)Some of the points are in my message from Sat, 05:19 pm. Especially this point: Can a conforming implementation simply re-allocate on every expansion? You make it sound like that would violate some performance guarantees but there are no specifics.
Nov 22 2009
On Sun, 22 Nov 2009 21:19:15 -0500, Bartosz Milewski <bartosz-nospam relisoft.com> wrote:Andrei Alexandrescu Wrote:My view is that a conforming implementation could reallocate on every expansion. That translates to an MRU cache size of zero. But the MRU cache is just one way to solve the problem, and does not need to be in the language spec. It's an implementation detail. BTW, the more disturbing case than those being discussed, one which violates more than just your expectations is: string x = "hello".idup; auto y = x[0..1]; y ~= "appy"; We must have a solution to this problem, to keep the promise of immutability intact. The other problem of sometimes sharing data is not as important IMO. Basically, we have 2 options for the spec: A. Once you append to an array, you cannot change the data through that array for any other alias to the original array, so x ~= y is equivalent to x = x ~ y. B. Appending to an array may reallocate, so appending to x may allow x to share it's prefix bytes with another array. Whatever implementation satisfies the choice is correct, however optimized you want to make it. For instance, you could use an MRU cache for immutable data only, and still satisfy rule A. I also agree that there should be an array building mechanism, because these rules may not satisfy all usages. -SteveI'm not sure I figure what the issue is, and I'd appreciate a rehash of the few other issues brought up. (I thought they had been responded to.)Some of the points are in my message from Sat, 05:19 pm. Especially this point: Can a conforming implementation simply re-allocate on every expansion? You make it sound like that would violate some performance guarantees but there are no specifics.
Nov 23 2009
Bartosz Milewski wrote:Andrei Alexandrescu Wrote:I don't think a conforming implementation is allowed to do that. The functioning of many algorithms assumes constant-time append. If D can't guarantee that, those algorithms won't work. I've been deliberately vague in the book because I wasn't sure we can offer a guarantee if the cache is filled. Now I know we can: if a memory block stores the requested size in addition to today's effective capacity, then the allocator can overallocate growing arrays and then detect and correctly handle in-place reallocation requests, even when the MRU cache capacity is exceeded. There would be a larger cost for that (acquiring a lock), but it's still a constant amortized per-element cost. AndreiI'm not sure I figure what the issue is, and I'd appreciate a rehash of the few other issues brought up. (I thought they had been responded to.)Some of the points are in my message from Sat, 05:19 pm. Especially this point: Can a conforming implementation simply re-allocate on every expansion? You make it sound like that would violate some performance guarantees but there are no specifics.
Nov 23 2009
Andrei Alexandrescu Wrote:I wish we had some performance numbers to support this optimization. There's always a break-even point between O(1) and O(N) and I wonder at what N it happens. Also, how often does it happen that the cache is full--whether it's normal state or an exception. Hitting the cache and the heap lock on every extension will also break locality. There are so many factors...Some of the points are in my message from Sat, 05:19 pm. Especially this point: Can a conforming implementation simply re-allocate on every expansion? You make it sound like that would violate some performance guarantees but there are no specifics.I don't think a conforming implementation is allowed to do that. The functioning of many algorithms assumes constant-time append. If D can't guarantee that, those algorithms won't work. I've been deliberately vague in the book because I wasn't sure we can offer a guarantee if the cache is filled. Now I know we can: if a memory block stores the requested size in addition to today's effective capacity, then the allocator can overallocate growing arrays and then detect and correctly handle in-place reallocation requests, even when the MRU cache capacity is exceeded. There would be a larger cost for that (acquiring a lock), but it's still a constant amortized per-element cost.
Nov 23 2009
Bartosz Milewski wrote:Andrei Alexandrescu Wrote:Very true. It would be great if you or someone else interested could find the time to run a few tests. I am committed way over the top and simply cannot look into this. AndreiI wish we had some performance numbers to support this optimization. There's always a break-even point between O(1) and O(N) and I wonder at what N it happens. Also, how often does it happen that the cache is full--whether it's normal state or an exception. Hitting the cache and the heap lock on every extension will also break locality. There are so many factors...Some of the points are in my message from Sat, 05:19 pm. Especially this point: Can a conforming implementation simply re-allocate on every expansion? You make it sound like that would violate some performance guarantees but there are no specifics.I don't think a conforming implementation is allowed to do that. The functioning of many algorithms assumes constant-time append. If D can't guarantee that, those algorithms won't work. I've been deliberately vague in the book because I wasn't sure we can offer a guarantee if the cache is filled. Now I know we can: if a memory block stores the requested size in addition to today's effective capacity, then the allocator can overallocate growing arrays and then detect and correctly handle in-place reallocation requests, even when the MRU cache capacity is exceeded. There would be a larger cost for that (acquiring a lock), but it's still a constant amortized per-element cost.
Nov 23 2009
== Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s articleBartosz Milewski wrote:always a break-even point between O(1) and O(N) and I wonder at what N it happens. Also, how often does it happen that the cache is full--whether it's normal state or an exception. Hitting the cache and the heap lock on every extension will also break locality. There are so many factors...Andrei Alexandrescu Wrote:I wish we had some performance numbers to support this optimization. There'sSome of the points are in my message from Sat, 05:19 pm. Especially this point: Can a conforming implementation simply re-allocate on every expansion? You make it sound like that would violate some performance guarantees but there are no specifics.I don't think a conforming implementation is allowed to do that. The functioning of many algorithms assumes constant-time append. If D can't guarantee that, those algorithms won't work. I've been deliberately vague in the book because I wasn't sure we can offer a guarantee if the cache is filled. Now I know we can: if a memory block stores the requested size in addition to today's effective capacity, then the allocator can overallocate growing arrays and then detect and correctly handle in-place reallocation requests, even when the MRU cache capacity is exceeded. There would be a larger cost for that (acquiring a lock), but it's still a constant amortized per-element cost.Very true. It would be great if you or someone else interested could find the time to run a few tests. I am committed way over the top and simply cannot look into this. AndreiBut one of my biggest gripes about the current appending implementation, far more important from a practical perspective than any theoretical safety guarantees, is that it doesn't scale to multiple threads. I've experienced this in a real-world program that I translated from someone else's C++ code that used vector::push_back() quite liberally inside core algorithms. (The reason is that it was written more for simplicity/readability than performance, so the person who wrote it never thought to optimize this kind of stuff.) When I did a literal translation of this to D, everything was completely serialized. It didn't even scale to two cores. I had to rewrite significant portions of the code in a way that did not use appending to get it to scale at all. D is supposed to be a language that is geared toward multi-threading. If something as simple as array appending is a threading bottleneck (and it was for me in a real-world program), I don't care if it is O(1) because effectively it has a constant the size of Jupiter.
Nov 23 2009
On Mon, 23 Nov 2009 16:33:42 -0500, dsimcha <dsimcha yahoo.com> wrote:But one of my biggest gripes about the current appending implementation, far more important from a practical perspective than any theoretical safety guarantees, is that it doesn't scale to multiple threads. I've experienced this in a real-world program that I translated from someone else's C++ code that used vector::push_back() quite liberally inside core algorithms. (The reason is that it was written more for simplicity/readability than performance, so the person who wrote it never thought to optimize this kind of stuff.) When I did a literal translation of this to D, everything was completely serialized. It didn't even scale to two cores. I had to rewrite significant portions of the code in a way that did not use appending to get it to scale at all. D is supposed to be a language that is geared toward multi-threading. If something as simple as array appending is a threading bottleneck (and it was for me in a real-world program), I don't care if it is O(1) because effectively it has a constant the size of Jupiter.It's hard to say that your example is a good fit for the arrays of D. What you need is a library type that does not require a global lock for appending. Such is not the case for every example of appending in D, and the existence of the current D arrays does not prevent you from adding more specialized types. There are other costs to using such types besides the global lock, such as storing the capacity, and if you are slicing, locally storing the slice extents. -Steve
Nov 23 2009
On Mon, Nov 23, 2009 at 2:09 PM, Steven Schveighoffer <schveiguy yahoo.com> wrote:On Mon, 23 Nov 2009 16:33:42 -0500, dsimcha <dsimcha yahoo.com> wrote:onBut one of my biggest gripes about the current appending implementation, far more important from a practical perspective than any theoretical safety guarantees, is that it doesn't scale to multiple threads. =A0I've experienced this in a real-world program that I translated from someone else's C++ code that used vector::push_back() quite liberally inside core algorithms. =A0(The reas=dn'tis that it was written more for simplicity/readability than performance, so the person who wrote it never thought to optimize this kind of stuff.) =A0When I did a literal translation of this to D, everything was completely serialized. =A0It di=ineven scale to two cores. =A0I had to rewrite significant portions of the code==A0Ifa way that did not use appending to get it to scale at all. D is supposed to be a language that is geared toward multi-threading. =Whatsomething as simple as array appending is a threading bottleneck (and it was for me in a real-world program), I don't care if it is O(1) because effectively it has a constant the size of Jupiter.It's hard to say that your example is a good fit for the arrays of D. =A0=you need is a library type that does not require a global lock for appending. =A0Such is not the case for every example of appending in D, a=ndthe existence of the current D arrays does not prevent you from adding mo=respecialized types. =A0There are other costs to using such types besides t=heglobal lock, such as storing the capacity, and if you are slicing, locall=ystoring the slice extents.That's likely to be far too big a limitation for the upcoming world of parallelism everywhere. But the single-threaded GC is the real culprit. Fix that and arrays shouldn't be a problem either. --bb
Nov 23 2009
Bill Baxter wrote:On Mon, Nov 23, 2009 at 2:09 PM, Steven Schveighoffer <schveiguy yahoo.com> wrote:I agree. Overall, our eyes should be at this point focused on principle matters rather than matters of optimizations that can be done but haven't been done yet. Without being 100% sure, I conjecture that array primitives can be implemented efficiently if we give the implementation the freedom to end sharing for ~=. Also I am not ruling out the possibility that we can guarantee a ~= that never keeps sharing with existing arrays. (One idea I discussed with Walter was encoding uniqueness of arrays in a bit stored with the array limits.) AndreiOn Mon, 23 Nov 2009 16:33:42 -0500, dsimcha <dsimcha yahoo.com> wrote:That's likely to be far too big a limitation for the upcoming world of parallelism everywhere. But the single-threaded GC is the real culprit. Fix that and arrays shouldn't be a problem either.But one of my biggest gripes about the current appending implementation, far more important from a practical perspective than any theoretical safety guarantees, is that it doesn't scale to multiple threads. I've experienced this in a real-world program that I translated from someone else's C++ code that used vector::push_back() quite liberally inside core algorithms. (The reason is that it was written more for simplicity/readability than performance, so the person who wrote it never thought to optimize this kind of stuff.) When I did a literal translation of this to D, everything was completely serialized. It didn't even scale to two cores. I had to rewrite significant portions of the code in a way that did not use appending to get it to scale at all. D is supposed to be a language that is geared toward multi-threading. If something as simple as array appending is a threading bottleneck (and it was for me in a real-world program), I don't care if it is O(1) because effectively it has a constant the size of Jupiter.It's hard to say that your example is a good fit for the arrays of D. What you need is a library type that does not require a global lock for appending. Such is not the case for every example of appending in D, and the existence of the current D arrays does not prevent you from adding more specialized types. There are other costs to using such types besides the global lock, such as storing the capacity, and if you are slicing, locally storing the slice extents.
Nov 23 2009
Andrei Alexandrescu Wrote:sharing for ~=. Also I am not ruling out the possibility that we can guarantee a ~= that never keeps sharing with existing arrays. (One idea I discussed with Walter was encoding uniqueness of arrays in a bit stored with the array limits.)I've been thinking the same. Full uniqueness might be hard to implement, but if you are only interested in "opportunistic uniqueness" as an optimization tool, that might be easier. In full uniqueness you worry about lending and deep aliases. Here you might just conservatively decide that passing an array to a function, or storing it in an object, destroys its uniqueness (on top of explicit slicing). And you don't have to care about deep aliases as long as your only interest is in the uniqueness of the array buffer and not the objects it might be storing. Uniqueness one way or another is the key.
Nov 24 2009
On 2009-11-22 19:58:37 -0500, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> said:It's very simple, really. Appending to an array never results in overwriting elements of an existing array. Appending may terminate a sharing relationship. This is simple, easy to understand, and requires no understanding of the optimization paraphernalia, which I hope to be able to improve.I think it's the "may terminate" part Bartosz (and some others) don't like, since it's hard to predict and may cause bugs that appear in an hard to understand pattern when misused. I'm not exactly sure where to stand on this issue, but I might have a solution that looks like a compromise: use the MRU only for arrays of immutable elements, like strings. Since it's not possible to modify immutable data, it doesn't change the semantics if two slices are aliasing each other or not[^1]. Other arrays could get the more predictable behavior of always reallocating, but an optimizer could still avoid that when static analysis can prove that only one slice points to the data. [^1]: It changes things somewhat if you take the address of elements and start comparing them, but I find it reasonable that taking address of things could reveal implementation details. It could change things if the element type has a postblit function, or a destructor, since you will avoid some copying, but I think that's fine. -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Nov 22 2009
== Quote from Bartosz Milewski (bartosz-nospam relisoft.com)'s articleMRU cache in particular?int[] a = [0]; auto b = a; a ~= 1; b ~= 2; What is a[1]? Is this considered "stomping" and requiring a re-allocation?Can this question be answered without the recourse to implementation, and theI thought about an abstract definition of "stomping".Something like that: Theexpanded part of the array is never implicitly shared with any other array. But I'm not sure about this case:int[] a = [1, 2, 3]; int[] b = a[0..1]; b ~= 4; what is a[1]? By my definition, this would be considered stomping, but I couldn't find thisexample in TDPL. a[1] == 2. The MRU cache stores both the pointer and the length. When you attempt to append to b, the MRU is searched. No array that starts at b.ptr and has length b.length is found. b is reallocated. This is easy to specify: "The efficiency of the ~= operator for dynamic arrays is implementation defined but guaranteed to be no worse than O(N). An implementation may optimize this operation to avoid a reallocation on every append. The current reference implementation does so. However, the observable behavior of performing the operation a ~= b is guaranteed to be equivalent to the observable behavior of performing the operation a = a ~ b." Furthermore, the purpose of language design is to create a usable language first and foremost, and then figure out how to specify it formally and abstractly, not to create a perfectly bulletproof specification first and foremost and then hope that results in a usable language. Again, not saying I particularly like the MRU cache idea (it has its issues at an implementation level) but I think that "perfectly specified" has to take a back seat to "works well in practice".
Nov 21 2009
dsimcha Wrote:Furthermore, the purpose of language design is to create a usable language first and foremost, and then figure out how to specify it formally and abstractly, not to create a perfectly bulletproof specification first and foremost and then hope that results in a usable language. Again, not saying I particularly like the MRU cache idea (it has its issues at an implementation level) but I think that "perfectly specified" has to take a back seat to "works well in practice".This might be true at the experimental phase of language design. But I expect more from TDPL, if it's going to be the official definition of the language for the foreseeable future. I'm not expecting it to be as scrupulous as the C++ Standard, but a bit more solid than, "If in doubt, look how it's implemented in DMD--MRU cache and all."
Nov 21 2009
dsimcha Wrote:== Quote from Bartosz Milewski (bartosz-nospam relisoft.com)'s articleConsider there is also c: int[] c = b[0..1]; According to the above definition, after b~=4 'b' would be relocated and b[0] would be disjoint from c[0]. Now consider that a's definition is changed to just [ 1 ]. If I understand this correctly, then b.ptr and b.length would match the original array, and in this case b ~= 4 operation might not relocate b, and b[0] and c[0] would refer to the same element. A change in 'a' would be defining the relation between b and c. AliMRU cache in particular?int[] a = [0]; auto b = a; a ~= 1; b ~= 2; What is a[1]? Is this considered "stomping" and requiring a re-allocation?Can this question be answered without the recourse to implementation, and theI thought about an abstract definition of "stomping".Something like that: Theexpanded part of the array is never implicitly shared with any other array. But I'm not sure about this case:int[] a = [1, 2, 3]; int[] b = a[0..1]; b ~= 4; what is a[1]? By my definition, this would be considered stomping, but I couldn't find thisexample in TDPL. a[1] == 2. The MRU cache stores both the pointer and the length. When you attempt to append to b, the MRU is searched. No array that starts at b.ptr and has length b.length is found. b is reallocated.
Nov 21 2009
== Quote from Ali Cehreli (acehreli yahoo.com)'s articledsimcha Wrote:would be disjoint from c[0].== Quote from Bartosz Milewski (bartosz-nospam relisoft.com)'s articleConsider there is also c: int[] c = b[0..1]; According to the above definition, after b~=4 'b' would be relocated and b[0]MRU cache in particular?int[] a = [0]; auto b = a; a ~= 1; b ~= 2; What is a[1]? Is this considered "stomping" and requiring a re-allocation?Can this question be answered without the recourse to implementation, and theI thought about an abstract definition of "stomping".Something like that: Theexpanded part of the array is never implicitly shared with any other array. But I'm not sure about this case:int[] a = [1, 2, 3]; int[] b = a[0..1]; b ~= 4; what is a[1]? By my definition, this would be considered stomping, but I couldn't find thisexample in TDPL. a[1] == 2. The MRU cache stores both the pointer and the length. When you attempt to append to b, the MRU is searched. No array that starts at b.ptr and has length b.length is found. b is reallocated.Now consider that a's definition is changed to just [ 1 ]. If I understand this correctly, then b.ptr and b.length would match the originalarray, and in this case b ~= 4 operation might not relocate b, and b[0] and c[0] would refer to the same element.A change in 'a' would be defining the relation between b and c. AliYou're right. Indexing, unlike appending provides no guarantees about stomping, except that, if none of the arrays involved are resized, then they will point to the same memory block. I don't see this as a problem in practice, because it does not break immutability guarantees, etc. I've been using D for a year and a half and never had a bug caused by that. All you need to do is follow these rules: 1. If you want a slice to point to the same memory as the original array, make sure none of the arrays are resized. 2. If you don't want a slice to point to the same memory as the original array, use .dup. 3. If you don't care either way, then do neither.
Nov 22 2009
On Sat, 21 Nov 2009 17:19:08 -0500, Bartosz Milewski <bartosz-nospam relisoft.com> wrote:I believe that's because Andrei is hoping this problem will be fixed before TDPL comes out...int[] a = [0]; auto b = a; a ~= 1; b ~= 2; What is a[1]? Is this considered "stomping" and requiring a re-allocation?Can this question be answered without the recourse to implementation, and the MRU cache in particular? I thought about an abstract definition of "stomping".Something like that: The expanded part of the array is never implicitly shared with any other array. But I'm not sure about this case: int[] a = [1, 2, 3]; int[] b = a[0..1]; b ~= 4; what is a[1]? By my definition, this would be considered stomping, but I couldn't find this example in TDPL.I'm also not sure what to make of this statement: "client code is guaranteed to benefit of good average performance over a large number of appends to the same array". Since "good" is open to interpretation, such a guarantee is meaningless. If it is meant as "amortized O(1)", it's impossible to fulfill, unless the MRU cache is infinite and a lookup is O(1).If a reasonably small size limit is put on the MRU cache (defined by Andrei as 8), then MRU cache lookup is O(1). You still need a lock, but that's not any different than today's requirement, and it certainly wouldn't be any worse than re-allocating for every append. I agree there needs to be a better solution if you want the highest performance, but the simplest and most convenient solution should also perform reasonably well, and shouldn't corrupt data.Then there is the problem of multithreading. A global MRU cache would require potentially expensive synchronization. A thread-local cache, on the other hand, makes thread creation even more heavy-weight than it is right now. My prediction is that D will be evolving towards user-level threads and fine granularity concurrency, which requires very small thread footprint.You mean more heavyweight than allocating a new heap for each thread? The only construct which makes sense to tie individual MRU caches to is a heap, and I think adding the 64 bytes required for an MRU cache to a heap is not that big a deal. -Steve
Nov 23 2009
Bartosz Milewski νrta:dsimcha Wrote:It seems as we cannot persuade Andrei and Walter, that it is a really-really bad and dangerous implementation. Everybody knows what is going on when an int[] "array" is resized. It doesn't really matters, how we call it, indeterminism, undefined behaviour, implementation dependent behaviour. But we all know it's a source of a hardly detectable, dangerous bug. As I've said it before it's worse than buffer overrun, as they can be detected using some memory patterns and memory guarding techniques. But this bug cannot be detected, it just occurs once in a while, (like implementing parallel programs without the proper critical sections.) It's hard enough to create good (parallel) programs by itself, so don't harden the work with placing such a trap. So please take a look at the language with a different eye. I know you've worked a lot on it, but I'm afraid such a mistake could ruin the whole language. Sorry. Or at least just think why are there so many posts concerning this feature. int[] is not a range, it is a range + random_copy_on_resize (replace the word random with any of your choice, it is not a matter) I think we can agree that passing "array" int[] as a (non-cont) value type is highly avoidable. Than why do we allow it ??? With std::vector + iterator you always now when it is invalid (when the array is resized). But here you don't have an object whose modification (surly) invalidates the ranges. I'm missing to have reference for the actual array. The semantics is not so important for me (now), but distinct the two object: 1. have array those contains the element + structure, that can be resized: RandomAccessArray!(int) 2. have slices (ranges) of array, where the structure cannot be altered only the elements: Rang!(int) 3. decide if int[] stands for a short version for either RandomAccessArray!(int) or Range!(int), but DO NOT have a mixed meaning. GzpThe one thing that I think has been missing from this discussion is, what would be the alternative if we didn't have this "non-deterministic" reallocation? How else could you **efficiently** implement dynamic arrays?In the long run (D3), I proposed using the "unique" type modifier. If an array is unique, the compiler knows that there are no slices to worry about, and it can use in-place reallocation to its heart content. That pretty much solves the performance problem. In the short run (D2), I would suggest sticking to "reallocate on every extension" semantics (especially in SafeD) and provide a library solution (a la C++ std::vector) where the performance of appending is an issue.
Nov 19 2009
On Thu, 19 Nov 2009 18:33:22 +0300, gzp <galap freemail.hu> wrote:Bartosz Milewski =C3=ADrta:=dsimcha Wrote:The one thing that I think has been missing from this discussion is,=what would be the alternative if we didn't have this "non-deterministic" =f =reallocation? How else could you **efficiently** implement dynamic arrays?In the long run (D3), I proposed using the "unique" type modifier. I=an array is unique, the compiler knows that there are no slices to =t. =worry about, and it can use in-place reallocation to its heart conten=That pretty much solves the performance problem. In the short run =(D2), I would suggest sticking to "reallocate on every extension" ==semantics (especially in SafeD) and provide a library solution (a la =C++ std::vector) where the performance of appending is an issue.It seems as we cannot persuade Andrei and Walter, that it is a =really-really bad and dangerous implementation. Everybody knows what i=s =going on when an int[] "array" is resized. It doesn't really matters, ==how we call it, indeterminism, undefined behaviour, implementation =dependent behaviour. But we all know it's a source of a hardly =detectable, dangerous bug. As I've said it before it's worse than buffer overrun, as they can be ==detected using some memory patterns and memory guarding techniques. Bu=t =this bug cannot be detected, it just occurs once in a while, (like =implementing parallel programs without the proper critical sections.) It's hard enough to create good (parallel) programs by itself, so don'=tharden the work with placing such a trap. So please take a look at the language with a different eye. I know =you've worked a lot on it, but I'm afraid such a mistake could ruin th=e =whole language. Sorry. Or at least just think why are there so many =posts concerning this feature. int[] is not a range, it is a range + random_copy_on_resize (replace t=he =word random with any of your choice, it is not a matter) I think we can agree that passing "array" int[] as a (non-cont) value ==type is highly avoidable. Than why do we allow it ??? With std::vector + iterator you always now when it is invalid (when th=e =array is resized). But here you don't have an object whose modificatio=n =(surly) invalidates the ranges. I'm missing to have reference for the ==actual array. The semantics is not so important for me (now), but distinct the two object: 1. have array those contains the element + structure, that can be =resized: RandomAccessArray!(int) 2. have slices (ranges) of array, where the structure cannot be altere=d =only the elements: Rang!(int) 3. decide if int[] stands for a short version for either =RandomAccessArray!(int) or Range!(int), but DO NOT have a mixed meanin=g.GzpSame here. FWIW, I strongly believe demise of T[new] was a step in a wro= ng = direction, and I feel highly frustrated about it. It was one of the most= = anticipated features that was asked for *years*! And it's gone when it w= as = so close to be implemented... The only more or less reasonable answer why T[new] "sucked" was: On Mon, 19 Oct 2009 01:55:28 +0400, Andrei Alexandrescu = <SeeWebsiteForEmail erdani.org> wrote:Returning a distinct type from .dup and ~ makes slices not closed over==these operations, a source of complication, confusion, and bloating.I see no problem returning T[] when a slice is dup'ed or concatenated. = That's what always happened anyway.
Nov 19 2009
Denis Koroskin wrote:Same here. FWIW, I strongly believe demise of T[new] was a step in a wrong direction, and I feel highly frustrated about it. It was one of the most anticipated features that was asked for *years*! And it's gone when it was so close to be implemented... The only more or less reasonable answer why T[new] "sucked" was: On Mon, 19 Oct 2009 01:55:28 +0400, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:I think Walter and Andrei just felt like many people do when they have to adapt to a new environment or some other big change: the new thing "sucks", you feel bad, you thought "the past was better", you want go go back. Also, you don't see the advantages, you only get riled up by the disadvantage (because _every_ change has disadvantages; that's only natural). So they just threw it away. Also, I don't quite understand why they find the new semantics caused by the append-caching simpler. Even Bartosz got worried about the undeterministic behavior of this.Returning a distinct type from .dup and ~ makes slices not closed over these operations, a source of complication, confusion, and bloating.I see no problem returning T[] when a slice is dup'ed or concatenated. That's what always happened anyway.
Nov 19 2009
grauzone wrote:Denis Koroskin wrote:However flattering the attention to Walter's and my psychology could be, it may be difficult to find supporting evidence for the theory above. AndreiSame here. FWIW, I strongly believe demise of T[new] was a step in a wrong direction, and I feel highly frustrated about it. It was one of the most anticipated features that was asked for *years*! And it's gone when it was so close to be implemented... The only more or less reasonable answer why T[new] "sucked" was: On Mon, 19 Oct 2009 01:55:28 +0400, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:I think Walter and Andrei just felt like many people do when they have to adapt to a new environment or some other big change: the new thing "sucks", you feel bad, you thought "the past was better", you want go go back. Also, you don't see the advantages, you only get riled up by the disadvantage (because _every_ change has disadvantages; that's only natural). So they just threw it away. Also, I don't quite understand why they find the new semantics caused by the append-caching simpler. Even Bartosz got worried about the undeterministic behavior of this.Returning a distinct type from .dup and ~ makes slices not closed over these operations, a source of complication, confusion, and bloating.I see no problem returning T[] when a slice is dup'ed or concatenated. That's what always happened anyway.
Nov 19 2009