digitalmars.D - Change representation of dynamic arrays?
- Walter Bright (38/38) Oct 19 2007 Currently, arrays are represented under the hood as:
- Daniel Keep (14/15) Oct 19 2007 I'm hesitant to break something like this which is basically invisible
- 0ffh (15/19) Oct 19 2007 I guess we go from changing one pointer and a size_t to changing two
- Daniel Keep (20/43) Oct 20 2007 Well, that's definitely more work to compute end, although in most
- Janice Caron (2/3) Oct 20 2007 tst.end = str.end - e*wchar.sizeof;
- Janice Caron (5/8) Oct 20 2007 See, the $ wouldn't expand to (str.end-str.start) as you implied, it
- Daniel Keep (7/16) Oct 20 2007 Ah yes, silly me for quoting something Walter wrote without thinking
- David Brown (6/10) Oct 19 2007 I wouldn't expect the performance to change in this case. You're doing ...
- Janice Caron (17/19) Oct 20 2007 While this may be optimised for simple arrays, it poses much more of a
- Walter Bright (4/26) Oct 20 2007 If you have a user-defined struct, you get to decide how it is
- Bill Baxter (14/40) Oct 20 2007 '$' doesn't work in user defined types. So the rest of your comment is
- Janice Caron (25/32) Oct 20 2007 I guess what I was thinking is... suppose I have an arbitrary,
- BCS (3/30) Oct 20 2007 if the return of .length is left unrestricted, then you can play games w...
- Oskar Linde (14/16) Oct 21 2007 Yes, it doesn't have to be more complicated than so. That would handle
- Janice Caron (24/25) Oct 21 2007 I think those are /excellent/ ideas.
- 0ffh (9/22) Oct 19 2007 Hmmm, currently I use printf("foo %s\n",std.string.toStringz(bar))
- Walter Bright (3/11) Oct 19 2007 printf("my string is %*.s\n", str.length, str.ptr);
- =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= (6/15) Oct 20 2007 When I try this in C, it gives a warning about the "length" argument:
-
Walter Bright
(2/19)
Oct 20 2007
I think I'm not going to think about it and just deprecate it
. - Chris Miller (8/22) Oct 19 2007 It sounds interesting...
- Walter Bright (2/7) Oct 19 2007 I think you answered it!
- Chris Miller (16/23) Oct 20 2007 I'm starting to think the change might not be such a good idea. Imagine ...
- Janice Caron (9/12) Oct 20 2007 It would be very counterintuitive if ("world"++ == "orld"), so I don't
- Janice Caron (8/14) Oct 20 2007 To add to that, iterators in general are not range checked, because
- Janice Caron (17/20) Oct 19 2007 This will break some of my old code, but none of my new code.
- Oskar Linde (5/21) Oct 19 2007 All array allocations in D allocate an extra byte just to prevent this.
- Walter Bright (3/35) Oct 20 2007 No prob, t=null already sets both members to 0.
- Janice Caron (5/15) Oct 20 2007 Both members of t, yes, but it won't stop s.end from pointing to t's
- Walter Bright (2/4) Oct 20 2007 Yes, an extra byte is allocated for just this reason.
- Jascha Wetzel (5/10) Oct 20 2007 couldn't the end pointer just point to the last element?
- Janice Caron (7/14) Oct 20 2007 Oooh yuk. How horrible. I'd rather that didn't happen. For one thing,
- Jascha Wetzel (11/26) Oct 20 2007 one can argue, that this way, empty arrays have an unambiguous
- Janice Caron (15/19) Oct 20 2007 I will argue below that this is not a good thing.
- Bill Baxter (6/30) Oct 20 2007 Not quite true. if(str){} tests only the .ptr part and ignores the
- Janice Caron (9/11) Oct 20 2007 Ooh, that's nasty. I've never tried that.
- BCS (3/42) Oct 20 2007 Ohh. That's another feature for an IDE: Find this pattern in the syntax/...
- Janice Caron (11/14) Oct 20 2007 Ooh - just noticed this.
- Janice Caron (14/14) Oct 20 2007 Walter, I think we might want a new property, isEmpty, for arrays (and
- David Brown (10/14) Oct 20 2007 The compiler should never generate an actual division for a compile-time
- Janice Caron (22/28) Oct 20 2007 I hadn't thought about that before, but yes, you're right. The rule
- David Brown (4/5) Oct 20 2007 The gcc backend certainly does. I don't know about dmd. I don't have i...
- =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= (4/11) Oct 20 2007 Or maybe just move it from object.d and hello.d, over to std.c.stdio ?
- Oskar Linde (21/40) Oct 19 2007 This has been nagging me too. Doing a = a[1..$]; has never felt very
- Janice Caron (19/23) Oct 20 2007 I'd be happy for it to break if the break could be detected at compile
- Daniel Keep (15/35) Oct 20 2007 You're calling a function that completely bypasses the type system, and
- Janice Caron (7/16) Oct 20 2007 I would expect it to fail to compile, and I would also expect to be
- Oskar Linde (10/29) Oct 20 2007 I guess there could be a warning (or error) passing a D array as a C
- Walter Bright (2/15) Oct 20 2007 Yes.
- =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= (7/18) Oct 20 2007 I think you meant to write printf("my string is %.*s\n", str);
- Walter Bright (3/28) Oct 20 2007 LOL!
- Janice Caron (5/7) Oct 20 2007 That's pointer arithmetic, of course, so looked at in pure assembler
- Walter Bright (6/16) Oct 20 2007 Divides are expensive. It's mitigated by:
- David Brown (5/10) Oct 20 2007 At least with the gcc backend, divides by constants almost always turn i...
- Walter Bright (6/16) Oct 20 2007 Divides are expensive. It's mitigated by:
- Janice Caron (11/13) Oct 20 2007 It also breaks binary compatibility with object files already
- Vladimir Panteleev (6/16) Oct 20 2007 I guess that could be fixed by changing the name mangling for dynamic ar...
- Jascha Wetzel (5/6) Oct 20 2007 Since this is a different ABI, the debug symbols need to indicate which
- Witold Baryluk (11/31) Oct 20 2007 What about garbage collector? And pointer that doesn't points to
- Sean Kelly (16/66) Oct 20 2007 This isn't a compelling reason to keep the current implementation. As
- Knud Soerensen (5/8) Oct 20 2007 I say go for it!
- Derek Parnell (23/24) Oct 21 2007 My first concern was for the speed of accessing the length, either throu...
- Don Clugston (13/30) Oct 21 2007 This seems to be a more powerful construction than .ptr + .length. If yo...
- Graham St Jack (11/11) Oct 21 2007 If we are considering changing the internal representation of dynamic
- David Brown (14/18) Oct 21 2007 D already makes use of this information in the GC. In fact, allocations
- Graham St Jack (2/25) Oct 21 2007 Cool - thanks.
- Kris (7/45) Oct 21 2007 Can anyone explain why the compiler/iterators/foreach would insist on us...
- Steven Schveighoffer (25/31) Oct 22 2007 Yes, but checking the length (which is something I do often in my code) ...
- Kris (5/40) Oct 22 2007 Yeah, I'm with Steve on this. There's more than one way to skin this cat...
Currently, arrays are represented under the hood as: size_t lengthOfArray; void* ptrToStartOfArray; Which works out reasonably well. The problem is if you want to use array types as the basis of iterators, and you want to step through the array. There's no escaping it being two operations: decrement the length increment the pointer This puts a brick in any fast implementation of iterators. To fix that, we can change the representation to: void* ptrToStartOfArray; void* ptrPastEndOfArray; Then there's just one increment. Some tests show this can improve loop performance by up to 70%. So, what does this not break? 1) Doesn't break array.ptr, this will still work. 2) Doesn't break array.length as rvalue, as this is rewritten by the compiler as (array.end - array.start). 3) Doesn't break array.length as an lvalue, as that is handled by the runtime library anyway. 4) Won't break anything on D 1.0, as it wouldn't get this change. 5) Won't break array slices, or any of that stuff we love about D arrays. What does this break? 1) Passing dynamic arrays to printf as in: printf("my string is %*.s\n", str); which relied on the under-the-hood representation. This doesn't work on some architectures anyway, and is thoroughly obsolete. One could quickly fix such code by writing it as: printf("my string is %*.s\n", str.length, str.ptr); 2) It breaks the internal library support code, but that's my problem. 3) It breaks binary compatibility with libraries already compiled. But we expect to break binary compatibility with D 2.0. 4) It breaks things like cast(ulong)str, if one was crazy enough to do that anyway. 5) It breaks anything that tries to look at the underlying representation of dynamic arrays - but such code should be rewritten to use .ptr and .length anyway, or slice notation. So, what do you think?
Oct 19 2007
Walter Bright wrote:[stuff]I'm hesitant to break something like this which is basically invisible to the end user. Then again, a 70% speed increase is not something to turn one's nose up at! And, as you said, you D 2.0 was forked so that changes like this could be made. You say you've measured the performance of loops using this, but what about slicing code? I know that I use a lot of [lower..$-something] style code; how much slower does this become? Once I'm not so busy, I might try to work out the proportion of foreach to $ slicing in my code. :P But, if the gains are there, and there's nothing critical that it breaks that you haven't thought of, I can't see any reason not to do it. -- Daniel P.S. I assume that array.ptr = array.start, and we'll have access to array.end, yes?
Oct 19 2007
Daniel Keep wrote:You say you've measured the performance of loops using this, but what about slicing code? I know that I use a lot of [lower..$-something] style code; how much slower does this become? Once I'm not so busy, I might try to work out the proportion of foreach to $ slicing in my code. :PI guess we go from changing one pointer and a size_t to changing two pointers, I can't see a lot of space for that being slower than now. Consider: wchar[] str="this is a test"; int s=10; // slice start int e=14; // slice end tst=s[s..e]; Old array: tst.ptr=str.ptr+s*wchar.sizeof; tst.len=e-s; New array: tst.start=str.start+s*wchar.sizeof; tst.end=str.start+e*wchar.sizeof; Regards, Frank
Oct 19 2007
0ffh wrote:Daniel Keep wrote:Well, that's definitely more work to compute end, although in most cases, the multiply could be reduced to a bit shift. I was talking about cases where we slice using the length of the array, since now we have to actually go and compute the length instead of simply looking it up. e = 2; tst = str[s .. $-e]; Old array: tst.ptr = str.ptr + s*wchar.sizeof; tst.len = (str.length-e) - s; New array: tst.start = str.start + s*wchar.sizeof; //tst.end = str.start + (str.length-e)*wchar.sizeof; tst.end = str.start + ((str.end-str.start)-e)*wchar.sizeof; So we've gone from two subtractions to an addition, two subtractions and a multiply; that's double the number of operations, one of which is an expensive multiply. If the primary justification for this change is eliminating a decrement operation for iteration, I think it's very relevant to look at how it's going to affect other code! -- DanielYou say you've measured the performance of loops using this, but what about slicing code? I know that I use a lot of [lower..$-something] style code; how much slower does this become? Once I'm not so busy, I might try to work out the proportion of foreach to $ slicing in my code. :PI guess we go from changing one pointer and a size_t to changing two pointers, I can't see a lot of space for that being slower than now. Consider: wchar[] str="this is a test"; int s=10; // slice start int e=14; // slice end tst=s[s..e]; Old array: tst.ptr=str.ptr+s*wchar.sizeof; tst.len=e-s; New array: tst.start=str.start+s*wchar.sizeof; tst.end=str.start+e*wchar.sizeof; Regards, Frank
Oct 20 2007
On 10/20/07, Daniel Keep <daniel.keep.lists gmail.com> wrote:tst.end = str.start + ((str.end-str.start)-e)*wchar.sizeof;tst.end = str.end - e*wchar.sizeof;
Oct 20 2007
On 10/20/07, Janice Caron <caron800 googlemail.com> wrote:On 10/20/07, Daniel Keep <daniel.keep.lists gmail.com> wrote:See, the $ wouldn't expand to (str.end-str.start) as you implied, it would expand to ((str.end-str.start)/wchar.sizeof). Obviously the two str.starts cancel each other out, so you can adjust the end pointer (relative to $) without looking at the start pointer at all.tst.end = str.start + ((str.end-str.start)-e)*wchar.sizeof;tst.end = str.end - e*wchar.sizeof;
Oct 20 2007
Janice Caron wrote:On 10/20/07, Janice Caron <caron800 googlemail.com> wrote:Ah yes, silly me for quoting something Walter wrote without thinking about it first. :POn 10/20/07, Daniel Keep <daniel.keep.lists gmail.com> wrote:See, the $ wouldn't expand to (str.end-str.start) as you implied, ittst.end = str.start + ((str.end-str.start)-e)*wchar.sizeof;tst.end = str.end - e*wchar.sizeof;would expand to ((str.end-str.start)/wchar.sizeof). Obviously the two str.starts cancel each other out, so you can adjust the end pointer (relative to $) without looking at the start pointer at all.Assuming the compiler is smart enough to do algebraic elimination. I think I'll also have to take a look at what code the compiler generates in these cases. -- Daniel
Oct 20 2007
On Sat, Oct 20, 2007 at 01:11:44PM +1000, Daniel Keep wrote:You say you've measured the performance of loops using this, but what about slicing code? I know that I use a lot of [lower..$-something] style code; how much slower does this become? Once I'm not so busy, I might try to work out the proportion of foreach to $ slicing in my code. :PI wouldn't expect the performance to change in this case. You're doing an addition to adjust the start pointer, and a subtraction to adjust the length. The new representation would do the same addition and subtraction. In fact, it looks like it would even generate the same code. David
Oct 19 2007
On 10/20/07, Daniel Keep <daniel.keep.lists gmail.com> wrote:but what about slicing code? I know that I use a lot of [lower..$-something]While this may be optimised for simple arrays, it poses much more of a problem for classes and structs which override opIndex() and opSlice(). The problem is that indeces in those functions are always specified as being relative to the beginning - there is no way to specify that an index should be considered relative to the end - so when you do a[0..$-1] in such a struct, the compiler has to convert $-1 into (a.length()-1). Then, inside opSlice(), to turn it back to a pointer you'd have to do (start+index). I know adds and subtracts are not that expensive, but given that this change is being talked about /because/ we want to get rid of a subtract, it is worth thinking about this. Presumably this will all sort itself out if we get iterators? When that happens, slice expressions could be rewritten by the compiler as iterator operations, which in turn would call opDereference() or opRange(), taking iterator parameters. Is that likely to happen, or will user-defined types always take the slow route?
Oct 20 2007
Janice Caron wrote:On 10/20/07, Daniel Keep <daniel.keep.lists gmail.com> wrote:If you have a user-defined struct, you get to decide how it is implemented. How the built-in arrays are done under the hood shouldn't be relevant.but what about slicing code? I know that I use a lot of [lower..$-something]While this may be optimised for simple arrays, it poses much more of a problem for classes and structs which override opIndex() and opSlice(). The problem is that indeces in those functions are always specified as being relative to the beginning - there is no way to specify that an index should be considered relative to the end - so when you do a[0..$-1] in such a struct, the compiler has to convert $-1 into (a.length()-1). Then, inside opSlice(), to turn it back to a pointer you'd have to do (start+index). I know adds and subtracts are not that expensive, but given that this change is being talked about /because/ we want to get rid of a subtract, it is worth thinking about this. Presumably this will all sort itself out if we get iterators? When that happens, slice expressions could be rewritten by the compiler as iterator operations, which in turn would call opDereference() or opRange(), taking iterator parameters. Is that likely to happen, or will user-defined types always take the slow route?
Oct 20 2007
Walter Bright wrote:Janice Caron wrote:'$' doesn't work in user defined types. So the rest of your comment is only relevant to some hypothetical future D where $ is allowed for user types. Andrei strongly favored making it mean .length everywhere if I recall correctly, so if anything happens with it that's probably what it will be, and your comment will be relevant.On 10/20/07, Daniel Keep <daniel.keep.lists gmail.com> wrote:but what about slicing code? I know that I use a lot of [lower..$-something]While this may be optimised for simple arrays, it poses much more of a problem for classes and structs which override opIndex() and opSlice(). The problem is that indeces in those functions are always specified as being relative to the beginning - there is no way to specify that an index should be considered relative to the end - so when you do a[0..$-1] in such a struct, the compiler has to convert $-1 into (a.length()-1). Then, inside opSlice(), to turn it back to a pointer you'd have to do (start+index).It becomes relevant if $ becomes a synonym for .length as was previously proposed. In that case you don't get to decide how $ is implemented. It has to compute the length, even if that's O(N) and not really needed for computing the $-1'th node. Or if all you're trying to do is make a light wrapper around a built-in array (to, say, add an extra data member to it), then it means that your opIndex could be significantly less efficient than the built-in one. --bbI know adds and subtracts are not that expensive, but given that this change is being talked about /because/ we want to get rid of a subtract, it is worth thinking about this. Presumably this will all sort itself out if we get iterators? When that happens, slice expressions could be rewritten by the compiler as iterator operations, which in turn would call opDereference() or opRange(), taking iterator parameters. Is that likely to happen, or will user-defined types always take the slow route?If you have a user-defined struct, you get to decide how it is implemented. How the built-in arrays are done under the hood shouldn't be relevant.
Oct 20 2007
On 10/20/07, Bill Baxter <dnewsgroup billbaxter.com> wrote:It becomes relevant if $ becomes a synonym for .length as was previously proposed. In that case you don't get to decide how $ is implemented. It has to compute the length, even if that's O(N) and not really needed for computing the $-1'th node. Or if all you're trying to do is make a light wrapper around a built-in array (to, say, add an extra data member to it), then it means that your opIndex could be significantly less efficient than the built-in one.I guess what I was thinking is... suppose I have an arbitrary, user-defined collection class: MyCollection!(int) c; and I want to dereference the last element. The source code might be: int n = c[$-1]; Now, currently, (or if $ gets defined as length), that would get turned into: int n = c.opIndex(c.length()-1); which /might/ be computationally intensive, depending on what MyCollection does with length() and opIndex(). The thing is, Walter seemed to imply that iterators were on their way, so what I was thinking was, if int n = c[$-1]; could instead be translated into int n = *(c.end()-1); then the computational overhead would vanish. I think we'd need something like opDereference() to overload the unary *. But, see, you'd never actually have to /write/ the *, because you'd still write "c[$-1]" - it's just what's going on under the hood. Similarly, I'm thinking the expression c[a,$-b] could turn into opRange(c.begin()+a, c.end()-b), instead of c.opIndex(a, c.length()-1). But it's all just pie in the sky right now, because I've got absolutely no idea what, if anything, Walter plans to do with collections and iterators.
Oct 20 2007
Reply to Janice,On 10/20/07, Bill Baxter <dnewsgroup billbaxter.com> wrote:if the return of .length is left unrestricted, then you can play games with having it return a proxy type that another opIndex then uses.It becomes relevant if $ becomes a synonym for .length as was previously proposed. In that case you don't get to decide how $ is implemented. It has to compute the length, even if that's O(N) and not really needed for computing the $-1'th node. Or if all you're trying to do is make a light wrapper around a built-in array (to, say, add an extra data member to it), then it means that your opIndex could be significantly less efficient than the built-in one.I guess what I was thinking is... suppose I have an arbitrary, user-defined collection class: MyCollection!(int) c; and I want to dereference the last element. The source code might be: int n = c[$-1]; Now, currently, (or if $ gets defined as length), that would get turned into: int n = c.opIndex(c.length()-1);
Oct 20 2007
BCS wrote:if the return of .length is left unrestricted, then you can play games with having it return a proxy type that another opIndex then uses.Yes, it doesn't have to be more complicated than so. That would handle the multidimensional case as well. Possibly let $ -> opDollar() instead of length(). Then, all we need is to make a..b turn into a special range type and depreciate opSlice. The remaining problem would be that while .length always explicitly refers to an object. $ only does that implicitly within index operations. A global _dollar() would solve that and allow: auto a = 1 .. $-1; auto b = $-5 .. $; auto c = myObj[a,b]; But that might be too much to ask for. -- Oskar
Oct 21 2007
On 10/21/07, Oskar Linde <oskar.lindeREM ovegmail.com> wrote:But that might be too much to ask for.I think those are /excellent/ ideas. So I assume opDollar would return some proxy type which represents "offset from end", and looks something like struct Dollar { int offset = 0; static Dollar opCall(int n) { Dollar d; d.offset=n; return d; } Dollar opAdd(int n) { return Dollar(offset+n); } Dollar opAdd_r(int n) { return Dollar(offset+n); } Dollar opSub(int n) { return Dollar(offset-n); } } and x..y returns a Slice!(typeof(x),typeof(y)) where struct Slice(T,U) { T x; U y; } (no need to implement opDotDot - that can happen automatically), and then opIndex knows to expect either an integer, a Dollar, or Slice whose elements are integers or Dollars. (More exotic uses of course can be imagined, but that would be enough to emulate current behavior). I like it!
Oct 21 2007
Walter Bright wrote:[see news://news.digitalmars.com:119/ffbr56$r40$1 digitalmars.com] What does this break? 1) Passing dynamic arrays to printf as in: printf("my string is %*.s\n", str);Hmmm, currently I use printf("foo %s\n",std.string.toStringz(bar)) anyway, never tried what you're doing there. Did I miss something?2) It breaks the internal library support code, but that's my problem.Si.3) It breaks binary compatibility with libraries already compiled. But we expect to break binary compatibility with D 2.0.No sweat.4) It breaks things like cast(ulong)str, if one was crazy enough to do that anyway.Never needed it... =)5) It breaks anything that tries to look at the underlying representation of dynamic arrays - but such code should be rewritten to use .ptr and .length anyway, or slice notation.Been a good boy and used .ptr and .length, so I'm fine with it... grin!So, what do you think?I'm all for it! Regards, Frank
Oct 19 2007
0ffh wrote:Walter Bright wrote:printf("my string is %*.s\n", str.length, str.ptr); will be a little faster, because it doesn't need to allocate memory.[see news://news.digitalmars.com:119/ffbr56$r40$1 digitalmars.com] What does this break? 1) Passing dynamic arrays to printf as in: printf("my string is %*.s\n", str);Hmmm, currently I use printf("foo %s\n",std.string.toStringz(bar)) anyway, never tried what you're doing there. Did I miss something?
Oct 19 2007
Walter Bright wrote:When I try this in C, it gives a warning about the "length" argument: field precision should have type 'int', but argument 2 has type 'size_t' Wouldn't this warning apply to C too, when using the printf function ? I'm thinking it might need a cast(int) to work properly on 64-bit... ? --andersprintf("my string is %*.s\n", str.length, str.ptr); will be a little faster, because it doesn't need to allocate memory.1) Passing dynamic arrays to printf as in: printf("my string is %*.s\n", str);Hmmm, currently I use printf("foo %s\n",std.string.toStringz(bar)) anyway, never tried what you're doing there. Did I miss something?
Oct 20 2007
Anders F Björklund wrote:Walter Bright wrote:I think I'm not going to think about it and just deprecate it <g>.When I try this in C, it gives a warning about the "length" argument: field precision should have type 'int', but argument 2 has type 'size_t' Wouldn't this warning apply to C too, when using the printf function ? I'm thinking it might need a cast(int) to work properly on 64-bit... ?printf("my string is %*.s\n", str.length, str.ptr); will be a little faster, because it doesn't need to allocate memory.1) Passing dynamic arrays to printf as in: printf("my string is %*.s\n", str);Hmmm, currently I use printf("foo %s\n",std.string.toStringz(bar)) anyway, never tried what you're doing there. Did I miss something?
Oct 20 2007
On Fri, 19 Oct 2007 23:03:02 -0400, Walter Bright <newshound1 digitalmars.com> wrote:Currently, arrays are represented under the hood as: size_t lengthOfArray; void* ptrToStartOfArray; Which works out reasonably well. The problem is if you want to use array types as the basis of iterators, and you want to step through the array. There's no escaping it being two operations: decrement the length increment the pointer This puts a brick in any fast implementation of iterators. To fix that, we can change the representation to: void* ptrToStartOfArray; void* ptrPastEndOfArray; Then there's just one increment. Some tests show this can improve loop performance by up to 70%.It sounds interesting... I wonder what kind of iterators you have in mind. Why couldn't the iterators just hold start and end pointers? It would only need to be calculated when creating an iterator from a non-iterator. Or perhaps you want a slice to be an iterator itself, so you could just do slice++ and it slices out the first element.
Oct 19 2007
Chris Miller wrote:I wonder what kind of iterators you have in mind. Why couldn't the iterators just hold start and end pointers? It would only need to be calculated when creating an iterator from a non-iterator. Or perhaps you want a slice to be an iterator itself, so you could just do slice++ and it slices out the first element.I think you answered it!
Oct 19 2007
On Sat, 20 Oct 2007 02:27:29 -0400, Walter Bright <newshound1 digitalmars.com> wrote:Chris Miller wrote:I'm starting to think the change might not be such a good idea. Imagine all the existing code extensively using length. This change is for one thing, that could probably just as easily be improved by changing how that one thing is implemented, without affecting all this other code. It's not even really about supporting legacy usage; the other methods are probably mostly quite valid still, and iterators won't solve all those cases. Also, doing slice++ doesn't allow slice-- safely (not that all forward iterators are backwards iterators, but an iterator from a slice should be bidirectional). Slice iterators should probably have 3 pointers: beginning, end, and current position. Just build an iterator from a slice, which is a couple extra cycles up front. I really want to hear more about what's going on, instead of guessing how grand things are going to be, or not going to be. How about another newsgroup for possible future plans, like digitalmars.D.alphaI wonder what kind of iterators you have in mind. Why couldn't the iterators just hold start and end pointers? It would only need to be calculated when creating an iterator from a non-iterator. Or perhaps you want a slice to be an iterator itself, so you could just do slice++ and it slices out the first element.I think you answered it!
Oct 20 2007
On 10/20/07, Chris Miller <chris dprogramming.com> wrote:Or perhaps you want a slice to be an iterator itself, so you could just do slice++ and it slices out the first element.It would be very counterintuitive if ("world"++ == "orld"), so I don't think I'd like that. In C++ terms, an interator is often just a pointer (plus some operator overloads), and an array of [ begin, end ] iterators is called a range. Looked at like that, an array slice in the new scheme would be an array of two iterators (i.e. a range), rather than a single iterator. Or so it seems to me.
Oct 20 2007
On 10/20/07, Janice Caron <caron800 googlemail.com> wrote:On 10/20/07, Chris Miller <chris dprogramming.com> wrote:To add to that, iterators in general are not range checked, because they are generalised pointers, not generalised indeces. You could argue that perhaps they /should/ be range checked (at least, in debug builds in D), but to achieve that, you'd need a struct containing not two, but /three/ pointers - one to the start of the slice, one to the end of the slice, and one to the thing pointed at. Otherwise you'd never be able to do --p.Or perhaps you want a slice to be an iterator itself, so you could just do slice++ and it slices out the first element.It would be very counterintuitive if ("world"++ == "orld"), so I don't think I'd like that.
Oct 20 2007
On 10/20/07, Walter Bright <newshound1 digitalmars.com> wrote:1) Passing dynamic arrays to printf as in: printf("my string is %*.s\n", str);This will break some of my old code, but none of my new code. What you could do to fix that is to break it even more (ironically). Make "printf" a reserved word, or an alias for writef, or /something/ which will cause any use of it to become a compile error.So, what do you think?I had always assumed that, from the start, you designed it as { ptr, length } deliberately in order to help the garbage collector. If you change it, then you will be maintaining a pointer (the end pointer) to something beyond the end of the array. For example, suppose I do: int[] s = new int[4]; int[] t = new int[1000]; now t.begin might end up being equal to s.end. But now when we do t = null; or t goes out of scope or we otherwise assign t, suddenly that array can no longer be freed, because the end pointer of s points to it. (Not that I particularly care. Just pointing it out. Go for it).
Oct 19 2007
Janice Caron wrote:I had always assumed that, from the start, you designed it as { ptr, length } deliberately in order to help the garbage collector. If you change it, then you will be maintaining a pointer (the end pointer) to something beyond the end of the array. For example, suppose I do: int[] s = new int[4]; int[] t = new int[1000]; now t.begin might end up being equal to s.end.All array allocations in D allocate an extra byte just to prevent this. But now when we dot = null; or t goes out of scope or we otherwise assign t, suddenly that array can no longer be freed, because the end pointer of s points to it.-- Oskar
Oct 19 2007
Janice Caron wrote:On 10/20/07, Walter Bright <newshound1 digitalmars.com> wrote:Just add 'deprecate' <g>.1) Passing dynamic arrays to printf as in: printf("my string is %*.s\n", str);This will break some of my old code, but none of my new code. What you could do to fix that is to break it even more (ironically). Make "printf" a reserved word, or an alias for writef, or /something/ which will cause any use of it to become a compile error.No prob, t=null already sets both members to 0.So, what do you think?I had always assumed that, from the start, you designed it as { ptr, length } deliberately in order to help the garbage collector. If you change it, then you will be maintaining a pointer (the end pointer) to something beyond the end of the array. For example, suppose I do: int[] s = new int[4]; int[] t = new int[1000]; now t.begin might end up being equal to s.end. But now when we do t = null; or t goes out of scope or we otherwise assign t, suddenly that array can no longer be freed, because the end pointer of s points to it.(Not that I particularly care. Just pointing it out. Go for it).
Oct 20 2007
On 10/20/07, Walter Bright <newshound1 digitalmars.com> wrote:Both members of t, yes, but it won't stop s.end from pointing to t's allocated memory. That said, someone posted that you always allocate an extra byte. If that's true, problem solved.int[] s = new int[4]; int[] t = new int[1000]; now t.begin might end up being equal to s.end. But now when we do t = null; or t goes out of scope or we otherwise assign t, suddenly that array can no longer be freed, because the end pointer of s points to it.No prob, t=null already sets both members to 0.
Oct 20 2007
Janice Caron wrote:That said, someone posted that you always allocate an extra byte. If that's true, problem solved.Yes, an extra byte is allocated for just this reason.
Oct 20 2007
Walter Bright wrote:Janice Caron wrote:couldn't the end pointer just point to the last element? iteration could use <= the length would be end-start+1 etc.That said, someone posted that you always allocate an extra byte. If that's true, problem solved.Yes, an extra byte is allocated for just this reason.
Oct 20 2007
On 10/20/07, Jascha Wetzel <firstname mainia.de> wrote:Walter Bright wrote:Oooh yuk. How horrible. I'd rather that didn't happen. For one thing, it means that empty arrays (and empty collection ranges generally) would have to have a different representation. No, I like the "standard" way of doing things, whereby the end iterator refences the last+1 element (and therefore cannot be dereferenced).Janice Caron wrote:couldn't the end pointer just point to the last element?That said, someone posted that you always allocate an extra byte. If that's true, problem solved.Yes, an extra byte is allocated for just this reason.
Oct 20 2007
Janice Caron wrote:On 10/20/07, Jascha Wetzel <firstname mainia.de> wrote:one can argue, that this way, empty arrays have an unambiguous representation (they are always null). there is enough code like if ( str is null || str == "" ) which wouldn't be necessary. the current semantics deal with null arrays in the same way as with empty arrays (length == 0, appending works alike). therefore, the additional expressiveness one would usually expect (empty array vs. invalid array) isn't quite there. but i still wonder, whether there would be performance issues.Walter Bright wrote:Oooh yuk. How horrible. I'd rather that didn't happen. For one thing, it means that empty arrays (and empty collection ranges generally) would have to have a different representation. No, I like the "standard" way of doing things, whereby the end iterator refences the last+1 element (and therefore cannot be dereferenced).Janice Caron wrote:couldn't the end pointer just point to the last element?That said, someone posted that you always allocate an extra byte. If that's true, problem solved.Yes, an extra byte is allocated for just this reason.
Oct 20 2007
On 10/20/07, Jascha Wetzel <firstname mainia.de> wrote:one can argue, that this way, empty arrays have an unambiguous representation (they are always null).I will argue below that this is not a good thing. Suppose I write a parser function which (say) parses an integer. It would have a signature like bool parseInt(ref string s, out int n) The idea is that if the function returns true, then the parsed int is returned in n, and s's pointer is advanced past the bytes consumed. However, I don't want s.ptr to be set to null, even if we consume all the bytes. For example, I might want to do a pointer subtraction later on. Or I might want to be able to extract the pointer because s was merely a slice of a larger string. For all sorts of reasons, allowing empty strings to have an address does make sense.there is enough code like if ( str is null || str == "" )I always write if (str.length == 0) to test for empty strings (and arrays generally). That works just fine, always.
Oct 20 2007
Jascha Wetzel wrote:Janice Caron wrote:Not quite true. if(str){} tests only the .ptr part and ignores the length (as I quite painfully discovered recently... Sure wish there were an easy way to find all the places in my code where I used that expression instead of .length==0.). --bbOn 10/20/07, Jascha Wetzel <firstname mainia.de> wrote:one can argue, that this way, empty arrays have an unambiguous representation (they are always null). there is enough code like if ( str is null || str == "" ) which wouldn't be necessary. the current semantics deal with null arrays in the same way as with empty arrays (length == 0, appending works alike).Walter Bright wrote:Oooh yuk. How horrible. I'd rather that didn't happen. For one thing, it means that empty arrays (and empty collection ranges generally) would have to have a different representation. No, I like the "standard" way of doing things, whereby the end iterator refences the last+1 element (and therefore cannot be dereferenced).Janice Caron wrote:couldn't the end pointer just point to the last element?That said, someone posted that you always allocate an extra byte. If that's true, problem solved.Yes, an extra byte is allocated for just this reason.
Oct 20 2007
On 10/20/07, Bill Baxter <dnewsgroup billbaxter.com> wrote:Not quite true. if(str){} tests only the .ptr part and ignores the lengthOoh, that's nasty. I've never tried that. Seems to me that either (a) if (array) shouldn't work - ie. arrays shouldn't implicitly cast to bool, or else (b) if (array) should be true iff the array is non-empty (which is not the same thing as non-null). I guess currently it tests whether or not the array is null - that is, unassigned or explicitly assigned with null. I'm not sure why that's useful.
Oct 20 2007
Reply to Bill,Jascha Wetzel wrote:Ohh. That's another feature for an IDE: Find this pattern in the syntax/semantic tree.Janice Caron wrote:Not quite true. if(str){} tests only the .ptr part and ignores the length (as I quite painfully discovered recently... Sure wish there were an easy way to find all the places in my code where I used that expression instead of .length==0.). --bbOn 10/20/07, Jascha Wetzel <firstname mainia.de> wrote:one can argue, that this way, empty arrays have an unambiguous representation (they are always null). there is enough code like if ( str is null || str == "" ) which wouldn't be necessary. the current semantics deal with null arrays in the same way as with empty arrays (length == 0, appending works alike).Walter Bright wrote:Oooh yuk. How horrible. I'd rather that didn't happen. For one thing, it means that empty arrays (and empty collection ranges generally) would have to have a different representation. No, I like the "standard" way of doing things, whereby the end iterator refences the last+1 element (and therefore cannot be dereferenced).Janice Caron wrote:couldn't the end pointer just point to the last element?That said, someone posted that you always allocate an extra byte. If that's true, problem solved.Yes, an extra byte is allocated for just this reason.
Oct 20 2007
On 10/20/07, Jascha Wetzel <firstname mainia.de> wrote:Ooh - just noticed this. The test is normally (p != end). You can't use <= with iterators in general because, if you did, the iterators would have to implement opCmp(), which would be an O(N) operation in the case of linked lists. != is O(1). With plain old arrays, for (p=start; p<=end; ++p) would fail with empty arrays if the empty array was represented as [ null, null ], because (null <= null) is true, so the for body would get executed once with p == null. (!) So there are /lots/ of reasons why end has to point /beyond/ the last byte.Yes, an extra byte is allocated for just this reason.couldn't the end pointer just point to the last element? iteration could use <=
Oct 20 2007
Walter, I think we might want a new property, isEmpty, for arrays (and collections in general). See, we can't test for (array == null), because as you know, a non-null array can still be empty. The test (array.length == 0) will still work, but since calculating .length might involve a divide, it's no longer guaranteed to be optimal. (Also, I can imagine collection in which .length might be O(N)). We could test for (array.start == array.end). In fact, that's probably the best approach, given what you've told us so far. But is there a guarantee that that would necessarily be the most efficient test for collections in general? I can imagine collections in which .end might involve a non-trivial calculation. .isEmpty would be clean, simple, and obvious.
Oct 20 2007
On Sat, Oct 20, 2007 at 04:44:06PM +0100, Janice Caron wrote:The test (array.length == 0) will still work, but since calculating .length might involve a divide, it's no longer guaranteed to be optimal. (Also, I can imagine collection in which .length might be O(N)).The compiler should never generate an actual division for a compile-time constant. For any constant division, there is an equivalent multiplication that will give the same result. Most modern architectures do multiply quickly. It is more work than strictly necessary to determine if the length is zero. However, (array.length == 0) is a real easy thing for an optimizer to see. It certainly doesn't justify adding a special construct to the language for it. It would be easier to just fix the optimizer if it misses it. David
Oct 20 2007
On 10/20/07, David Brown <dlang davidb.org> wrote:On Sat, Oct 20, 2007 at 04:44:06PM +0100, Janice Caron wrote: For any constant division, there is an equivalent multiplication that will give the same result.I hadn't thought about that before, but yes, you're right. The rule is, if two numbers a and b are relatively prime (that is to say: gcd(a,b)=1) there is exactly one number c=inv(a) mod b so that c*a mod b=1. So for what you suggest, b would have to be 2^32 (that is, uint.max+1) and a the number you want to divide by. In order to be coprime with b, a may /not/ be even, but that's not a problem because we can just do shifts until it isn't. So, for example, to divide by 7, you can just multiply by invmod(7,0x100000000). To divide by 14, you'd right-shift first, and then divide by 7. Cool! I wonder if D actually does things this way?However, (array.length == 0) is a real easy thing for an optimizer to see. It certainly doesn't justify adding a special construct to the language for it.I was thinking ahead. I was assuming we'd want the same semantics for all collections, not just arrays. C++ STL has an empty() function as well as a size() function for exactly that reason. For some collections, calculating .length() will be an O(N) operation, so for those collections, an isEmpty() functions would make sense. I agree it's less necessary for a plain old array, but it would help template metaprogramming enormously if something like isEmpty was available, so you wouldn't need to know what kind of collection you were dealing with. That said, it's easy enough to add as a standalone function if it's not built in.
Oct 20 2007
On Sat, Oct 20, 2007 at 06:12:07PM +0100, Janice Caron wrote:I wonder if D actually does things this way?The gcc backend certainly does. I don't know about dmd. I don't have it handy to check, but it wouldn't be hard to disassemble the output and look. David
Oct 20 2007
Walter Bright wrote:Or maybe just move it from object.d and hello.d, over to std.c.stdio ? :-P --andersThis will break some of my old code, but none of my new code. What you could do to fix that is to break it even more (ironically). Make "printf" a reserved word, or an alias for writef, or /something/ which will cause any use of it to become a compile error.Just add 'deprecate' <g>.
Oct 20 2007
Walter Bright wrote:Currently, arrays are represented under the hood as: size_t lengthOfArray; void* ptrToStartOfArray; Which works out reasonably well. The problem is if you want to use array types as the basis of iterators, and you want to step through the array. There's no escaping it being two operations: decrement the length increment the pointerThis has been nagging me too. Doing a = a[1..$]; has never felt very efficient. Will there be a different syntax for such idioms?What does this break? 1) Passing dynamic arrays to printf as in: printf("my string is %*.s\n", str);I'm glad that breaks. That code was never portable anyway, and has caused headaches porting D code to other platforms.4) It breaks things like cast(ulong)str, if one was crazy enough to do that anyway.This is something 64-bit D also breaks.So, what do you think?It really shows the power of the property syntax that one can transparently change the underlying representation like this. I am all for it (as far as I can see now). Two related things that have been holding me off D2 are waiting for some announced code breaking changes that will have quite some impact on how code is written (not mentioning const): 1. T[new] - Hopefully T[*] (hint ;) ) - Hopefully a reference type 2. Static arrays becoming value types - Being able to return them from functions - No longer having to litter the code with incompatible wrapper structs Are the ETA for those still far away? -- Oskar
Oct 19 2007
On 10/20/07, Oskar Linde <oskar.lindeREM ovegmail.com> wrote:I'd be happy for it to break if the break could be detected at compile time (or even at runtime with an assert). I'm not terrifically happy that source code that previously generated code that ran just fine would now generate code that will simply crash with no explanation. You have to remember that there was once a time when writef() and family did not exist, and that printf("%.*s") was the /recommended/ way of doing things, documented as such on the Digital Mars site. So you cannot blame anyone for using that idiom, say, three or four years ago. As for not portable, I'm surprised at that. I thought the C printf() function was fairly standardised? Ah well, it doesn't matter. I'm sure everyone here agrees that D folk should not use printf anyway, now that we've got better things. I'd be all in favor making any use of printf a compile error - say by making a global function which takes no parameters (forcing coders to explicity qualify the function name if they wanted to C printf). Note that this reasoning also applies to vprintf, fprintf, vfprintf, sprintf, snprintf, etc., etc...1) Passing dynamic arrays to printf as in: printf("my string is %*.s\n", str);I'm glad that breaks. That code was never portable anyway
Oct 20 2007
Janice Caron wrote:On 10/20/07, Oskar Linde <oskar.lindeREM ovegmail.com> wrote:You're calling a function that completely bypasses the type system, and you want compile-time checking on it? :S If you're going to play with fire (type-unsafe variadic C functions), you shouldn't be surprised when you get third-degree burns. ;)I'd be happy for it to break if the break could be detected at compile time (or even at runtime with an assert). I'm not terrifically happy that source code that previously generated code that ran just fine would now generate code that will simply crash with no explanation.1) Passing dynamic arrays to printf as in: printf("my string is %*.s\n", str);I'm glad that breaks. That code was never portable anywayYou have to remember that there was once a time when writef() and family did not exist, and that printf("%.*s") was the /recommended/ way of doing things, documented as such on the Digital Mars site. So you cannot blame anyone for using that idiom, say, three or four years ago.This is true, but remember that this is for D 2.0. If someone honestly expects to be able to take code written three years ago when the language was still in flux (and which, mind you, likely won't even compile with the current 1.0 compiler) and compile it on 2.0 without issues is, in my opinion, delusional.I'd be all in favor making any use of printf a compile error - say by making a global function which takes no parameters (forcing coders to explicity qualify the function name if they wanted to C printf). Note that this reasoning also applies to vprintf, fprintf, vfprintf, sprintf, snprintf, etc., etc...I think that removing it from object so that it can't be used without explicitly importing it would be sufficient. Maybe sticking a deprecated attribute on the printf declaration would also help. Maybe with a message saying "No, you want writef. Seriously." -- Daniel
Oct 20 2007
On 10/20/07, Daniel Keep <daniel.keep.lists gmail.com> wrote:This is true, but remember that this is for D 2.0. If someone honestly expects to be able to take code written three years ago when the language was still in flux (and which, mind you, likely won't even compile with the current 1.0 compiler) and compile it on 2.0 without issues is, in my opinion, delusional.I would expect it to fail to compile, and I would also expect to be able to use the compiler errors (or in extreme cases, runtime asserts) to tell me what needs to be changed. What I don't expect is no complaints at all, just a runtime, hard to debug, crash. (That's especially important if you're using code you didn't write yourself).think that removing it from object so that it can't be used without explicitly importing it would be sufficient. Maybe sticking a deprecated attribute on the printf declaration would also help. Maybe with a message saying "No, you want writef. Seriously."Both are excellent ideas.
Oct 20 2007
Janice Caron wrote:On 10/20/07, Oskar Linde <oskar.lindeREM ovegmail.com> wrote:I guess there could be a warning (or error) passing a D array as a C vararg argument.I'd be happy for it to break if the break could be detected at compile time (or even at runtime with an assert). I'm not terrifically happy that source code that previously generated code that ran just fine would now generate code that will simply crash with no explanation.1) Passing dynamic arrays to printf as in: printf("my string is %*.s\n", str);I'm glad that breaks. That code was never portable anywayYou have to remember that there was once a time when writef() and family did not exist, and that printf("%.*s") was the /recommended/ way of doing things, documented as such on the Digital Mars site. So you cannot blame anyone for using that idiom, say, three or four years ago.I am well aware of that, and I dont blame anyone for using the idiom. I merely blame the recommendation. ;)As for not portable, I'm surprised at that. I thought the C printf() function was fairly standardised? Ah well, it doesn't matter. I'm sure everyone here agrees that D folk should not use printf anyway, now that we've got better things.Printf is standardized. That is not the problem. The problem is that the above idiom is a hack relying on a specific stack layout when passing vararg arguments to C functions. -- Oskar
Oct 20 2007
Oskar Linde wrote:Two related things that have been holding me off D2 are waiting for some announced code breaking changes that will have quite some impact on how code is written (not mentioning const): 1. T[new] - Hopefully T[*] (hint ;) ) - Hopefully a reference type 2. Static arrays becoming value types - Being able to return them from functions - No longer having to litter the code with incompatible wrapper structs Are the ETA for those still far away?Yes.
Oct 20 2007
Walter Bright wrote:What does this break? 1) Passing dynamic arrays to printf as in: printf("my string is %*.s\n", str);I think you meant to write printf("my string is %.*s\n", str); Which is something of an argument against printf in itself :-)which relied on the under-the-hood representation. This doesn't work on some architectures anyway, and is thoroughly obsolete. One could quickly fix such code by writing it as: printf("my string is %*.s\n", str.length, str.ptr);Yes, but that change had to be done sooner or later anyway... Same as with the cast from char[] to ulong, dirty hacks both.So, what do you think?Good riddance. --anders
Oct 20 2007
Anders F Björklund wrote:Walter Bright wrote:Yes.What does this break? 1) Passing dynamic arrays to printf as in: printf("my string is %*.s\n", str);I think you meant to write printf("my string is %.*s\n", str); Which is something of an argument against printf in itself :-)LOL!which relied on the under-the-hood representation. This doesn't work on some architectures anyway, and is thoroughly obsolete. One could quickly fix such code by writing it as: printf("my string is %*.s\n", str.length, str.ptr);Yes, but that change had to be done sooner or later anyway... Same as with the cast from char[] to ulong, dirty hacks both.So, what do you think?Good riddance.
Oct 20 2007
On 10/20/07, Walter Bright <newshound1 digitalmars.com> wrote:2) Doesn't break array.length as rvalue, as this is rewritten by the compiler as (array.end - array.start).That's pointer arithmetic, of course, so looked at in pure assembler terms, we're talking ((array.end - array.start) / element.sizeof) Is there any performance cost to the divide, or is optimised to be a right shift (at least where it can be)?
Oct 20 2007
Janice Caron wrote:On 10/20/07, Walter Bright <newshound1 digitalmars.com> wrote:Divides are expensive. It's mitigated by: 1) most of the time, it's 1 2) most of the rest of the time, it's a power of 2 and optimized to shift 3) some of the rest of the time, it can be skipped as it divides then multiples by the same amount.2) Doesn't break array.length as rvalue, as this is rewritten by the compiler as (array.end - array.start).That's pointer arithmetic, of course, so looked at in pure assembler terms, we're talking ((array.end - array.start) / element.sizeof) Is there any performance cost to the divide, or is optimised to be a right shift (at least where it can be)?
Oct 20 2007
On Sat, Oct 20, 2007 at 01:16:54AM -0700, Walter Bright wrote:Divides are expensive. It's mitigated by: 1) most of the time, it's 1 2) most of the rest of the time, it's a power of 2 and optimized to shift 3) some of the rest of the time, it can be skipped as it divides then multiples by the same amount.At least with the gcc backend, divides by constants almost always turn into multiplies of strange constants, or weird tricks with shifts and subtracts/adds, depending on what the compiler things will be better. David
Oct 20 2007
Janice Caron wrote:On 10/20/07, Walter Bright <newshound1 digitalmars.com> wrote:Divides are expensive. It's mitigated by: 1) most of the time, it's 1 2) most of the rest of the time, it's a power of 2 and optimized to shift 3) some of the rest of the time, it can be skipped as it divides then multiples by the same amount.2) Doesn't break array.length as rvalue, as this is rewritten by the compiler as (array.end - array.start).That's pointer arithmetic, of course, so looked at in pure assembler terms, we're talking ((array.end - array.start) / element.sizeof) Is there any performance cost to the divide, or is optimised to be a right shift (at least where it can be)?
Oct 20 2007
On 10/20/07, Walter Bright <newshound1 digitalmars.com> wrote:What does this break? 3) It breaks binary compatibility with libraries already compiled.It also breaks binary compatibility with object files already compiled. I don't use dsss do I don't know how that will deal, but anyone using a makefile or some other build system needs to make sure that .obj files are dependent, not only on the correstponding .d file, but also on dmd.exe (or platform equivalent), so that the rebuild automatically updates the .obj file when dmd changes. That's not Walter's problem of course - it's just something of which developers need to be aware. Even if your build script does not have this dependency built in, a simple one-time "clean" build will fix all.
Oct 20 2007
On Sat, 20 Oct 2007 12:15:14 +0300, Janice Caron <caron800 googlemail.com> wrote:On 10/20/07, Walter Bright <newshound1 digitalmars.com> wrote:I guess that could be fixed by changing the name mangling for dynamic arrays. Then the users will be forced to rebuild their apps (unless they pass dynamic arrays hidden in void pointers or arrays). -- Best regards, Vladimir mailto:thecybershadow gmail.comWhat does this break? 3) It breaks binary compatibility with libraries already compiled.It also breaks binary compatibility with object files already compiled. I don't use dsss do I don't know how that will deal, but anyone using a makefile or some other build system needs to make sure that .obj files are dependent, not only on the correstponding .d file, but also on dmd.exe (or platform equivalent), so that the rebuild automatically updates the .obj file when dmd changes.
Oct 20 2007
Walter Bright wrote:What does this break?Since this is a different ABI, the debug symbols need to indicate which version is used. For CodeView i suggest using the language and/or version fields in the S_COMPILE symbol, which aren't used by DMD, atm.
Oct 20 2007
Dnia Fri, 19 Oct 2007 20:03:02 -0700 Walter Bright <newshound1 digitalmars.com> napisa=B3/a:Currently, arrays are represented under the hood as: =20 size_t lengthOfArray; void* ptrToStartOfArray; =20 Which works out reasonably well. The problem is if you want to use array types as the basis of iterators, and you want to step through the array. There's no escaping it being two operations: =20 decrement the length increment the pointer =20 This puts a brick in any fast implementation of iterators. To fix that, we can change the representation to: =20 void* ptrToStartOfArray; void* ptrPastEndOfArray; =20 Then there's just one increment. Some tests show this can improve loop performance by up to 70%.What about garbage collector? And pointer that doesn't points to real data? If this will be fixed somehow and there will be strategy for moving collector to change this obscure second pointer to be proper, for new location of data, then it can be good choice. (especially if this change of iterator really will increse iteration, and wouldn't give regressions). --=20 Witold Baryluk, aleph0
Oct 20 2007
Walter Bright wrote:Currently, arrays are represented under the hood as: size_t lengthOfArray; void* ptrToStartOfArray; Which works out reasonably well. The problem is if you want to use array types as the basis of iterators, and you want to step through the array. There's no escaping it being two operations: decrement the length increment the pointer This puts a brick in any fast implementation of iterators. To fix that, we can change the representation to: void* ptrToStartOfArray; void* ptrPastEndOfArray; Then there's just one increment. Some tests show this can improve loop performance by up to 70%. So, what does this not break? 1) Doesn't break array.ptr, this will still work. 2) Doesn't break array.length as rvalue, as this is rewritten by the compiler as (array.end - array.start). 3) Doesn't break array.length as an lvalue, as that is handled by the runtime library anyway. 4) Won't break anything on D 1.0, as it wouldn't get this change. 5) Won't break array slices, or any of that stuff we love about D arrays. What does this break? 1) Passing dynamic arrays to printf as in: printf("my string is %*.s\n", str); which relied on the under-the-hood representation. This doesn't work on some architectures anyway, and is thoroughly obsolete. One could quickly fix such code by writing it as: printf("my string is %*.s\n", str.length, str.ptr);This isn't a compelling reason to keep the current implementation. As Anders pointed out, it doesn't work on 64-bit anyway.2) It breaks the internal library support code, but that's my problem.Yeah, no big deal here either.3) It breaks binary compatibility with libraries already compiled. But we expect to break binary compatibility with D 2.0.Right.4) It breaks things like cast(ulong)str, if one was crazy enough to do that anyway.Though I know the runtime does this, I've never seen any other code that does :-) And the runtime changes are easily fixed anyway. Heck, GDC already has this implemented.5) It breaks anything that tries to look at the underlying representation of dynamic arrays - but such code should be rewritten to use .ptr and .length anyway, or slice notation.Not a compelling reason to keep the old representation, as this is really an implementation detail. My only potential concern would be the impact on slice operations and such, but addition and subtraction are so fast that this change shouldn't affect anything. Also, such operations are rarely, if ever, performed in tight loops, unlike iterator operations. vote++ Sean
Oct 20 2007
Walter Bright wrote:Then there's just one increment. Some tests show this can improve loop performance by up to 70%.I say go for it! I think it is important to break code as soon as possible. Waiting to break code will only result in more code being broken at a later time.
Oct 20 2007
On Fri, 19 Oct 2007 20:03:02 -0700, Walter Bright wrote:So, what do you think?My first concern was for the speed of accessing the length, either through .length property or the $ concept. I've examined my usage of .length and about 80% of the time I'm comparing it to zero ... if (X.length == 0) if (X.length > 0) About another 10% is comparing the length of one array to another array ... if (X.length == Y.length) Most of the rest is extracting right hand slices (substrings typically) ... result = X[$ - a .. $ - b] Only occasionally do I set the .length Most of my array iterations use the foreach() construct. So it seems that it would be beneficial change for me. It certainly won't break any of my code because I don't try to access the ABI directly, only through .ptr and .length. My test for an empty array is to compare the length. My test for an unassigned array is to test the .ptr value to null. So I think that a good built-in property would be an .isEmpty concept. -- Derek Parnell Melbourne, Australia skype: derek.j.parnell
Oct 21 2007
Walter Bright wrote:Currently, arrays are represented under the hood as: size_t lengthOfArray; void* ptrToStartOfArray; Which works out reasonably well. The problem is if you want to use array types as the basis of iterators, and you want to step through the array. There's no escaping it being two operations: decrement the length increment the pointer This puts a brick in any fast implementation of iterators. To fix that, we can change the representation to: void* ptrToStartOfArray; void* ptrPastEndOfArray;This seems to be a more powerful construction than .ptr + .length. If you know the length, it's usually trivial to calculate the end, but the converse is not necessarily true. My first reaction is, does it generalise to multi-dimensional arrays? (I'm hoping we'll still get them eventually, though not necessarily as built-in types). Haven't yet thought through it properly, but I think it's probably _better_. It means that a 'strided slice' (eg, every 4th element) could be implemented without mucking around with the length element. It's easy to test if two such arrays overlap. Many operations can be done entirely at compile time, simply using a cast. votes++; I'm less sure about the use of slices as iterators though.
Oct 21 2007
If we are considering changing the internal representation of dynamic arrays, maybe we should also consider adding a pointer to "end of allocation", making it three pointers in total. The extra pointer could be used by the concatenation operators to dramatically reduce the number of reallocs that occur when doing ~= appends to a dynamic array - by only realloc-ing when capacity runs out, and allocating more capacity than immediately needed. I currently have to use a more heavy-weight container class (like a Vector) if I don't want to pay the performance cost of repeated reallocs, and it seems to me that cranking up the "out of the box" performance of dynamic arrays when concatenating would be good.
Oct 21 2007
On Mon, Oct 22, 2007 at 09:55:12AM +0930, Graham St Jack wrote:The extra pointer could be used by the concatenation operators to dramatically reduce the number of reallocs that occur when doing ~= appends to a dynamic array - by only realloc-ing when capacity runs out, and allocating more capacity than immediately needed.D already makes use of this information in the GC. In fact, allocations are always powers of 2, up until the page size, and then in full pages. Try the following: import std.stdio; import std.gc; void main () { char[] text; for (int i = 0; i < 256; i++) { text ~= 'a'; writefln ("len = %d, alen = %d", text.length, capacity (text.ptr)); } }
Oct 21 2007
David Brown wrote:On Mon, Oct 22, 2007 at 09:55:12AM +0930, Graham St Jack wrote:Cool - thanks.The extra pointer could be used by the concatenation operators to dramatically reduce the number of reallocs that occur when doing ~= appends to a dynamic array - by only realloc-ing when capacity runs out, and allocating more capacity than immediately needed.D already makes use of this information in the GC. In fact, allocations are always powers of 2, up until the page size, and then in full pages. Try the following: import std.stdio; import std.gc; void main () { char[] text; for (int i = 0; i < 256; i++) { text ~= 'a'; writefln ("len = %d, alen = %d", text.length, capacity (text.ptr)); } }
Oct 21 2007
Can anyone explain why the compiler/iterators/foreach would insist on using the two operations Walter notes below (decrement the length, increment the pointer) instead of first converting to a pointer pair? Seem like this is a concern that has more than one way to approach it? - kris "Walter Bright" <newshound1 digitalmars.com> wrote in message news:ffbr56$r40$1 digitalmars.com...Currently, arrays are represented under the hood as: size_t lengthOfArray; void* ptrToStartOfArray; Which works out reasonably well. The problem is if you want to use array types as the basis of iterators, and you want to step through the array. There's no escaping it being two operations: decrement the length increment the pointer This puts a brick in any fast implementation of iterators. To fix that, we can change the representation to: void* ptrToStartOfArray; void* ptrPastEndOfArray; Then there's just one increment. Some tests show this can improve loop performance by up to 70%. So, what does this not break? 1) Doesn't break array.ptr, this will still work. 2) Doesn't break array.length as rvalue, as this is rewritten by the compiler as (array.end - array.start). 3) Doesn't break array.length as an lvalue, as that is handled by the runtime library anyway. 4) Won't break anything on D 1.0, as it wouldn't get this change. 5) Won't break array slices, or any of that stuff we love about D arrays. What does this break? 1) Passing dynamic arrays to printf as in: printf("my string is %*.s\n", str); which relied on the under-the-hood representation. This doesn't work on some architectures anyway, and is thoroughly obsolete. One could quickly fix such code by writing it as: printf("my string is %*.s\n", str.length, str.ptr); 2) It breaks the internal library support code, but that's my problem. 3) It breaks binary compatibility with libraries already compiled. But we expect to break binary compatibility with D 2.0. 4) It breaks things like cast(ulong)str, if one was crazy enough to do that anyway. 5) It breaks anything that tries to look at the underlying representation of dynamic arrays - but such code should be rewritten to use .ptr and .length anyway, or slice notation. So, what do you think?
Oct 21 2007
"Walter Bright" wrote2) Doesn't break array.length as rvalue, as this is rewritten by the compiler as (array.end - array.start).Yes, but checking the length (which is something I do often in my code) now becomes an operation rather than a memory load. So you are swapping a performance hit in one place with one in another.What does this break? 1) Passing dynamic arrays to printf as in: printf("my string is %*.s\n", str);Absolutely could not care less :)So, what do you think?I think this change is unnecessary for iterators, why do we need to change the array representation? An iterator is a pointer in a sequence of values. It does not need to know where in the sequence of values it is, so why would you represent an iterator with 2 pointers? If you make the case that it's for bounds checking, then you really need 3 pointers to check for reverse iterating past the beginning. I'll also point out that you are now adding the cost of copying two values when passing an iterator around instead of one pointer, so using pointers will be faster anyways. Why not have an iterator type, defined as the property of an array, i.e.: char[].iterator is a type, which is aliased to char *. Then you could have properties like: char[].iterator begin() { return ptr} char[].iterator end() { return ptr + length} This is very similar to C++, and makes the most sense to me, and provides the most speed out of anything. You could do something similar with AA's, but the type would have to be a struct probably. It wouldn't even pollute the namespace/require keywords because they are properties of the arrays themselves. -Steve
Oct 22 2007
Yeah, I'm with Steve on this. There's more than one way to skin this cat, and have solid performance all round instead of potentially trading one for another. "Steven Schveighoffer" <schveiguy yahoo.com> wrote in message news:ffij0u$290f$1 digitalmars.com..."Walter Bright" wrote2) Doesn't break array.length as rvalue, as this is rewritten by the compiler as (array.end - array.start).Yes, but checking the length (which is something I do often in my code) now becomes an operation rather than a memory load. So you are swapping a performance hit in one place with one in another.What does this break? 1) Passing dynamic arrays to printf as in: printf("my string is %*.s\n", str);Absolutely could not care less :)So, what do you think?I think this change is unnecessary for iterators, why do we need to change the array representation? An iterator is a pointer in a sequence of values. It does not need to know where in the sequence of values it is, so why would you represent an iterator with 2 pointers? If you make the case that it's for bounds checking, then you really need 3 pointers to check for reverse iterating past the beginning. I'll also point out that you are now adding the cost of copying two values when passing an iterator around instead of one pointer, so using pointers will be faster anyways. Why not have an iterator type, defined as the property of an array, i.e.: char[].iterator is a type, which is aliased to char *. Then you could have properties like: char[].iterator begin() { return ptr} char[].iterator end() { return ptr + length} This is very similar to C++, and makes the most sense to me, and provides the most speed out of anything. You could do something similar with AA's, but the type would have to be a struct probably. It wouldn't even pollute the namespace/require keywords because they are properties of the arrays themselves. -Steve
Oct 22 2007