digitalmars.D - string concatenation idea.
- Regan Heath (26/26) Jun 14 2004 The thread "string performance issues" by "Daniel Horn" got me thinking ...
- Matthew (19/45) Jun 14 2004 It means adding another member to slices, and would complicate it in oth...
- Regan Heath (10/81) Jun 14 2004 Why do you need to add another member to slices?
-
Stewart Gordon
(31/43)
Jun 15 2004
- Hauke Duden (12/15) Jun 15 2004 I don't think it means that. It has already been mentioned that
- Regan Heath (13/31) Jun 15 2004 Or rather simply do the allocation the above steps would cause.
- Arcane Jill (21/35) Jun 16 2004 I'm not convinced, though, that this is defined behavior. It seems possi...
- Hauke Duden (11/40) Jun 16 2004 I agree that it should be that way. But Walter himself has recommended
- Arcane Jill (21/23) Jun 16 2004 I haven't heard Walter say that myself, but if he did, then he should kn...
- J Anderson (5/32) Jun 16 2004 Good points. That is why I said reserve should go into phobos so
- Ben Hinkle (9/67) Jun 16 2004 The length-juggling trick also has limitations: setting the length to 0 ...
- J Anderson (9/34) Jun 14 2004 I think this would work:
- Derek Parnell (10/59) Jun 14 2004 How about ...
- J Anderson (4/41) Jun 14 2004 I didn't think of that, neat.
- Regan Heath (7/45) Jun 14 2004 Yes, but not quite as neat as it could be, see the real problem in the
- Regan Heath (6/60) Jun 14 2004 This works, but see the other thread "string performance issues" for an
- J Anderson (25/88) Jun 14 2004 Personally I'd use block allocation for a problem like that. That's
- Ivan Senji (7/100) Jun 14 2004 length
- J Anderson (4/139) Jun 14 2004 --
- Regan Heath (28/121) Jun 14 2004 Yes, the poster was simply stating that it should be easier than that in...
- J Anderson (32/83) Jun 15 2004 You can make your own class to do that. The reason against that
- J Anderson (8/11) Jun 15 2004 I do think it should be in phobos somewhere (where you don't need to
- Ivan Senji (4/16) Jun 15 2004 I agree. D lets us add "properties" to arrays so why not show how this
- Regan Heath (7/17) Jun 15 2004 FYI Walter mentioned that std.outbuffer has a struct/class for doing wha...
- Regan Heath (28/124) Jun 15 2004 I know I can, I have.
- J Anderson (6/109) Jun 16 2004 I already mentioned that way of doing it without overhead. After some
The thread "string performance issues" by "Daniel Horn" got me thinking of an idea for a change to arrays. Bascially, concatenation can be slow, as it causes reallocations of the array. If you could pre-allocate the array then it wouldn't be as slow. I tried this: char[] p = "regan" p.length = 10; p ~= "fred"; and ended up with a string containing 'r' 'e' 'g' 'a' 'n' '0' '0' '0' '0' '0' 'f' 'r' 'e' 'd' which was not what I was after :) I remembered a thread on arrays requesting renaming the 'length' property to 'reserve' or something like that, and the idea for the addition of a reserve property that simply allocated memory to the array without changing it's length came to me. If we could go: char[] p = "regan"; p.reserve = 10; p ~= "fred"; and end up with a string containing 'r' 'e' 'g' 'a' 'n' 'f' 'r' 'e' 'd' '0' and a length of 9. Then we could do fast concatenation. Otherwise we're left writing a String class that achieves this by setting length and using memcpy. I thought a design goal of D was to avoid this. Thoughts? -- Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/
Jun 14 2004
It means adding another member to slices, and would complicate it in other ways also, since one would have to distinguish between slices that own and slices that view. I prefer a language based approach, similar to that for Java, whereby the elements in a single concatenation statement are actually appended to a StringBuffer. This boosts string concatenation performance enormously. I showed how to achieve this in a similar vein, and with similarly significant performance benefits, for C++ in my recent (June's CUJ) article "Fast, Non-intrusive String Concatenation". Walter was one of the reviewers, so he's au fait with the technique. I suggest something similar can be done for D, by transcribing ~ sequences into calls to an underlying implementation class/API. It wouldn't be hard to do, and the restriction would just be the same for Java and C++ (using my fast_string_concatenator<>), in that it would only work for a single statement. Not that that's a particularly onerous restriction, of course. The alternative is just to have a StringBuilder class, which doesn't seem too much of a burden either. "Regan Heath" <regan netwin.co.nz> wrote in message news:opr9l56hws5a2sq9 digitalmars.com...The thread "string performance issues" by "Daniel Horn" got me thinking of an idea for a change to arrays. Bascially, concatenation can be slow, as it causes reallocations of the array. If you could pre-allocate the array then it wouldn't be as slow. I tried this: char[] p = "regan" p.length = 10; p ~= "fred"; and ended up with a string containing 'r' 'e' 'g' 'a' 'n' '0' '0' '0' '0' '0' 'f' 'r' 'e' 'd' which was not what I was after :) I remembered a thread on arrays requesting renaming the 'length' property to 'reserve' or something like that, and the idea for the addition of a reserve property that simply allocated memory to the array without changing it's length came to me. If we could go: char[] p = "regan"; p.reserve = 10; p ~= "fred"; and end up with a string containing 'r' 'e' 'g' 'a' 'n' 'f' 'r' 'e' 'd' '0' and a length of 9. Then we could do fast concatenation. Otherwise we're left writing a String class that achieves this by setting length and using memcpy. I thought a design goal of D was to avoid this. Thoughts? -- Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/
Jun 14 2004
On Tue, 15 Jun 2004 12:31:48 +1000, Matthew <admin stlsoft.dot.dot.dot.dot.org> wrote:It means adding another member to slices, and would complicate it in other ways also, since one would have to distinguish between slices that own and slices that view.Why do you need to add another member to slices? Don't we already have to distinguish between slices that own and slices that view?I prefer a language based approach, similar to that for Java, whereby the elements in a single concatenation statement are actually appended to a StringBuffer. This boosts string concatenation performance enormously. I showed how to achieve this in a similar vein, and with similarly significant performance benefits, for C++ in my recent (June's CUJ) article "Fast, Non-intrusive String Concatenation". Walter was one of the reviewers, so he's au fait with the technique. I suggest something similar can be done for D, by transcribing ~ sequences into calls to an underlying implementation class/API. It wouldn't be hard to do, and the restriction would just be the same for Java and C++ (using my fast_string_concatenator<>), in that it would only work for a single statement. Not that that's a particularly onerous restriction, of course. The alternative is just to have a StringBuilder class, which doesn't seem too much of a burden either.I thought the point of adding strings (i.e. char[]) to D was to avoid having other String classes? Regan"Regan Heath" <regan netwin.co.nz> wrote in message news:opr9l56hws5a2sq9 digitalmars.com...-- Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/The thread "string performance issues" by "Daniel Horn" got me thinking of an idea for a change to arrays. Bascially, concatenation can be slow, as it causes reallocations of the array. If you could pre-allocate the array then it wouldn't be as slow. I tried this: char[] p = "regan" p.length = 10; p ~= "fred"; and ended up with a string containing 'r' 'e' 'g' 'a' 'n' '0' '0' '0' '0' '0' 'f' 'r' 'e' 'd' which was not what I was after :) I remembered a thread on arrays requesting renaming the 'length' property to 'reserve' or something like that, and the idea for the addition of a reserve property that simply allocated memory to the array without changing it's length came to me. If we could go: char[] p = "regan"; p.reserve = 10; p ~= "fred"; and end up with a string containing 'r' 'e' 'g' 'a' 'n' 'f' 'r' 'e' 'd' '0' and a length of 9. Then we could do fast concatenation. Otherwise we're left writing a String class that achieves this by setting length and using memcpy. I thought a design goal of D was to avoid this. Thoughts? -- Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/
Jun 14 2004
Regan Heath wrote:On Tue, 15 Jun 2004 12:31:48 +1000, Matthew <admin stlsoft.dot.dot.dot.dot.org> wrote:<snip> My experiments show that arrays are allocated in powers of 2 or multiples of 2K. (At least 1D arrays of an atomic type, otherwise I haven't experimented.) There obviously is a reserve property, though looking at the ABI it must be of the allocated block rather than of the array reference itself. My experiments with slicing showed that a slice becomes a newly allocated array if the length is changed, even if it doesn't exceed the originally sliced length. But there are two cases I haven't tested yet: int[] qwert = new int[10]; int[] yuiop = qwert; qwert.length = 14; // remaining within allocated block // ... put some data in qwert yuiop.length = 13; It would follow that qwert and yuiop still have the same start address, and setting the length of yuiop would overwrite qwert[10..13]. int[] qwert = new int[10]; int[] yuiop = qwert[0..5]; yuiop.length = 8; If the distinction between an owner and a viewer is simply a matter of whether the start address is the start of the allocated block, then likewise here. Hang on ... better check if length setting and concatenation really have the same effect.... Stewart. -- My e-mail is valid but not my primary mailbox, aside from its being the unfortunate victim of intensive mail-bombing at the moment. Please keep replies on the 'group where everyone may benefit.It means adding another member to slices, and would complicate it in other ways also, since one would have to distinguish between slices that own and slices that view.Why do you need to add another member to slices? Don't we already have to distinguish between slices that own and slices that view?
Jun 15 2004
Matthew wrote:It means adding another member to slices, and would complicate it in other ways also, since one would have to distinguish between slices that own and slices that view.I don't think it means that. It has already been mentioned that s.reserve(x) is the same as s.length=x; s.length=oldLength; The compiler could simply rewrite it. Note that the idea of a global template function is not quite equivalent, since the function is not in the "namespace" of the char type, but rather in global scope. Which means that multiple such functions will only overload if they are aliased. Hauke
Jun 15 2004
On Tue, 15 Jun 2004 10:24:21 +0200, Hauke Duden <H.NS.Duden gmx.net> wrote:Matthew wrote:Or rather simply do the allocation the above steps would cause. The only worry I have is that if you do this..It means adding another member to slices, and would complicate it in other ways also, since one would have to distinguish between slices that own and slices that view.I don't think it means that. It has already been mentioned that s.reserve(x) is the same as s.length=x; s.length=oldLength; The compiler could simply rewrite it.s.length=x; s.length=oldLength;when, if ever does it free the memory you are no longer using? Because if it does it sometime in the middle of a few thousand concat operations then were almost back at square one.Note that the idea of a global template function is not quite equivalent, since the function is not in the "namespace" of the char type, but rather in global scope. Which means that multiple such functions will only overload if they are aliased.While it's a nice workaround I cannot see why we cant have a reserve property, IMO it makes sense for all array types, it's the obvious way to do it (so people looking for a way will find it), it's dead simple to change, it does not cause any problems with existing code... Regan. -- Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/
Jun 15 2004
In article <cambih$16cs$1 digitaldaemon.com>, Hauke Duden says...I don't think it means that. It has already been mentioned that s.reserve(x) is the same as s.length=x; s.length=oldLength; The compiler could simply rewrite it.I think it's more like:if (x > s.length) { uint t = s.length; s.length = x; s.length = t; }I'm not convinced, though, that this is defined behavior. It seems possible to me that future (or even third-party) D-compilers would be at liberty to NOT make this equivalence, unless it were actually part of the spec. Also, it seems to me that a really supercharged optimizer might decide that the above code leaves everything unchanged, and so could be removed altogether. (Optimizers don't always have the same notion as we coders of what is intended and what is merely a side-effect). Finally, note that you can do:s = s[0..0];instead of:s = null;To set a string's length to zero without freeing the allocated memory. But again, not only is this not documented, but the documentation actually implies that this SHOULDN'T work. There is debate over whether this is a feature or a bug. Arcane Jill PS. It occurs to me that the statement: char[] s = a ~ b ~ c ~ d ~ e ~ f ~ g ~ h; would run a lot faster if it were rewritten as: ((((((((s ~= a) ~= b) ~= c) ~= d) ~= e) ~= f) ~= g) ~= h); (Sorry about the brackets - couldn't remember the associativity order of ~=).
Jun 16 2004
Arcane Jill wrote:In article <cambih$16cs$1 digitaldaemon.com>, Hauke Duden says...I agree that it should be that way. But Walter himself has recommended the length-juggling technique in this newsgroup. I don't like it either. That's why I think there should be a reserve property. The compiler is the best choice for deciding how it should be implemented. If arrays can never shrink fine, it can be implemented by changing the length property. Otherwise the compiler has to figure out the best way, depending on the implementation. If there is no reserve property then people will simply use the undocumented behaviour because it's the simplest way. HaukeI don't think it means that. It has already been mentioned that s.reserve(x) is the same as s.length=x; s.length=oldLength; The compiler could simply rewrite it.I think it's more like:if (x > s.length) { uint t = s.length; s.length = x; s.length = t; }I'm not convinced, though, that this is defined behavior. It seems possible to me that future (or even third-party) D-compilers would be at liberty to NOT make this equivalence, unless it were actually part of the spec.
Jun 16 2004
In article <caovi5$23qt$1 digitaldaemon.com>, Hauke Duden says...But Walter himself has recommended the length-juggling technique in this newsgroup.I haven't heard Walter say that myself, but if he did, then he should know better. Nobody should EVER rely on undocumented behavior, and it would be extremely bad practice to encourage others so to do. What Walter says on this newsgroup does not constitute the D standard. Only the published manual may make that claim. Every time the DMD compiler deviates from the documentation, it ceases to be standards-compliant. (Of course, Walter has the power to change the standard as well as the implementation, but he still has to actually DO it, not just talk about doing it). There is nothing to stop anyone from writing a second D compiler tomorrow, and I do not wish to go back to the bad old days of BASIC, when every implementation of BASIC was entirely non-portable to any other because of various stupid little quirks and features unique to each. If it were officially documented that decreasing the length of a array is guaranteed not to change the address of the string, and is further guaranteed not to release any of the memory within the array's former bounds, then, and only then, could this trick be expected to work on all present and future standards-compliant D compilers. Arcane Jill (PS. Walter - sorry if this sounds rude. That wasn't my intention, I just come across that way sometimes by accident).
Jun 16 2004
Arcane Jill wrote:In article <caovi5$23qt$1 digitaldaemon.com>, Hauke Duden says...Good points. That is why I said reserve should go into phobos so vendors can optimise there own and users can also write their own versions. -- -Anderson: http://badmama.com.au/~anderson/But Walter himself has recommended the length-juggling technique in this newsgroup.I haven't heard Walter say that myself, but if he did, then he should know better. Nobody should EVER rely on undocumented behavior, and it would be extremely bad practice to encourage others so to do. What Walter says on this newsgroup does not constitute the D standard. Only the published manual may make that claim. Every time the DMD compiler deviates from the documentation, it ceases to be standards-compliant. (Of course, Walter has the power to change the standard as well as the implementation, but he still has to actually DO it, not just talk about doing it). There is nothing to stop anyone from writing a second D compiler tomorrow, and I do not wish to go back to the bad old days of BASIC, when every implementation of BASIC was entirely non-portable to any other because of various stupid little quirks and features unique to each. If it were officially documented that decreasing the length of a array is guaranteed not to change the address of the string, and is further guaranteed not to release any of the memory within the array's former bounds, then, and only then, could this trick be expected to work on all present and future standards-compliant D compilers. Arcane Jill (PS. Walter - sorry if this sounds rude. That wasn't my intention, I just come across that way sometimes by accident).
Jun 16 2004
Arcane Jill wrote:In article <cambih$16cs$1 digitaldaemon.com>, Hauke Duden says...The length-juggling trick also has limitations: setting the length to 0 or from 0 will wipe out the data ptr. Look in src/internal/gc/gc.d the function _d_arraysetlength. For example s.length = 10; // reserve some space s.length = 0; // sets data ptr to null s.length = 10; // reserve some more space s = s[0..0]; // preserves the data ptr s.length = 5; // allocates new data ptrI don't think it means that. It has already been mentioned that s.reserve(x) is the same as s.length=x; s.length=oldLength; The compiler could simply rewrite it.I think it's more like:if (x > s.length) { uint t = s.length; s.length = x; s.length = t; }I'm not convinced, though, that this is defined behavior. It seems possible to me that future (or even third-party) D-compilers would be at liberty to NOT make this equivalence, unless it were actually part of the spec. Also, it seems to me that a really supercharged optimizer might decide that the above code leaves everything unchanged, and so could be removed altogether. (Optimizers don't always have the same notion as we coders of what is intended and what is merely a side-effect). Finally, note that you can do:s = s[0..0];instead of:s = null;To set a string's length to zero without freeing the allocated memory. But again, not only is this not documented, but the documentation actually implies that this SHOULDN'T work. There is debate over whether this is a feature or a bug.Arcane Jill PS. It occurs to me that the statement: char[] s = a ~ b ~ c ~ d ~ e ~ f ~ g ~ h; would run a lot faster if it were rewritten as: ((((((((s ~= a) ~= b) ~= c) ~= d) ~= e) ~= f) ~= g) ~= h); (Sorry about the brackets - couldn't remember the associativity order of ~=).
Jun 16 2004
Regan Heath wrote:The thread "string performance issues" by "Daniel Horn" got me thinking of an idea for a change to arrays. Bascially, concatenation can be slow, as it causes reallocations of the array. If you could pre-allocate the array then it wouldn't be as slow. I tried this: char[] p = "regan" p.length = 10; p ~= "fred"; and ended up with a string containing 'r' 'e' 'g' 'a' 'n' '0' '0' '0' '0' '0' 'f' 'r' 'e' 'd' which was not what I was after :) I remembered a thread on arrays requesting renaming the 'length' property to 'reserve' or something like that, and the idea for the addition of a reserve property that simply allocated memory to the array without changing it's length came to me. If we could go: char[] p = "regan"; p.reserve = 10; p ~= "fred"; and end up with a string containing 'r' 'e' 'g' 'a' 'n' 'f' 'r' 'e' 'd' '0' and a length of 9. Then we could do fast concatenation. Otherwise we're left writing a String class that achieves this by setting length and using memcpy. I thought a design goal of D was to avoid this. Thoughts?I think this would work: char[] p = "regan" p.length = 10; p.length = 5; p ~= "fred"; as D doesn't clean up the memory straight away. -- -Anderson: http://badmama.com.au/~anderson/
Jun 14 2004
On Tue, 15 Jun 2004 10:35:20 +0800, J Anderson wrote:Regan Heath wrote:How about ... char[] p = "regan" p.length = 10; p[5..9] = "fred"; -- Derek Melbourne, Australia 15/Jun/04 1:04:56 PMThe thread "string performance issues" by "Daniel Horn" got me thinking of an idea for a change to arrays. Bascially, concatenation can be slow, as it causes reallocations of the array. If you could pre-allocate the array then it wouldn't be as slow. I tried this: char[] p = "regan" p.length = 10; p ~= "fred"; and ended up with a string containing 'r' 'e' 'g' 'a' 'n' '0' '0' '0' '0' '0' 'f' 'r' 'e' 'd' which was not what I was after :) I remembered a thread on arrays requesting renaming the 'length' property to 'reserve' or something like that, and the idea for the addition of a reserve property that simply allocated memory to the array without changing it's length came to me. If we could go: char[] p = "regan"; p.reserve = 10; p ~= "fred"; and end up with a string containing 'r' 'e' 'g' 'a' 'n' 'f' 'r' 'e' 'd' '0' and a length of 9. Then we could do fast concatenation. Otherwise we're left writing a String class that achieves this by setting length and using memcpy. I thought a design goal of D was to avoid this. Thoughts?I think this would work: char[] p = "regan" p.length = 10; p.length = 5; p ~= "fred"; as D doesn't clean up the memory straight away.
Jun 14 2004
Derek Parnell wrote:On Tue, 15 Jun 2004 10:35:20 +0800, J Anderson wrote:I didn't think of that, neat. -- -Anderson: http://badmama.com.au/~anderson/Regan Heath wrote:How about ... char[] p = "regan" p.length = 10; p[5..9] = "fred";Bascially, concatenation can be slow, as it causes reallocations of the array. If you could pre-allocate the array then it wouldn't be as slow. I tried this: char[] p = "regan" p.length = 10; p ~= "fred"; and ended up with a string containing 'r' 'e' 'g' 'a' 'n' '0' '0' '0' '0' '0' 'f' 'r' 'e' 'd'I think this would work: char[] p = "regan" p.length = 10; p.length = 5; p ~= "fred"; as D doesn't clean up the memory straight away.
Jun 14 2004
On Tue, 15 Jun 2004 11:15:16 +0800, J Anderson <REMOVEanderson badmama.com.au> wrote:Derek Parnell wrote:Yes, but not quite as neat as it could be, see the real problem in the thread "string performance issues" by "Daniel Horn". Regan. -- Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/On Tue, 15 Jun 2004 10:35:20 +0800, J Anderson wrote:I didn't think of that, neat.Regan Heath wrote:How about ... char[] p = "regan" p.length = 10; p[5..9] = "fred";Bascially, concatenation can be slow, as it causes reallocations of the array. If you could pre-allocate the array then it wouldn't be as slow. I tried this: char[] p = "regan" p.length = 10; p ~= "fred"; and ended up with a string containing 'r' 'e' 'g' 'a' 'n' '0' '0' '0' '0' '0' 'f' 'r' 'e' 'd'I think this would work: char[] p = "regan" p.length = 10; p.length = 5; p ~= "fred"; as D doesn't clean up the memory straight away.
Jun 14 2004
On Tue, 15 Jun 2004 13:11:11 +1000, Derek Parnell <derek psych.ward> wrote:On Tue, 15 Jun 2004 10:35:20 +0800, J Anderson wrote:This works, but see the other thread "string performance issues" for an example of the real problem. Regan -- Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/Regan Heath wrote:How about ... char[] p = "regan" p.length = 10; p[5..9] = "fred";The thread "string performance issues" by "Daniel Horn" got me thinking of an idea for a change to arrays. Bascially, concatenation can be slow, as it causes reallocations of the array. If you could pre-allocate the array then it wouldn't be as slow. I tried this: char[] p = "regan" p.length = 10; p ~= "fred"; and ended up with a string containing 'r' 'e' 'g' 'a' 'n' '0' '0' '0' '0' '0' 'f' 'r' 'e' 'd' which was not what I was after :) I remembered a thread on arrays requesting renaming the 'length' property to 'reserve' or something like that, and the idea for the addition of a reserve property that simply allocated memory to the array without changing it's length came to me. If we could go: char[] p = "regan"; p.reserve = 10; p ~= "fred"; and end up with a string containing 'r' 'e' 'g' 'a' 'n' 'f' 'r' 'e' 'd' '0' and a length of 9. Then we could do fast concatenation. Otherwise we're left writing a String class that achieves this by setting length and using memcpy. I thought a design goal of D was to avoid this. Thoughts?I think this would work: char[] p = "regan" p.length = 10; p.length = 5; p ~= "fred"; as D doesn't clean up the memory straight away.
Jun 14 2004
Regan Heath wrote:On Tue, 15 Jun 2004 13:11:11 +1000, Derek Parnell <derek psych.ward> wrote:Personally I'd use block allocation for a problem like that. That's what you'd do in C. Then after woods you simply trim the array to the size you really want (or use length = block; length = 0; beforehand). I don't think strings should be treated as a specific type of array. Whatever applies to char array should also apply to every other type of array, except for the automatic conversion to zero terminate arrays of course. Adding a reserve property could increase the string overhead, unless all it did was: template reserveT(T) { void reserve(inout char [] array, uint length) { int oldlen = array.length; array.length = length; array.length = oldlen; } } alias reserveT!(char).reserve reserve; Hay, what do you know - I just solved your problem *grin*. Now you can write: array.reserve(10); -- -Anderson: http://badmama.com.au/~anderson/On Tue, 15 Jun 2004 10:35:20 +0800, J Anderson wrote:This works, but see the other thread "string performance issues" for an example of the real problem. ReganRegan Heath wrote:How about ... char[] p = "regan" p.length = 10; p[5..9] = "fred";The thread "string performance issues" by "Daniel Horn" got me thinking of an idea for a change to arrays. Bascially, concatenation can be slow, as it causes reallocations of the array. If you could pre-allocate the array then it wouldn't be as slow. I tried this: char[] p = "regan" p.length = 10; p ~= "fred"; and ended up with a string containing 'r' 'e' 'g' 'a' 'n' '0' '0' '0' '0' '0' 'f' 'r' 'e' 'd' which was not what I was after :) I remembered a thread on arrays requesting renaming the 'length' property to 'reserve' or something like that, and the idea for the addition of a reserve property that simply allocated memory to the array without changing it's length came to me. If we could go: char[] p = "regan"; p.reserve = 10; p ~= "fred"; and end up with a string containing 'r' 'e' 'g' 'a' 'n' 'f' 'r' 'e' 'd' '0' and a length of 9. Then we could do fast concatenation. Otherwise we're left writing a String class that achieves this by setting length and using memcpy. I thought a design goal of D was to avoid this. Thoughts?I think this would work: char[] p = "regan" p.length = 10; p.length = 5; p ~= "fred"; as D doesn't clean up the memory straight away.
Jun 14 2004
"J Anderson" <REMOVEanderson badmama.com.au> wrote in message news:calte9$fm0$1 digitaldaemon.com...Regan Heath wrote:lengthOn Tue, 15 Jun 2004 13:11:11 +1000, Derek Parnell <derek psych.ward> wrote:On Tue, 15 Jun 2004 10:35:20 +0800, J Anderson wrote:Regan Heath wrote:The thread "string performance issues" by "Daniel Horn" got me thinking of an idea for a change to arrays. Bascially, concatenation can be slow, as it causes reallocations of the array. If you could pre-allocate the array then it wouldn't be as slow. I tried this: char[] p = "regan" p.length = 10; p ~= "fred"; and ended up with a string containing 'r' 'e' 'g' 'a' 'n' '0' '0' '0' '0' '0' 'f' 'r' 'e' 'd' which was not what I was after :) I remembered a thread on arrays requesting renaming the 'length' property to 'reserve' or something like that, and the idea for the addition of a reserve property that simply allocated memory to the array without changing it's length came to me. If we could go: char[] p = "regan"; p.reserve = 10; p ~= "fred"; and end up with a string containing 'r' 'e' 'g' 'a' 'n' 'f' 'r' 'e' 'd' '0' and a length of 9. Then we could do fast concatenation. Otherwise we're left writing a String class that achieves this by settingDid you mean to write: void reserve(inout T [] array, uint length) :)Personally I'd use block allocation for a problem like that. That's what you'd do in C. Then after woods you simply trim the array to the size you really want (or use length = block; length = 0; beforehand). I don't think strings should be treated as a specific type of array. Whatever applies to char array should also apply to every other type of array, except for the automatic conversion to zero terminate arrays of course. Adding a reserve property could increase the string overhead, unless all it did was: template reserveT(T) { void reserve(inout char [] array, uint length)This works, but see the other thread "string performance issues" for an example of the real problem. ReganHow about ... char[] p = "regan" p.length = 10; p[5..9] = "fred";and using memcpy. I thought a design goal of D was to avoid this. Thoughts?I think this would work: char[] p = "regan" p.length = 10; p.length = 5; p ~= "fred"; as D doesn't clean up the memory straight away.{ int oldlen = array.length; array.length = length; array.length = oldlen; } } alias reserveT!(char).reserve reserve; Hay, what do you know - I just solved your problem *grin*. Now you can write: array.reserve(10);Nice :)-- -Anderson: http://badmama.com.au/~anderson/
Jun 14 2004
Ivan Senji wrote:"J Anderson" <REMOVEanderson badmama.com.au> wrote in message news:calte9$fm0$1 digitaldaemon.com...Good catch.Regan Heath wrote:lengthOn Tue, 15 Jun 2004 13:11:11 +1000, Derek Parnell <derek psych.ward> wrote:On Tue, 15 Jun 2004 10:35:20 +0800, J Anderson wrote:Regan Heath wrote:The thread "string performance issues" by "Daniel Horn" got me thinking of an idea for a change to arrays. Bascially, concatenation can be slow, as it causes reallocations of the array. If you could pre-allocate the array then it wouldn't be as slow. I tried this: char[] p = "regan" p.length = 10; p ~= "fred"; and ended up with a string containing 'r' 'e' 'g' 'a' 'n' '0' '0' '0' '0' '0' 'f' 'r' 'e' 'd' which was not what I was after :) I remembered a thread on arrays requesting renaming the 'length' property to 'reserve' or something like that, and the idea for the addition of a reserve property that simply allocated memory to the array without changing it's length came to me. If we could go: char[] p = "regan"; p.reserve = 10; p ~= "fred"; and end up with a string containing 'r' 'e' 'g' 'a' 'n' 'f' 'r' 'e' 'd' '0' and a length of 9. Then we could do fast concatenation. Otherwise we're left writing a String class that achieves this by settingDid you mean to write: void reserve(inout T [] array, uint length) :)Personally I'd use block allocation for a problem like that. That's what you'd do in C. Then after woods you simply trim the array to the size you really want (or use length = block; length = 0; beforehand). I don't think strings should be treated as a specific type of array. Whatever applies to char array should also apply to every other type of array, except for the automatic conversion to zero terminate arrays of course. Adding a reserve property could increase the string overhead, unless all it did was: template reserveT(T) { void reserve(inout char [] array, uint length)This works, but see the other thread "string performance issues" for an example of the real problem. ReganHow about ... char[] p = "regan" p.length = 10; p[5..9] = "fred";and using memcpy. I thought a design goal of D was to avoid this. Thoughts?I think this would work: char[] p = "regan" p.length = 10; p.length = 5; p ~= "fred"; as D doesn't clean up the memory straight away.-- -Anderson: http://badmama.com.au/~anderson/{ int oldlen = array.length; array.length = length; array.length = oldlen; } } alias reserveT!(char).reserve reserve; Hay, what do you know - I just solved your problem *grin*. Now you can write: array.reserve(10);Nice :)-- -Anderson: http://badmama.com.au/~anderson/
Jun 14 2004
On Tue, 15 Jun 2004 12:20:31 +0800, J Anderson <REMOVEanderson badmama.com.au> wrote:Regan Heath wrote:Yes, the poster was simply stating that it should be easier than that in D, after all that is the point of having a concatenation operator.On Tue, 15 Jun 2004 13:11:11 +1000, Derek Parnell <derek psych.ward> wrote:Personally I'd use block allocation for a problem like that. That's what you'd do in C.On Tue, 15 Jun 2004 10:35:20 +0800, J Anderson wrote:This works, but see the other thread "string performance issues" for an example of the real problem. ReganRegan Heath wrote:How about ... char[] p = "regan" p.length = 10; p[5..9] = "fred";The thread "string performance issues" by "Daniel Horn" got me thinking of an idea for a change to arrays. Bascially, concatenation can be slow, as it causes reallocations of the array. If you could pre-allocate the array then it wouldn't be as slow. I tried this: char[] p = "regan" p.length = 10; p ~= "fred"; and ended up with a string containing 'r' 'e' 'g' 'a' 'n' '0' '0' '0' '0' '0' 'f' 'r' 'e' 'd' which was not what I was after :) I remembered a thread on arrays requesting renaming the 'length' property to 'reserve' or something like that, and the idea for the addition of a reserve property that simply allocated memory to the array without changing it's length came to me. If we could go: char[] p = "regan"; p.reserve = 10; p ~= "fred"; and end up with a string containing 'r' 'e' 'g' 'a' 'n' 'f' 'r' 'e' 'd' '0' and a length of 9. Then we could do fast concatenation. Otherwise we're left writing a String class that achieves this by setting length and using memcpy. I thought a design goal of D was to avoid this. Thoughts?I think this would work: char[] p = "regan" p.length = 10; p.length = 5; p ~= "fred"; as D doesn't clean up the memory straight away.Then after woods you simply trim the array to the size you really want (or use length = block; length = 0; beforehand). I don't think strings should be treated as a specific type of array.Neither do I, nor was I suggesting that. I want this property on all arrays, the example I gave used a char[] as that is what the original poster wanted it for.Whatever applies to char array should also apply to every other type of array, except for the automatic conversion to zero terminate arrays of course.Agreed.Adding a reserve property could increase the string overhead, unless all it did was: template reserveT(T) { void reserve(inout char [] array, uint length) { int oldlen = array.length; array.length = length; array.length = oldlen; } } alias reserveT!(char).reserve reserve; Hay, what do you know - I just solved your problem *grin*. Now you can write: array.reserve(10);This should work. :) So why not add this to arrays then? Let me re-iterate what I am suggesting... I think all arrays should have a property which you can use to specify the different to the length property which specifies the number of elements in the array. Example.. char[] p; p.length = 10; the above adds 10 elements to the array, initializing them to the default value. This is the current behaviour. char[] p; p.reserve = 10; the above grabs enough memory for 10 elements, but does not add/initialize any. Regan. -- Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/
Jun 14 2004
Regan Heath wrote:Yes, the poster was simply stating that it should be easier than that in D, after all that is the point of having a concatenation operator.You can make your own class to do that. The reason against that approach is that it requires more overhead (requiring both memory and CPU). Having the most primitive array type is essential because then we can do things like you suggest. If this overhead was included in the primitive type then we would have to lug it around even when we don't need the overhead. Most of the time you can write your program so that you don't even need reserve tricks for efficiency by determining the amount of memory nessary. An array with a reserve would look like: struct array { uint length; uint reserve; T * pointer } compare that to: struct array { uint length; T * pointer } Not to mention all the extra operations to maintain the reserved size. Its not to hard to write an array class that has a reserve (10 lines or so), but it would be much harder the other way round.Then after woods you simply trim the array to the size you really want (or use length = block; length = 0; beforehand). I don't think strings should be treated as a specific type of array.Neither do I, nor was I suggesting that. I want this property on all arrays, the example I gave used a char[] as that is what the original poster wanted it for.Whatever applies to char array should also apply to every other type of array, except for the automatic conversion to zero terminate arrays of course.Agreed.Adding a reserve property could increase the string overhead, unless all it did was: template reserveT(T) { void reserve(inout char [] array, uint length) { int oldlen = array.length; array.length = length; array.length = oldlen; } } alias reserveT!(char).reserve reserve; Hay, what do you know - I just solved your problem *grin*. Now you can write: array.reserve(10);This should work. :) So why not add this to arrays then? Let me re-iterate what I am suggesting... I think all arrays should have a property which you can use to specify should be different to the length property which specifies the number of elements in the array.Example.. char[] p; p.length = 10; the above adds 10 elements to the array, initializing them to the default value. This is the current behaviour. char[] p; p.reserve = 10; the above grabs enough memory for 10 elements, but does not add/initialize any.Initialization is pretty cheap for arrays. That is initialization only take a fraction of the time it requires to allocate the memory for the array. So whatever the time saved with initialization, the user will most probably not notice it.Regan.All this talk just makes me like D arrays even more. -- -Anderson: http://badmama.com.au/~anderson/
Jun 15 2004
J Anderson wrote:I do think it should be in phobos somewhere (where you don't need to include it). I probably didn't outline that enough in the previous email. That way different vendors can optimise the reserve to what works best for their program and individuals can overload it with there own optimised versions. -- -Anderson: http://badmama.com.au/~anderson/This should work. :) So why not add this to arrays then?
Jun 15 2004
"J Anderson" <REMOVEanderson badmama.com.au> wrote in message news:cam7n9$vbv$2 digitaldaemon.com...J Anderson wrote:I agree. D lets us add "properties" to arrays so why not show how this feature can be useful!I do think it should be in phobos somewhere (where you don't need toThis should work. :) So why not add this to arrays then?include it). I probably didn't outline that enough in the previous email. That way different vendors can optimise the reserve to what works best for their program and individuals can overload it with there own optimised versions. -- -Anderson: http://badmama.com.au/~anderson/
Jun 15 2004
On Tue, 15 Jun 2004 15:15:59 +0800, J Anderson <REMOVEanderson badmama.com.au> wrote:J Anderson wrote:FYI Walter mentioned that std.outbuffer has a struct/class for doing what I want also. Regan. -- Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/I do think it should be in phobos somewhere (where you don't need to include it). I probably didn't outline that enough in the previous email. That way different vendors can optimise the reserve to what works best for their program and individuals can overload it with there own optimised versions.This should work. :) So why not add this to arrays then?
Jun 15 2004
On Tue, 15 Jun 2004 15:12:59 +0800, J Anderson <REMOVEanderson badmama.com.au> wrote:Regan Heath wrote:I know I can, I have.Yes, the poster was simply stating that it should be easier than that in D, after all that is the point of having a concatenation operator.You can make your own class to do that.Then after woods you simply trim the array to the size you really want (or use length = block; length = 0; beforehand). I don't think strings should be treated as a specific type of array.Neither do I, nor was I suggesting that. I want this property on all arrays, the example I gave used a char[] as that is what the original poster wanted it for.Whatever applies to char array should also apply to every other type of array, except for the automatic conversion to zero terminate arrays of course.Agreed.Adding a reserve property could increase the string overhead, unless all it did was: template reserveT(T) { void reserve(inout char [] array, uint length) { int oldlen = array.length; array.length = length; array.length = oldlen; } } alias reserveT!(char).reserve reserve; Hay, what do you know - I just solved your problem *grin*. Now you can write: array.reserve(10);This should work. :) So why not add this to arrays then? Let me re-iterate what I am suggesting... I think all arrays should have a property which you can use to specify be different to the length property which specifies the number of elements in the array.The reason against that approach is that it requires more overhead (requiring both memory and CPU).I dont think so. For example current array behaviour if you go. a.length = 1000; a.length = 1; is to allocate space for 1000 entries and not free it. Meaning the array already stores (or calculates) it's size in memory in a seperate place to the length. Meaning all the reserve property needs to do, is the same thing the above 2 lines already does. With one possible exception, I am not sure when/if the array frees the un-used memory, so it's possible that after those lines sometime in the middle of 1000 or so concatentation ops it decides to free the excess and I'd be back at square one (almost).Having the most primitive array type is essential because then we can do things like you suggest. If this overhead was included in the primitive type then we would have to lug it around even when we don't need the overhead. Most of the time you can write your program so that you don't even need reserve tricks for efficiency by determining the amount of memory nessary.I agree totally, I don't believe my suggestion adds any overhead whatsoever.An array with a reserve would look like: struct array { uint length; uint reserve; T * pointer } compare that to: struct array { uint length; T * pointer } Not to mention all the extra operations to maintain the reserved size.I think you'll find they're already there in some form as arrays already display the behaviour I want, just not in an easy to use/understand/find form. It looks like a hack currently.Its not to hard to write an array class that has a reserve (10 lines or so), but it would be much harder the other way round.Everything counts if you do it enough times. Why do something you do not need?Example.. char[] p; p.length = 10; the above adds 10 elements to the array, initializing them to the default value. This is the current behaviour. char[] p; p.reserve = 10; the above grabs enough memory for 10 elements, but does not add/initialize any.Initialization is pretty cheap for arrays. That is initialization only take a fraction of the time it requires to allocate the memory for the array. So whatever the time saved with initialization, the user will most probably not notice it.I like em too. Writing the class/struct to give me this behaviour was trivial, easier than in C/C++. But I still believe this change is a good one, it has pro's and no cons. Regan. -- Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/Regan.All this talk just makes me like D arrays even more.
Jun 15 2004
Regan Heath wrote:On Tue, 15 Jun 2004 15:12:59 +0800, J Anderson <REMOVEanderson badmama.com.au> wrote:I already mentioned that way of doing it without overhead. After some time though, that memory will be re-claimed, which I don't mind.Regan Heath wrote:I know I can, I have.Yes, the poster was simply stating that it should be easier than that in D, after all that is the point of having a concatenation operator.You can make your own class to do that.Then after woods you simply trim the array to the size you really want (or use length = block; length = 0; beforehand). I don't think strings should be treated as a specific type of array.Neither do I, nor was I suggesting that. I want this property on all arrays, the example I gave used a char[] as that is what the original poster wanted it for.Whatever applies to char array should also apply to every other type of array, except for the automatic conversion to zero terminate arrays of course.Agreed.Adding a reserve property could increase the string overhead, unless all it did was: template reserveT(T) { void reserve(inout char [] array, uint length) { int oldlen = array.length; array.length = length; array.length = oldlen; } } alias reserveT!(char).reserve reserve; Hay, what do you know - I just solved your problem *grin*. Now you can write: array.reserve(10);This should work. :) So why not add this to arrays then? Let me re-iterate what I am suggesting... I think all arrays should have a property which you can use to this should be different to the length property which specifies the number of elements in the array.The reason against that approach is that it requires more overhead (requiring both memory and CPU).I dont think so. For example current array behaviour if you go. a.length = 1000; a.length = 1; is to allocate space for 1000 entries and not free it. Meaning the array already stores (or calculates) it's size in memory in a seperate place to the length. Meaning all the reserve property needs to do, is the same thing the above 2 lines already does. With one possible exception, I am not sure when/if the array frees the un-used memory, so it's possible that after those lines sometime in the middle of 1000 or so concatentation ops it decides to free the excess and I'd be back at square one (almost).Because the more rules you have the more complex things become. -- -Anderson: http://badmama.com.au/~anderson/Its not to hard to write an array class that has a reserve (10 lines or so), but it would be much harder the other way round.Everything counts if you do it enough times. Why do something you do not need?Example.. char[] p; p.length = 10; the above adds 10 elements to the array, initializing them to the default value. This is the current behaviour. char[] p; p.reserve = 10; the above grabs enough memory for 10 elements, but does not add/initialize any.Initialization is pretty cheap for arrays. That is initialization only take a fraction of the time it requires to allocate the memory for the array. So whatever the time saved with initialization, the user will most probably not notice it.
Jun 16 2004