digitalmars.D - string concatenation idea.

Regan Heath (26/26) Jun 14 2004 The thread "string performance issues" by "Daniel Horn" got me thinking ...

Matthew (19/45) Jun 14 2004 It means adding another member to slices, and would complicate it in oth...

Regan Heath (10/81) Jun 14 2004 Why do you need to add another member to slices?

Stewart Gordon (31/43) Jun 15 2004

Hauke Duden (12/15) Jun 15 2004 I don't think it means that. It has already been mentioned that

Regan Heath (13/31) Jun 15 2004 Or rather simply do the allocation the above steps would cause.
Arcane Jill (21/35) Jun 16 2004 I'm not convinced, though, that this is defined behavior. It seems possi...

Hauke Duden (11/40) Jun 16 2004 I agree that it should be that way. But Walter himself has recommended

Arcane Jill (21/23) Jun 16 2004 I haven't heard Walter say that myself, but if he did, then he should kn...

J Anderson (5/32) Jun 16 2004 Good points. That is why I said reserve should go into phobos so

Ben Hinkle (9/67) Jun 16 2004 The length-juggling trick also has limitations: setting the length to 0 ...

J Anderson (9/34) Jun 14 2004 I think this would work:

Derek Parnell (10/59) Jun 14 2004 How about ...

J Anderson (4/41) Jun 14 2004 I didn't think of that, neat.

Regan Heath (7/45) Jun 14 2004 Yes, but not quite as neat as it could be, see the real problem in the

Regan Heath (6/60) Jun 14 2004 This works, but see the other thread "string performance issues" for an

J Anderson (25/88) Jun 14 2004 Personally I'd use block allocation for a problem like that. That's

Ivan Senji (7/100) Jun 14 2004 length

J Anderson (4/139) Jun 14 2004 --

Regan Heath (28/121) Jun 14 2004 Yes, the poster was simply stating that it should be easier than that in...

J Anderson (32/83) Jun 15 2004 You can make your own class to do that. The reason against that

J Anderson (8/11) Jun 15 2004 I do think it should be in phobos somewhere (where you don't need to

Ivan Senji (4/16) Jun 15 2004 I agree. D lets us add "properties" to arrays so why not show how this
Regan Heath (7/17) Jun 15 2004 FYI Walter mentioned that std.outbuffer has a struct/class for doing wha...

Regan Heath (28/124) Jun 15 2004 I know I can, I have.

J Anderson (6/109) Jun 16 2004 I already mentioned that way of doing it without overhead. After some

Regan Heath <regan netwin.co.nz> writes:

The thread "string performance issues" by "Daniel Horn" got me thinking of 
an idea for a change to arrays.

Bascially, concatenation can be slow, as it causes reallocations of the 
array. If you could pre-allocate the array then it wouldn't be as slow.

I tried this:

char[] p = "regan"

p.length = 10;
p ~= "fred";

and ended up with a string containing

'r' 'e' 'g' 'a' 'n' '0' '0' '0' '0' '0' 'f' 'r' 'e' 'd'

which was not what I was after :)

I remembered a thread on arrays requesting renaming the 'length' property 
to 'reserve' or something like that, and the idea for the addition of a 
reserve property that simply allocated memory to the array without 
changing it's length came to me. If we could go:

char[] p = "regan";

p.reserve = 10;
p ~= "fred";

and end up with a string containing

'r' 'e' 'g' 'a' 'n' 'f' 'r' 'e' 'd' '0'

and a length of 9. Then we could do fast concatenation. Otherwise we're 
left writing a String class that achieves this by setting length and using 
memcpy. I thought a design goal of D was to avoid this.

Thoughts?

-- 
Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/

Jun 14 2004

"Matthew" <admin stlsoft.dot.dot.dot.dot.org> writes:

It means adding another member to slices, and would complicate it in other ways
also, since one would have to distinguish between slices that own and slices
that
view.

I prefer a language based approach, similar to that for Java, whereby the
elements in a single concatenation statement are actually appended to a
StringBuffer. This boosts string concatenation performance enormously.

I showed how to achieve this in a similar vein, and with similarly significant
performance benefits, for C++ in my recent (June's CUJ) article "Fast,
Non-intrusive String Concatenation". Walter was one of the reviewers, so he's au
fait with the technique.

I suggest something similar can be done for D, by transcribing ~ sequences into
calls to an underlying implementation class/API. It wouldn't be hard to do, and
the restriction would just be the same for Java and C++ (using my
fast_string_concatenator<>), in that it would only work for a single statement.
Not that that's a particularly onerous restriction, of course.

The alternative is just to have a StringBuilder class, which doesn't seem too
much of a burden either.

"Regan Heath" <regan netwin.co.nz> wrote in message
news:opr9l56hws5a2sq9 digitalmars.com...
 The thread "string performance issues" by "Daniel Horn" got me thinking of
 an idea for a change to arrays.

 Bascially, concatenation can be slow, as it causes reallocations of the
 array. If you could pre-allocate the array then it wouldn't be as slow.

 I tried this:

 char[] p = "regan"

 p.length = 10;
 p ~= "fred";

 and ended up with a string containing

 'r' 'e' 'g' 'a' 'n' '0' '0' '0' '0' '0' 'f' 'r' 'e' 'd'

 which was not what I was after :)

 I remembered a thread on arrays requesting renaming the 'length' property
 to 'reserve' or something like that, and the idea for the addition of a
 reserve property that simply allocated memory to the array without
 changing it's length came to me. If we could go:

 char[] p = "regan";

 p.reserve = 10;
 p ~= "fred";

 and end up with a string containing

 'r' 'e' 'g' 'a' 'n' 'f' 'r' 'e' 'd' '0'

 and a length of 9. Then we could do fast concatenation. Otherwise we're
 left writing a String class that achieves this by setting length and using
 memcpy. I thought a design goal of D was to avoid this.

 Thoughts?

 -- 
 Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/

Jun 14 2004

Regan Heath <regan netwin.co.nz> writes:

On Tue, 15 Jun 2004 12:31:48 +1000, Matthew 
<admin stlsoft.dot.dot.dot.dot.org> wrote:
 It means adding another member to slices, and would complicate it in 
 other ways
 also, since one would have to distinguish between slices that own and 
 slices that
 view.

Why do you need to add another member to slices?
Don't we already have to distinguish between slices that own and slices 
that view?

 I prefer a language based approach, similar to that for Java, whereby the
 elements in a single concatenation statement are actually appended to a
 StringBuffer. This boosts string concatenation performance enormously.

 I showed how to achieve this in a similar vein, and with similarly 
 significant
 performance benefits, for C++ in my recent (June's CUJ) article "Fast,
 Non-intrusive String Concatenation". Walter was one of the reviewers, so 
 he's au
 fait with the technique.

 I suggest something similar can be done for D, by transcribing ~ 
 sequences into
 calls to an underlying implementation class/API. It wouldn't be hard to 
 do, and
 the restriction would just be the same for Java and C++ (using my
 fast_string_concatenator<>), in that it would only work for a single 
 statement.
 Not that that's a particularly onerous restriction, of course.

 The alternative is just to have a StringBuilder class, which doesn't 
 seem too
 much of a burden either.

I thought the point of adding strings (i.e. char[]) to D was to avoid 
having other String classes?

Regan

 "Regan Heath" <regan netwin.co.nz> wrote in message
 news:opr9l56hws5a2sq9 digitalmars.com...
 The thread "string performance issues" by "Daniel Horn" got me thinking 
 of
 an idea for a change to arrays.

 Bascially, concatenation can be slow, as it causes reallocations of the
 array. If you could pre-allocate the array then it wouldn't be as slow.

 I tried this:

 char[] p = "regan"

 p.length = 10;
 p ~= "fred";

 and ended up with a string containing

 'r' 'e' 'g' 'a' 'n' '0' '0' '0' '0' '0' 'f' 'r' 'e' 'd'

 which was not what I was after :)

 I remembered a thread on arrays requesting renaming the 'length' 
 property
 to 'reserve' or something like that, and the idea for the addition of a
 reserve property that simply allocated memory to the array without
 changing it's length came to me. If we could go:

 char[] p = "regan";

 p.reserve = 10;
 p ~= "fred";

 and end up with a string containing

 'r' 'e' 'g' 'a' 'n' 'f' 'r' 'e' 'd' '0'

 and a length of 9. Then we could do fast concatenation. Otherwise we're
 left writing a String class that achieves this by setting length and 
 using
 memcpy. I thought a design goal of D was to avoid this.

 Thoughts?

 --
 Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/




-- 
Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/

Jun 14 2004

Stewart Gordon <smjg_1998 yahoo.com> writes:

Regan Heath wrote:
 On Tue, 15 Jun 2004 12:31:48 +1000, Matthew 
 <admin stlsoft.dot.dot.dot.dot.org> wrote:
 
 It means adding another member to slices, and would complicate it in 
 other ways
 also, since one would have to distinguish between slices that own and 
 slices that
 view.

 
 Why do you need to add another member to slices?
 Don't we already have to distinguish between slices that own and slices 
 that view?

<snip>

My experiments show that arrays are allocated in powers of 2 or 
multiples of 2K.  (At least 1D arrays of an atomic type, otherwise I 
haven't experimented.)  There obviously is a reserve property, though 
looking at the ABI it must be of the allocated block rather than of the 
array reference itself.

My experiments with slicing showed that a slice becomes a newly 
allocated array if the length is changed, even if it doesn't exceed the 
originally sliced length.

But there are two cases I haven't tested yet:

	int[] qwert = new int[10];
	int[] yuiop = qwert;
	qwert.length = 14;	// remaining within allocated block
	// ... put some data in qwert
	yuiop.length = 13;

It would follow that qwert and yuiop still have the same start address, 
and setting the length of yuiop would overwrite qwert[10..13].

	int[] qwert = new int[10];
	int[] yuiop = qwert[0..5];
	yuiop.length = 8;

If the distinction between an owner and a viewer is simply a matter of 
whether the start address is the start of the allocated block, then 
likewise here.

Hang on ... better check if length setting and concatenation really have 
the same effect....

Stewart.

-- 
My e-mail is valid but not my primary mailbox, aside from its being the 
unfortunate victim of intensive mail-bombing at the moment.  Please keep 
replies on the 'group where everyone may benefit.

Jun 15 2004

Hauke Duden <H.NS.Duden gmx.net> writes:

Matthew wrote:
 It means adding another member to slices, and would complicate it in other ways
 also, since one would have to distinguish between slices that own and slices
that
 view.

I don't think it means that. It has already been mentioned that

s.reserve(x)

is the same as

s.length=x;
s.length=oldLength;

The compiler could simply rewrite it.

Note that the idea of a global template function is not quite 
equivalent, since the function is not in the "namespace" of the char 
type, but rather in global scope. Which means that multiple such 
functions will only overload if they are aliased.


Hauke

Jun 15 2004

Regan Heath <regan netwin.co.nz> writes:

On Tue, 15 Jun 2004 10:24:21 +0200, Hauke Duden <H.NS.Duden gmx.net> wrote:
 Matthew wrote:
 It means adding another member to slices, and would complicate it in 
 other ways
 also, since one would have to distinguish between slices that own and 
 slices that
 view.

 I don't think it means that. It has already been mentioned that

 s.reserve(x)

 is the same as

 s.length=x;
 s.length=oldLength;

 The compiler could simply rewrite it.

Or rather simply do the allocation the above steps would cause.

The only worry I have is that if you do this..

 s.length=x;
 s.length=oldLength;

when, if ever does it free the memory you are no longer using?

Because if it does it sometime in the middle of a few thousand concat 
operations then were almost back at square one.

 Note that the idea of a global template function is not quite 
 equivalent, since the function is not in the "namespace" of the char 
 type, but rather in global scope. Which means that multiple such 
 functions will only overload if they are aliased.

While it's a nice workaround I cannot see why we cant have a reserve 
property, IMO it makes sense for all array types, it's the obvious way to 
do it (so people looking for a way will find it), it's dead simple to 
change, it does not cause any problems with existing code...

Regan.

-- 
Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/

Jun 15 2004

Arcane Jill <Arcane_member pathlink.com> writes:

In article <cambih$16cs$1 digitaldaemon.com>, Hauke Duden says...
I don't think it means that. It has already been mentioned that

s.reserve(x)

is the same as

s.length=x;
s.length=oldLength;

The compiler could simply rewrite it.


I think it's more like:

 if (x > s.length)
 {
    uint t = s.length;
    s.length = x;
    s.length = t;
 }

I'm not convinced, though, that this is defined behavior. It seems possible to
me that future (or even third-party) D-compilers would be at liberty to NOT make
this equivalence, unless it were actually part of the spec.

Also, it seems to me that a really supercharged optimizer might decide that the
above code leaves everything unchanged, and so could be removed altogether.
(Optimizers don't always have the same notion as we coders of what is intended
and what is merely a side-effect).

Finally, note that you can do:

 s = s[0..0];

instead of:

 s = null;

To set a string's length to zero without freeing the allocated memory. But
again, not only is this not documented, but the documentation actually implies
that this SHOULDN'T work. There is debate over whether this is a feature or a
bug.

Arcane Jill


PS. It occurs to me that the statement:

char[] s = a ~ b ~ c ~ d ~ e ~ f ~ g ~ h;

would run a lot faster if it were rewritten as:

((((((((s ~= a) ~= b) ~= c) ~= d) ~= e) ~= f) ~= g) ~= h);

(Sorry about the brackets - couldn't remember the associativity order of ~=).

Jun 16 2004

Hauke Duden <H.NS.Duden gmx.net> writes:

Arcane Jill wrote:
 In article <cambih$16cs$1 digitaldaemon.com>, Hauke Duden says...
 
I don't think it means that. It has already been mentioned that

s.reserve(x)

is the same as

s.length=x;
s.length=oldLength;

The compiler could simply rewrite it.

 
 
 
 I think it's more like:
 
 
if (x > s.length)
{
   uint t = s.length;
   s.length = x;
   s.length = t;
}

 
 
 I'm not convinced, though, that this is defined behavior. It seems possible to
 me that future (or even third-party) D-compilers would be at liberty to NOT
make
 this equivalence, unless it were actually part of the spec.

I agree that it should be that way. But Walter himself has recommended 
the length-juggling technique in this newsgroup. I don't like it either.

That's why I think there should be a reserve property. The compiler is 
the best choice for deciding how it should be implemented. If arrays can 
never shrink fine, it can be implemented by changing the length 
property. Otherwise the compiler has to figure out the best way, 
depending on the implementation.

If there is no reserve property then people will simply use the 
undocumented behaviour because it's the simplest way.

Hauke

Jun 16 2004

Arcane Jill <Arcane_member pathlink.com> writes:

In article <caovi5$23qt$1 digitaldaemon.com>, Hauke Duden says...

But Walter himself has recommended 
the length-juggling technique in this newsgroup.

I haven't heard Walter say that myself, but if he did, then he should know
better. Nobody should EVER rely on undocumented behavior, and it would be
extremely bad practice to encourage others so to do.

What Walter says on this newsgroup does not constitute the D standard. Only the
published manual may make that claim. Every time the DMD compiler deviates from
the documentation, it ceases to be standards-compliant. (Of course, Walter has
the power to change the standard as well as the implementation, but he still has
to actually DO it, not just talk about doing it).

There is nothing to stop anyone from writing a second D compiler tomorrow, and I
do not wish to go back to the bad old days of BASIC, when every implementation
of BASIC was entirely non-portable to any other because of various stupid little
quirks and features unique to each.

If it were officially documented that decreasing the length of a array is
guaranteed not to change the address of the string, and is further guaranteed
not to release any of the memory within the array's former bounds, then, and
only then, could this trick be expected to work on all present and future
standards-compliant D compilers.

Arcane Jill

(PS. Walter - sorry if this sounds rude. That wasn't my intention, I just come
across that way sometimes by accident).

Jun 16 2004

J Anderson <REMOVEanderson badmama.com.au> writes:

Arcane Jill wrote:

In article <caovi5$23qt$1 digitaldaemon.com>, Hauke Duden says...

  

But Walter himself has recommended 
the length-juggling technique in this newsgroup.
    

I haven't heard Walter say that myself, but if he did, then he should know
better. Nobody should EVER rely on undocumented behavior, and it would be
extremely bad practice to encourage others so to do.

What Walter says on this newsgroup does not constitute the D standard. Only the
published manual may make that claim. Every time the DMD compiler deviates from
the documentation, it ceases to be standards-compliant. (Of course, Walter has
the power to change the standard as well as the implementation, but he still has
to actually DO it, not just talk about doing it).

There is nothing to stop anyone from writing a second D compiler tomorrow, and I
do not wish to go back to the bad old days of BASIC, when every implementation
of BASIC was entirely non-portable to any other because of various stupid little
quirks and features unique to each.

If it were officially documented that decreasing the length of a array is
guaranteed not to change the address of the string, and is further guaranteed
not to release any of the memory within the array's former bounds, then, and
only then, could this trick be expected to work on all present and future
standards-compliant D compilers.

Arcane Jill

(PS. Walter - sorry if this sounds rude. That wasn't my intention, I just come
across that way sometimes by accident).
  

Good points.  That is why I said reserve should go into phobos so 
vendors can optimise there own and users can also write their own versions.

-- 
-Anderson: http://badmama.com.au/~anderson/

Jun 16 2004

Ben Hinkle <bhinkle4 juno.com> writes:

Arcane Jill wrote:

 In article <cambih$16cs$1 digitaldaemon.com>, Hauke Duden says...
I don't think it means that. It has already been mentioned that

s.reserve(x)

is the same as

s.length=x;
s.length=oldLength;

The compiler could simply rewrite it.

 
 
 I think it's more like:
 
 if (x > s.length)
 {
    uint t = s.length;
    s.length = x;
    s.length = t;
 }

 
 I'm not convinced, though, that this is defined behavior. It seems
 possible to me that future (or even third-party) D-compilers would be at
 liberty to NOT make this equivalence, unless it were actually part of the
 spec.
 
 Also, it seems to me that a really supercharged optimizer might decide
 that the above code leaves everything unchanged, and so could be removed
 altogether. (Optimizers don't always have the same notion as we coders of
 what is intended and what is merely a side-effect).
 
 Finally, note that you can do:
 
 s = s[0..0];

 
 instead of:
 
 s = null;

 
 To set a string's length to zero without freeing the allocated memory. But
 again, not only is this not documented, but the documentation actually
 implies that this SHOULDN'T work. There is debate over whether this is a
 feature or a bug.

The length-juggling trick also has limitations: setting the length to 0 or
from 0 will wipe out the data ptr. Look in src/internal/gc/gc.d the
function _d_arraysetlength. For example
 s.length = 10; // reserve some space
 s.length = 0;  // sets data ptr to null
 s.length = 10; // reserve some more space
 s = s[0..0];   // preserves the data ptr
 s.length = 5;  // allocates new data ptr



 Arcane Jill
 
 
 PS. It occurs to me that the statement:
 
 char[] s = a ~ b ~ c ~ d ~ e ~ f ~ g ~ h;
 
 would run a lot faster if it were rewritten as:
 
 ((((((((s ~= a) ~= b) ~= c) ~= d) ~= e) ~= f) ~= g) ~= h);
 
 (Sorry about the brackets - couldn't remember the associativity order of
 ~=).

Jun 16 2004

J Anderson <REMOVEanderson badmama.com.au> writes:

Regan Heath wrote:

 The thread "string performance issues" by "Daniel Horn" got me 
 thinking of an idea for a change to arrays.

 Bascially, concatenation can be slow, as it causes reallocations of 
 the array. If you could pre-allocate the array then it wouldn't be as 
 slow.

 I tried this:

 char[] p = "regan"

 p.length = 10;
 p ~= "fred";

 and ended up with a string containing

 'r' 'e' 'g' 'a' 'n' '0' '0' '0' '0' '0' 'f' 'r' 'e' 'd'

 which was not what I was after :)

 I remembered a thread on arrays requesting renaming the 'length' 
 property to 'reserve' or something like that, and the idea for the 
 addition of a reserve property that simply allocated memory to the 
 array without changing it's length came to me. If we could go:

 char[] p = "regan";

 p.reserve = 10;
 p ~= "fred";

 and end up with a string containing

 'r' 'e' 'g' 'a' 'n' 'f' 'r' 'e' 'd' '0'

 and a length of 9. Then we could do fast concatenation. Otherwise 
 we're left writing a String class that achieves this by setting length 
 and using memcpy. I thought a design goal of D was to avoid this.

 Thoughts?

I think this would work:
char[] p = "regan"
p.length = 10;
p.length = 5;
p ~= "fred";

as D doesn't clean up the memory straight away.

-- 
-Anderson: http://badmama.com.au/~anderson/

Jun 14 2004

Derek Parnell <derek psych.ward> writes:

On Tue, 15 Jun 2004 10:35:20 +0800, J Anderson wrote:

 Regan Heath wrote:
 
 The thread "string performance issues" by "Daniel Horn" got me 
 thinking of an idea for a change to arrays.

 Bascially, concatenation can be slow, as it causes reallocations of 
 the array. If you could pre-allocate the array then it wouldn't be as 
 slow.

 I tried this:

 char[] p = "regan"

 p.length = 10;
 p ~= "fred";

 and ended up with a string containing

 'r' 'e' 'g' 'a' 'n' '0' '0' '0' '0' '0' 'f' 'r' 'e' 'd'

 which was not what I was after :)

 I remembered a thread on arrays requesting renaming the 'length' 
 property to 'reserve' or something like that, and the idea for the 
 addition of a reserve property that simply allocated memory to the 
 array without changing it's length came to me. If we could go:

 char[] p = "regan";

 p.reserve = 10;
 p ~= "fred";

 and end up with a string containing

 'r' 'e' 'g' 'a' 'n' 'f' 'r' 'e' 'd' '0'

 and a length of 9. Then we could do fast concatenation. Otherwise 
 we're left writing a String class that achieves this by setting length 
 and using memcpy. I thought a design goal of D was to avoid this.

 Thoughts?

 I think this would work:
 char[] p = "regan"
 p.length = 10;
 p.length = 5;
 p ~= "fred";
 
 as D doesn't clean up the memory straight away.

How about ...

 char[] p = "regan"
 p.length = 10;
 p[5..9] = "fred";
  
-- 
Derek
Melbourne, Australia
15/Jun/04 1:04:56 PM

Jun 14 2004

J Anderson <REMOVEanderson badmama.com.au> writes:

Derek Parnell wrote:

On Tue, 15 Jun 2004 10:35:20 +0800, J Anderson wrote:

  

Regan Heath wrote:
    

Bascially, concatenation can be slow, as it causes reallocations of 
the array. If you could pre-allocate the array then it wouldn't be as 
slow.

I tried this:

char[] p = "regan"

p.length = 10;
p ~= "fred";

and ended up with a string containing

'r' 'e' 'g' 'a' 'n' '0' '0' '0' '0' '0' 'f' 'r' 'e' 'd'

      

I think this would work:
char[] p = "regan"
p.length = 10;
p.length = 5;
p ~= "fred";

as D doesn't clean up the memory straight away.
    

How about ...

 char[] p = "regan"
 p.length = 10;
 p[5..9] = "fred";
  
  

I didn't think of that, neat.

-- 
-Anderson: http://badmama.com.au/~anderson/

Jun 14 2004

Regan Heath <regan netwin.co.nz> writes:

On Tue, 15 Jun 2004 11:15:16 +0800, J Anderson 
<REMOVEanderson badmama.com.au> wrote:
 Derek Parnell wrote:

 On Tue, 15 Jun 2004 10:35:20 +0800, J Anderson wrote:


 Regan Heath wrote:

 Bascially, concatenation can be slow, as it causes reallocations of 
 the array. If you could pre-allocate the array then it wouldn't be as 
 slow.

 I tried this:

 char[] p = "regan"

 p.length = 10;
 p ~= "fred";

 and ended up with a string containing

 'r' 'e' 'g' 'a' 'n' '0' '0' '0' '0' '0' 'f' 'r' 'e' 'd'

 I think this would work:
 char[] p = "regan"
 p.length = 10;
 p.length = 5;
 p ~= "fred";

 as D doesn't clean up the memory straight away.

 How about ...

 char[] p = "regan"
 p.length = 10;
 p[5..9] = "fred";

 I didn't think of that, neat.

Yes, but not quite as neat as it could be, see the real problem in the 
thread "string performance issues" by "Daniel Horn".

Regan.

-- 
Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/

Jun 14 2004

Regan Heath <regan netwin.co.nz> writes:

On Tue, 15 Jun 2004 13:11:11 +1000, Derek Parnell <derek psych.ward> wrote:
 On Tue, 15 Jun 2004 10:35:20 +0800, J Anderson wrote:

 Regan Heath wrote:

 The thread "string performance issues" by "Daniel Horn" got me
 thinking of an idea for a change to arrays.

 Bascially, concatenation can be slow, as it causes reallocations of
 the array. If you could pre-allocate the array then it wouldn't be as
 slow.

 I tried this:

 char[] p = "regan"

 p.length = 10;
 p ~= "fred";

 and ended up with a string containing

 'r' 'e' 'g' 'a' 'n' '0' '0' '0' '0' '0' 'f' 'r' 'e' 'd'

 which was not what I was after :)

 I remembered a thread on arrays requesting renaming the 'length'
 property to 'reserve' or something like that, and the idea for the
 addition of a reserve property that simply allocated memory to the
 array without changing it's length came to me. If we could go:

 char[] p = "regan";

 p.reserve = 10;
 p ~= "fred";

 and end up with a string containing

 'r' 'e' 'g' 'a' 'n' 'f' 'r' 'e' 'd' '0'

 and a length of 9. Then we could do fast concatenation. Otherwise
 we're left writing a String class that achieves this by setting length
 and using memcpy. I thought a design goal of D was to avoid this.

 Thoughts?

 I think this would work:
 char[] p = "regan"
 p.length = 10;
 p.length = 5;
 p ~= "fred";

 as D doesn't clean up the memory straight away.

 How about ...

  char[] p = "regan"
  p.length = 10;
  p[5..9] = "fred";

This works, but see the other thread "string performance issues" for an 
example of the real problem.

Regan

-- 
Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/

Jun 14 2004

J Anderson <REMOVEanderson badmama.com.au> writes:

Regan Heath wrote:

 On Tue, 15 Jun 2004 13:11:11 +1000, Derek Parnell <derek psych.ward> 
 wrote:

 On Tue, 15 Jun 2004 10:35:20 +0800, J Anderson wrote:

 Regan Heath wrote:

 The thread "string performance issues" by "Daniel Horn" got me
 thinking of an idea for a change to arrays.

 Bascially, concatenation can be slow, as it causes reallocations of
 the array. If you could pre-allocate the array then it wouldn't be as
 slow.

 I tried this:

 char[] p = "regan"

 p.length = 10;
 p ~= "fred";

 and ended up with a string containing

 'r' 'e' 'g' 'a' 'n' '0' '0' '0' '0' '0' 'f' 'r' 'e' 'd'

 which was not what I was after :)

 I remembered a thread on arrays requesting renaming the 'length'
 property to 'reserve' or something like that, and the idea for the
 addition of a reserve property that simply allocated memory to the
 array without changing it's length came to me. If we could go:

 char[] p = "regan";

 p.reserve = 10;
 p ~= "fred";

 and end up with a string containing

 'r' 'e' 'g' 'a' 'n' 'f' 'r' 'e' 'd' '0'

 and a length of 9. Then we could do fast concatenation. Otherwise
 we're left writing a String class that achieves this by setting length
 and using memcpy. I thought a design goal of D was to avoid this.

 Thoughts?

 I think this would work:
 char[] p = "regan"
 p.length = 10;
 p.length = 5;
 p ~= "fred";

 as D doesn't clean up the memory straight away.


 How about ...

  char[] p = "regan"
  p.length = 10;
  p[5..9] = "fred";


 This works, but see the other thread "string performance issues" for 
 an example of the real problem.

 Regan

Personally I'd use block allocation for a problem like that.  That's 
what you'd do in C.  Then after woods you simply trim the array to the 
size you really want (or use length = block; length = 0; beforehand).

I don't think strings should be treated as a specific type of array.  
Whatever applies to char array should also apply to every other type of 
array, except for the automatic conversion to zero terminate arrays of 
course.

Adding a reserve property could increase the string overhead, unless all 
it did was:

template reserveT(T)
{
   void reserve(inout char [] array, uint length)
   {
       int oldlen = array.length;
       array.length = length;
       array.length = oldlen;
   }
}

alias reserveT!(char).reserve reserve;

Hay, what do you know - I just solved your problem *grin*.

Now you can write:

array.reserve(10);


-- 
-Anderson: http://badmama.com.au/~anderson/

Jun 14 2004

"Ivan Senji" <ivan.senji public.srce.hr> writes:

"J Anderson" <REMOVEanderson badmama.com.au> wrote in message
news:calte9$fm0$1 digitaldaemon.com...
 Regan Heath wrote:

 On Tue, 15 Jun 2004 13:11:11 +1000, Derek Parnell <derek psych.ward>
 wrote:

 On Tue, 15 Jun 2004 10:35:20 +0800, J Anderson wrote:

 Regan Heath wrote:

 The thread "string performance issues" by "Daniel Horn" got me
 thinking of an idea for a change to arrays.

 Bascially, concatenation can be slow, as it causes reallocations of
 the array. If you could pre-allocate the array then it wouldn't be as
 slow.

 I tried this:

 char[] p = "regan"

 p.length = 10;
 p ~= "fred";

 and ended up with a string containing

 'r' 'e' 'g' 'a' 'n' '0' '0' '0' '0' '0' 'f' 'r' 'e' 'd'

 which was not what I was after :)

 I remembered a thread on arrays requesting renaming the 'length'
 property to 'reserve' or something like that, and the idea for the
 addition of a reserve property that simply allocated memory to the
 array without changing it's length came to me. If we could go:

 char[] p = "regan";

 p.reserve = 10;
 p ~= "fred";

 and end up with a string containing

 'r' 'e' 'g' 'a' 'n' 'f' 'r' 'e' 'd' '0'

 and a length of 9. Then we could do fast concatenation. Otherwise
 we're left writing a String class that achieves this by setting





length
 and using memcpy. I thought a design goal of D was to avoid this.

 Thoughts?

 I think this would work:
 char[] p = "regan"
 p.length = 10;
 p.length = 5;
 p ~= "fred";

 as D doesn't clean up the memory straight away.


 How about ...

  char[] p = "regan"
  p.length = 10;
  p[5..9] = "fred";


 This works, but see the other thread "string performance issues" for
 an example of the real problem.

 Regan

 Personally I'd use block allocation for a problem like that.  That's
 what you'd do in C.  Then after woods you simply trim the array to the
 size you really want (or use length = block; length = 0; beforehand).

 I don't think strings should be treated as a specific type of array.
 Whatever applies to char array should also apply to every other type of
 array, except for the automatic conversion to zero terminate arrays of
 course.

 Adding a reserve property could increase the string overhead, unless all
 it did was:

 template reserveT(T)
 {
    void reserve(inout char [] array, uint length)

Did you mean to write:

    void reserve(inout T [] array, uint length)
:)

    {
        int oldlen = array.length;
        array.length = length;
        array.length = oldlen;
    }
 }

 alias reserveT!(char).reserve reserve;

 Hay, what do you know - I just solved your problem *grin*.

 Now you can write:

 array.reserve(10);

Nice :)


 --
 -Anderson: http://badmama.com.au/~anderson/

Jun 14 2004

J Anderson <REMOVEanderson badmama.com.au> writes:

Ivan Senji wrote:

"J Anderson" <REMOVEanderson badmama.com.au> wrote in message
news:calte9$fm0$1 digitaldaemon.com...
  

Regan Heath wrote:

    

On Tue, 15 Jun 2004 13:11:11 +1000, Derek Parnell <derek psych.ward>
wrote:

      

On Tue, 15 Jun 2004 10:35:20 +0800, J Anderson wrote:

        

Regan Heath wrote:

          

The thread "string performance issues" by "Daniel Horn" got me
thinking of an idea for a change to arrays.

Bascially, concatenation can be slow, as it causes reallocations of
the array. If you could pre-allocate the array then it wouldn't be as
slow.

I tried this:

char[] p = "regan"

p.length = 10;
p ~= "fred";

and ended up with a string containing

'r' 'e' 'g' 'a' 'n' '0' '0' '0' '0' '0' 'f' 'r' 'e' 'd'

which was not what I was after :)

I remembered a thread on arrays requesting renaming the 'length'
property to 'reserve' or something like that, and the idea for the
addition of a reserve property that simply allocated memory to the
array without changing it's length came to me. If we could go:

char[] p = "regan";

p.reserve = 10;
p ~= "fred";

and end up with a string containing

'r' 'e' 'g' 'a' 'n' 'f' 'r' 'e' 'd' '0'

and a length of 9. Then we could do fast concatenation. Otherwise
we're left writing a String class that achieves this by setting
            





length
  

and using memcpy. I thought a design goal of D was to avoid this.

Thoughts?

            

I think this would work:
char[] p = "regan"
p.length = 10;
p.length = 5;
p ~= "fred";

as D doesn't clean up the memory straight away.
          

How about ...

 char[] p = "regan"
 p.length = 10;
 p[5..9] = "fred";
        

This works, but see the other thread "string performance issues" for
an example of the real problem.

Regan

      

Personally I'd use block allocation for a problem like that.  That's
what you'd do in C.  Then after woods you simply trim the array to the
size you really want (or use length = block; length = 0; beforehand).

I don't think strings should be treated as a specific type of array.
Whatever applies to char array should also apply to every other type of
array, except for the automatic conversion to zero terminate arrays of
course.

Adding a reserve property could increase the string overhead, unless all
it did was:

template reserveT(T)
{
   void reserve(inout char [] array, uint length)
    

Did you mean to write:

    void reserve(inout T [] array, uint length)
:)
  

Good catch.

  

   {
       int oldlen = array.length;
       array.length = length;
       array.length = oldlen;
   }
}

alias reserveT!(char).reserve reserve;

Hay, what do you know - I just solved your problem *grin*.

Now you can write:

array.reserve(10);

    

Nice :)


  

--
-Anderson: http://badmama.com.au/~anderson/
    


  


-- 
-Anderson: http://badmama.com.au/~anderson/

Jun 14 2004

Regan Heath <regan netwin.co.nz> writes:

On Tue, 15 Jun 2004 12:20:31 +0800, J Anderson 
<REMOVEanderson badmama.com.au> wrote:

 Regan Heath wrote:

 On Tue, 15 Jun 2004 13:11:11 +1000, Derek Parnell <derek psych.ward> 
 wrote:

 On Tue, 15 Jun 2004 10:35:20 +0800, J Anderson wrote:

 Regan Heath wrote:

 The thread "string performance issues" by "Daniel Horn" got me
 thinking of an idea for a change to arrays.

 Bascially, concatenation can be slow, as it causes reallocations of
 the array. If you could pre-allocate the array then it wouldn't be as
 slow.

 I tried this:

 char[] p = "regan"

 p.length = 10;
 p ~= "fred";

 and ended up with a string containing

 'r' 'e' 'g' 'a' 'n' '0' '0' '0' '0' '0' 'f' 'r' 'e' 'd'

 which was not what I was after :)

 I remembered a thread on arrays requesting renaming the 'length'
 property to 'reserve' or something like that, and the idea for the
 addition of a reserve property that simply allocated memory to the
 array without changing it's length came to me. If we could go:

 char[] p = "regan";

 p.reserve = 10;
 p ~= "fred";

 and end up with a string containing

 'r' 'e' 'g' 'a' 'n' 'f' 'r' 'e' 'd' '0'

 and a length of 9. Then we could do fast concatenation. Otherwise
 we're left writing a String class that achieves this by setting 
 length
 and using memcpy. I thought a design goal of D was to avoid this.

 Thoughts?

 I think this would work:
 char[] p = "regan"
 p.length = 10;
 p.length = 5;
 p ~= "fred";

 as D doesn't clean up the memory straight away.


 How about ...

  char[] p = "regan"
  p.length = 10;
  p[5..9] = "fred";


 This works, but see the other thread "string performance issues" for an 
 example of the real problem.

 Regan

 Personally I'd use block allocation for a problem like that.  That's 
 what you'd do in C.

Yes, the poster was simply stating that it should be easier than that in 
D, after all that is the point of having a concatenation operator.

 Then after woods you simply trim the array to the size you really want 
 (or use length = block; length = 0; beforehand).

 I don't think strings should be treated as a specific type of array.

Neither do I, nor was I suggesting that. I want this property on all 
arrays, the example I gave used a char[] as that is what the original 
poster wanted it for.

 Whatever applies to char array should also apply to every other type of 
 array, except for the automatic conversion to zero terminate arrays of 
 course.

Agreed.

 Adding a reserve property could increase the string overhead, unless all 
 it did was:

 template reserveT(T)
 {
    void reserve(inout char [] array, uint length)
    {
        int oldlen = array.length;
        array.length = length;
        array.length = oldlen;
    }
 }

 alias reserveT!(char).reserve reserve;

 Hay, what do you know - I just solved your problem *grin*.

 Now you can write:

 array.reserve(10);

This should work. :)

So why not add this to arrays then?

Let me re-iterate what I am suggesting...

I think all arrays should have a property which you can use to specify the 

different to the length property which specifies the number of elements in 
the array.

Example..

char[] p;
p.length = 10;

the above adds 10 elements to the array, initializing them to the default 
value.
This is the current behaviour.

char[] p;
p.reserve = 10;

the above grabs enough memory for 10 elements, but does not add/initialize 
any.

Regan.

-- 
Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/

Jun 14 2004

J Anderson <REMOVEanderson badmama.com.au> writes:

Regan Heath wrote:

 Yes, the poster was simply stating that it should be easier than that 
 in D, after all that is the point of having a concatenation operator.

 Then after woods you simply trim the array to the size you really 
 want (or use length = block; length = 0; beforehand).

 I don't think strings should be treated as a specific type of array.


 Neither do I, nor was I suggesting that. I want this property on all 
 arrays, the example I gave used a char[] as that is what the original 
 poster wanted it for.

 Whatever applies to char array should also apply to every other type 
 of array, except for the automatic conversion to zero terminate 
 arrays of course.


 Agreed.

 Adding a reserve property could increase the string overhead, unless 
 all it did was:

 template reserveT(T)
 {
    void reserve(inout char [] array, uint length)
    {
        int oldlen = array.length;
        array.length = length;
        array.length = oldlen;
    }
 }

 alias reserveT!(char).reserve reserve;

 Hay, what do you know - I just solved your problem *grin*.

 Now you can write:

 array.reserve(10);


 This should work. :)

 So why not add this to arrays then?

 Let me re-iterate what I am suggesting...

 I think all arrays should have a property which you can use to specify 

 should be different to the length property which specifies the number 
 of elements in the array.

You can make your own class to do that.  The reason against that 
approach is that it requires more overhead (requiring both memory and 
CPU).  Having the most primitive array type is essential because then we 
can do things like you suggest.  If this overhead was included in the 
primitive type then we would have to lug it around even when we don't 
need the overhead.  Most of the time you can write your program so that 
you don't even need reserve tricks for efficiency by determining the 
amount of memory nessary.

An array with a reserve would look like:

struct array
{
    uint length;
    uint reserve;
    T * pointer 
}

compare that to:

struct array
{
    uint length;
    T * pointer 
}

Not to mention all the extra operations to maintain the reserved size.

Its not to hard to write an array class that has a reserve (10 lines or 
so), but it would be much harder the other way round. 

 Example..

 char[] p;
 p.length = 10;

 the above adds 10 elements to the array, initializing them to the 
 default value.
 This is the current behaviour.

 char[] p;
 p.reserve = 10;

 the above grabs enough memory for 10 elements, but does not 
 add/initialize any.

Initialization is pretty cheap for arrays.  That is initialization only 
take a fraction of the time it requires to allocate the memory for the 
array.  So whatever the time saved with initialization, the user will 
most probably not notice it.

 Regan.


All this talk just makes me like D arrays even more.

-- 
-Anderson: http://badmama.com.au/~anderson/

Jun 15 2004

J Anderson <REMOVEanderson badmama.com.au> writes:

J Anderson wrote:

 This should work. :)

 So why not add this to arrays then?


I do think it should be in phobos somewhere (where you don't need to 
include it).  I probably didn't outline that enough in the previous 
email.  That way different vendors can optimise the reserve to what 
works best for their program and individuals can overload it with there 
own optimised versions.

-- 
-Anderson: http://badmama.com.au/~anderson/

Jun 15 2004

"Ivan Senji" <ivan.senji public.srce.hr> writes:

"J Anderson" <REMOVEanderson badmama.com.au> wrote in message
news:cam7n9$vbv$2 digitaldaemon.com...
 J Anderson wrote:

 This should work. :)

 So why not add this to arrays then?


 I do think it should be in phobos somewhere (where you don't need to

I agree. D lets us add "properties" to arrays so why not show how this
feature can be useful!

 include it).  I probably didn't outline that enough in the previous
 email.  That way different vendors can optimise the reserve to what
 works best for their program and individuals can overload it with there
 own optimised versions.

 --
 -Anderson: http://badmama.com.au/~anderson/

Jun 15 2004

Regan Heath <regan netwin.co.nz> writes:

On Tue, 15 Jun 2004 15:15:59 +0800, J Anderson 
<REMOVEanderson badmama.com.au> wrote:

 J Anderson wrote:

 This should work. :)

 So why not add this to arrays then?


 I do think it should be in phobos somewhere (where you don't need to 
 include it).  I probably didn't outline that enough in the previous 
 email.  That way different vendors can optimise the reserve to what 
 works best for their program and individuals can overload it with there 
 own optimised versions.

FYI Walter mentioned that std.outbuffer has a struct/class for doing what 
I want also.

Regan.

-- 
Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/

Jun 15 2004

Regan Heath <regan netwin.co.nz> writes:

On Tue, 15 Jun 2004 15:12:59 +0800, J Anderson 
<REMOVEanderson badmama.com.au> wrote:
 Regan Heath wrote:

 Yes, the poster was simply stating that it should be easier than that 
 in D, after all that is the point of having a concatenation operator.

 Then after woods you simply trim the array to the size you really want 
 (or use length = block; length = 0; beforehand).

 I don't think strings should be treated as a specific type of array.


 Neither do I, nor was I suggesting that. I want this property on all 
 arrays, the example I gave used a char[] as that is what the original 
 poster wanted it for.

 Whatever applies to char array should also apply to every other type 
 of array, except for the automatic conversion to zero terminate arrays 
 of course.


 Agreed.

 Adding a reserve property could increase the string overhead, unless 
 all it did was:

 template reserveT(T)
 {
    void reserve(inout char [] array, uint length)
    {
        int oldlen = array.length;
        array.length = length;
        array.length = oldlen;
    }
 }

 alias reserveT!(char).reserve reserve;

 Hay, what do you know - I just solved your problem *grin*.

 Now you can write:

 array.reserve(10);


 This should work. :)

 So why not add this to arrays then?

 Let me re-iterate what I am suggesting...

 I think all arrays should have a property which you can use to specify 

 be different to the length property which specifies the number of 
 elements in the array.

 You can make your own class to do that.

I know I can, I have.

 The reason against that approach is that it requires more overhead 
 (requiring both memory and CPU).

I dont think so. For example current array behaviour if you go.

a.length = 1000;
a.length = 1;

is to allocate space for 1000 entries and not free it.

Meaning the array already stores (or calculates) it's size in memory in a 
seperate place to the length.

Meaning all the reserve property needs to do, is the same thing the above 
2 lines already does.

With one possible exception, I am not sure when/if the array frees the 
un-used memory, so it's possible that after those lines sometime in the 
middle of 1000 or so concatentation ops it decides to free the excess and 
I'd be back at square one (almost).

 Having the most primitive array type is essential because then we can do 
 things like you suggest.  If this overhead was included in the primitive 
 type then we would have to lug it around even when we don't need the 
 overhead.  Most of the time you can write your program so that you don't 
 even need reserve tricks for efficiency by determining the amount of 
 memory nessary.

I agree totally, I don't believe my suggestion adds any overhead 
whatsoever.

 An array with a reserve would look like:

 struct array
 {
     uint length;
     uint reserve;
     T * pointer }

 compare that to:

 struct array
 {
     uint length;
     T * pointer }

 Not to mention all the extra operations to maintain the reserved size.

I think you'll find they're already there in some form as arrays already 
display the behaviour I want, just not in an easy to use/understand/find 
form. It looks like a hack currently.

 Its not to hard to write an array class that has a reserve (10 lines or 
 so), but it would be much harder the other way round.
 Example..

 char[] p;
 p.length = 10;

 the above adds 10 elements to the array, initializing them to the 
 default value.
 This is the current behaviour.

 char[] p;
 p.reserve = 10;

 the above grabs enough memory for 10 elements, but does not 
 add/initialize any.

 Initialization is pretty cheap for arrays.  That is initialization only 
 take a fraction of the time it requires to allocate the memory for the 
 array.  So whatever the time saved with initialization, the user will 
 most probably not notice it.

Everything counts if you do it enough times. Why do something you do not 
need?

 Regan.


 All this talk just makes me like D arrays even more.

I like em too. Writing the class/struct to give me this behaviour was 
trivial, easier than in C/C++.

But I still believe this change is a good one, it has pro's and no cons.

Regan.

-- 
Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/

Jun 15 2004

J Anderson <REMOVEanderson badmama.com.au> writes:

Regan Heath wrote:

 On Tue, 15 Jun 2004 15:12:59 +0800, J Anderson 
 <REMOVEanderson badmama.com.au> wrote:

 Regan Heath wrote:

 Yes, the poster was simply stating that it should be easier than 
 that in D, after all that is the point of having a concatenation 
 operator.

 Then after woods you simply trim the array to the size you really 
 want (or use length = block; length = 0; beforehand).

 I don't think strings should be treated as a specific type of array.



 Neither do I, nor was I suggesting that. I want this property on all 
 arrays, the example I gave used a char[] as that is what the 
 original poster wanted it for.

 Whatever applies to char array should also apply to every other 
 type of array, except for the automatic conversion to zero 
 terminate arrays of course.



 Agreed.

 Adding a reserve property could increase the string overhead, 
 unless all it did was:

 template reserveT(T)
 {
    void reserve(inout char [] array, uint length)
    {
        int oldlen = array.length;
        array.length = length;
        array.length = oldlen;
    }
 }

 alias reserveT!(char).reserve reserve;

 Hay, what do you know - I just solved your problem *grin*.

 Now you can write:

 array.reserve(10);



 This should work. :)

 So why not add this to arrays then?

 Let me re-iterate what I am suggesting...

 I think all arrays should have a property which you can use to 

 this should be different to the length property which specifies the 
 number of elements in the array.


 You can make your own class to do that.


 I know I can, I have.

 The reason against that approach is that it requires more overhead 
 (requiring both memory and CPU).


 I dont think so. For example current array behaviour if you go.

 a.length = 1000;
 a.length = 1;

 is to allocate space for 1000 entries and not free it.

 Meaning the array already stores (or calculates) it's size in memory 
 in a seperate place to the length.

 Meaning all the reserve property needs to do, is the same thing the 
 above 2 lines already does.

 With one possible exception, I am not sure when/if the array frees the 
 un-used memory, so it's possible that after those lines sometime in 
 the middle of 1000 or so concatentation ops it decides to free the 
 excess and I'd be back at square one (almost).

I already mentioned that way of doing it without overhead.  After some 
time though, that memory will be re-claimed, which I don't mind.

 Its not to hard to write an array class that has a reserve (10 lines 
 or so), but it would be much harder the other way round.

 Example..

 char[] p;
 p.length = 10;

 the above adds 10 elements to the array, initializing them to the 
 default value.
 This is the current behaviour.

 char[] p;
 p.reserve = 10;

 the above grabs enough memory for 10 elements, but does not 
 add/initialize any.


 Initialization is pretty cheap for arrays.  That is initialization 
 only take a fraction of the time it requires to allocate the memory 
 for the array.  So whatever the time saved with initialization, the 
 user will most probably not notice it.


 Everything counts if you do it enough times. Why do something you do 
 not need?

Because the more rules you have the more complex things become.

-- 
-Anderson: http://badmama.com.au/~anderson/

Jun 16 2004

D Programming

C/C++ Programming

Other

digitalmars.D - string concatenation idea.