digitalmars.D - Appending char[] to char[][] has unexpected results

Tim Keating (26/26) Apr 30 2013 Not sure whether this is a bug, or perhaps I'm misunderstanding

anonymous (9/36) Apr 30 2013 Just outBuf ~= buf.dup; works, too. Without .dup you're

Tim Keating (4/12) May 01 2013 Okay, that was obviously the bit I was missing. The dchar

Peter Alexander (26/38) May 01 2013 The wchar and dchar versions don't reuse the buffer. Not sure

Steven Schveighoffer (19/21) May 01 2013 Wow, that is really awful.

Steven Schveighoffer (5/16) May 01 2013 Note that it could have worked even with utf8 depending on the input

"Tim Keating" <mrtact gmail.com> writes:

Not sure whether this is a bug, or perhaps I'm misunderstanding 
something, but it seems like this should work:

void main()
{
	char[][] outBuf;
	auto f = File("testData.txt", "r");
	char[] buf;

	writeln("\n**** RAW OUTPUT *****");

	while (f.readln(buf))
	{
		write(buf);
		outBuf ~= buf;
	}

	writeln("\n**** BUFFERED OUTPUT *****");

	foreach (line; outBuf)
	{
		write(line);
	}
}

testData.txt is just a couple of lines of miscellaneous text. The 
expectation is that the raw output and the buffered output should 
be exactly the same... but they are not. (If anyone would like to 
see this for themselves, I stuck it in github: 
https://github.com/MrTact/CharBug.)

Changing the types of outBuf and buf to dchar works as expected. 
Changing outBuf to a string[] and appending buf.idup does as well.

Apr 30 2013

"anonymous" <anonymous example.com> writes:

On Wednesday, 1 May 2013 at 03:54:23 UTC, Tim Keating wrote:
 Not sure whether this is a bug, or perhaps I'm misunderstanding 
 something, but it seems like this should work:

 void main()
 {
 	char[][] outBuf;
 	auto f = File("testData.txt", "r");
 	char[] buf;

 	writeln("\n**** RAW OUTPUT *****");

 	while (f.readln(buf))
 	{
 		write(buf);
 		outBuf ~= buf;
 	}

 	writeln("\n**** BUFFERED OUTPUT *****");

 	foreach (line; outBuf)
 	{
 		write(line);
 	}
 }

 testData.txt is just a couple of lines of miscellaneous text. 
 The expectation is that the raw output and the buffered output 
 should be exactly the same... but they are not. (If anyone 
 would like to see this for themselves, I stuck it in github: 
 https://github.com/MrTact/CharBug.)

 Changing the types of outBuf and buf to dchar works as 
 expected. Changing outBuf to a string[] and appending buf.idup 
 does as well.

Just outBuf ~= buf.dup; works, too. Without .dup you're 
overwriting and appending the same chunk of memory again and 
again.
 From the documentation on File.readln
(<http://dlang.org/phobos/std_stdio#readln>): "Note that reusing 
the buffer means that the previous contents of it has to be 
copied if needed."
I'm a bit puzzled as for why it behaves differently with dchar.

Apr 30 2013

"Tim Keating" <mrtact gmail.com> writes:

On Wednesday, 1 May 2013 at 04:33:28 UTC, anonymous wrote:
 Just outBuf ~= buf.dup; works, too. Without .dup you're 
 overwriting and appending the same chunk of memory again and 
 again.
 From the documentation on File.readln
 (<http://dlang.org/phobos/std_stdio#readln>): "Note that 
 reusing the buffer means that the previous contents of it has 
 to be copied if needed."
 I'm a bit puzzled as for why it behaves differently with dchar.

Okay, that was obviously the bit I was missing. The dchar 
situation IS baffling -- if that hadn't worked, I would have been 
more certain I was simply doing something wrong.

May 01 2013

"Peter Alexander" <peter.alexander.au gmail.com> writes:

On Wednesday, 1 May 2013 at 13:56:48 UTC, Tim Keating wrote:
 On Wednesday, 1 May 2013 at 04:33:28 UTC, anonymous wrote:
 Just outBuf ~= buf.dup; works, too. Without .dup you're 
 overwriting and appending the same chunk of memory again and 
 again.
 From the documentation on File.readln
 (<http://dlang.org/phobos/std_stdio#readln>): "Note that 
 reusing the buffer means that the previous contents of it has 
 to be copied if needed."
 I'm a bit puzzled as for why it behaves differently with dchar.

 Okay, that was obviously the bit I was missing. The dchar 
 situation IS baffling -- if that hadn't worked, I would have 
 been more certain I was simply doing something wrong.

The wchar and dchar versions don't reuse the buffer. Not sure 
why. Here's the implementation, complete with relevant TODO

     size_t readln(C)(ref C[] buf, dchar terminator = '\n') if 
(isSomeChar!C && !is(C == enum))
     {
         static if (is(C == char))
         {
             enforce(_p && _p.handle, "Attempt to read from an 
unopened file.");
             return readlnImpl(_p.handle, buf, terminator);
         }
         else
         {
             // TODO: optimize this
             string s = readln(terminator);
             if (!s.length) return 0;
             buf.length = 0;
             foreach (wchar c; s)
             {
                 buf ~= c;
             }
             return buf.length;
         }
     }

Oh dear!

May 01 2013

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

	On Wed, 01 May 2013 13:19:16 -0700, Peter Alexander  
<peter.alexander.au gmail.com> wrote:


 The wchar and dchar versions don't reuse the buffer. Not sure why.  
 Here's the implementation, complete with relevant TODO

Wow, that is really awful.

Needs immediate improvement.  I would say the following code would be at  
least a bandaid-fix:

             ...
	    if(buf.length == buf.capacity) {
		buf.length = 0;
		buf.assumeSafeAppend();
             } else {
		buf.length = 0;
	    }
             foreach (wchar c; s)
             {
                 buf ~= c;
             }
             ...

Refactor as desired.

-Steve

May 01 2013

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Wed, 01 May 2013 06:56:47 -0700, Tim Keating <mrtact gmail.com> wrote:

 On Wednesday, 1 May 2013 at 04:33:28 UTC, anonymous wrote:
 Just outBuf ~= buf.dup; works, too. Without .dup you're overwriting and  
 appending the same chunk of memory again and again.
 From the documentation on File.readln
 (<http://dlang.org/phobos/std_stdio#readln>): "Note that reusing the  
 buffer means that the previous contents of it has to be copied if  
 needed."
 I'm a bit puzzled as for why it behaves differently with dchar.

 Okay, that was obviously the bit I was missing. The dchar situation IS  
 baffling -- if that hadn't worked, I would have been more certain I was  
 simply doing something wrong.

Note that it could have worked even with utf8 depending on the input  
file.  Although I agree the library code is not ideal, this is not an  
excuse ;)

-Steve

May 01 2013

D Programming

C/C++ Programming

Other

digitalmars.D - Appending char[] to char[][] has unexpected results