digitalmars.D - File.byLine should return dups?

Graham Fawcett (35/35) Jun 04 2010 Hi folks,

Steven Schveighoffer (12/42) Jun 04 2010 The latter. File is re-using the buffer for each line, so you are seein...

Graham Fawcett (7/60) Jun 04 2010 Thanks for the response. Yes, it makes mode to favour performance for

Andrei Alexandrescu (4/11) Jun 04 2010 I wonder what we can do to automatically reject such programs.

Graham Fawcett (3/17) Jun 04 2010 isForwardPersistentRange vs. isForwardVolatileRange?
Steven Schveighoffer (6/18) Jun 05 2010 Train people that being given a mutable buffer does not mean you own it?...

Graham Fawcett (6/75) Jun 04 2010 It just occurred to me that duper/iduper could be defined as

Graham Fawcett <fawcett uwindsor.ca> writes:

Hi folks,

I expected this program to print the lines of a file in reverse order:

    // bad.d
    import std.stdio;
    import std.range;
    
    void main() {
      auto f = File("bad.d");
      foreach(char[] line; retro(array(f.byLine())))
        writeln(line);
    }

...but instead it prints a jumble of line fragments. I suspect that
it's because byLine() is not dup'ing the arrays it's reading from the
file?

This version works:

    // good.d
    import std.stdio;
    import std.range;
    
    Retro!(char[][]) retroLines(File f) {
     char[][] lines;
      foreach(line; f.byLine())
        lines ~= line.dup;		// note the .dup!
      return retro(lines);
    }
    
    void main() {
      auto f = File("good.d");
      foreach(line; retroLines(f))
        writeln(line);
    }

..but if you remove the '.dup' then it prints a similar mess.

So, is there a bug in byLine(), or am I just using it wrong?

Best,
Graham

Jun 04 2010

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Fri, 04 Jun 2010 15:35:06 -0400, Graham Fawcett <fawcett uwindsor.ca>  
wrote:

 Hi folks,

 I expected this program to print the lines of a file in reverse order:

     // bad.d
     import std.stdio;
     import std.range;
    void main() {
       auto f = File("bad.d");
       foreach(char[] line; retro(array(f.byLine())))
         writeln(line);
     }

 ...but instead it prints a jumble of line fragments. I suspect that
 it's because byLine() is not dup'ing the arrays it's reading from the
 file?

 This version works:

     // good.d
     import std.stdio;
     import std.range;
    Retro!(char[][]) retroLines(File f) {
      char[][] lines;
       foreach(line; f.byLine())
         lines ~= line.dup;		// note the .dup!
       return retro(lines);
     }
    void main() {
       auto f = File("good.d");
       foreach(line; retroLines(f))
         writeln(line);
     }

 ..but if you remove the '.dup' then it prints a similar mess.

 So, is there a bug in byLine(), or am I just using it wrong?

The latter.  File is re-using the buffer for each line, so you are seeing  
the data get overwritten.  This is for performance reasons.  Not everyone  
wants incur heap allocations for every line of a file   As you showed,  
it's possible to get the desired behavior if you need it.  The reverse  
would be impossible.

Now, that being said, a nice addition would be to create a duper range  
that allows you to do one expression:

foreach(char[] line; retro(array(duper(f.byLine()))))

-Steve

Copied from my .announce reply :)

Jun 04 2010

Graham Fawcett <fawcett uwindsor.ca> writes:

Hi Steven,

On Fri, 04 Jun 2010 15:43:18 -0400, Steven Schveighoffer wrote:

 On Fri, 04 Jun 2010 15:35:06 -0400, Graham Fawcett <fawcett uwindsor.ca>
 wrote:
 
 Hi folks,

 I expected this program to print the lines of a file in reverse order:

     // bad.d
     import std.stdio;
     import std.range;
    void main() {
       auto f = File("bad.d");
       foreach(char[] line; retro(array(f.byLine())))
         writeln(line);
     }

 ...but instead it prints a jumble of line fragments. I suspect that
 it's because byLine() is not dup'ing the arrays it's reading from the
 file?

 This version works:

     // good.d
     import std.stdio;
     import std.range;
    Retro!(char[][]) retroLines(File f) {
      char[][] lines;
       foreach(line; f.byLine())
         lines ~= line.dup;		// note the .dup!
       return retro(lines);
     }
    void main() {
       auto f = File("good.d");
       foreach(line; retroLines(f))
         writeln(line);
     }

 ..but if you remove the '.dup' then it prints a similar mess.

 So, is there a bug in byLine(), or am I just using it wrong?

 
 The latter.  File is re-using the buffer for each line, so you are
 seeing the data get overwritten.  This is for performance reasons.  Not
 everyone wants incur heap allocations for every line of a file   As you
 showed, it's possible to get the desired behavior if you need it.  The
 reverse would be impossible.
 

Thanks for the response. Yes, it makes mode to favour performance for
the common case, and as you say it's not hard to resolve this issue.

 Now, that being said, a nice addition would be to create a duper range
 that allows you to do one expression:

 foreach(char[] line; retro(array(duper(f.byLine()))))

Yes -- duper (and iduper) for the win! Great idea.

 -Steve
 
 Copied from my .announce reply :)

Sorry again for the .announce posting. :P

Graham

Jun 04 2010

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 06/04/2010 02:58 PM, Graham Fawcett wrote:
[snip]
 Thanks for the response. Yes, it makes mode to favour performance for
 the common case, and as you say it's not hard to resolve this issue.

 Now, that being said, a nice addition would be to create a duper range
 that allows you to do one expression:

 foreach(char[] line; retro(array(duper(f.byLine()))))

 Yes -- duper (and iduper) for the win! Great idea.

I wonder what we can do to automatically reject such programs.

Andrei

Jun 04 2010

Graham Fawcett <fawcett uwindsor.ca> writes:

On Fri, 04 Jun 2010 15:04:56 -0500, Andrei Alexandrescu wrote:

 On 06/04/2010 02:58 PM, Graham Fawcett wrote: [snip]
 Thanks for the response. Yes, it makes mode to favour performance for
 the common case, and as you say it's not hard to resolve this issue.

 Now, that being said, a nice addition would be to create a duper range
 that allows you to do one expression:

 foreach(char[] line; retro(array(duper(f.byLine()))))

 Yes -- duper (and iduper) for the win! Great idea.

 
 I wonder what we can do to automatically reject such programs.
 
 Andrei

isForwardPersistentRange vs. isForwardVolatileRange?

Graham

Jun 04 2010

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Fri, 04 Jun 2010 16:04:56 -0400, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 On 06/04/2010 02:58 PM, Graham Fawcett wrote:
 [snip]
 Thanks for the response. Yes, it makes mode to favour performance for
 the common case, and as you say it's not hard to resolve this issue.

 Now, that being said, a nice addition would be to create a duper range
 that allows you to do one expression:

 foreach(char[] line; retro(array(duper(f.byLine()))))

 Yes -- duper (and iduper) for the win! Great idea.

 I wonder what we can do to automatically reject such programs.

Train people that being given a mutable buffer does not mean you own it? :)

Seriously though, it's just something you learn.  I don't think there's  
anything the compiler can do to help.

-Steve

Jun 05 2010

Graham Fawcett <fawcett uwindsor.ca> writes:

On Fri, 04 Jun 2010 19:58:43 +0000, Graham Fawcett wrote:

 Hi Steven,
 
 On Fri, 04 Jun 2010 15:43:18 -0400, Steven Schveighoffer wrote:
 
 On Fri, 04 Jun 2010 15:35:06 -0400, Graham Fawcett
 <fawcett uwindsor.ca> wrote:
 
 Hi folks,

 I expected this program to print the lines of a file in reverse order:

     // bad.d
     import std.stdio;
     import std.range;
    void main() {
       auto f = File("bad.d");
       foreach(char[] line; retro(array(f.byLine())))
         writeln(line);
     }

 ...but instead it prints a jumble of line fragments. I suspect that
 it's because byLine() is not dup'ing the arrays it's reading from the
 file?

 This version works:

     // good.d
     import std.stdio;
     import std.range;
    Retro!(char[][]) retroLines(File f) {
      char[][] lines;
       foreach(line; f.byLine())
         lines ~= line.dup;		// note the .dup!
       return retro(lines);
     }
    void main() {
       auto f = File("good.d");
       foreach(line; retroLines(f))
         writeln(line);
     }

 ..but if you remove the '.dup' then it prints a similar mess.

 So, is there a bug in byLine(), or am I just using it wrong?

 
 The latter.  File is re-using the buffer for each line, so you are
 seeing the data get overwritten.  This is for performance reasons.  Not
 everyone wants incur heap allocations for every line of a file   As you
 showed, it's possible to get the desired behavior if you need it.  The
 reverse would be impossible.
 
 

 Thanks for the response. Yes, it makes mode to favour performance for
 the common case, and as you say it's not hard to resolve this issue.
 
 Now, that being said, a nice addition would be to create a duper range
 that allows you to do one expression:

 foreach(char[] line; retro(array(duper(f.byLine()))))

 
 Yes -- duper (and iduper) for the win! Great idea.

It just occurred to me that duper/iduper could be defined as

alias map!("a.dup") duper;
alias map!("a.idup") iduper;

Thanks for suggesting duper, I like that idea very much. :)

Graham


 
 -Steve
 
 Copied from my .announce reply :)

 
 Sorry again for the .announce posting. :P
 
 Graham

Jun 04 2010

D Programming

C/C++ Programming

Other

digitalmars.D - File.byLine should return dups?