digitalmars.D - Range for files by character

Stephan Schiffels (19/19) May 20 2013 Hi,

w0rp (4/23) May 20 2013 I would try f.byChunk(n).joiner. joiner is from std.algorithm and

Stephan Schiffels (5/33) May 20 2013 Ah, wonderful. That's exactly what I needed. I think that pretty

Jonathan M Davis (13/35) May 20 2013 The reality is that what you're doing is horribly inefficient. You never...

"Stephan Schiffels" <stephan_schiffels mac.com> writes:

Hi,

I need an Input Range that iterates a file character by 
character. In bioinformatics this is often important, and having 
a D-range is of course preferable than any foreach-byLine 
combination, since we can apply filters and other goodies from 
std.algorithm. In this implementation, I am simply filtering out 
new-lines, as an example.

import std.stdio;
import std.conv;
import std.algorithm;

void main() {
   auto f = File("someFile.txt", "r");
   foreach(c; f.byChunk(1).filter!(a => to!char(a[0]) != '\n'))
     write(to!char(c[0]));
}

Is this the right way to do it? I was a bit surprised that 
std.stdio doesn't provide a "byChar" or "byByte" range. Is there 
a reason for this, or is this a too special need?

Stephan

May 20 2013

"w0rp" <devw0rp gmail.com> writes:

On Monday, 20 May 2013 at 21:36:41 UTC, Stephan Schiffels wrote:
 Hi,

 I need an Input Range that iterates a file character by 
 character. In bioinformatics this is often important, and 
 having a D-range is of course preferable than any 
 foreach-byLine combination, since we can apply filters and 
 other goodies from std.algorithm. In this implementation, I am 
 simply filtering out new-lines, as an example.

 import std.stdio;
 import std.conv;
 import std.algorithm;

 void main() {
   auto f = File("someFile.txt", "r");
   foreach(c; f.byChunk(1).filter!(a => to!char(a[0]) != '\n'))
     write(to!char(c[0]));
 }

 Is this the right way to do it? I was a bit surprised that 
 std.stdio doesn't provide a "byChar" or "byByte" range. Is 
 there a reason for this, or is this a too special need?

 Stephan

I would try f.byChunk(n).joiner. joiner is from std.algorithm and 
it produces a range which joins a range of ranges, quite like 
your typical array to string join function.

May 20 2013

"Stephan Schiffels" <stephan_schiffels mac.com> writes:

On Monday, 20 May 2013 at 21:40:51 UTC, w0rp wrote:
 On Monday, 20 May 2013 at 21:36:41 UTC, Stephan Schiffels wrote:
 Hi,

 I need an Input Range that iterates a file character by 
 character. In bioinformatics this is often important, and 
 having a D-range is of course preferable than any 
 foreach-byLine combination, since we can apply filters and 
 other goodies from std.algorithm. In this implementation, I am 
 simply filtering out new-lines, as an example.

 import std.stdio;
 import std.conv;
 import std.algorithm;

 void main() {
  auto f = File("someFile.txt", "r");
  foreach(c; f.byChunk(1).filter!(a => to!char(a[0]) != '\n'))
    write(to!char(c[0]));
 }

 Is this the right way to do it? I was a bit surprised that 
 std.stdio doesn't provide a "byChar" or "byByte" range. Is 
 there a reason for this, or is this a too special need?

 Stephan

 I would try f.byChunk(n).joiner. joiner is from std.algorithm 
 and it produces a range which joins a range of ranges, quite 
 like your typical array to string join function.

Ah, wonderful. That's exactly what I needed. I think that pretty 
much does what Jonathan suggested under the hood. I can also use 
byLine then, indeed...
Thanks.

May 20 2013

"Jonathan M Davis" <jmdavisProg gmx.com> writes:

On Monday, May 20, 2013 23:36:39 Stephan Schiffels wrote:
 Hi,
 
 I need an Input Range that iterates a file character by
 character. In bioinformatics this is often important, and having
 a D-range is of course preferable than any foreach-byLine
 combination, since we can apply filters and other goodies from
 std.algorithm. In this implementation, I am simply filtering out
 new-lines, as an example.
 
 import std.stdio;
 import std.conv;
 import std.algorithm;
 
 void main() {
 auto f = File("someFile.txt", "r");
 foreach(c; f.byChunk(1).filter!(a => to!char(a[0]) != '\n'))
 write(to!char(c[0]));
 }
 
 Is this the right way to do it? I was a bit surprised that
 std.stdio doesn't provide a "byChar" or "byByte" range. Is there
 a reason for this, or is this a too special need?

The reality is that what you're doing is horribly inefficient. You never really 
want to read a file a byte at a time. You want to read more along the lines of 
kilobytes at a time and then process it byte by byte. And for that, you 
basically want streams, and work has been done in that area, but it's not 
complete yet.

What you will probably need to do is create a range that wraps ByChunk so that 
the outer range returns a byte (or char) at a type, but the file gets read 
kilobytes at a time (it iterates over ByChunk's buffer until it hits the end 
and then pops off ByChunks front and starts at the front of the buffer again). 
And if you're stripping out newlines, you might as well just wrap ByLine 
instead of ByChunk, since that'll strip out the newlines for you.

- Jonathan M Davis

May 20 2013

D Programming

C/C++ Programming

Other

digitalmars.D - Range for files by character