digitalmars.D - Range for files by character
- Stephan Schiffels (19/19) May 20 2013 Hi,
- w0rp (4/23) May 20 2013 I would try f.byChunk(n).joiner. joiner is from std.algorithm and
- Stephan Schiffels (5/33) May 20 2013 Ah, wonderful. That's exactly what I needed. I think that pretty
- Jonathan M Davis (13/35) May 20 2013 The reality is that what you're doing is horribly inefficient. You never...
Hi, I need an Input Range that iterates a file character by character. In bioinformatics this is often important, and having a D-range is of course preferable than any foreach-byLine combination, since we can apply filters and other goodies from std.algorithm. In this implementation, I am simply filtering out new-lines, as an example. import std.stdio; import std.conv; import std.algorithm; void main() { auto f = File("someFile.txt", "r"); foreach(c; f.byChunk(1).filter!(a => to!char(a[0]) != '\n')) write(to!char(c[0])); } Is this the right way to do it? I was a bit surprised that std.stdio doesn't provide a "byChar" or "byByte" range. Is there a reason for this, or is this a too special need? Stephan
May 20 2013
On Monday, 20 May 2013 at 21:36:41 UTC, Stephan Schiffels wrote:Hi, I need an Input Range that iterates a file character by character. In bioinformatics this is often important, and having a D-range is of course preferable than any foreach-byLine combination, since we can apply filters and other goodies from std.algorithm. In this implementation, I am simply filtering out new-lines, as an example. import std.stdio; import std.conv; import std.algorithm; void main() { auto f = File("someFile.txt", "r"); foreach(c; f.byChunk(1).filter!(a => to!char(a[0]) != '\n')) write(to!char(c[0])); } Is this the right way to do it? I was a bit surprised that std.stdio doesn't provide a "byChar" or "byByte" range. Is there a reason for this, or is this a too special need? StephanI would try f.byChunk(n).joiner. joiner is from std.algorithm and it produces a range which joins a range of ranges, quite like your typical array to string join function.
May 20 2013
On Monday, 20 May 2013 at 21:40:51 UTC, w0rp wrote:On Monday, 20 May 2013 at 21:36:41 UTC, Stephan Schiffels wrote:Ah, wonderful. That's exactly what I needed. I think that pretty much does what Jonathan suggested under the hood. I can also use byLine then, indeed... Thanks.Hi, I need an Input Range that iterates a file character by character. In bioinformatics this is often important, and having a D-range is of course preferable than any foreach-byLine combination, since we can apply filters and other goodies from std.algorithm. In this implementation, I am simply filtering out new-lines, as an example. import std.stdio; import std.conv; import std.algorithm; void main() { auto f = File("someFile.txt", "r"); foreach(c; f.byChunk(1).filter!(a => to!char(a[0]) != '\n')) write(to!char(c[0])); } Is this the right way to do it? I was a bit surprised that std.stdio doesn't provide a "byChar" or "byByte" range. Is there a reason for this, or is this a too special need? StephanI would try f.byChunk(n).joiner. joiner is from std.algorithm and it produces a range which joins a range of ranges, quite like your typical array to string join function.
May 20 2013
On Monday, May 20, 2013 23:36:39 Stephan Schiffels wrote:Hi, I need an Input Range that iterates a file character by character. In bioinformatics this is often important, and having a D-range is of course preferable than any foreach-byLine combination, since we can apply filters and other goodies from std.algorithm. In this implementation, I am simply filtering out new-lines, as an example. import std.stdio; import std.conv; import std.algorithm; void main() { auto f = File("someFile.txt", "r"); foreach(c; f.byChunk(1).filter!(a => to!char(a[0]) != '\n')) write(to!char(c[0])); } Is this the right way to do it? I was a bit surprised that std.stdio doesn't provide a "byChar" or "byByte" range. Is there a reason for this, or is this a too special need?The reality is that what you're doing is horribly inefficient. You never really want to read a file a byte at a time. You want to read more along the lines of kilobytes at a time and then process it byte by byte. And for that, you basically want streams, and work has been done in that area, but it's not complete yet. What you will probably need to do is create a range that wraps ByChunk so that the outer range returns a byte (or char) at a type, but the file gets read kilobytes at a time (it iterates over ByChunk's buffer until it hits the end and then pops off ByChunks front and starts at the front of the buffer again). And if you're stripping out newlines, you might as well just wrap ByLine instead of ByChunk, since that'll strip out the newlines for you. - Jonathan M Davis
May 20 2013