digitalmars.D.learn - file rawRead and rawWrite in chunks example
- Jay Norwood (35/35) Aug 08 2015 I'm playing around with the range based operations and with raw
- =?UTF-8?B?QWxpIMOHZWhyZWxp?= (32/34) Aug 08 2015 When the body of the foreach loop performs something, then
- Jay Norwood (5/12) Aug 08 2015 Thanks. There are many examples of numeric to string data output
- Jay Norwood (6/11) Aug 08 2015 Your f.rawRead(dbv) form compiles, but f.rawRead!(dbv) results in
- =?UTF-8?B?QWxpIMOHZWhyZWxp?= (18/28) Aug 08 2015 rawRead is a member function template with one template parameter and
- =?UTF-8?B?Ik5vcmRsw7Z3Ig==?= (11/12) Aug 09 2015 Now benchmarks write and read separately:
- =?UTF-8?B?Ik5vcmRsw7Z3Ig==?= (5/7) Aug 09 2015 Yes :)
- Jay Norwood (14/21) Aug 09 2015 So, to be clear, I think you must be saying that you want to
- Jay Norwood (29/33) Aug 09 2015 I benchmarked my first results:
I'm playing around with the range based operations and with raw file io. I couldn't figure out a way to get rid of the outer foreach loops. Nice execution time of 537 msec for this, which creates and reads back a file of about 160MB (20_000_000 doubles). import std.algorithm; import std.stdio; import std.conv; import std.math; import std.range; import std.file; import std.datetime; import std.array; void main() { auto fn = "numberList.db"; auto f = File(fn,"wb"); scope(exit) std.file.remove(fn); std.datetime.StopWatch sw; sw.start(); foreach(elem; chunks(iota(10.5,20_000_010.5,1.0),1000000)){ f.rawWrite(elem.array()); } f.close(); f = File(fn,"rb"); const int n = 1000000; double dbv[] = new double[n]; foreach(i; iota(10,20_000_000+10,n)){ f.rawRead!(double)(dbv); } f.close(); long tm = sw.peek().msecs; writeln("time msecs:", tm); }
Aug 08 2015
On 08/08/2015 04:11 PM, Jay Norwood wrote:I'm playing around with the range based operations and with raw file io. I couldn't figure out a way to get rid of the outer foreach loops.When the body of the foreach loop performs something, then std.algorithm.each can be useful: import std.algorithm; import std.stdio; import std.range; import std.datetime; void main() { auto fn = "numberList.db"; std.datetime.StopWatch sw; sw.start(); scope(exit) std.file.remove(fn); { auto f = File(fn,"wb"); iota(10.5, 20_000_010.5, 1.0) .chunks(1000000) .each!(a => f.rawWrite(a.array)); } { auto f = File(fn,"rb"); const int n = 1000000; // NOTE: D-style syntax on the left-hand side double[] dbv = new double[n]; // NOTE: No need to tell rawRead the type as double iota(10, 20_000_000 + 10, n) .each!(a => f.rawRead(dbv)); } long tm = sw.peek().msecs; writeln("time msecs:", tm); } Ali
Aug 08 2015
On Sunday, 9 August 2015 at 00:50:16 UTC, Ali Çehreli wrote:{ auto f = File(fn,"wb"); iota(10.5, 20_000_010.5, 1.0) .chunks(1000000) .each!(a => f.rawWrite(a.array)); } AliThanks. There are many examples of numeric to string data output in the docs, saving byLine. Those are on the order of 30x slower than this rawWrite example. This will be more useful to many people.
Aug 08 2015
On Sunday, 9 August 2015 at 00:50:16 UTC, Ali Çehreli wrote:// NOTE: No need to tell rawRead the type as double iota(10, 20_000_000 + 10, n) .each!(a => f.rawRead(dbv)); } AliYour f.rawRead(dbv) form compiles, but f.rawRead!(dbv) results in an error msg in compiler error in 2.067.1. The f.rawRead!(double)(dbv) form works. Error: template instance rawRead!(dbv) does not match template declaration rawRead(T)(T[] buffer)
Aug 08 2015
On 08/08/2015 07:07 PM, Jay Norwood wrote:On Sunday, 9 August 2015 at 00:50:16 UTC, Ali Çehreli wrote:rawRead is a member function template with one template parameter and one function parameter: T[] rawRead(T)(T[] buffer); The single template parameter T is the element type of its function parameter, which is a dynamic array. In this case, function template type deduction works and the template parameter need not be provided because dbv is of type double[] and it is obvious that T is double: f.rawRead(dbv) // <- compiles It is the same thing as proving T explicitly as double: f.rawRead!(double)(dbv) // <- compiles The code that does not compile has an error because it provides dbv as a template argument (because it is in the parameter list that comes right after !): f.rawRead!(dbv) // oops, dbv should be the function argument not the template argument Ali// NOTE: No need to tell rawRead the type as double iota(10, 20_000_000 + 10, n) .each!(a => f.rawRead(dbv)); } AliYour f.rawRead(dbv) form compiles, but f.rawRead!(dbv) results in an error msg in compiler error in 2.067.1. The f.rawRead!(double)(dbv) form works.
Aug 08 2015
On Sunday, 9 August 2015 at 00:50:16 UTC, Ali Çehreli wrote:AliNow benchmarks write and read separately: https://github.com/nordlow/justd/blob/0d746b2c1800a82a61a6cb7edcabfd9664066b2c/tests/t_rawio.d Couldn't the chunk logic be deduced aswell? Something like: void rawWriteInAutoChunks(R)(File f, R r) { const count = preferred_disk_write_size / sizeof(ElementType!R); return r.chunks(count).each!(a => f.rawWrite(a.array)); } What would a suitable value for `preferred_disk_write_size` be?
Aug 09 2015
On Sunday, 9 August 2015 at 10:40:06 UTC, Nordlöw wrote:Couldn't the chunk logic be deduced aswell?Yes :) See update at: https://github.com/nordlow/justd/blob/a633b52876388921ec49c189f374746f7b4d8c93/tests/t_rawio.dWhat would a suitable value for `preferred_disk_write_size` be?Is there a suitable constant somewhere in Phobos?
Aug 09 2015
On Sunday, 9 August 2015 at 11:06:34 UTC, Nordlöw wrote:On Sunday, 9 August 2015 at 10:40:06 UTC, Nordlöw wrote:So, to be clear, I think you must be saying that you want to specify the disk chunk size separate from the array size. Is that correct? I stepped through the original code (with the foreach loops) and I see single calls to fwrite and fread for each array. The rawWrite is executing a single fwrite per array f.rawWrite(elem.array()) auto result = .fwrite(buffer.ptr, T.sizeof, buffer.length, _p.handle); The rawRead is executing a sing fread per array immutable result = fread(buffer.ptr, T.sizeof, buffer.length, _p.handle);Couldn't the chunk logic be deduced aswell?Yes :) See update at: https://github.com/nordlow/justd/blob/a633b52876388921ec49c189f374746f7b4d8c93/tests/t_rawio.dWhat would a suitable value for `preferred_disk_write_size` be?Is there a suitable constant somewhere in Phobos?
Aug 09 2015
On Sunday, 9 August 2015 at 10:40:06 UTC, Nordlöw wrote:On Sunday, 9 August 2015 at 00:50:16 UTC, Ali Çehreli wrote:I benchmarked my first results: D:\visd\raw\raw\Release>raw time write msecs:457 time read msecs:75 This is for 160MB of data. The write includes initialization of the values. The read time is faster than my ssd drive, so I have to assume this is win7 or the ssd caching the data. If I increase double count to 200,000,000 (to 1.6GB of data), the times are: D:\visd\raw\raw\Release>raw time write msecs:7236 time read msecs:11979 08/09/2015 10:12 AM 1,600,000,000 numberList.db So that's around 220MB/sec for the writes and 133MB/sec for the reads. That's an intel 520 series 180GB ssd, but in an SATA 3Gb/s interface in a laptop. Sequential write speed for that ssd should be about 257MB/sec. Sequential read should be close to 395MB/sec for this drive on a 6Gb/sec SATA. So read speed is lower than I'd expect. If I move this program over to my work computer, the same 1.6GB measurement returns these times below on a Samsung 840 SSD, which is on a 6Gb/sec SATA interface. I believe the 458MB/sec write speeds. I suspect the read timing is again just measuring win7's cached data. J:\visd>raw time write msecs:3489 time read msecs:579AliNow benchmarks write and read separately:
Aug 09 2015