www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - file rawRead and rawWrite in chunks example

reply "Jay Norwood" <jayn prismnet.com> writes:
I'm playing around with the range based operations and with raw 
file io.  I couldn't figure out a way to get rid of the outer 
foreach loops.

Nice execution time of 537 msec for this, which creates and reads 
back a file of about 160MB (20_000_000 doubles).


import std.algorithm;
import std.stdio;
import std.conv;
import std.math;
import std.range;
import std.file;
import std.datetime;
import std.array;

void main()
{

	auto fn = "numberList.db";
	auto f = File(fn,"wb");
	scope(exit) std.file.remove(fn);
	std.datetime.StopWatch sw;
	sw.start();
	
	foreach(elem; chunks(iota(10.5,20_000_010.5,1.0),1000000)){
		f.rawWrite(elem.array());
	}
	f.close();
	f = File(fn,"rb");

	const int n = 1000000;
	double dbv[] = new double[n];
	foreach(i; iota(10,20_000_000+10,n)){
		f.rawRead!(double)(dbv);
	}

	f.close();
	long tm = sw.peek().msecs;
	writeln("time msecs:", tm);

}
Aug 08 2015
parent reply =?UTF-8?B?QWxpIMOHZWhyZWxp?= <acehreli yahoo.com> writes:
On 08/08/2015 04:11 PM, Jay Norwood wrote:
 I'm playing around with the range based operations and with raw file
 io.  I couldn't figure out a way to get rid of the outer foreach loops.
When the body of the foreach loop performs something, then std.algorithm.each can be useful: import std.algorithm; import std.stdio; import std.range; import std.datetime; void main() { auto fn = "numberList.db"; std.datetime.StopWatch sw; sw.start(); scope(exit) std.file.remove(fn); { auto f = File(fn,"wb"); iota(10.5, 20_000_010.5, 1.0) .chunks(1000000) .each!(a => f.rawWrite(a.array)); } { auto f = File(fn,"rb"); const int n = 1000000; // NOTE: D-style syntax on the left-hand side double[] dbv = new double[n]; // NOTE: No need to tell rawRead the type as double iota(10, 20_000_000 + 10, n) .each!(a => f.rawRead(dbv)); } long tm = sw.peek().msecs; writeln("time msecs:", tm); } Ali
Aug 08 2015
next sibling parent "Jay Norwood" <jayn prismnet.com> writes:
On Sunday, 9 August 2015 at 00:50:16 UTC, Ali Çehreli wrote:
     {
         auto f = File(fn,"wb");

         iota(10.5, 20_000_010.5, 1.0)
             .chunks(1000000)
             .each!(a => f.rawWrite(a.array));
     }

 Ali
Thanks. There are many examples of numeric to string data output in the docs, saving byLine. Those are on the order of 30x slower than this rawWrite example. This will be more useful to many people.
Aug 08 2015
prev sibling next sibling parent reply "Jay Norwood" <jayn prismnet.com> writes:
On Sunday, 9 August 2015 at 00:50:16 UTC, Ali Çehreli wrote:
         // NOTE: No need to tell rawRead the type as double
         iota(10, 20_000_000 + 10, n)
             .each!(a => f.rawRead(dbv));
     }

 Ali
Your f.rawRead(dbv) form compiles, but f.rawRead!(dbv) results in an error msg in compiler error in 2.067.1. The f.rawRead!(double)(dbv) form works. Error: template instance rawRead!(dbv) does not match template declaration rawRead(T)(T[] buffer)
Aug 08 2015
parent =?UTF-8?B?QWxpIMOHZWhyZWxp?= <acehreli yahoo.com> writes:
On 08/08/2015 07:07 PM, Jay Norwood wrote:

 On Sunday, 9 August 2015 at 00:50:16 UTC, Ali Çehreli wrote:
         // NOTE: No need to tell rawRead the type as double
         iota(10, 20_000_000 + 10, n)
             .each!(a => f.rawRead(dbv));
     }

 Ali
Your f.rawRead(dbv) form compiles, but f.rawRead!(dbv) results in an error msg in compiler error in 2.067.1. The f.rawRead!(double)(dbv) form works.
rawRead is a member function template with one template parameter and one function parameter: T[] rawRead(T)(T[] buffer); The single template parameter T is the element type of its function parameter, which is a dynamic array. In this case, function template type deduction works and the template parameter need not be provided because dbv is of type double[] and it is obvious that T is double: f.rawRead(dbv) // <- compiles It is the same thing as proving T explicitly as double: f.rawRead!(double)(dbv) // <- compiles The code that does not compile has an error because it provides dbv as a template argument (because it is in the parameter list that comes right after !): f.rawRead!(dbv) // oops, dbv should be the function argument not the template argument Ali
Aug 08 2015
prev sibling parent reply =?UTF-8?B?Ik5vcmRsw7Z3Ig==?= <per.nordlow gmail.com> writes:
On Sunday, 9 August 2015 at 00:50:16 UTC, Ali Çehreli wrote:
 Ali
Now benchmarks write and read separately: https://github.com/nordlow/justd/blob/0d746b2c1800a82a61a6cb7edcabfd9664066b2c/tests/t_rawio.d Couldn't the chunk logic be deduced aswell? Something like: void rawWriteInAutoChunks(R)(File f, R r) { const count = preferred_disk_write_size / sizeof(ElementType!R); return r.chunks(count).each!(a => f.rawWrite(a.array)); } What would a suitable value for `preferred_disk_write_size` be?
Aug 09 2015
next sibling parent reply =?UTF-8?B?Ik5vcmRsw7Z3Ig==?= <per.nordlow gmail.com> writes:
On Sunday, 9 August 2015 at 10:40:06 UTC, Nordlöw wrote:
 Couldn't the chunk logic be deduced aswell?
Yes :) See update at: https://github.com/nordlow/justd/blob/a633b52876388921ec49c189f374746f7b4d8c93/tests/t_rawio.d
 What would a suitable value for `preferred_disk_write_size` be?
Is there a suitable constant somewhere in Phobos?
Aug 09 2015
parent "Jay Norwood" <jayn prismnet.com> writes:
On Sunday, 9 August 2015 at 11:06:34 UTC, Nordlöw wrote:
 On Sunday, 9 August 2015 at 10:40:06 UTC, Nordlöw wrote:
 Couldn't the chunk logic be deduced aswell?
Yes :) See update at: https://github.com/nordlow/justd/blob/a633b52876388921ec49c189f374746f7b4d8c93/tests/t_rawio.d
 What would a suitable value for `preferred_disk_write_size` be?
Is there a suitable constant somewhere in Phobos?
So, to be clear, I think you must be saying that you want to specify the disk chunk size separate from the array size. Is that correct? I stepped through the original code (with the foreach loops) and I see single calls to fwrite and fread for each array. The rawWrite is executing a single fwrite per array f.rawWrite(elem.array()) auto result = .fwrite(buffer.ptr, T.sizeof, buffer.length, _p.handle); The rawRead is executing a sing fread per array immutable result = fread(buffer.ptr, T.sizeof, buffer.length, _p.handle);
Aug 09 2015
prev sibling parent "Jay Norwood" <jayn prismnet.com> writes:
On Sunday, 9 August 2015 at 10:40:06 UTC, Nordlöw wrote:
 On Sunday, 9 August 2015 at 00:50:16 UTC, Ali Çehreli wrote:
 Ali
Now benchmarks write and read separately:
I benchmarked my first results: D:\visd\raw\raw\Release>raw time write msecs:457 time read msecs:75 This is for 160MB of data. The write includes initialization of the values. The read time is faster than my ssd drive, so I have to assume this is win7 or the ssd caching the data. If I increase double count to 200,000,000 (to 1.6GB of data), the times are: D:\visd\raw\raw\Release>raw time write msecs:7236 time read msecs:11979 08/09/2015 10:12 AM 1,600,000,000 numberList.db So that's around 220MB/sec for the writes and 133MB/sec for the reads. That's an intel 520 series 180GB ssd, but in an SATA 3Gb/s interface in a laptop. Sequential write speed for that ssd should be about 257MB/sec. Sequential read should be close to 395MB/sec for this drive on a 6Gb/sec SATA. So read speed is lower than I'd expect. If I move this program over to my work computer, the same 1.6GB measurement returns these times below on a Samsung 840 SSD, which is on a 6Gb/sec SATA interface. I believe the 458MB/sec write speeds. I suspect the read timing is again just measuring win7's cached data. J:\visd>raw time write msecs:3489 time read msecs:579
Aug 09 2015