digitalmars.D.learn - file rawRead and rawWrite in chunks example

Jay Norwood (35/35) Aug 08 2015 I'm playing around with the range based operations and with raw

=?UTF-8?B?QWxpIMOHZWhyZWxp?= (32/34) Aug 08 2015 When the body of the foreach loop performs something, then

Jay Norwood (5/12) Aug 08 2015 Thanks. There are many examples of numeric to string data output
Jay Norwood (6/11) Aug 08 2015 Your f.rawRead(dbv) form compiles, but f.rawRead!(dbv) results in

=?UTF-8?B?QWxpIMOHZWhyZWxp?= (18/28) Aug 08 2015 rawRead is a member function template with one template parameter and

=?UTF-8?B?Ik5vcmRsw7Z3Ig==?= (11/12) Aug 09 2015 Now benchmarks write and read separately:

=?UTF-8?B?Ik5vcmRsw7Z3Ig==?= (5/7) Aug 09 2015 Yes :)

Jay Norwood (14/21) Aug 09 2015 So, to be clear, I think you must be saying that you want to

Jay Norwood (29/33) Aug 09 2015 I benchmarked my first results:

"Jay Norwood" <jayn prismnet.com> writes:

I'm playing around with the range based operations and with raw 
file io.  I couldn't figure out a way to get rid of the outer 
foreach loops.

Nice execution time of 537 msec for this, which creates and reads 
back a file of about 160MB (20_000_000 doubles).


import std.algorithm;
import std.stdio;
import std.conv;
import std.math;
import std.range;
import std.file;
import std.datetime;
import std.array;

void main()
{

	auto fn = "numberList.db";
	auto f = File(fn,"wb");
	scope(exit) std.file.remove(fn);
	std.datetime.StopWatch sw;
	sw.start();
	
	foreach(elem; chunks(iota(10.5,20_000_010.5,1.0),1000000)){
		f.rawWrite(elem.array());
	}
	f.close();
	f = File(fn,"rb");

	const int n = 1000000;
	double dbv[] = new double[n];
	foreach(i; iota(10,20_000_000+10,n)){
		f.rawRead!(double)(dbv);
	}

	f.close();
	long tm = sw.peek().msecs;
	writeln("time msecs:", tm);

}

Aug 08 2015

=?UTF-8?B?QWxpIMOHZWhyZWxp?= <acehreli yahoo.com> writes:

On 08/08/2015 04:11 PM, Jay Norwood wrote:
 I'm playing around with the range based operations and with raw file
 io.  I couldn't figure out a way to get rid of the outer foreach loops.

When the body of the foreach loop performs something, then 
std.algorithm.each can be useful:

import std.algorithm;
import std.stdio;
import std.range;
import std.datetime;

void main()
{

     auto fn = "numberList.db";

     std.datetime.StopWatch sw;
     sw.start();

     scope(exit) std.file.remove(fn);

     {
         auto f = File(fn,"wb");

         iota(10.5, 20_000_010.5, 1.0)
             .chunks(1000000)
             .each!(a => f.rawWrite(a.array));
     }

     {
         auto f = File(fn,"rb");

         const int n = 1000000;

         // NOTE: D-style syntax on the left-hand side
         double[] dbv = new double[n];

         // NOTE: No need to tell rawRead the type as double
         iota(10, 20_000_000 + 10, n)
             .each!(a => f.rawRead(dbv));
     }

     long tm = sw.peek().msecs;
     writeln("time msecs:", tm);
}

Ali

Aug 08 2015

"Jay Norwood" <jayn prismnet.com> writes:

On Sunday, 9 August 2015 at 00:50:16 UTC, Ali Çehreli wrote:
     {
         auto f = File(fn,"wb");

         iota(10.5, 20_000_010.5, 1.0)
             .chunks(1000000)
             .each!(a => f.rawWrite(a.array));
     }

 Ali

Thanks. There are many examples of numeric to string data output 
in the docs, saving byLine.   Those are on the order of 30x 
slower than this rawWrite example.  This will be more useful to 
many people.

Aug 08 2015

"Jay Norwood" <jayn prismnet.com> writes:

On Sunday, 9 August 2015 at 00:50:16 UTC, Ali Çehreli wrote:
         // NOTE: No need to tell rawRead the type as double
         iota(10, 20_000_000 + 10, n)
             .each!(a => f.rawRead(dbv));
     }

 Ali

Your f.rawRead(dbv) form compiles, but f.rawRead!(dbv) results in 
an error msg in  compiler error in 2.067.1. The 
f.rawRead!(double)(dbv) form works.

Error: template instance rawRead!(dbv) does not match template 
declaration rawRead(T)(T[] buffer)

Aug 08 2015

=?UTF-8?B?QWxpIMOHZWhyZWxp?= <acehreli yahoo.com> writes:

On 08/08/2015 07:07 PM, Jay Norwood wrote:

 On Sunday, 9 August 2015 at 00:50:16 UTC, Ali Çehreli wrote:
         // NOTE: No need to tell rawRead the type as double
         iota(10, 20_000_000 + 10, n)
             .each!(a => f.rawRead(dbv));
     }

 Ali

 Your f.rawRead(dbv) form compiles, but f.rawRead!(dbv) results in an
 error msg in  compiler error in 2.067.1. The f.rawRead!(double)(dbv)
 form works.

rawRead is a member function template with one template parameter and 
one function parameter:

     T[] rawRead(T)(T[] buffer);

The single template parameter T is the element type of its function 
parameter, which is a dynamic array.

In this case, function template type deduction works and the template 
parameter need not be provided because dbv is of type double[] and it is 
obvious that T is double:

     f.rawRead(dbv)    // <- compiles

It is the same thing as proving T explicitly as double:

     f.rawRead!(double)(dbv)    // <- compiles

The code that does not compile has an error because it provides dbv as a 
template argument (because it is in the parameter list that comes right 
after !):

     f.rawRead!(dbv)    // oops, dbv should be the function argument not 
the template argument

Ali

Aug 08 2015

=?UTF-8?B?Ik5vcmRsw7Z3Ig==?= <per.nordlow gmail.com> writes:

On Sunday, 9 August 2015 at 00:50:16 UTC, Ali Çehreli wrote:
 Ali

Now benchmarks write and read separately:

https://github.com/nordlow/justd/blob/0d746b2c1800a82a61a6cb7edcabfd9664066b2c/tests/t_rawio.d

Couldn't the chunk logic be deduced aswell? Something like:

void rawWriteInAutoChunks(R)(File f, R r)
{
     const count = preferred_disk_write_size / 
sizeof(ElementType!R);
     return r.chunks(count).each!(a => f.rawWrite(a.array));
}

What would a suitable value for `preferred_disk_write_size` be?

Aug 09 2015

=?UTF-8?B?Ik5vcmRsw7Z3Ig==?= <per.nordlow gmail.com> writes:

On Sunday, 9 August 2015 at 10:40:06 UTC, Nordlöw wrote:
 Couldn't the chunk logic be deduced aswell?

Yes :)

See update at:

https://github.com/nordlow/justd/blob/a633b52876388921ec49c189f374746f7b4d8c93/tests/t_rawio.d

 What would a suitable value for `preferred_disk_write_size` be?

Is there a suitable constant somewhere in Phobos?

Aug 09 2015

"Jay Norwood" <jayn prismnet.com> writes:

On Sunday, 9 August 2015 at 11:06:34 UTC, Nordlöw wrote:
 On Sunday, 9 August 2015 at 10:40:06 UTC, Nordlöw wrote:
 Couldn't the chunk logic be deduced aswell?

 Yes :)

 See update at:

 https://github.com/nordlow/justd/blob/a633b52876388921ec49c189f374746f7b4d8c93/tests/t_rawio.d

 What would a suitable value for `preferred_disk_write_size` be?

 Is there a suitable constant somewhere in Phobos?

So, to be clear, I think you must be saying that you want to 
specify the disk chunk size separate from the array size.  Is 
that correct?

I stepped through the original code (with the foreach loops) and 
I see single calls to fwrite and fread for each array.

The rawWrite is executing a single fwrite per array
f.rawWrite(elem.array())

         auto result =
             .fwrite(buffer.ptr, T.sizeof, buffer.length, 
_p.handle);

The rawRead is executing a sing fread per array
         immutable result =
             fread(buffer.ptr, T.sizeof, buffer.length, _p.handle);

Aug 09 2015

"Jay Norwood" <jayn prismnet.com> writes:

On Sunday, 9 August 2015 at 10:40:06 UTC, Nordlöw wrote:
 On Sunday, 9 August 2015 at 00:50:16 UTC, Ali Çehreli wrote:
 Ali

 Now benchmarks write and read separately:

 

I benchmarked my first results:
D:\visd\raw\raw\Release>raw
time write msecs:457
time read msecs:75

This is for 160MB of data. The write includes initialization of 
the values.

The read time is faster than my ssd drive, so I have to assume 
this is win7 or the ssd caching the data.

If I increase double count to 200,000,000 (to 1.6GB of data), the 
times are:
D:\visd\raw\raw\Release>raw
time write msecs:7236
time read msecs:11979

08/09/2015  10:12 AM     1,600,000,000 numberList.db

So that's around 220MB/sec for the writes and 133MB/sec for the 
reads.  That's an intel 520 series 180GB ssd, but in an SATA 
3Gb/s interface in a laptop.  Sequential write speed for that ssd 
should be about 257MB/sec.  Sequential read should be close to 
395MB/sec for this drive on a 6Gb/sec SATA.   So read speed is 
lower than I'd expect.

If I move this program over to my work computer, the same 1.6GB 
measurement returns these times below on a Samsung 840 SSD, which 
is on a 6Gb/sec SATA interface.  I believe the 458MB/sec write 
speeds. I suspect the read timing is again just measuring win7's 
cached data.

J:\visd>raw
time write msecs:3489
time read msecs:579

Aug 09 2015

D Programming

C/C++ Programming

Other

digitalmars.D.learn - file rawRead and rawWrite in chunks example