digitalmars.D.learn - Improving IO Speed

TJB (30/30) Mar 14 2014 I have a program in C++ that I am translating to D as a way to

bearophile (5/7) Mar 14 2014 I have never used readExact so far, so I don't have many

monarch_dodra (4/9) Mar 14 2014 Given he's using a raw read, I suspect he doesn't have a choice.

monarch_dodra (24/28) Mar 14 2014 I expect you'd get better performance with std.stdio rather than
Craig Dillabaugh (5/35) Mar 14 2014 I am not sure how std.stream buffers data (the library has been

TJB (9/52) Mar 14 2014 Well, one thing that I found out by experimentation was that if I

Artem Tarasov (1/1) Mar 15 2014 Did you try setvbuf method of std.stdio.File?

"Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> (11/18) Mar 21 2014 Won't help with speed, but you can write it with less repetition:
flamencofantasy (10/40) May 09 2014 Try this;

"TJB" <broughtj gmail.com> writes:

I have a program in C++ that I am translating to D as a way to 
investigate and learn D. The program is used to process 
potentially hundreds of TB's of financial transactions data so it 
is crucial that it be performant. Right now the C++ version is 
orders of magnitude faster.

Here is a simple example of what I am doing in D:

import std.stdio : writefln;
import std.stream;

align(1) struct TaqIdx
{
   align(1) char[10] symbol;
   align(1) int tdate;
   align(1) int begrec;
   align(1) int endrec;
}

void main()
{
   auto input = new File("T201212A.IDX");
   TaqIdx tmp;
   int count;

   while(!input.eof())
   {
     input.readExact(&tmp, TaqIdx.sizeof);
    // Do something with the data
   }
}

Do you have any suggestions for improving the speed in this 
situation?

Thank you!

TJB

Mar 14 2014

"bearophile" <bearophileHUGS lycos.com> writes:

TJB:

 Do you have any suggestions for improving the speed in this 
 situation?

I have never used readExact so far, so I don't have many 
suggestions. But try to not pack the struct.

Bye,
bearophile

Mar 14 2014

"monarch_dodra" <monarchdodra gmail.com> writes:

On Friday, 14 March 2014 at 18:26:36 UTC, bearophile wrote:
 TJB:

 Do you have any suggestions for improving the speed in this 
 situation?

 I have never used readExact so far, so I don't have many 
 suggestions. But try to not pack the struct.

Given he's using a raw read, I suspect he doesn't have a choice. 
That said, depending on how heavily the struct is used, he could 
unpack the struct post-rawRead.

Mar 14 2014

"monarch_dodra" <monarchdodra gmail.com> writes:

On Friday, 14 March 2014 at 18:00:58 UTC, TJB wrote:
 Do you have any suggestions for improving the speed in this 
 situation?

 Thank you!

 TJB

I expect you'd get better performance with std.stdio rather than 
std.stream. stream is class based and (AFAIK) not as optimized 
for performance.

I'd make it look like this:

void main()
{
   auto input = File("T201212A.IDX"); //Not a class

   TaqIdx tmp;
   ...



 From there, I'd use either of `byChunk` or `rawRead`, I don't 
know which is most efficient.

   TaqIdx[] buf = (&tmp)[0 .. 1];
   while (input.rawRead().length)
   {
     ...
   }

   or
   ubyte[] buf = (cast(ubyte*)&tmp)[0 .. TaqIdx.sizeof];
   foreach ( b ; file.byChunks(buf) )
   {
     ...
   }

Give it a try and see if it runs faster.

Mar 14 2014

"Craig Dillabaugh" <cdillaba cg.scs.carleton.ca> writes:

On Friday, 14 March 2014 at 18:00:58 UTC, TJB wrote:
 I have a program in C++ that I am translating to D as a way to 
 investigate and learn D. The program is used to process 
 potentially hundreds of TB's of financial transactions data so 
 it is crucial that it be performant. Right now the C++ version 
 is orders of magnitude faster.

 Here is a simple example of what I am doing in D:

 import std.stdio : writefln;
 import std.stream;

 align(1) struct TaqIdx
 {
   align(1) char[10] symbol;
   align(1) int tdate;
   align(1) int begrec;
   align(1) int endrec;
 }

 void main()
 {
   auto input = new File("T201212A.IDX");
   TaqIdx tmp;
   int count;

   while(!input.eof())
   {
     input.readExact(&tmp, TaqIdx.sizeof);
    // Do something with the data
   }
 }

 Do you have any suggestions for improving the speed in this 
 situation?

 Thank you!

 TJB

I am not sure how std.stream buffers data (the library has been 
marked for removal, so perhaps not very efficiently), but what 
happens if you read in a large array of your TaqIdx structs with 
each read.

Mar 14 2014

"TJB" <broughtj gmail.com> writes:

On Friday, 14 March 2014 at 19:11:12 UTC, Craig Dillabaugh wrote:
 On Friday, 14 March 2014 at 18:00:58 UTC, TJB wrote:
 I have a program in C++ that I am translating to D as a way to 
 investigate and learn D. The program is used to process 
 potentially hundreds of TB's of financial transactions data so 
 it is crucial that it be performant. Right now the C++ version 
 is orders of magnitude faster.

 Here is a simple example of what I am doing in D:

 import std.stdio : writefln;
 import std.stream;

 align(1) struct TaqIdx
 {
  align(1) char[10] symbol;
  align(1) int tdate;
  align(1) int begrec;
  align(1) int endrec;
 }

 void main()
 {
  auto input = new File("T201212A.IDX");
  TaqIdx tmp;
  int count;

  while(!input.eof())
  {
    input.readExact(&tmp, TaqIdx.sizeof);
   // Do something with the data
  }
 }

 Do you have any suggestions for improving the speed in this 
 situation?

 Thank you!

 TJB

 I am not sure how std.stream buffers data (the library has been 
 marked for removal, so perhaps not very efficiently), but what 
 happens if you read in a large array of your TaqIdx structs 
 with each read.

Well, one thing that I found out by experimentation was that if I 
replace

auto input = new File("T201212A.IDX");

with

auto input = new BufferedFile("T201212A.IDX");

The performance gap vanishes.  Now I have nearly identical 
execution times between the two codes.  But perhaps if std.stream 
is scheduled for removal I shouldn't be using it?

Mar 14 2014

"Artem Tarasov" <lomereiter gmail.com> writes:

Did you try setvbuf method of std.stdio.File?

Mar 15 2014

"Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> writes:

On Friday, 14 March 2014 at 18:00:58 UTC, TJB wrote:
 align(1) struct TaqIdx
 {
   align(1) char[10] symbol;
   align(1) int tdate;
   align(1) int begrec;
   align(1) int endrec;
 }

Won't help with speed, but you can write it with less repetition:

align(1) struct TaqIdx
{
align(1):
   char[10] symbol;
   int tdate;
   int begrec;
   int endrec;
}

The outer align(1) is still necessary to avoid the padding.

Mar 21 2014

"flamencofantasy" <flamencofantasy gmail.com> writes:

Try this;

import std.mmfile;
scope mmFile = new MmFile("T201212A.IDX");

TaqIdx* arr = cast(TaqIdx*)mmFile[0..mmFile.length].ptr;

for (ulong i = 0; i < mmFile.length/TaqIdx.sizeof; ++i)
{
     // do something...
     writeln(arr[i].symbol);
}


On Friday, 14 March 2014 at 18:00:58 UTC, TJB wrote:
 I have a program in C++ that I am translating to D as a way to 
 investigate and learn D. The program is used to process 
 potentially hundreds of TB's of financial transactions data so 
 it is crucial that it be performant. Right now the C++ version 
 is orders of magnitude faster.

 Here is a simple example of what I am doing in D:

 import std.stdio : writefln;
 import std.stream;

 align(1) struct TaqIdx
 {
   align(1) char[10] symbol;
   align(1) int tdate;
   align(1) int begrec;
   align(1) int endrec;
 }

 void main()
 {
   auto input = new File("T201212A.IDX");
   TaqIdx tmp;
   int count;

   while(!input.eof())
   {
     input.readExact(&tmp, TaqIdx.sizeof);
    // Do something with the data
   }
 }

 Do you have any suggestions for improving the speed in this 
 situation?

 Thank you!

 TJB

May 09 2014

D Programming

C/C++ Programming

Other

digitalmars.D.learn - Improving IO Speed