www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - writef too slow?

reply Helmut Leitner <helmut.leitner wikiservice.at> writes:
I took a timeout 0.78-0.98 and when I came back I had 
missed whatever writef discussions there have been.

When I then, about a month ago, published the first dmake
version it contained lots of simple printf calls.
Contributors were quick to implement lots of useful
extensions and all of them also switched printf to writef
naturally. I was a bit ashamed for having missed that.

But now, as I'm preparing the next dmake version I did
a routine check for speed and found writef so slow in
comparision that I can hardly switch without hesitating:


  import std.stdio;
  import venus.benchmark;

  alias char[] String;

  int main(String [] args)
  {
    String s="World";
    double pi=3.14;

    DelegateBenchPrint(delegate void () { writef("writef - Hello, %s.
PI=%.2f!\n",s,pi); } ,"writef:
");
    DelegateBenchPrint(delegate void () { printf("printf - Hello, %.*s.
PI=%.2f!\n",s,pi); }
,"printf: ");
    return 0;
  }

gives 
  - about 3 usec for the printf version
  - about 12-16 usec for the writef version
when redirected to a file (900 MHz Athlon/Windows)

This factor 4-5 lets me hesitate to use writef, because 
it might actually make tools like dmake look bad.

What is the "official" situation with printf / writef?

Have you noticed a similar performance gap?

Is there a chance for a better implementation or is this
the OO price we just have to pay?

Are you willing to pay the prize?

-- 
Helmut Leitner    leitner hls.via.at
Graz, Austria   www.hls-software.com
Oct 15 2004
next sibling parent reply =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:
Helmut Leitner wrote:

 But now, as I'm preparing the next dmake version I did
 a routine check for speed and found writef so slow in
 comparision that I can hardly switch without hesitating:
[...]
  alias char[] String;
Shouldn't that be "string" ? "String" sounds like a class.
 This factor 4-5 lets me hesitate to use writef, because 
 it might actually make tools like dmake look bad.
I believe that the conversion has something to do with it: private void doFormatCallback(dchar c) { char[4] buf; char[] b; b = std.utf.toUTF8(buf, c); writeString(b); } This is called for each (UTF-32) character to be output... So it sounds more like an Unicode issue than an OOP one ? --anders
Oct 15 2004
next sibling parent reply Helmut Leitner <leitner hls.via.at> writes:
Anders F Björklund wrote:
 
 Helmut Leitner wrote:
 
 But now, as I'm preparing the next dmake version I did
 a routine check for speed and found writef so slow in
 comparision that I can hardly switch without hesitating:
[...]
  alias char[] String;
Shouldn't that be "string" ? "String" sounds like a class.
Depends on whether standards for this develop in the community. One could interpret lowercase as: - builtin - atom / scalar - not an object ..... As far as I see, there will never be a String class in D, but there will also never be an official statement about that... -- Helmut Leitner leitner hls.via.at Graz, Austria www.hls-software.com
Oct 15 2004
parent =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:
Helmut Leitner wrote:

 alias char[] String;
Shouldn't that be "string" ? "String" sounds like a class.
Depends on whether standards for this develop in the community.
My own humble suggestion was string for char[] and ustring for wchar[] --anders
Oct 15 2004
prev sibling parent reply "Ben Hinkle" <bhinkle mathworks.com> writes:
 This factor 4-5 lets me hesitate to use writef, because
 it might actually make tools like dmake look bad.
I believe that the conversion has something to do with it: private void doFormatCallback(dchar c) { char[4] buf; char[] b; b = std.utf.toUTF8(buf, c); writeString(b); } This is called for each (UTF-32) character to be output... So it sounds more like an Unicode issue than an OOP one ? --anders
A small correction: the doFormatCallback is actually in std.stream and not std.stdio but the one in std.stream is essentially the same as the one in std.stdio so it probably doesn't matter. I would try running a profiler like VTune to see where the time is being spent. Either that or throw in some performance monitoring code yourself. -Ben
Oct 15 2004
parent "Walter" <newshound digitalmars.com> writes:
"Ben Hinkle" <bhinkle mathworks.com> wrote in message
news:ckokq8$1uv3$1 digitaldaemon.com...
 A small correction: the doFormatCallback is actually in std.stream and not
 std.stdio but the one in std.stream is essentially the same as the one in
 std.stdio so it probably doesn't matter.
It isn't the same, the one in std.stdio has an opimization for ASCII in it. One probable culprit, however, is the fact that writef() does a lock for each character output, whereas printf() does one lock for the whole write.
Oct 15 2004
prev sibling next sibling parent Derek <derek psyc.ward> writes:
On Fri, 15 Oct 2004 13:05:57 +0200, Helmut Leitner wrote:

 I took a timeout 0.78-0.98 and when I came back I had 
 missed whatever writef discussions there have been.
 
 When I then, about a month ago, published the first dmake
 version it contained lots of simple printf calls.
 Contributors were quick to implement lots of useful
 extensions and all of them also switched printf to writef
 naturally. I was a bit ashamed for having missed that.
 
 But now, as I'm preparing the next dmake version I did
 a routine check for speed and found writef so slow in
 comparision that I can hardly switch without hesitating:
 
 
   import std.stdio;
   import venus.benchmark;
 
   alias char[] String;
 
   int main(String [] args)
   {
     String s="World";
     double pi=3.14;
 
     DelegateBenchPrint(delegate void () { writef("writef - Hello, %s.
PI=%.2f!\n",s,pi); } ,"writef:
 ");
     DelegateBenchPrint(delegate void () { printf("printf - Hello, %.*s.
PI=%.2f!\n",s,pi); }
 ,"printf: ");
     return 0;
   }
 
 gives 
   - about 3 usec for the printf version
   - about 12-16 usec for the writef version
 when redirected to a file (900 MHz Athlon/Windows)
 
 This factor 4-5 lets me hesitate to use writef, because 
 it might actually make tools like dmake look bad.
 
 What is the "official" situation with printf / writef?
 
 Have you noticed a similar performance gap?
 
 Is there a chance for a better implementation or is this
 the OO price we just have to pay?
 
 Are you willing to pay the prize?
I must be a *lot* more tolerant of /slow/ performance. On my machine, my extended dmake, with full verbose mode, compiles all of Phobos and creates a phobos library file in less than 2 seconds. Barely long enough for me to take a sip of coffee. So I guess the /slow/ performance of writef is not going to upset me too much. -- Derek Melbourne, Australia
Oct 15 2004
prev sibling next sibling parent Sean Kelly <sean f4.ca> writes:
Helmut Leitner wrote:
 This factor 4-5 lets me hesitate to use writef, because 
 it might actually make tools like dmake look bad.
 
 What is the "official" situation with printf / writef?
writef is meant to be the D replacement for printf. It has support for char, wchar, and dchar, as well as (I think) imaginary numbers.
 Have you noticed a similar performance gap?
Haven't tested it.
 Is there a chance for a better implementation or is this
 the OO price we just have to pay?
Looking at the code, I'm not sure it can get much more efficient. writef does a lot more than printf does, mostly to do with detecting the types of arguments. It then passes the result off to C functions for further processing, so perhaps if the entire implementation were done in D code performance may improve a bit, but I'm skeptical.
 Are you willing to pay the prize?
C++ IOStreams have similar performance issues, but the price tends to be worth it for the type safety and such that they provide. I feel the same way about writef, though obviously I'd prefer it be faster if possible. FWIW, I've implemented the input side (readf) that could bear some testing. The link is here: http://home.f4.ca/sean/d/stdio.zip I expect it may be even more than 4 times slower than scanf, as I convert everything to UTF-32 for processing. Sadly, I don't see any easy way around this particular issue, as it seems the only completely safe way to do pattern matching. If you've any suggestions, please let me know. Sean
Oct 15 2004
prev sibling next sibling parent "Ben Hinkle" <bhinkle mathworks.com> writes:
I pointed VTune at

import std.stdio;
alias char[] String;
void writeftest() {
    String s="World";
    double pi=3.14;
    writef("writef - Hello, %s. PI = %.2f.\n",s,pi);
}
void printftest() {
    String s="World";
    double pi=3.14;
    printf("printf - Hello, %.*s. PI = %.2f.\n",s,pi);
}
int main(String [] args)
  {
    String s="World";
    double pi=3.14;
    for (int k=0;k<10000;k++) writeftest();
    for (int k=0;k<10000;k++) printftest();
    return 0;
  }

and noticed that most of the time is being spent in __fp_lock and
__fp_unlock as writef calls fputc for every character and so it looks like
it has to lock and unlock the file reference every time. Probably some
buffering mechanism (using std.stream.BufferedFile or something) would get
the speed back to printf.

-Ben
Oct 15 2004
prev sibling parent reply "Ben Hinkle" <bhinkle mathworks.com> writes:
I've attached a modified version of std.stdio that builds up a buffer before
calling file I/O. I'd be curious if it speeds up your code. Replace
   import std.stdio;
replaced with
   import stdio2;
and compile in stdio2.d with your application.

Note the attached file doesn't handle the wide case. I only did the fputc
case. Also it uses a buffer size of 32 as a guess - bigger might waste time
initializing array elements that aren't used.

-Ben


begin 666 stdio2.d

M;6%R<RYC;VT-"B J(%!L86-E9"!I;B!T:&4 4'5B;&EC($1O;6%I;BX-"B J

M<FEV871E(&EM<&]R="!S=&0N9F]R;6%T.PT*<')I=F%T92!I;7!O<G0 <W1D
M+G5T9CL-"F-O;G-T(&EN="!"=69F97)3:7IE(#T
M=F]I9"!W<FET97 H1DE,12H 9G L(%1Y<&5);F9O6UT 87)G=6UE;G1S+"!V
M;VED*B!A<F=P='(L(&EN="!N97=L:6YE*0T*>R  (&EN="!O<FEE;G1A=&EO
M;CL-"B  ("!C:&%R6T)U9F9E<E-I>F5=(&)U9F9E<CL-"B  ("!I;G0 ;B ]


M('9O:60 8VAE8VM?8G5F9F5R*"D >PT*("  ("  ("!I9B H;B ]/2!"=69F
M97)3:7IE*2 -"B  ("  ("  >PT*("  ("  ("  ("  9FQU<VA?8G5F9F5R



M<'5T8RAD8VAA<B!C*0T*"7L-" D ("  :68 *&, /#T ,' W1BD-" D ("  
M>PT*("  ("  ("  ("  ("  (&)U9F9E<EMN*RM=(#T 8V%S="AC:&%R*6,[





M("  ("  ("!B=69F97);;BLK72 ](&-A<W0H8VAA<BEC.PT*("  ("  ("  
M("  ("  ("  ("!C:&5C:U]B=69F97(H*3L-"B\O"0D ("  <W1D+F,N<W1D
M:6\N9G!U=&,H8EMI72P 9G I.PT*("  ("  ("  ("  ("  ('T-" D ("  
M?0T*"7T-" T*"7-T9"YF;W)M870N9&]&;W)M870H)G!U=&,L(&%R9W5M96YT



M<FEE;G1A=&EO; T*("  ('L-" EV97)S:6]N("A7:6YD;W=S*0T*"7L-" D 





M(#T


M;'-E('9E<G-I;VX *&QI;G5X*0T*"7L-" D ("  =F]I9"!P=71C=RAD8VAA



M+"!A<F=P='(I.PT*"6EF("AN97=L:6YE*0T*"2  ("!S=&0N8RYS=&1I;RYF




M(&9W<FET968H1DE,12H 9G L("XN+BD-"GL-"B  ("!W<FET97 H9G L(%]A

M24Q%*B!F<"P +BXN*0T*>PT*("  ('=R:71E>"AF<"P 7V%R9W5M96YT<RP 

`
end
Oct 15 2004
parent reply Ben Hinkle <bhinkle4 juno.com> writes:
Ben Hinkle wrote:

 I've attached a modified version of std.stdio that builds up a buffer
 before calling file I/O. I'd be curious if it speeds up your code. Replace
    import std.stdio;
 replaced with
    import stdio2;
 and compile in stdio2.d with your application.
 
 Note the attached file doesn't handle the wide case. I only did the fputc
 case. Also it uses a buffer size of 32 as a guess - bigger might waste
 time initializing array elements that aren't used.
 
 -Ben
Looking at that code again my second modification in the non-ASCII cased is hosed. I just copy-and-pasted and forgot to update the variable name. So only call it with ASCII. I'll probably do some more experiments to see what buffer size is best and send something to Walter. -Ben
Oct 15 2004
parent reply "Walter" <newshound digitalmars.com> writes:
"Ben Hinkle" <bhinkle4 juno.com> wrote in message
news:ckplmr$2sr5$1 digitaldaemon.com...
 Ben Hinkle wrote:

 I've attached a modified version of std.stdio that builds up a buffer
 before calling file I/O. I'd be curious if it speeds up your code.
Replace
    import std.stdio;
 replaced with
    import stdio2;
 and compile in stdio2.d with your application.

 Note the attached file doesn't handle the wide case. I only did the
fputc
 case. Also it uses a buffer size of 32 as a guess - bigger might waste
 time initializing array elements that aren't used.

 -Ben
Looking at that code again my second modification in the non-ASCII cased
is
 hosed. I just copy-and-pasted and forgot to update the variable name. So
 only call it with ASCII. I'll probably do some more experiments to see
what
 buffer size is best and send something to Walter.
I don't believe the buffer size is the problem. The problem is that the lock should be surround all the calls to fputc, not embedded within fputc. Look at the implementation of printf in \dm\src\core\printf.c.
Oct 15 2004
parent reply "Ben Hinkle" <bhinkle mathworks.com> writes:
"Walter" <newshound digitalmars.com> wrote in message
news:ckq78u$6eo$1 digitaldaemon.com...
 "Ben Hinkle" <bhinkle4 juno.com> wrote in message
 news:ckplmr$2sr5$1 digitaldaemon.com...
 Ben Hinkle wrote:

 I've attached a modified version of std.stdio that builds up a buffer
 before calling file I/O. I'd be curious if it speeds up your code.
Replace
    import std.stdio;
 replaced with
    import stdio2;
 and compile in stdio2.d with your application.

 Note the attached file doesn't handle the wide case. I only did the
fputc
 case. Also it uses a buffer size of 32 as a guess - bigger might waste
 time initializing array elements that aren't used.

 -Ben
Looking at that code again my second modification in the non-ASCII cased
is
 hosed. I just copy-and-pasted and forgot to update the variable name. So
 only call it with ASCII. I'll probably do some more experiments to see
what
 buffer size is best and send something to Walter.
I don't believe the buffer size is the problem. The problem is that the
lock
 should be surround all the calls to fputc, not embedded within fputc. Look
 at the implementation of printf in \dm\src\core\printf.c.
I just meant the buffer size shouldn't be so big that the average time spent initialzing unused elements is greater than the average time spent flushing. Off the top of my head that size is probably pretty big but who knows - it is system dependent. I'll check out the printf code - where is it? I can't find it in the dmd.zip download.
Oct 16 2004
parent "Walter" <newshound digitalmars.com> writes:
"Ben Hinkle" <bhinkle mathworks.com> wrote in message
news:cks3uo$1oao$1 digitaldaemon.com...
 "Walter" <newshound digitalmars.com> wrote in message
 news:ckq78u$6eo$1 digitaldaemon.com...
 I don't believe the buffer size is the problem. The problem is that the
lock
 should be surround all the calls to fputc, not embedded within fputc.
Look
 at the implementation of printf in \dm\src\core\printf.c.
I just meant the buffer size shouldn't be so big that the average time
spent
 initialzing unused elements is greater than the average time spent
flushing.
 Off the top of my head that size is probably pretty big but who knows - it
 is system dependent. I'll check out the printf code - where is it? I can't
 find it in the dmd.zip download.
It's on the DMC++ CD in \dm\src\core\printf.c
Oct 16 2004