Digital Mars - c++ - stdio very slow

↑ ↓ ← → Heinz Saathoff <hsaat despammed.com> writes:

Hello,

I wrote a small program (NT console app) to search for filenames in my 
Eudora mailbox files. The first attempt was to use (fopen, fgetc, 
flcose) functions, files were opened in binary mode. The resulting 
program worked but took a long time to run. Ok, might be my straight-
forward naive algorithm. Then I experimented with memory mapped files 
and the otherwise same program was very fast. Ok, might be the OS 
overhead, so I tried the (_open, _read, _close) functions with a own 
small buffering (1K buffer). It was a bit slower than the memory mapped 
approch but still very fast. Here are the times measured:

stdio :   14.8 seconds
_read :    1.8 seconds
mmap  :    0.8 seconds

Not that it matters for my small program, but I'm wondering why the 
stdio fgetc function takes so much time. 


- Heinz

Aug 12 2004

↑ ↓ ← → "Walter" <newshound digitalmars.com> writes:

It could be that the optimal buffer size you need is not the default one
used by stdio.

"Heinz Saathoff" <hsaat despammed.com> wrote in message
news:MPG.1b85610ac0558da69896e3 news.digitalmars.com...
 Hello,

 I wrote a small program (NT console app) to search for filenames in my
 Eudora mailbox files. The first attempt was to use (fopen, fgetc,
 flcose) functions, files were opened in binary mode. The resulting
 program worked but took a long time to run. Ok, might be my straight-
 forward naive algorithm. Then I experimented with memory mapped files
 and the otherwise same program was very fast. Ok, might be the OS
 overhead, so I tried the (_open, _read, _close) functions with a own
 small buffering (1K buffer). It was a bit slower than the memory mapped
 approch but still very fast. Here are the times measured:

 stdio :   14.8 seconds
 _read :    1.8 seconds
 mmap  :    0.8 seconds

 Not that it matters for my small program, but I'm wondering why the
 stdio fgetc function takes so much time.


 - Heinz

Aug 12 2004

↑ ↓ ← → Heinz Saathoff <hsaat despammed.com> writes:

Hello Walter,

Walter wrote ...
 It could be that the optimal buffer size you need is not the default one
 used by stdio.


I only do a sequential read using fgetc. As far as I know stdio also 
uses a buffer of at least 512 bytes. 
Decreasing the buffer size from 1024 byte to 512 byte in my 
(_open,_read,_close)-buffered version increases the runtime from 1.8 
seconds to 1.95 seconds. That's not too much for the smallest praxtical 
buffer size.
 
 "Heinz Saathoff" <hsaat despammed.com> wrote in message
 news:MPG.1b85610ac0558da69896e3 news.digitalmars.com...
 Hello,

 I wrote a small program (NT console app) to search for filenames in my
 Eudora mailbox files. The first attempt was to use (fopen, fgetc,
 flcose) functions, files were opened in binary mode. The resulting
 program worked but took a long time to run. Ok, might be my straight-
 forward naive algorithm. Then I experimented with memory mapped files
 and the otherwise same program was very fast. Ok, might be the OS
 overhead, so I tried the (_open, _read, _close) functions with a own
 small buffering (1K buffer). It was a bit slower than the memory mapped
 approch but still very fast. Here are the times measured:

 stdio :   14.8 seconds
 _read :    1.8 seconds
 mmap  :    0.8 seconds

 Not that it matters for my small program, but I'm wondering why the
 stdio fgetc function takes so much time.

Aug 13 2004

↑ ↓ ← → Jan Knepper <jan smartsoft.us> writes:

Heinz Saathoff wrote:
 I only do a sequential read using fgetc. As far as I know stdio also 
 uses a buffer of at least 512 bytes. 
 Decreasing the buffer size from 1024 byte to 512 byte in my 
 (_open,_read,_close)-buffered version increases the runtime from 1.8 
 seconds to 1.95 seconds. That's not too much for the smallest praxtical 
 buffer size.


How much are you reading at once?
fgetc only does 1 character per call. (_)read usually does more.
Calling the buffered I/O system for every single character or once for a 
block of 128 does make a difference.


  
 
"Heinz Saathoff" <hsaat despammed.com> wrote in message
news:MPG.1b85610ac0558da69896e3 news.digitalmars.com...

Hello,

I wrote a small program (NT console app) to search for filenames in my
Eudora mailbox files. The first attempt was to use (fopen, fgetc,
flcose) functions, files were opened in binary mode. The resulting
program worked but took a long time to run. Ok, might be my straight-
forward naive algorithm. Then I experimented with memory mapped files
and the otherwise same program was very fast. Ok, might be the OS
overhead, so I tried the (_open, _read, _close) functions with a own
small buffering (1K buffer). It was a bit slower than the memory mapped
approch but still very fast. Here are the times measured:

stdio :   14.8 seconds
_read :    1.8 seconds
mmap  :    0.8 seconds

Not that it matters for my small program, but I'm wondering why the
stdio fgetc function takes so much time.







-- 
ManiaC++
Jan Knepper

But as for me and my household, we shall use Mozilla...
www.mozilla.org

Aug 13 2004

↑ ↓ ← → Heinz Saathoff <hsaat despammed.com> writes:

Hello Jan,

Jan Knepper wrote...
 Heinz Saathoff wrote:
 I only do a sequential read using fgetc. As far as I know stdio also 
 uses a buffer of at least 512 bytes. 
 Decreasing the buffer size from 1024 byte to 512 byte in my 
 (_open,_read,_close)-buffered version increases the runtime from 1.8 
 seconds to 1.95 seconds. That's not too much for the smallest praxtical 
 buffer size.


 How much are you reading at once?
 fgetc only does 1 character per call. (_)read usually does more.
 Calling the buffered I/O system for every single character or once for a 
 block of 128 does make a difference.


When using stdio I read one char at a time with fgetc. But internally 
stdio users a buffer too. 
My simple buffered file is this:
------------------- buffered file --------------------------
class CppFILE
{
public:
   CppFILE() : fhandle(0), idx(0), filled(-1) {}
   CppFILE(const char *name, const char *mode) : idx(0), filled(-1) {
      Open(name, mode);
   }
   ~CppFILE() { Close(); }
   bool Open(const char *name, const char *mode) {
      fhandle = _open(name, _O_RDONLY|_O_BINARY);
      return fhandle>=0;
   }
   void Close() { if(fhandle > 0) {
                     _close(fhandle);
                     fhandle = -1;
                     idx = 0;
                     filled = -1;
                  }
                }
   int  getc();
protected:
   void  Fill();
   int   fhandle;
   unsigned char  buffer[4096];
   int   idx, filled;
};

void CppFILE::Fill()
{
   if( fhandle>=0 && (filled < 0 || filled == sizeof(buffer)) ) {
      // fill possible
      filled = _read(fhandle, buffer, sizeof(buffer));
      //printf("Fill: read %d\n", filled);
      idx = 0;
   }//if
}

int CppFILE::getc()
{
   if(idx < filled)  return buffer[idx++];
   Fill();
   if(idx < filled)  return buffer[idx++];
   return EOF;
}
---------------- end buffered file -------------------------

Instead of fgetc I used  infile.getc()  to read a single char. I thought 
that fgetc would do it's buffering in a similar way. But it seems that 
fgetc() does much more than my simple getc(). I think it's time to look 
for the sources of stdio.


- Heinz

Aug 16 2004

↑ ↓ ← → Scott Michel <scottm aero.org> writes:

Heinz Saathoff wrote:

 Hello Jan,
 
 Jan Knepper wrote...
 
Heinz Saathoff wrote:

I only do a sequential read using fgetc. As far as I know stdio also 
uses a buffer of at least 512 bytes. 
Decreasing the buffer size from 1024 byte to 512 byte in my 
(_open,_read,_close)-buffered version increases the runtime from 1.8 
seconds to 1.95 seconds. That's not too much for the smallest praxtical 
buffer size.


How much are you reading at once?
fgetc only does 1 character per call. (_)read usually does more.
Calling the buffered I/O system for every single character or once for a 
block of 128 does make a difference.




Jan's point is that a function call to fgetc() has a lot more overhead 
associated with it than incrementing a pointer. The test would be a 
little better balanced if you benchmarked fread() against _read().

In both the _read() and the memory mapped file case, you're reading into 
a buffer and (presumably) using a character pointer to examine each 
character in the buffer. This will always be faster than fgetc(), even 
if fgetc() is inlined.

Aug 16 2004

↑ ↓ ← → Heinz Saathoff <hsaat despammed.com> writes:

Hello Scott,

Scott Michel wrote...
How much are you reading at once?
fgetc only does 1 character per call. (_)read usually does more.
Calling the buffered I/O system for every single character or once for a 
block of 128 does make a difference.




 Jan's point is that a function call to fgetc() has a lot more overhead 
 associated with it than incrementing a pointer. The test would be a 
 little better balanced if you benchmarked fread() against _read().


That fgetc() has much overhead is true, but I wasn't sure why it's 
nearly a factor of 10 against my primitive buffering approach. Walter 
told me that fgetc has to be aware of multithreading. There will be some 
error handling too. All this is overhead.
When I find some time I will have a look to the sources and see what 
happens.

 
 In both the _read() and the memory mapped file case, you're reading into 
 a buffer and (presumably) using a character pointer to examine each 
 character in the buffer. This will always be faster than fgetc(), even 
 if fgetc() is inlined.


If fgetc() was implemented the way I did in my simple buffering file 
wrapper it would be as fast as my version. As you told fgetc() does more 
than just picking a char from a buffer and incrementing a pointer. I 
didn't expect this overhead in first place but now I know not to use 
fgetc() in timecritical applications. 


- Heinz

Aug 17 2004

↑ ↓ ← → Scott Michel <scottm aero.org> writes:

Heinz Saathoff wrote:
 That fgetc() has much overhead is true, but I wasn't sure why it's 
 nearly a factor of 10 against my primitive buffering approach. Walter 
 told me that fgetc has to be aware of multithreading. There will be some 
 error handling too. All this is overhead.
 When I find some time I will have a look to the sources and see what 
 happens.


Well, you could purchase the CD and look at the code. :-)

fgetc() for 32-bit is handcoded assembly (see src/CORE32/FPUTC.ASM.) 
It's about as fast as you're going to get it to run. It does deal with 
multithreaded locking of the file descriptor, which is the place where 
the slowdown occurs. If you look at the code for LockSemaphoreNested, 
you'll see a LOCK-prefixed instruction -- this is **really** slow 
because it forces a lot of synchrony in the Pentium pipeline. It's the 
most conservative way of doing locking if you can't do self-modifying 
code or don't offer processor-specific versions of the RTL.

Of course, you can create your own CPU-specific version of the RTL 
because the build system and the code is available from the CD. For 
example, you can call CMPXCHG or XADD instead of LOCK INC (because the 
lock semantics are implied.) If you're not in a SMP environment, you can 
simply use MOV so long as it's an aligned MOV (gauranteed atomic.)

 If fgetc() was implemented the way I did in my simple buffering file 
 wrapper it would be as fast as my version. As you told fgetc() does more 
 than just picking a char from a buffer and incrementing a pointer. I 
 didn't expect this overhead in first place but now I know not to use 
 fgetc() in timecritical applications. 


Clearly, your own optimizations work better than the generic version, 
which has to make few and conservative assumptions. If you're looking to 
be maximally portable, stick with fgetc() or use fread() if the size of 
the object is known. fread() will amortize the penalty of calling stdio 
over a larger number of bytes.

Aug 17 2004

↑ ↓ ← → Heinz Saathoff <hsaat despammed.com> writes:

Hello Scott,

Scott Michel wrote...
 Heinz Saathoff wrote:
 That fgetc() has much overhead is true, but I wasn't sure why it's 
 nearly a factor of 10 against my primitive buffering approach. Walter 
 told me that fgetc has to be aware of multithreading. There will be some 
 error handling too. All this is overhead.
 When I find some time I will have a look to the sources and see what 
 happens.


 Well, you could purchase the CD and look at the code. :-)


I already have. That' why I have the sources.

 
 fgetc() for 32-bit is handcoded assembly (see src/CORE32/FPUTC.ASM.) 
 It's about as fast as you're going to get it to run. It does deal with 
 multithreaded locking of the file descriptor, which is the place where 
 the slowdown occurs. If you look at the code for LockSemaphoreNested, 
 you'll see a LOCK-prefixed instruction -- this is **really** slow 
 because it forces a lot of synchrony in the Pentium pipeline. It's the 
 most conservative way of doing locking if you can't do self-modifying 
 code or don't offer processor-specific versions of the RTL.


Thank's for the hint. 


 Of course, you can create your own CPU-specific version of the RTL 
 because the build system and the code is available from the CD. For 
 example, you can call CMPXCHG or XADD instead of LOCK INC (because the 
 lock semantics are implied.) If you're not in a SMP environment, you can 
 simply use MOV so long as it's an aligned MOV (gauranteed atomic.)


It's not necessary in the moment. The app I wrote is only used by 
myself. Now that I'm aware of this bottleneck I can evade it.


 If fgetc() was implemented the way I did in my simple buffering file 
 wrapper it would be as fast as my version. As you told fgetc() does more 
 than just picking a char from a buffer and incrementing a pointer. I 
 didn't expect this overhead in first place but now I know not to use 
 fgetc() in timecritical applications. 


 Clearly, your own optimizations work better than the generic version, 
 which has to make few and conservative assumptions. If you're looking to 
 be maximally portable, stick with fgetc() or use fread() if the size of 
 the object is known. fread() will amortize the penalty of calling stdio 
 over a larger number of bytes.


Yes, it's always good to know where the time is spent and how small 
changes in code can result in great performance gains. 


- Heinz

Aug 18 2004

↑ ↓ ← → Scott Michel <scottm aero.org> writes:

Heinz Saathoff wrote:
 Yes, it's always good to know where the time is spent and how small 
 changes in code can result in great performance gains. 


Google for the linux kernel patch that modifies the kernel at run-time 
to select the "right" atomic instructions depending on whether the 
machine is SMP and the processor model/rev. Pretty cool looking stuff, 
but with the XP patches that prevent modifying executable pages, I doubt 
this could be easily implemented in the RTL.


-scooter

Aug 18 2004

↑ ↓ ← → "Walter" <newshound digitalmars.com> writes:

Try setting the buffer size larger, not smaller, and make it a multiple of
4K. -Walter

"Heinz Saathoff" <hsaat despammed.com> wrote in message
news:MPG.1b868bd6fe18ecf99896e4 news.digitalmars.com...
 Hello Walter,

 Walter wrote ...
 It could be that the optimal buffer size you need is not the default one
 used by stdio.


 I only do a sequential read using fgetc. As far as I know stdio also
 uses a buffer of at least 512 bytes.
 Decreasing the buffer size from 1024 byte to 512 byte in my
 (_open,_read,_close)-buffered version increases the runtime from 1.8
 seconds to 1.95 seconds. That's not too much for the smallest praxtical
 buffer size.

 "Heinz Saathoff" <hsaat despammed.com> wrote in message
 news:MPG.1b85610ac0558da69896e3 news.digitalmars.com...
 Hello,

 I wrote a small program (NT console app) to search for filenames in my
 Eudora mailbox files. The first attempt was to use (fopen, fgetc,
 flcose) functions, files were opened in binary mode. The resulting
 program worked but took a long time to run. Ok, might be my straight-
 forward naive algorithm. Then I experimented with memory mapped files
 and the otherwise same program was very fast. Ok, might be the OS
 overhead, so I tried the (_open, _read, _close) functions with a own
 small buffering (1K buffer). It was a bit slower than the memory






 approch but still very fast. Here are the times measured:

 stdio :   14.8 seconds
 _read :    1.8 seconds
 mmap  :    0.8 seconds

 Not that it matters for my small program, but I'm wondering why the
 stdio fgetc function takes so much time.

Aug 13 2004

↑ ↓ ← → Heinz Saathoff <hsaat despammed.com> writes:

Hello Walter,

The test with small buffer was to show that my simple buffering still is 
much faster than the stdio fgetc(). For stdio I didn't change anything. 
As far as I know stdio uses buffering too if not disabled. I think I 
will have a look at the stdio sources to find out what happens.

- Heinz


Walter wrote...
 Try setting the buffer size larger, not smaller, and make it a multiple of
 4K. -Walter
 
 "Heinz Saathoff" <hsaat despammed.com> wrote in message
 news:MPG.1b868bd6fe18ecf99896e4 news.digitalmars.com...
 Hello Walter,

 Walter wrote ...
 It could be that the optimal buffer size you need is not the default one
 used by stdio.


 I only do a sequential read using fgetc. As far as I know stdio also
 uses a buffer of at least 512 bytes.
 Decreasing the buffer size from 1024 byte to 512 byte in my
 (_open,_read,_close)-buffered version increases the runtime from 1.8
 seconds to 1.95 seconds. That's not too much for the smallest praxtical
 buffer size.

Aug 16 2004

↑ ↓ ← → "Walter" <newshound digitalmars.com> writes:

"Heinz Saathoff" <hsaat despammed.com> wrote in message
news:MPG.1b8a8600757fa24d9896e6 news.digitalmars.com...
 Hello Walter,

 The test with small buffer was to show that my simple buffering still is
 much faster than the stdio fgetc(). For stdio I didn't change anything.
 As far as I know stdio uses buffering too if not disabled. I think I
 will have a look at the stdio sources to find out what happens.

 - Heinz


fgetc also must do thread synchronization.

Aug 16 2004