www.digitalmars.com         C & C++   DMDScript  

D - performance in std.stream

reply "Ben Hinkle" <bhinkle4 juno.com> writes:
Right now std.stream.readLine takes no inputs and returns a
char[] of the line read. Each time it gets called it builds a string
using ~=. Can there be an API that lets you pass in an
"inout char[]" argument that only gets resized if needed. That
would take a burden off the GC when reading files line by line.

That way the existing readLine could be reimplemented as:

char[] readLine()
{
   char[] result;
   readLine(result);
   return result;
}
void readLine(out char[] result)
{
   [fill and only reallocate result if needed]
}

The same would go for readLineW.
-Ben
Mar 03 2004
next sibling parent reply "Walter" <walter digitalmars.com> writes:
You're right, the performance of std.stream isn't good. Want to design a new
one? <g>
Mar 08 2004
parent reply larry cowan <larry_member pathlink.com> writes:
In article <c2j7a9$2kc5$1 digitaldaemon.com>, Walter says...
You're right, the performance of std.stream isn't good. Want to design a new
one? <g>
.. but the stacking of chars onto the string for readLine isn't the problem. Giving it a preallocated buffer and indexing them in does not significantly improve the speed. OTOH the stream.d that Andy Friesen wrote a while back as a template "bolt-ins" getc's trying to manage ungetc also). It is a neat start, but is incomplete.
Mar 09 2004
next sibling parent reply Andy Friesen <andy ikagames.com> writes:
larry cowan wrote:
 In article <c2j7a9$2kc5$1 digitaldaemon.com>, Walter says...
 
You're right, the performance of std.stream isn't good. Want to design a new
one? <g>
.. but the stacking of chars onto the string for readLine isn't the problem. Giving it a preallocated buffer and indexing them in does not significantly improve the speed. OTOH the stream.d that Andy Friesen wrote a while back as a template "bolt-ins" getc's trying to manage ungetc also). It is a neat start, but is incomplete.
The biggest disadvantage to the bolt-in approach is that it's inherently dependant on templates. In C++ land this would translate to horrendous compile times, here in D land it means occasional linker weirdness unless you link stream.obj explicitly. Also, what's missing? I tend to suffer from blindness to hypothetical needs other than my own. :) -- andy
Mar 09 2004
parent reply larry cowan <larry_member pathlink.com> writes:
In article <c2m5kb$21oj$1 digitaldaemon.com>, Andy Friesen says...
larry cowan wrote:
 In article <c2j7a9$2kc5$1 digitaldaemon.com>, Walter says...
 
You're right, the performance of std.stream isn't good. Want to design a new
one? <g>
.. but the stacking of chars onto the string for readLine isn't the problem. Giving it a preallocated buffer and indexing them in does not significantly improve the speed. OTOH the stream.d that Andy Friesen wrote a while back as a template "bolt-ins". getc's trying to manage ungetc also). It is a neat start, but is incomplete.
The biggest disadvantage to the bolt-in approach is that it's inherently dependant on templates. In C++ land this would translate to horrendous compile times, here in D land it means occasional linker weirdness unless you link stream.obj explicitly. Also, what's missing? I tend to suffer from blindness to hypothetical needs other than my own. :) -- andy
The writeWChar() and writeDChar() methods won't compile. It really needs overloaded this() in read and write to specify options for file open. An open() would be nice for reopening after closing, but it's not really needed when you can just instantiate it again easily - why allow explicit close?. Could a read/write be built for this? Or is this what the R and W inheritance (this(File*...)) is to support? Haven't tried to do anything with this, but there are seek requirements for switching back and forth if a read/write open is used (could be automated, but only for the common cases).
Mar 09 2004
parent reply Andy Friesen <andy ikagames.com> writes:
larry cowan wrote:

 The writeWChar() and writeDChar() methods won't compile.
Crazy. I'll take a look.
 It really needs overloaded this() in read and write to specify options for file
 open.
What sort of options? Text/binary, append, and the like?
 An open() would be nice for reopening after closing, but it's not really needed
 when you can just instantiate it again easily - why allow explicit close?.
close() is necessary because D's garbage collector isn't deterministic. The destructor can clean up if you forget, but it's not a good idea to have an open file descriptor laying around like that.
 Could a read/write be built for this? Or is this what the R and W inheritance
 (this(File*...)) is to support?  Haven't tried to do anything with this, but
 there are seek requirements for switching back and forth if a read/write open
is
 used (could be automated, but only for the common cases).
I hadn't considered simultaneous read/write; I doubt it will work. -- andy
Mar 09 2004
parent reply larry cowan <larry_member pathlink.com> writes:
In article <c2mcd0$2ebl$1 digitaldaemon.com>, Andy Friesen says...
larry cowan wrote:

 The writeWChar() and writeDChar() methods won't compile.
Crazy. I'll take a look.
streams.d(529): function toString overloads char[](char c) and char[](creal r) both match argument list for toString
 It really needs overloaded this() in read and write to specify options for file
 open.
What sort of options? Text/binary, append, and the like?
Yeah, the normal filemodes. And rw+ as indicated below... And rb and wb for windoughs.
 An open() would be nice for reopening after closing, but it's not really needed
 when you can just instantiate it again easily - why allow explicit close?.
close() is necessary because D's garbage collector isn't deterministic. The destructor can clean up if you forget, but it's not a good idea to have an open file descriptor laying around like that.
Ok, reasonable, but then you should be able to reopen it (not necessarily in the same mode.
 Could a read/write be built for this? Or is this what the R and W inheritance
 (this(File*...)) is to support?  Haven't tried to do anything with this, but
 there are seek requirements for switching back and forth if a read/write open
is
 used (could be automated, but only for the common cases).
I hadn't considered simultaneous read/write; I doubt it will work.
Not simultaneous - just on the same open(). You switch back and forth using a seek ( maybe to current point + 0L ) to clear the pointer mode and then can issue an opposite mode command, e.g., read, seek, overwrite, seek, read, read, seek-back, overwrite, overwrite, ... . This is used to maintain a work file of current events, or a fixed-size-rec database file. You can do the same with two open file pointers (read & write to same file), but bsd unix has supported this for over 20 years - and I think it came from even earlier at AT&T. You could support smart mode-switching, but usually you want to move the pointer back to the start of the record you just read, or to eof, or to front of file (to update a status rec of some kind), so that's not really useful.
  -- andy
Mar 10 2004
parent Charles Hixson <charleshixsn earthlink.net> writes:
larry cowan wrote:
 In article <c2mcd0$2ebl$1 digitaldaemon.com>, Andy Friesen says...
 
larry cowan wrote:
...
Could a read/write be built for this? Or is this what the R and W inheritance
(this(File*...)) is to support?  Haven't tried to do anything with this, but
there are seek requirements for switching back and forth if a read/write open is
used (could be automated, but only for the common cases).
I hadn't considered simultaneous read/write; I doubt it will work.
Not simultaneous - just on the same open(). You switch back and forth using a seek ( maybe to current point + 0L ) to clear the pointer mode and then can issue an opposite mode command, e.g., read, seek, overwrite, seek, read, read, seek-back, overwrite, overwrite, ... . This is used to maintain a work file of current events, or a fixed-size-rec database file. You can do the same with two open file pointers (read & write to same file), but bsd unix has supported this for over 20 years - and I think it came from even earlier at AT&T. You could support smart mode-switching, but usually you want to move the pointer back to the start of the record you just read, or to eof, or to front of file (to update a status rec of some kind), so that's not really useful.
 -- andy
Yes. Being able to do block oriented read-write to the same file is very important. I'm not sure whether it should be a part of std.stream, or whether it should be in a separate package, but it's really quite important. Without that capability one can, to take an extreme example, end up copying an entire database for a simple one character change.
Mar 10 2004
prev sibling parent Ben Hinkle <bhinkle4 juno.com> writes:
On Wed, 10 Mar 2004 04:12:42 +0000 (UTC), larry cowan
<larry_member pathlink.com> wrote:

In article <c2j7a9$2kc5$1 digitaldaemon.com>, Walter says...
You're right, the performance of std.stream isn't good. Want to design a new
one? <g>
.. but the stacking of chars onto the string for readLine isn't the problem. Giving it a preallocated buffer and indexing them in does not significantly improve the speed.
Yeah - there are probably bigger performance issues and I haven't actually tested any changes. But in general creating APIs that generate lots of garbage to collect is a performance problem waiting to happen.
OTOH the stream.d that Andy Friesen wrote a while back as a template "bolt-ins"

getc's trying to manage ungetc also).  It is a neat start, but is incomplete.
I wasn't aware of Andy's stream.d. I'm downloading "apropos" from http://ikagames.com/andy/d/ now since it looks like that is where I can find it. -Ben
Mar 10 2004
prev sibling parent reply "Kris" <someidiot earthlink.net> writes:
I've been building a multi-layer IO package for DSC (D Servlet Container)
that's fully buffered and generates zero garbage.

The layers include:

- raw byte-array style I/O

- informal text-oriented Tokenizer I/O with text/binary conversion

- structured binary I/O (directly to/from variables) with optional endian
flipping

- structured text-oriented input with data conversion

- structured class serialization/deserialization framework

It's quite flexible: for example, one can 'read' a line into a Token (using
a LineTokenizer) and then map said Token into any of the other layers for
further slicing. One can mix and match the different layers in pretty much
any way that makes sense. There is no copying of data unless requested by
the app, and no memory allocation other than app-allocated I/O buffers. It
does random-access where the underlying system supports it (not on a
socket), and is intended to support memory-mapped buffers (for *huge*
files). It uses lookahead instead of unget... for those cases that need such
things.

I'm writing this stuff for a high-performance server (in D), so one of the
primary goals is very low runtime overhead (i.e. zero memory allocation). If
anyone's interested, I'll try to get the first major I/O cut out the door
by, say, the end of the month.

- Kris


"Ben Hinkle" <bhinkle4 juno.com> wrote in message
news:c257jd$1ph8$1 digitaldaemon.com...
 Right now std.stream.readLine takes no inputs and returns a
 char[] of the line read. Each time it gets called it builds a string
 using ~=. Can there be an API that lets you pass in an
 "inout char[]" argument that only gets resized if needed. That
 would take a burden off the GC when reading files line by line.

 That way the existing readLine could be reimplemented as:

 char[] readLine()
 {
    char[] result;
    readLine(result);
    return result;
 }
 void readLine(out char[] result)
 {
    [fill and only reallocate result if needed]
 }

 The same would go for readLineW.
 -Ben
Mar 10 2004
next sibling parent reply Charles Hixson <charleshixsn earthlink.net> writes:
Kris wrote:
 I've been building a multi-layer IO package for DSC (D Servlet Container)
 that's fully buffered and generates zero garbage.
 
 The layers include:
 
 - raw byte-array style I/O
 
 - informal text-oriented Tokenizer I/O with text/binary conversion
 
 - structured binary I/O (directly to/from variables) with optional endian
 flipping
 
 - structured text-oriented input with data conversion
 
 - structured class serialization/deserialization framework
 
 It's quite flexible: for example, one can 'read' a line into a Token (using
 a LineTokenizer) and then map said Token into any of the other layers for
 further slicing. One can mix and match the different layers in pretty much
 any way that makes sense. There is no copying of data unless requested by
 the app, and no memory allocation other than app-allocated I/O buffers. It
 does random-access where the underlying system supports it (not on a
 socket), and is intended to support memory-mapped buffers (for *huge*
 files). It uses lookahead instead of unget... for those cases that need such
 things.
 
 I'm writing this stuff for a high-performance server (in D), so one of the
 primary goals is very low runtime overhead (i.e. zero memory allocation). If
 anyone's interested, I'll try to get the first major I/O cut out the door
 by, say, the end of the month.
 
 - Kris
...

 
 
 
That sounds extremely interesting. Would it be portable, or platform specific?
Mar 10 2004
parent "Kris" <someidiot earthlink.net> writes:
 That sounds extremely interesting.  Would it be portable, or platform
 specific?
Portable.
Mar 10 2004
prev sibling next sibling parent reply "Ben Hinkle" <bhinkle4 juno.com> writes:
Sounds nice! I'm definitely interested. Some comments/questions below

"Kris" <someidiot earthlink.net> wrote in message
news:c2nl32$1mm2$1 digitaldaemon.com...
 I've been building a multi-layer IO package for DSC (D Servlet Container)
 that's fully buffered and generates zero garbage.

 The layers include:

 - raw byte-array style I/O

 - informal text-oriented Tokenizer I/O with text/binary conversion
I'm not sure what informal means here or what text/binary conversion means. Does it include scanf/printf? Any replacement for std.stream needs to have that.
 - structured binary I/O (directly to/from variables) with optional endian
 flipping

 - structured text-oriented input with data conversion
not sure what structured means but I'm guessing the data being read has a format like "rows of numbers separated by tabs with the first column being a string" or something. Is this what you mean?
 - structured class serialization/deserialization framework
I don't know what you have in mind here but it sounds like a big chunk of work. Could it go into another module?
 It's quite flexible: for example, one can 'read' a line into a Token
(using
 a LineTokenizer) and then map said Token into any of the other layers for
 further slicing. One can mix and match the different layers in pretty much
 any way that makes sense. There is no copying of data unless requested by
 the app, and no memory allocation other than app-allocated I/O buffers. It
 does random-access where the underlying system supports it (not on a
 socket), and is intended to support memory-mapped buffers (for *huge*
 files). It uses lookahead instead of unget... for those cases that need
such
 things.

 I'm writing this stuff for a high-performance server (in D), so one of the
 primary goals is very low runtime overhead (i.e. zero memory allocation).
If
 anyone's interested, I'll try to get the first major I/O cut out the door
 by, say, the end of the month.
Mar 10 2004
parent "Kris" <someidiot earthlink.net> writes:
Below:

 I'm not sure what informal means here or what text/binary conversion
 means. Does it include scanf/printf? Any replacement for std.stream needs
 to have that.
Forgive the odd terminology Ben. In this context, informal means "loosely formatted", like a command-line or word-count input, or line-input. Formal, or structured, stipulates the input data must conform exactly. Formal inputs throw an exception when the input fails to match an expectation, whereas informal ones just return eof (or an empty Token). Text/binary conversion is converting numbers to text and vice versa (what printf and scanf do). Here's a trivial example: { TextWriter w; int j; // convert to text w << "number is : " << j << w.Newline; } Printf is supported via a thin adaptor: w << Printf.format("%d %d", 10, 20) << ....
 not sure what structured means but I'm guessing the data being read
 has a format like "rows of numbers separated by tabs with the
 first column being a string" or something. Is this what you mean?
Pretty much, yes. In other words, structured/formal format is rigid. There's an agreed-upon layout that is reflected in the source code. That layout might be implemented as raw binary, or as delimited text (as you suggest). If it's raw binary, then there should also be an agreed Endian layout.
 I don't know what you have in mind here but it sounds like a big
 chunk of work. Could it go into another module?
If it were a Java-style reflection approach I'd quickly agree :-) In this case it's just a couple of trivial interfaces. Each serializable class must implement a read & write method. If one were deserializing a (previously serialized) class, one would probably follow the structured/formal approach. Here's an example: class MyClass : IReadable, IWritable { private int foo; private float bar; // the IReadable interface void read (IReader r) { r >> foo >> bar; } // the IWritable interface void write (IWriter w) { w << foo << bar; } } { MyClass mc = new MyClass(...); .... BinaryWriter w = new BinaryWriter(...); w << mc; ... } Note that there are a variety of Readers and Writers (binary, text, token, etc) and all of them can be used directly against any class implementing IReadable and/or IWritable. In other words, the implementing class doesn't know, or care, whether it's serialized in binary, text, or something else. As an aside, there's also an ISerializable interface that adds methods getGuid() and create(). This would be used when shipping class instances around a network, sans reflection (something that DSC will support for clustering).
Mar 10 2004
prev sibling parent reply "Carlos Santander B." <carlos8294 msn.com> writes:
"Kris" <someidiot earthlink.net> wrote in message
news:c2nl32$1mm2$1 digitaldaemon.com
| I've been building a multi-layer IO package for DSC (D
| Servlet Container) that's fully buffered and generates
| zero garbage.
| ...

Do you know when you could make a release for DSC?

-----------------------
Carlos Santander Bernal
Mar 10 2004
parent reply "Kris" <someidiot earthlink.net> writes:
Carlos,

I would hope to release an Alpha sometime in April. Initial release will not
support https, because I don't have that expertise. Any volunteers? <g>

- Kris


"Carlos Santander B." <carlos8294 msn.com> wrote in message
news:c2ok8o$efd$1 digitaldaemon.com...
 "Kris" <someidiot earthlink.net> wrote in message
 news:c2nl32$1mm2$1 digitaldaemon.com
 | I've been building a multi-layer IO package for DSC (D
 | Servlet Container) that's fully buffered and generates
 | zero garbage.
 | ...

 Do you know when you could make a release for DSC?

 -----------------------
 Carlos Santander Bernal
Mar 10 2004
parent "Carlos Santander B." <carlos8294 msn.com> writes:
"Kris" <someidiot earthlink.net> wrote in message
news:c2olf2$gn9$1 digitaldaemon.com
| Carlos,
|
| I would hope to release an Alpha sometime in April.
| Initial release will not support https, because I don't
| have that expertise. Any volunteers? <g>
|
| - Kris
|

Not here, sorry. I was just asking because right now I'm in the middle of
building an e-business site but using JSP (including https, certificates,
signed mail...), so I got curious to know if/how that could be done using
DSC. Not that I would use it: deadline is this monday and I still have to
figure out how to attach a file to a signed mail, and make Apache find the
jar for creating the graphic that will be attached. So, I still have a lot
of research to do.

-----------------------
Carlos Santander Bernal
Mar 10 2004