D - performance in std.stream
- Ben Hinkle (18/18) Mar 03 2004 Right now std.stream.readLine takes no inputs and returns a
- Walter (2/2) Mar 08 2004 You're right, the performance of std.stream isn't good. Want to design a...
- larry cowan (7/9) Mar 09 2004 .. but the stacking of chars onto the string for readLine isn't the prob...
- Andy Friesen (8/22) Mar 09 2004 The biggest disadvantage to the bolt-in approach is that it's inherently...
- larry cowan (10/32) Mar 09 2004 The writeWChar() and writeDChar() methods won't compile.
- Andy Friesen (8/17) Mar 09 2004 What sort of options? Text/binary, append, and the like?
- larry cowan (17/34) Mar 10 2004 streams.d(529): function toString overloads char[](char c) and char[](cr...
- Charles Hixson (6/29) Mar 10 2004 Yes. Being able to do block oriented read-write to the same file is
- Ben Hinkle (10/22) Mar 10 2004 Yeah - there are probably bigger performance issues and I haven't
- Kris (25/43) Mar 10 2004 I've been building a multi-layer IO package for DSC (D Servlet Container...
- Charles Hixson (4/40) Mar 10 2004 That sounds extremely interesting. Would it be portable, or platform
- Kris (1/3) Mar 10 2004 Portable.
- Ben Hinkle (14/36) Mar 10 2004 Sounds nice! I'm definitely interested. Some comments/questions below
- Kris (55/63) Mar 10 2004 Forgive the odd terminology Ben. In this context, informal means "loose...
-
Carlos Santander B.
(9/9)
Mar 10 2004
"Kris"
wrote in message - Kris (6/15) Mar 10 2004 Carlos,
-
Carlos Santander B.
(19/19)
Mar 10 2004
"Kris"
wrote in message
Right now std.stream.readLine takes no inputs and returns a char[] of the line read. Each time it gets called it builds a string using ~=. Can there be an API that lets you pass in an "inout char[]" argument that only gets resized if needed. That would take a burden off the GC when reading files line by line. That way the existing readLine could be reimplemented as: char[] readLine() { char[] result; readLine(result); return result; } void readLine(out char[] result) { [fill and only reallocate result if needed] } The same would go for readLineW. -Ben
Mar 03 2004
You're right, the performance of std.stream isn't good. Want to design a new one? <g>
Mar 08 2004
In article <c2j7a9$2kc5$1 digitaldaemon.com>, Walter says...You're right, the performance of std.stream isn't good. Want to design a new one? <g>.. but the stacking of chars onto the string for readLine isn't the problem. Giving it a preallocated buffer and indexing them in does not significantly improve the speed. OTOH the stream.d that Andy Friesen wrote a while back as a template "bolt-ins" getc's trying to manage ungetc also). It is a neat start, but is incomplete.
Mar 09 2004
larry cowan wrote:In article <c2j7a9$2kc5$1 digitaldaemon.com>, Walter says...The biggest disadvantage to the bolt-in approach is that it's inherently dependant on templates. In C++ land this would translate to horrendous compile times, here in D land it means occasional linker weirdness unless you link stream.obj explicitly. Also, what's missing? I tend to suffer from blindness to hypothetical needs other than my own. :) -- andyYou're right, the performance of std.stream isn't good. Want to design a new one? <g>.. but the stacking of chars onto the string for readLine isn't the problem. Giving it a preallocated buffer and indexing them in does not significantly improve the speed. OTOH the stream.d that Andy Friesen wrote a while back as a template "bolt-ins" getc's trying to manage ungetc also). It is a neat start, but is incomplete.
Mar 09 2004
In article <c2m5kb$21oj$1 digitaldaemon.com>, Andy Friesen says...larry cowan wrote:The writeWChar() and writeDChar() methods won't compile. It really needs overloaded this() in read and write to specify options for file open. An open() would be nice for reopening after closing, but it's not really needed when you can just instantiate it again easily - why allow explicit close?. Could a read/write be built for this? Or is this what the R and W inheritance (this(File*...)) is to support? Haven't tried to do anything with this, but there are seek requirements for switching back and forth if a read/write open is used (could be automated, but only for the common cases).In article <c2j7a9$2kc5$1 digitaldaemon.com>, Walter says...The biggest disadvantage to the bolt-in approach is that it's inherently dependant on templates. In C++ land this would translate to horrendous compile times, here in D land it means occasional linker weirdness unless you link stream.obj explicitly. Also, what's missing? I tend to suffer from blindness to hypothetical needs other than my own. :) -- andyYou're right, the performance of std.stream isn't good. Want to design a new one? <g>.. but the stacking of chars onto the string for readLine isn't the problem. Giving it a preallocated buffer and indexing them in does not significantly improve the speed. OTOH the stream.d that Andy Friesen wrote a while back as a template "bolt-ins". getc's trying to manage ungetc also). It is a neat start, but is incomplete.
Mar 09 2004
larry cowan wrote:The writeWChar() and writeDChar() methods won't compile.Crazy. I'll take a look.It really needs overloaded this() in read and write to specify options for file open.What sort of options? Text/binary, append, and the like?An open() would be nice for reopening after closing, but it's not really needed when you can just instantiate it again easily - why allow explicit close?.close() is necessary because D's garbage collector isn't deterministic. The destructor can clean up if you forget, but it's not a good idea to have an open file descriptor laying around like that.Could a read/write be built for this? Or is this what the R and W inheritance (this(File*...)) is to support? Haven't tried to do anything with this, but there are seek requirements for switching back and forth if a read/write open is used (could be automated, but only for the common cases).I hadn't considered simultaneous read/write; I doubt it will work. -- andy
Mar 09 2004
In article <c2mcd0$2ebl$1 digitaldaemon.com>, Andy Friesen says...larry cowan wrote:streams.d(529): function toString overloads char[](char c) and char[](creal r) both match argument list for toStringThe writeWChar() and writeDChar() methods won't compile.Crazy. I'll take a look.Yeah, the normal filemodes. And rw+ as indicated below... And rb and wb for windoughs.It really needs overloaded this() in read and write to specify options for file open.What sort of options? Text/binary, append, and the like?Ok, reasonable, but then you should be able to reopen it (not necessarily in the same mode.An open() would be nice for reopening after closing, but it's not really needed when you can just instantiate it again easily - why allow explicit close?.close() is necessary because D's garbage collector isn't deterministic. The destructor can clean up if you forget, but it's not a good idea to have an open file descriptor laying around like that.Not simultaneous - just on the same open(). You switch back and forth using a seek ( maybe to current point + 0L ) to clear the pointer mode and then can issue an opposite mode command, e.g., read, seek, overwrite, seek, read, read, seek-back, overwrite, overwrite, ... . This is used to maintain a work file of current events, or a fixed-size-rec database file. You can do the same with two open file pointers (read & write to same file), but bsd unix has supported this for over 20 years - and I think it came from even earlier at AT&T. You could support smart mode-switching, but usually you want to move the pointer back to the start of the record you just read, or to eof, or to front of file (to update a status rec of some kind), so that's not really useful.Could a read/write be built for this? Or is this what the R and W inheritance (this(File*...)) is to support? Haven't tried to do anything with this, but there are seek requirements for switching back and forth if a read/write open is used (could be automated, but only for the common cases).I hadn't considered simultaneous read/write; I doubt it will work.-- andy
Mar 10 2004
larry cowan wrote:In article <c2mcd0$2ebl$1 digitaldaemon.com>, Andy Friesen says...Yes. Being able to do block oriented read-write to the same file is very important. I'm not sure whether it should be a part of std.stream, or whether it should be in a separate package, but it's really quite important. Without that capability one can, to take an extreme example, end up copying an entire database for a simple one character change.larry cowan wrote: ...Not simultaneous - just on the same open(). You switch back and forth using a seek ( maybe to current point + 0L ) to clear the pointer mode and then can issue an opposite mode command, e.g., read, seek, overwrite, seek, read, read, seek-back, overwrite, overwrite, ... . This is used to maintain a work file of current events, or a fixed-size-rec database file. You can do the same with two open file pointers (read & write to same file), but bsd unix has supported this for over 20 years - and I think it came from even earlier at AT&T. You could support smart mode-switching, but usually you want to move the pointer back to the start of the record you just read, or to eof, or to front of file (to update a status rec of some kind), so that's not really useful.Could a read/write be built for this? Or is this what the R and W inheritance (this(File*...)) is to support? Haven't tried to do anything with this, but there are seek requirements for switching back and forth if a read/write open is used (could be automated, but only for the common cases).I hadn't considered simultaneous read/write; I doubt it will work.-- andy
Mar 10 2004
On Wed, 10 Mar 2004 04:12:42 +0000 (UTC), larry cowan <larry_member pathlink.com> wrote:In article <c2j7a9$2kc5$1 digitaldaemon.com>, Walter says...Yeah - there are probably bigger performance issues and I haven't actually tested any changes. But in general creating APIs that generate lots of garbage to collect is a performance problem waiting to happen.You're right, the performance of std.stream isn't good. Want to design a new one? <g>.. but the stacking of chars onto the string for readLine isn't the problem. Giving it a preallocated buffer and indexing them in does not significantly improve the speed.OTOH the stream.d that Andy Friesen wrote a while back as a template "bolt-ins" getc's trying to manage ungetc also). It is a neat start, but is incomplete.I wasn't aware of Andy's stream.d. I'm downloading "apropos" from http://ikagames.com/andy/d/ now since it looks like that is where I can find it. -Ben
Mar 10 2004
I've been building a multi-layer IO package for DSC (D Servlet Container) that's fully buffered and generates zero garbage. The layers include: - raw byte-array style I/O - informal text-oriented Tokenizer I/O with text/binary conversion - structured binary I/O (directly to/from variables) with optional endian flipping - structured text-oriented input with data conversion - structured class serialization/deserialization framework It's quite flexible: for example, one can 'read' a line into a Token (using a LineTokenizer) and then map said Token into any of the other layers for further slicing. One can mix and match the different layers in pretty much any way that makes sense. There is no copying of data unless requested by the app, and no memory allocation other than app-allocated I/O buffers. It does random-access where the underlying system supports it (not on a socket), and is intended to support memory-mapped buffers (for *huge* files). It uses lookahead instead of unget... for those cases that need such things. I'm writing this stuff for a high-performance server (in D), so one of the primary goals is very low runtime overhead (i.e. zero memory allocation). If anyone's interested, I'll try to get the first major I/O cut out the door by, say, the end of the month. - Kris "Ben Hinkle" <bhinkle4 juno.com> wrote in message news:c257jd$1ph8$1 digitaldaemon.com...Right now std.stream.readLine takes no inputs and returns a char[] of the line read. Each time it gets called it builds a string using ~=. Can there be an API that lets you pass in an "inout char[]" argument that only gets resized if needed. That would take a burden off the GC when reading files line by line. That way the existing readLine could be reimplemented as: char[] readLine() { char[] result; readLine(result); return result; } void readLine(out char[] result) { [fill and only reallocate result if needed] } The same would go for readLineW. -Ben
Mar 10 2004
Kris wrote:I've been building a multi-layer IO package for DSC (D Servlet Container) that's fully buffered and generates zero garbage. The layers include: - raw byte-array style I/O - informal text-oriented Tokenizer I/O with text/binary conversion - structured binary I/O (directly to/from variables) with optional endian flipping - structured text-oriented input with data conversion - structured class serialization/deserialization framework It's quite flexible: for example, one can 'read' a line into a Token (using a LineTokenizer) and then map said Token into any of the other layers for further slicing. One can mix and match the different layers in pretty much any way that makes sense. There is no copying of data unless requested by the app, and no memory allocation other than app-allocated I/O buffers. It does random-access where the underlying system supports it (not on a socket), and is intended to support memory-mapped buffers (for *huge* files). It uses lookahead instead of unget... for those cases that need such things. I'm writing this stuff for a high-performance server (in D), so one of the primary goals is very low runtime overhead (i.e. zero memory allocation). If anyone's interested, I'll try to get the first major I/O cut out the door by, say, the end of the month. - Kris...That sounds extremely interesting. Would it be portable, or platform specific?
Mar 10 2004
That sounds extremely interesting. Would it be portable, or platform specific?Portable.
Mar 10 2004
Sounds nice! I'm definitely interested. Some comments/questions below "Kris" <someidiot earthlink.net> wrote in message news:c2nl32$1mm2$1 digitaldaemon.com...I've been building a multi-layer IO package for DSC (D Servlet Container) that's fully buffered and generates zero garbage. The layers include: - raw byte-array style I/O - informal text-oriented Tokenizer I/O with text/binary conversionI'm not sure what informal means here or what text/binary conversion means. Does it include scanf/printf? Any replacement for std.stream needs to have that.- structured binary I/O (directly to/from variables) with optional endian flipping - structured text-oriented input with data conversionnot sure what structured means but I'm guessing the data being read has a format like "rows of numbers separated by tabs with the first column being a string" or something. Is this what you mean?- structured class serialization/deserialization frameworkI don't know what you have in mind here but it sounds like a big chunk of work. Could it go into another module?It's quite flexible: for example, one can 'read' a line into a Token(usinga LineTokenizer) and then map said Token into any of the other layers for further slicing. One can mix and match the different layers in pretty much any way that makes sense. There is no copying of data unless requested by the app, and no memory allocation other than app-allocated I/O buffers. It does random-access where the underlying system supports it (not on a socket), and is intended to support memory-mapped buffers (for *huge* files). It uses lookahead instead of unget... for those cases that needsuchthings. I'm writing this stuff for a high-performance server (in D), so one of the primary goals is very low runtime overhead (i.e. zero memory allocation).Ifanyone's interested, I'll try to get the first major I/O cut out the door by, say, the end of the month.
Mar 10 2004
Below:I'm not sure what informal means here or what text/binary conversion means. Does it include scanf/printf? Any replacement for std.stream needs to have that.Forgive the odd terminology Ben. In this context, informal means "loosely formatted", like a command-line or word-count input, or line-input. Formal, or structured, stipulates the input data must conform exactly. Formal inputs throw an exception when the input fails to match an expectation, whereas informal ones just return eof (or an empty Token). Text/binary conversion is converting numbers to text and vice versa (what printf and scanf do). Here's a trivial example: { TextWriter w; int j; // convert to text w << "number is : " << j << w.Newline; } Printf is supported via a thin adaptor: w << Printf.format("%d %d", 10, 20) << ....not sure what structured means but I'm guessing the data being read has a format like "rows of numbers separated by tabs with the first column being a string" or something. Is this what you mean?Pretty much, yes. In other words, structured/formal format is rigid. There's an agreed-upon layout that is reflected in the source code. That layout might be implemented as raw binary, or as delimited text (as you suggest). If it's raw binary, then there should also be an agreed Endian layout.I don't know what you have in mind here but it sounds like a big chunk of work. Could it go into another module?If it were a Java-style reflection approach I'd quickly agree :-) In this case it's just a couple of trivial interfaces. Each serializable class must implement a read & write method. If one were deserializing a (previously serialized) class, one would probably follow the structured/formal approach. Here's an example: class MyClass : IReadable, IWritable { private int foo; private float bar; // the IReadable interface void read (IReader r) { r >> foo >> bar; } // the IWritable interface void write (IWriter w) { w << foo << bar; } } { MyClass mc = new MyClass(...); .... BinaryWriter w = new BinaryWriter(...); w << mc; ... } Note that there are a variety of Readers and Writers (binary, text, token, etc) and all of them can be used directly against any class implementing IReadable and/or IWritable. In other words, the implementing class doesn't know, or care, whether it's serialized in binary, text, or something else. As an aside, there's also an ISerializable interface that adds methods getGuid() and create(). This would be used when shipping class instances around a network, sans reflection (something that DSC will support for clustering).
Mar 10 2004
"Kris" <someidiot earthlink.net> wrote in message news:c2nl32$1mm2$1 digitaldaemon.com | I've been building a multi-layer IO package for DSC (D | Servlet Container) that's fully buffered and generates | zero garbage. | ... Do you know when you could make a release for DSC? ----------------------- Carlos Santander Bernal
Mar 10 2004
Carlos, I would hope to release an Alpha sometime in April. Initial release will not support https, because I don't have that expertise. Any volunteers? <g> - Kris "Carlos Santander B." <carlos8294 msn.com> wrote in message news:c2ok8o$efd$1 digitaldaemon.com..."Kris" <someidiot earthlink.net> wrote in message news:c2nl32$1mm2$1 digitaldaemon.com | I've been building a multi-layer IO package for DSC (D | Servlet Container) that's fully buffered and generates | zero garbage. | ... Do you know when you could make a release for DSC? ----------------------- Carlos Santander Bernal
Mar 10 2004
"Kris" <someidiot earthlink.net> wrote in message news:c2olf2$gn9$1 digitaldaemon.com | Carlos, | | I would hope to release an Alpha sometime in April. | Initial release will not support https, because I don't | have that expertise. Any volunteers? <g> | | - Kris | Not here, sorry. I was just asking because right now I'm in the middle of building an e-business site but using JSP (including https, certificates, signed mail...), so I got curious to know if/how that could be done using DSC. Not that I would use it: deadline is this monday and I still have to figure out how to attach a file to a signed mail, and make Apache find the jar for creating the graphic that will be attached. So, I still have a lot of research to do. ----------------------- Carlos Santander Bernal
Mar 10 2004