digitalmars.D - readf/unformat 1.4 released
- Sean Kelly (27/27) Sep 24 2004 For those of you who don't know, readf began as an attempt at a full
- Ben Hinkle (21/48) Sep 24 2004 the
- Sean Kelly (38/44) Sep 24 2004 Format strings can also specify how to parse the incoming data. Integer...
- Ben Hinkle (23/68) Sep 24 2004 I'm
- Sean Kelly (5/12) Sep 24 2004 Yup. I didn't think about it until just now, but this may come in handy...
- =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= (58/117) Mar 17 2005 I changed this package to break it into
- Sean Kelly (4/15) Mar 17 2005 Nice! Is this version available online?
- =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= (13/17) Mar 17 2005 Not yet, have to clean it up and backport
- Sean Kelly (24/38) Mar 17 2005 I'll have to re-evaluate TypeInfo in DMD. I don't suppose it's working ...
- =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= (27/48) Mar 17 2005 Maybe I explained myself badly. You can of course pass
For those of you who don't know, readf began as an attempt at a full C99-compliant scanf implementation in D. It's since been renamed to match the Phobos writef/format functions a bit more closely, and this version attempts to bring usage a bit closer to readf. What's new in this version: - Support for negative zero and negative infinity (previous version ignored sign in these cases). This version also does not allow the optional sign to appear before "NAN" as IMO it's meaningless. So "+NAN" and "-NAN" will both cause an error. If you don't like this, please let me know. It is contrary to the C99 spec. - Format strings are no longer necessary. Default formats are: %s: char arrays %c: char pointers %i: integer/bit %f: floating point unFormat still will not throw an exception on parameter mismatch, but will return immediately instead. This is the only interface issue I know of where this package diverges from doFormat/writef. The format string parsing is still fully scanf-compliant, so there are some redundant format specifiers. Check the C99 spec or the included text file to get an idea of what specifiers do what. By the way, the code currently assumes input data to be in UTF-8 or UTF-16 (native) format as it uses the Phobos toUTFXX functions for conversion. The package includes a custom utf.d that allows delegates and consists of two implementation files: unformat.d and stdio.d. The format string and all incoming data are converted to UTF-32 before evaluation to facilitate comparison. As usual, please let me know what you think. The file is here: http://home.f4.ca/sean/d/stdio.zip Sean
Sep 24 2004
"Sean Kelly" <sean f4.ca> wrote in message news:cj1of5$cjp$1 digitaldaemon.com...For those of you who don't know, readf began as an attempt at a full C99-compliant scanf implementation in D. It's since been renamed to matchthePhobos writef/format functions a bit more closely, and this versionattempts tobring usage a bit closer to readf. What's new in this version: - Support for negative zero and negative infinity (previous versionignored signin these cases). This version also does not allow the optional sign toappearbefore "NAN" as IMO it's meaningless. So "+NAN" and "-NAN" will bothcause anerror. If you don't like this, please let me know. It is contrary to theC99spec. - Format strings are no longer necessary. Default formats are: %s: char arrays %c: char pointers %i: integer/bit %f: floating point unFormat still will not throw an exception on parameter mismatch, but will return immediately instead. This is the only interface issue I know ofwherethis package diverges from doFormat/writef. The format string parsing isstillfully scanf-compliant, so there are some redundant format specifiers.Check theC99 spec or the included text file to get an idea of what specifiers dowhat.By the way, the code currently assumes input data to be in UTF-8 or UTF-16 (native) format as it uses the Phobos toUTFXX functions for conversion.Thepackage includes a custom utf.d that allows delegates and consists of two implementation files: unformat.d and stdio.d. The format string and all incoming data are converted to UTF-32 before evaluation to facilitate comparison. As usual, please let me know what you think. The file ishere:http://home.f4.ca/sean/d/stdio.zip SeanHow does unFormat take advantage of D's _arguments feature (if it does)? I'm not quite sure why the parsing code needs to think about %s or %i or whatever since it can look at the type of the target variable. If it sees int* then it parses an int and if it sees char[]* it parses a string. The only role of the format would be to specify where to parse and where to match literal characters. -Ben
Sep 24 2004
In article <cj1r40$e0k$1 digitaldaemon.com>, Ben Hinkle says...How does unFormat take advantage of D's _arguments feature (if it does)? I'm not quite sure why the parsing code needs to think about %s or %i or whatever since it can look at the type of the target variable. If it sees int* then it parses an int and if it sees char[]* it parses a string. The only role of the format would be to specify where to parse and where to match literal characters.Format strings can also specify how to parse the incoming data. Integers, for example, have a bunch of different format specifiers for different types of input. I chose "%i" as the default, since it's the most flexible, but "%d" specifies decimal numbers only, "%o" is octal, you can include width specifiers, etc. I also may have forgotten to allow a bit to be parsed as a string (%s) to convert "true" and "false" to 1 and 0, respectively. For a contrived example: In the second case, 0x1 is expected to be a decimal number so the "x" is interpreted as non-numeric. The "%*s" indicates that a string should be read but assignment should be suppressed (which throws out the "x1"), and the final string is read as normal because the format string has been exhausted. So in many cases there's no need to use format specifiers. The code still uses them internally even when one isn't supplied because it simplifies things, but this is all invisible to the programmer. unFormat takes advantage of the _arguments collection by using it to determine what type is being written to (it will return if you try to read a string into an integer, for example--writef would throw a FormatError in this situation), and to determine what to expect if no format string is supplied. You can also do stuff like this: So there's no restriction on the number or the location of format strings. All arguments are evaluated left to right. Sean
Sep 24 2004
"Sean Kelly" <sean f4.ca> wrote in message news:cj1thq$fd3$1 digitaldaemon.com...In article <cj1r40$e0k$1 digitaldaemon.com>, Ben Hinkle says...I'mHow does unFormat take advantage of D's _arguments feature (if it does)?fornot quite sure why the parsing code needs to think about %s or %i or whatever since it can look at the type of the target variable. If it sees int* then it parses an int and if it sees char[]* it parses a string. The only role of the format would be to specify where to parse and where to match literal characters.Format strings can also specify how to parse the incoming data. Integers,example, have a bunch of different format specifiers for different typesofinput. I chose "%i" as the default, since it's the most flexible, but"%d"specifies decimal numbers only, "%o" is octal, you can include widthspecifiers,etc. I also may have forgotten to allow a bit to be parsed as a string(%s) toconvert "true" and "false" to 1 and 0, respectively. For a contrived example: In the second case, 0x1 is expected to be a decimal number so the "x" is interpreted as non-numeric. The "%*s" indicates that a string should bereadbut assignment should be suppressed (which throws out the "x1"), and thefinalstring is read as normal because the format string has been exhausted. So in many cases there's no need to use format specifiers. The code stillusesthem internally even when one isn't supplied because it simplifies things,butthis is all invisible to the programmer. unFormat takes advantage of the _arguments collection by using it todeterminewhat type is being written to (it will return if you try to read a stringintoan integer, for example--writef would throw a FormatError in thissituation),and to determine what to expect if no format string is supplied. You can also do stuff like this: So there's no restriction on the number or the location of format strings.Allarguments are evaluated left to right. Seancool! so a C scanf call scanf("%d %d",&i,&j) can be any of readf("%d %d",&i,&i) readf("%d",&i,"%d",&j) readf(&i,&j); assuming i and j are ints. very nifty.
Sep 24 2004
In article <cj219m$h14$1 digitaldaemon.com>, Ben Hinkle says...cool! so a C scanf call scanf("%d %d",&i,&j) can be any of readf("%d %d",&i,&i) readf("%d",&i,"%d",&j) readf(&i,&j); assuming i and j are ints. very nifty.Yup. I didn't think about it until just now, but this may come in handy for internationalization, since the whole "%1" concept that's been talked about can be faked just by reordering parameters. Sean
Sep 24 2004
Sean Kelly wrote: (back in 2004-09-24, that was)For those of you who don't know, readf began as an attempt at a full C99-compliant scanf implementation in D. It's since been renamed to match the Phobos writef/format functions a bit more closely, and this version attempts to bring usage a bit closer to readf.[...]unFormat still will not throw an exception on parameter mismatch, but will return immediately instead. This is the only interface issue I know of where this package diverges from doFormat/writef.I changed this package to break it into std.stdio.readf and std.string.unformat... I also made it throw Exceptions on % FormatError and parameter mismatch e.g. not passing pointers The missing TypeInfo for pointers, and the fact that you cannot pass _arguments and _argptr around with GDC without losing info makes it a horrible kludge at the moment - but it does work! ;-) (currently unformat only works from within std.string, though...) Sean Kelly's old declaration of unFormat was:int unFormat( bit delegate( out dchar ) getc, bit delegate( dchar ) ungetc, TypeInfo[] arguments, void* argptr )This was changed to use EOF and lose the "bit":void unFormat( dchar delegate() getc, dchar delegate(dchar) ungetc, TypeInfo[] arguments, va_list argptr, Mangle[] mangle = null, Mangle[] mangle2 = null)(last two parameters being part of the GDC kludge, you should already know "va_list" from std.stdarg:)version (GNU) { // va_list might be a pointer, but assuming so is not portable. private import gcc.builtins; alias __builtin_va_list va_list; } else { alias void* va_list; }You only provide two delegates: getc and ungetc, which are very similar to their C counterparts... (making the wrappers for fgetc and ungetc simple) dchar getc(); dchar ungetc(dchar c); Should an EOF occur, the "new" versions now returns a cast(dchar) std.c.stdio.EOF, or: 0xFFFFFFFF as UTF-32. (which is not a valid code point, and thus "safe" here) Otherwise, the internals work more or less as before (except that it doesn't internalize the exceptions...) Here is the "ideal" version of std.string.unformat, ignoring the current GDC Mangling preprocessing hacks:void unformat(char[] s, ...) { size_t idx = 0, old_idx; dchar getc() { old_idx = idx; if (idx >= s.length) return cast(dchar) EOF; return std.utf.decode(s, idx); } dchar ungetc(dchar c) { idx = old_idx; return c; } std.unformat.unFormat(&getc, &ungetc, _arguments, _argptr); }You can use this as: (very similar to "format") int i, j; unformat("1 2", "%d %d",&i,&i) unformat("1 2", "%d",&i,"%d",&j) unformat("1 2", &i,&j); assert(i == 1 && j == 2); Since i and j are int's, it'll default to "%d". Then there is the readf function, which also works as expected:import std.stdio; void main() { char[] s; write("What's is your name: "); readf("%s", &s); writefln("Hello, %s!", s); }Which inputs/outputs something like: What's is your name: Anders Hello, Anders! (yes, this is the actual D program) Note that if you pass "s" instead of "&s", an Error will be thrown... (this should stop the usual scanf bugs, with forgetting to &-prefix ?) The program also uses the formatless version of writef called "write", which doesn't treat '%' characters special but just prints them out... Once the new version of GDC is out, I will try to see if the TypeInfo passing can't be fixed for that compiler too and then post some code.* Copyright (C) 2004 by Sean Kelly * Copyright (C) 2005 by Anders F Bjoerklund * * Permission to use, copy, modify, distribute and sell this software * and its documentation for any purpose is hereby granted without fee, * provided that the above copyright notice appear in all copies and * that both that copyright notice and this permission notice appear * in supporting documentation. Author makes no representations about * the suitability of this software for any purpose. It is provided * "as is" without express or implied warranty.Original file came from: http://home.f4.ca/sean/d/stdio.zip (had the Open Source license agreement being duplicated above) Thanks to Sean for doing the grunt-work with format parsing, so we (still) don't have to rename it to "std.stdo" anymore :-) --anders PS. Yes, it uses pointers. Kris has already written C++-style bitshift-operator overloads for people who want that... ? http://svn.dsource.org/svn/projects/mango/trunk/doc/html/classAbstractReader.html http://svn.dsource.org/svn/projects/mango/trunk/doc/html/classAbstractWriter.html When (and if) D supports "out" arguments for variadic lists, the code can be changed to support those "out" vars instead. Although it would then also need some kind of R/O attribute to able to differentiate between format strings and string params? Meanwhile, the pointers work just fine (and it checks the types!)
Mar 17 2005
In article <d1bhnv$ek6$1 digitaldaemon.com>, =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= says...Sean Kelly wrote: (back in 2004-09-24, that was)Nice! Is this version available online? SeanFor those of you who don't know, readf began as an attempt at a full C99-compliant scanf implementation in D. It's since been renamed to match the Phobos writef/format functions a bit more closely, and this version attempts to bring usage a bit closer to readf.[...]unFormat still will not throw an exception on parameter mismatch, but will return immediately instead. This is the only interface issue I know of where this package diverges from doFormat/writef.I changed this package to break it into std.stdio.readf and std.string.unformat...
Mar 17 2005
Sean Kelly wrote:Not yet, have to clean it up and backport it back into the DMD release again... (currently it's done to a tweaked GDC, you see) And it *really* wants TypeInfo? Main reason was to get opinions on: 1) changed getc delegate definitions 2) throwing on exceptions on errors Also, I still need to write the multibyte wrappers of getc/ungetc for file based streams (regular, as well as the wide orientation kind) --andersI changed this package to break it into std.stdio.readf and std.string.unformat...Nice! Is this version available online?
Mar 17 2005
In article <d1co48$1nlr$1 digitaldaemon.com>, =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= says...Sean Kelly wrote:I'll have to re-evaluate TypeInfo in DMD. I don't suppose it's working yet for pointer types? And why in the world doesn't GDC properly generate copies of the TypeInfo array?Not yet, have to clean it up and backport it back into the DMD release again... (currently it's done to a tweaked GDC, you see) And it *really* wants TypeInfo?I changed this package to break it into std.stdio.readf and std.string.unformat...Nice! Is this version available online?Main reason was to get opinions on: 1) changed getc delegate definitionsI mostly created the getc/ungetc specs as they were because they were easier to embed in boolean expressions. ie. if( !getc( ch ) ) return; is easier to write than: if( ( ch = getc() ) == WEOF ) return; I figured the C function would need to be wrapped either way, so this seemed a decent gain. Especially since I think the reason the C routines are written the way they are is because C lacks an output qualifier. But it's mostly a cosmetic issue, so it doesn't matter much to me either way.2) throwing on exceptions on errorsI mostly didn't do this with my version of the functions because I thought it made sense that unFormat should throw the same exceptions as format, but doing so created a dependency I wasn't happy with for an add-on library. If this stuff made it into Phobos I fully support the idea of consistency between the functions. This fix should be pretty easy anyway, as it just amounts to putting a "throw" in the necessary catch blocks at the bottom of the unFormat implementation (unFormat uses exceptions internally for flow control).Also, I still need to write the multibyte wrappers of getc/ungetc for file based streams (regular, as well as the wide orientation kind)My release used the same wrappers for file i/o as it used for file i/o. Are these functions not available in Linux? Sean
Mar 17 2005
Sean Kelly wrote:I'll have to re-evaluate TypeInfo in DMD. I don't suppose it's working yet for pointer types?Nope, they are all of the "TypeInfo" base class... :-(And why in the world doesn't GDC properly generate copies of the TypeInfo arrayMaybe I explained myself badly. You can of course pass _arguments and _argptr off to subroutines. It is just that "arguments[i] is typeid(int*)" will no longer work... The identity is lost, when doing the workaround like that. It still works, if they are done against the original _arguments and in the same module (I'm a little shady on the details why that is so, just *that* it is so...) If the TypeInfo/typeid was working, all would be cool.I figured the C function would need to be wrapped either way, so this seemed a decent gain. Especially since I think the reason the C routines are written the way they are is because C lacks an output qualifier. But it's mostly a cosmetic issue, so it doesn't matter much to me either way.The read/write functions in std.stream work like you describe, with out parameters (they throw Exceptions on EOF, instead of return a bit, but that's just a matter of preference...) Just thought that "dchar getc()" was a better match for "putc(dchar)", and that there seemed to be a lot of checking for eof spread out in the code ? That's all.I mostly didn't do this with my version of the functions because I thought it made sense that unFormat should throw the same exceptions as format, but doing so created a dependency I wasn't happy with for an add-on library.Yeah, it did meant hacking a few things in std.format... And there are some *nasty* circular dependencies going on, and double if not trouble defines of things like "stdin" and "va_list" Had to resort to e.g. "alias std.stdarg.va_list va_list;"If this stuff made it into Phobos I fully support the idea of consistency between the functions. This fix should be pretty easy anyway, as it just amounts to putting a "throw" in the necessary catch blocks at the bottom of the unFormat implementation (unFormat uses exceptions internally for flow control).I did leave the overflow checks in, but most should be passed further ?Yes, I just didn't loop over the bytes to reassemble UTF-32 (yet) Either way, readf and unformat *definitely* have a place in a future release of Phobos - next to writef and format... Just need to get the TypeInfo stuff completed first ? (the major part being adding ti for pointer types...) --andersAlso, I still need to write the multibyte wrappers of getc/ungetc for file based streams (regular, as well as the wide orientation kind)My release used the same wrappers for file i/o as it used for file i/o. Are these functions not available in Linux?
Mar 17 2005