digitalmars.D - unFormat marginally complete
- Sean Kelly (52/52) Jul 29 2004 http://home.f4.ca/sean/d/unformat.d
- Sean Kelly (4/5) Aug 04 2004 I just realized I'd misread a part of the scanf spec. I've fixed the
- pragma (15/19) Aug 05 2004 Looks pretty useful. I like it. I haven't had a chance to run with it ...
- Sean Kelly (9/29) Aug 05 2004 Everything is done internally in terms of dchars, so hopefully the funct...
- Arcane Jill (7/12) Aug 05 2004 Nope, whitespace is locale independent. You only have to import
- Sean Kelly (19/29) Aug 06 2004 By the way. I like that doFormat doesn't require a format string at all...
http://home.f4.ca/sean/d/unformat.d The D compiler is currently a bit weird with templates and stdarg so to use unformat.d in 0.97 you have to compile in std.format.d as well. If anyone feels inclined to play with it, please let me know if sutff is broken, you'd like the exceptions to match doFormat, etc. Prototypes: int unFormat( bit delegate( out dchar ) getc, bit delegate( dchar ) ungetc, TypeInfo[] arguments, void* argptr ); int sreadf( ... ); // first va_arg is string, second is format int freadf( FILE* buf, ... ); // first va_arg is format int readf( ... ); // first va_arg is format (console input) Ways in which unFormat differs from vscanf (and possibly doFormat): - The format string can be either UTF-8, UTF-16, or UTF-32. - If there is a mismatch between the arguments and the format specification, the function will return and will not evaluate the rest of the format string. - unFormat will return prematurely on an input failure (if get returns false), an argument mismatch, or a UTF conversion error. UtfError exceptions will not be passed out of the function. For reference, the conversion specifiers are: d, u: An optionally signed decimal integer. i: An optionally signed integer. Base can be decimal, hex, or octal and will be detected automatically. If the input is preceded by 0x or 0X then the number will be interpreted as hex. If the input is preceded only by 0 then the number will be interpreted as octal. Any other value will be interpreted as decimal. o: An optionally signed octal integer. x, X: An optionally signed hex integer. a, e, f, g A, E, F, G: An optionally signed floating point number, infinity, or NaN. Examples: 1 -5.6 1.2e5 0x3p-2 0X1234 NAN INF infinity c: A single UTF-32 character, or sequence of characters if the width modifier is present. s: A sequence of non-whitespace characters. [: Defines a scanset. Contents can be single characters or a range indicated by a hyphen. Examples: [a-z] indicates the set of numeric values between a and z, inclusive. [abc123] indicates the characters a, b, c, 1, 2, and 3. p: A pointer in hex format without the leading 0x. n: Returns the number of UTF-32 characters read from the input stream. %: Matches a single % character.
Jul 29 2004
Sean Kelly wrote:http://home.f4.ca/sean/d/unformat.dI just realized I'd misread a part of the scanf spec. I've fixed the code and re-uploaded it with another unit test. Sean
Aug 04 2004
In article <ces7ck$s6f$1 digitaldaemon.com>, Sean Kelly says...Sean Kelly wrote:Looks pretty useful. I like it. I haven't had a chance to run with it myself, so I'll have to ask: do you have any provisions for reading or handling whitespace? One critique though: why check all your exception instances (Underflow, BadFmt, etc) for each call of unFormat? You can set all these up ahead of time in a static block outside your function, without breaking encapsulation too badly. That way you can prevent redundant allocations (which you've already done) plus eliminate all those extra "if" statements. :) - Pragmahttp://home.f4.ca/sean/d/unformat.dI just realized I'd misread a part of the scanf spec. I've fixed the code and re-uploaded it with another unit test.
Aug 05 2004
In article <cetftt$1nd2$1 digitaldaemon.com>, pragma <EricAnderton at yahoo dot com> says...In article <ces7ck$s6f$1 digitaldaemon.com>, Sean Kelly says...Everything is done internally in terms of dchars, so hopefully the functions will be able to correctly recognize all whitespace chars. I know there may also be some locale dependent whitespace sequences (Jill?) but as D doesn't have any concept of locales yet, that will have to wait.Sean Kelly wrote:Looks pretty useful. I like it. I haven't had a chance to run with it myself, so I'll have to ask: do you have any provisions for reading or handling whitespace?http://home.f4.ca/sean/d/unformat.dI just realized I'd misread a part of the scanf spec. I've fixed the code and re-uploaded it with another unit test.One critique though: why check all your exception instances (Underflow, BadFmt, etc) for each call of unFormat? You can set all these up ahead of time in a static block outside your function, without breaking encapsulation too badly. That way you can prevent redundant allocations (which you've already done) plus eliminate all those extra "if" statements. :)Good point. I think I'm still in a C++ mindset as far as statics are concerned. I'll make this change today :) Sean
Aug 05 2004
In article <ceti16$1oa9$1 digitaldaemon.com>, Sean Kelly says...Everything is done internally in terms of dchars, so hopefully the functions will be able to correctly recognize all whitespace chars. I know there may also be some locale dependent whitespace sequences (Jill?)Nope, whitespace is locale independent. You only have to import etc.unicode.unicode and call isWhitespace(dchar). But I'd suggest waiting until next week because I'm planning to finally get the linkable library + header files together this weekend, which will make things somewhat easier for you.but as D doesn't have any concept of locales yet, that will have to wait.It will have soon, but as I said, it's not relevant to whitespace. Arcane Jill
Aug 05 2004
In article <cetftt$1nd2$1 digitaldaemon.com>, pragma <EricAnderton at yahoo dot com> says...In article <ces7ck$s6f$1 digitaldaemon.com>, Sean Kelly says...By the way. I like that doFormat doesn't require a format string at all. Since I was working off the scanf spec I didn't do anything about that with unFormat. I assume that doFormat can handle things like this: doFormat( &get, "hello world", 1, "%d", 2 ); and would print: hello world12 I suppose the equivalent bit for unFormat would be: char[] buf; int x, y; float f; unFormat( &get, &unget, &buf, &x, "%2d", &y, &f ); which would read a string, an integer, an int with width 2, and a float. The only thing I don't know offhand is if I can tell a char** from a char* using TypeInfo (for %p). In any case, would people like this syntax rather than having to specify a format string? I think I may start on it today just to see how it goes. SeanSean Kelly wrote:Looks pretty useful. I like it. I haven't had a chance to run with it myself, so I'll have to ask: do you have any provisions for reading or handling whitespace?http://home.f4.ca/sean/d/unformat.dI just realized I'd misread a part of the scanf spec. I've fixed the code and re-uploaded it with another unit test.
Aug 06 2004