digitalmars.D.learn - sscanf() and using \ in format = no workie?
- AEon (19/19) Apr 04 2005 Until D supports a native version of sscanf(), I have been experimenting
- Regan Heath (35/54) Apr 04 2005 Do you have a copy of MSDN? or similar ANSI C documentation? If not, fin...
- AEon (24/67) Apr 04 2005 Thank you for taking the time to explicitly code the above.
- Regan Heath (16/71) Apr 04 2005 "MicroSoft Developer Network" it's the documentation that comes with MS ...
- AEon (23/42) Apr 07 2005 Nope... the documentation I have seem to be quite old and pretty close
- Regan Heath (18/57) Apr 07 2005 You might be right: "Note that %[a-z] and %[z-a] are interpreted as
Until D supports a native version of sscanf(), I have been experimenting with the command some more, and noted that defining a sscanf format that contains a \ will fail: char[] line = " 0:00 InitGame: \gamename\baseq3\blah\\mapname\q3tourney2\protocol\" char[] junk, map; char[] mtempl = "\\mapname\\%s\\"; junk.length = line.length; map.length = line.length; int ret = sscanf( line, mtempl, map.ptr ); printf("\n \"%.*s\"\n -> Ret: %d Map: \"%.*s\" Junk: \"%.*s\"\n", line, ret, map, junk ); char[] mtempl = "Initgame: %s\\mapname\\%s\\"; int ret = sscanf( line, mtempl, junk.ptr, map.ptr ); Neither of the above to mtempl (templates) will recognize the line above and grab the mapname "q3tourney2". IIRC I had the exact same problem long ago, and then gave up on sscanf. :( Any idea what I am doing wrong? AEon
Apr 04 2005
On Mon, 04 Apr 2005 09:01:21 +0200, AEon <aeon2001 lycos.de> wrote:Until D supports a native version of sscanf(), I have been experimenting with the command some more, and noted that defining a sscanf format that contains a \ will fail: char[] line = " 0:00 InitGame: \gamename\baseq3\blah\\mapname\q3tourney2\protocol\" char[] junk, map; char[] mtempl = "\\mapname\\%s\\"; junk.length = line.length; map.length = line.length; int ret = sscanf( line, mtempl, map.ptr ); printf("\n \"%.*s\"\n -> Ret: %d Map: \"%.*s\" Junk: \"%.*s\"\n", line, ret, map, junk ); char[] mtempl = "Initgame: %s\\mapname\\%s\\"; int ret = sscanf( line, mtempl, junk.ptr, map.ptr ); Neither of the above to mtempl (templates) will recognize the line above and grab the mapname "q3tourney2". IIRC I had the exact same problem long ago, and then gave up on sscanf. :( Any idea what I am doing wrong?Do you have a copy of MSDN? or similar ANSI C documentation? If not, find some :) From the MSDN sscanf docs: "%s: String, up to first white-space character (space, tab or newline). To read strings not delimited by space characters, use set of square brackets ([ ]), as discussed following Table R.7." So, "%s" means characters up until the next space. You can use "%[a-z0-9]" to stop on any character other than those specified within the []. So... import std.c.stdio; import std.string; void main() { char[] line = "0:00 InitGame: \\gamename\\baseq3\\blah\\mapname\\q3tourney2\\protocol\\"; char[] mtempl = "\\mapname\\%[a-z0-9]"; char[] junk, map; int ret; map.length = line.length; junk.length = line.length; ret = line.find("\\mapname"); ret = sscanf( toStringz(line[ret..$]), toStringz(mtempl), map.ptr ); printf("\n \"%.*s\"\n -> Ret: %d Map: \"%.*s\"\n", line, ret, map); } should get you what you want. Notes: - I have used toStringz, this function converts a D string into a C string. Now, technically it's not required above because static strings in D have a null terminator appended (not sure why). - I have used 'find' to find the \\mapname part of the string (and done no error checking). Regan
Apr 04 2005
Regan Heath wrote:Do you have a copy of MSDN? or similar ANSI C documentation? If not, find some :)MSDN ? I have The Waite Group's Essential Guide to ANSI C.From the MSDN sscanf docs: "%s: String, up to first white-space character (space, tab or newline). To read strings not delimited by space characters, use set of square brackets ([ ]), as discussed following Table R.7." So, "%s" means characters up until the next space. You can use "%[a-z0-9]" to stop on any character other than those specified within the []. So... import std.c.stdio; import std.string; void main() { char[] line = "0:00 InitGame: \\gamename\\baseq3\\blah\\mapname\\q3tourney2\\protocol\\"; char[] mtempl = "\\mapname\\%[a-z0-9]"; char[] junk, map; int ret; map.length = line.length; junk.length = line.length; ret = line.find("\\mapname"); ret = sscanf( toStringz(line[ret..$]), toStringz(mtempl), map.ptr ); printf("\n \"%.*s\"\n -> Ret: %d Map: \"%.*s\"\n", line, ret, map); } should get you what you want. Notes: - I have used toStringz, this function converts a D string into a C string. Now, technically it's not required above because static strings in D have a null terminator appended (not sure why). - I have used 'find' to find the \\mapname part of the string (and done no error checking).Thank you for taking the time to explicitly code the above. I actually did use a find command to quickly do the above only using D string commands. The above was more a general application test for sscanf() to write a programmable parser. Alas "\\mapname\\%[a-z0-9]" is just not good enough, since the names could contain just about any character < ASCII 127. And I noticed something else: " 34:03 Kill: 1 0 7: AEon - gXp killed pezen by MOD_ROCKET_SPLASH" A log line that contains blanks in names, will completely messup a sscanf() with the format: int ret = sscanf( line,"%d:%02d Kill: %d %d %d: %s"~ " killed " ~ "%s" ~ "by MOD_ROCKET_SPLASH", &min,&sec,&pl1,&pl2,&mod, fragger.ptr,fragged.ptr ); fragger will be "AEon" and not "AEon - gXp". I then noted that at the very least I would need "%[a-zA-Z0-9]", but when I add "%[a-zA-Z 0-9]" (a blank) to allow for names with blanks. sscanf totally messed up the read. Seems like there are certain limitations of the sscanf usefuleness. :( Slowly I seem to recally why I dropped the use of sscanf(), for the reasons above, only by now I had forgotten them. Hmpf. Back to my old code where I hack the lines appart by hand. AEon
Apr 04 2005
On Mon, 04 Apr 2005 11:20:52 +0200, AEon <aeon2001 lycos.de> wrote:Regan Heath wrote:"MicroSoft Developer Network" it's the documentation that comes with MS Visual Studio. Does your guide have the documentation to sscanf? Something like what I quoted below about %s and %[a-z]?Do you have a copy of MSDN? or similar ANSI C documentation? If not, find some :)MSDN ? I have The Waite Group's Essential Guide to ANSI C.NP. I just copy/pasted your code and tried to make it work.From the MSDN sscanf docs: "%s: String, up to first white-space character (space, tab or newline). To read strings not delimited by space characters, use set of square brackets ([ ]), as discussed following Table R.7." So, "%s" means characters up until the next space. You can use "%[a-z0-9]" to stop on any character other than those specified within the []. So... import std.c.stdio; import std.string; void main() { char[] line = "0:00 InitGame: \\gamename\\baseq3\\blah\\mapname\\q3tourney2\\protocol\\"; char[] mtempl = "\\mapname\\%[a-z0-9]"; char[] junk, map; int ret; map.length = line.length; junk.length = line.length; ret = line.find("\\mapname"); ret = sscanf( toStringz(line[ret..$]), toStringz(mtempl), map.ptr ); printf("\n \"%.*s\"\n -> Ret: %d Map: \"%.*s\"\n", line, ret, map); } should get you what you want. Notes: - I have used toStringz, this function converts a D string into a C string. Now, technically it's not required above because static strings in D have a null terminator appended (not sure why). - I have used 'find' to find the \\mapname part of the string (and done no error checking).Thank you for taking the time to explicitly code the above.I actually did use a find command to quickly do the above only using D string commands. The above was more a general application test for sscanf() to write a programmable parser.I understand.Alas "\\mapname\\%[a-z0-9]" is just not good enough, since the names could contain just about any character < ASCII 127.Hmm...And I noticed something else: " 34:03 Kill: 1 0 7: AEon - gXp killed pezen by MOD_ROCKET_SPLASH" A log line that contains blanks in names, will completely messup a sscanf() with the format: int ret = sscanf( line,"%d:%02d Kill: %d %d %d: %s"~ " killed " ~ "%s" ~ "by MOD_ROCKET_SPLASH", &min,&sec,&pl1,&pl2,&mod, fragger.ptr,fragged.ptr ); fragger will be "AEon" and not "AEon - gXp". I then noted that at the very least I would need "%[a-zA-Z0-9]", but when I add "%[a-zA-Z 0-9]" (a blank) to allow for names with blanks. sscanf totally messed up the read. Seems like there are certain limitations of the sscanf usefuleness. :(Indeed. Can you change the format of the log lines? If so, choose a delimiter that cannot appear in the log data and use that to delimit the fields. If you cannot then scan for the keywords i.e. "Kill: %d %d d:" <here be player name> "killed" <here be opponent name> "by" <here be device> (you're probably doing this already, right?)Slowly I seem to recally why I dropped the use of sscanf(), for the reasons above, only by now I had forgotten them. Hmpf. Back to my old code where I hack the lines appart by hand.Have you tried the regexp library? std.regexp. I'm no regexp expert but you might be able to use the "split" function in std.regexp to parse your log lines. Not sure. Regan
Apr 04 2005
Regan Heath wrote:Nope... the documentation I have seem to be quite old and pretty close to the original ANSI C definitions, without any more modern thrills (like C99 e.g.). The above syntax is not mentioned %[a-z], that was the reason it surprised me. Are you sure this actually *is* ANSI C, and not some latter addition, that D just "happens" to also support?Regan Heath wrote:"MicroSoft Developer Network" it's the documentation that comes with MS Visual Studio. Does your guide have the documentation to sscanf? Something like what I quoted below about %s and %[a-z]?Do you have a copy of MSDN? or similar ANSI C documentation? If not, find some :)MSDN ? I have The Waite Group's Essential Guide to ANSI C.Alas I can't change the logs, these are from more than 30 or so games, and every game engine comes along with its own format. Some log files like those from half-life e.g. put names in double quotes, making them a *lot* easier to parse.Seems like there are certain limitations of the sscanf usefuleness. :(Indeed. Can you change the format of the log lines?If so, choose a delimiter that cannot appear in the log data and use that to delimit the fields. If you cannot then scan for the keywords i.e. "Kill: %d %d d:" <here be player name> "killed" <here be opponent name> "by" <here be device> (you're probably doing this already, right?)I am... that was the reason I was surprised sscanf did not work. E.g. using the key words should delimit the %s marked player names in a completely obvious way. But it does not. I will probably write up my own function that does something like sscanf, but in a more "solid" way. I noted that sscanf works quite well with numbers %d "1234:32 Kill" " 1234:32Kill" will both work just file with a "%d:%d Kill" format.Have you tried the regexp library? std.regexp. I'm no regexp expert but you might be able to use the "split" function in std.regexp to parse your log lines. Not sure.A split with the "keywords" would already help. But indeed regular expressions is still something I need to look into. IIRC was there not some discussion that regular expressions were not yet fully implemented in D? AEon
Apr 07 2005
On Thu, 07 Apr 2005 10:43:41 +0200, AEon <aeon2001 lycos.de> wrote:Regan Heath wrote:You might be right: "Note that %[a-z] and %[z-a] are interpreted as equivalent to %[abcde...z]. This is a common scanf function extension, but note that the ANSI standard does not require it." It's important to note that D technically doesn't 'support' it, D is link compatible with C, and the C libraries that come with D i.e. the Digital Mars ones, 'support' it. When you call ANSI C functions from D you're calling C functions. Hence the problems with char[] which is not a C string.Nope... the documentation I have seem to be quite old and pretty close to the original ANSI C definitions, without any more modern thrills (like C99 e.g.). The above syntax is not mentioned %[a-z], that was the reason it surprised me. Are you sure this actually *is* ANSI C, and not some latter addition, that D just "happens" to also support?Regan Heath wrote:"MicroSoft Developer Network" it's the documentation that comes with MS Visual Studio. Does your guide have the documentation to sscanf? Something like what I quoted below about %s and %[a-z]?Do you have a copy of MSDN? or similar ANSI C documentation? If not, find some :)MSDN ? I have The Waite Group's Essential Guide to ANSI C.Those half-life guys have prooven to be quite smart on a number of occasions, one more doesn't surprise me.Alas I can't change the logs, these are from more than 30 or so games, and every game engine comes along with its own format. Some log files like those from half-life e.g. put names in double quotes, making them a *lot* easier to parse.Seems like there are certain limitations of the sscanf usefuleness. :(Indeed. Can you change the format of the log lines?Have you tried: http://www.digitalmars.com/drn-bin/wwwnews?digitalmars.D/19835If so, choose a delimiter that cannot appear in the log data and use that to delimit the fields. If you cannot then scan for the keywords i.e. "Kill: %d %d d:" <here be player name> "killed" <here be opponent name> "by" <here be device> (you're probably doing this already, right?)I am... that was the reason I was surprised sscanf did not work. E.g. using the key words should delimit the %s marked player names in a completely obvious way. But it does not. I will probably write up my own function that does something like sscanf, but in a more "solid" way.I noted that sscanf works quite well with numbers %d "1234:32 Kill" " 1234:32Kill" will both work just file with a "%d:%d Kill" format.Strings can contain numbers, numbers cannot contain letters, numbers are easy :)There was a lot of discussion about further integrating them, but I believe the std.regexp library does a fairly good job already. ReganHave you tried the regexp library? std.regexp. I'm no regexp expert but you might be able to use the "split" function in std.regexp to parse your log lines. Not sure.A split with the "keywords" would already help. But indeed regular expressions is still something I need to look into. IIRC was there not some discussion that regular expressions were not yet fully implemented in D?
Apr 07 2005