digitalmars.D.learn - String Parsing with \" in a ".." text line
- AEon (27/27) Mar 20 2005 I started to code by parser, a *lot* easier with D, commands like
- Regan Heath (6/10) Mar 20 2005 I think you have to write your own version of split, one that allows
- AEon (12/20) Mar 21 2005 :)... will take a while to get a useful version written, since I am stil...
- Stewart Gordon (14/45) Mar 21 2005 Is this a third-party file format? If not, why not define a format
- AEon (9/43) Mar 21 2005 True... it is totally up to me to define the format, I felt that was the...
- Stewart Gordon (8/14) Mar 21 2005 I actually meant the find and rfind (oops, where did findr come from?)
- David Medlock (7/47) Mar 21 2005 Why not just use an existing scripting language for your configuration
- Stewart Gordon (11/16) Mar 21 2005 Around two years ago I invented a configuration language called
- AEon (4/9) Mar 21 2005 Is that not a tad overkill... I only want to define a few variables and ...
- Regan Heath (17/28) Mar 21 2005 The simplest possible format...
- AEon (16/29) Mar 21 2005 Hmm... sure that would work, but space is definately something that need...
- Regan Heath (20/53) Mar 21 2005 In the label/setting name?
- Derek Parnell (210/249) Mar 21 2005 I have a module that will 'tokenize' lines that will probably suit your
- AEon (5/7) Mar 22 2005 What kind of format it that. Looks like something similar to tar, shar o...
- Regan Heath (6/12) Mar 22 2005 It's uuencoded. Search the web for a utility commonly called uudecode. O...
- J C Calvarese (7/16) Mar 22 2005 Decoding the fun way:
- Derek Parnell (9/17) Mar 22 2005 It ain't "my" format but a very commonly used one - UUEncode. Most news
- AEon (5/18) Mar 22 2005 I had not been suggesting you invented it ;)...
- AEon (74/75) Mar 22 2005 A few questions about your code:
- Derek Parnell (78/170) Mar 22 2005 The D method to resolve such ambiguities is to fully qualify the referen...
- AEon (19/19) Mar 22 2005 Derek Parnell,
- J C Calvarese (22/27) Mar 22 2005 I'm not sure what you mean by "archive". It's not like Walter clears out...
- AEon (13/31) Mar 23 2005 True enough, and now that I finally have a newsreader installed,
- J C Calvarese (15/39) Mar 23 2005 I made that transition myself a few years ago (web forums ->
- AEon (16/18) Mar 25 2005 :9
I started to code by parser, a *lot* easier with D, commands like std.string.split work mirracles. But I am still wondering how to optimize parsing, in this case of a configuration file: <code> // comments [General] game "Quake III Arena" gameInfo "Retail, Rocket Arena III, Q3: Team Arena" gameOpt "-q3a" // *** comment gameMode "16" // comments </code> I do a std.string.find(line, "game") to find out if the line contains my key-variable. And then a char[][] splitLine = std.string.split(line, "\""); accessing the value of the var of interest via splitLine[1] Now that is fine and dandy. But when I want to allow the user to use double quotes (") in the config file, this will turn ugly, since the above split does not differ between " and \". Any ideas how to elegantly read the var/value pairs should the value contain a \"? (In C I did some very evil manual hacking to make that work). Thanx. AEon
Mar 20 2005
On Mon, 21 Mar 2005 01:27:31 +0000 (UTC), AEon <AEon_member pathlink.com> wrote:Any ideas how to elegantly read the var/value pairs should the value contain a \"? (In C I did some very evil manual hacking to make that work).I think you have to write your own version of split, one that allows "escaped" characters. Once written I'd recommend it for inclusion into std.string. Regan
Mar 20 2005
Regan Heath says...:)... will take a while to get a useful version written, since I am still learning about all the goodies in std.string. Basically what could be useful would be a char[][] splitx(char[] stringtosplit, char[] delimiter, char[] non-delimiters) of sorts: splitx( line, "\"", "\\\""); A simpler solution would be to use another delimiter in my config files. But that would leave the problem, that any delimiter could also be needed in the text. If I find anything useful, will post the code. AEonAny ideas how to elegantly read the var/value pairs should the value contain a \"? (In C I did some very evil manual hacking to make that work).I think you have to write your own version of split, one that allows "escaped" characters. Once written I'd recommend it for inclusion into std.string.
Mar 21 2005
AEon wrote:I started to code by parser, a *lot* easier with D, commands like std.string.split work mirracles. But I am still wondering how to optimize parsing, in this case of a configuration file: <code> // comments [General] game "Quake III Arena" gameInfo "Retail, Rocket Arena III, Q3: Team Arena" gameOpt "-q3a" // *** comment gameMode "16" // comments </code>Is this a third-party file format? If not, why not define a format that's that little bit easier to parse? I'd be inclined to go for something resembling Windows .ini files. But if you still want to do it this way....I do a std.string.find(line, "game") to find out if the line contains my key-variable.Which won't work if "game" is somewhere in the value, not in the key. How about checking whether the line _begins_ with "game"?And then a char[][] splitLine = std.string.split(line, "\""); accessing the value of the var of interest via splitLine[1] Now that is fine and dandy. But when I want to allow the user to use double quotes (") in the config file, this will turn ugly, since the above split does not differ between " and \".<snip> By using split for this you're making life difficult for yourself. How about just picking out the first and last quotes, using find and findr? Stewart. -- My e-mail is valid but not my primary mailbox. Please keep replies on the 'group where everyone may benefit.
Mar 21 2005
Stewart Gordon... good pointsTrue... it is totally up to me to define the format, I felt that was the easiest way to format the cfg file, and is easy to read.<code> // comments [General] game "Quake III Arena" gameInfo "Retail, Rocket Arena III, Q3: Team Arena" gameOpt "-q3a" // *** comment gameMode "16" // comments </code>Is this a third-party file format? If not, why not define a format that's that little bit easier to parse? I'd be inclined to go for something resembling Windows .ini files. But if you still want to do it this way....I had thought of that, but then forgot to check for it. Sigh :)I do a std.string.find(line, "game") to find out if the line contains my key-variable.Which won't work if "game" is somewhere in the value, not in the key. How about checking whether the line _begins_ with "game"?Well as long as there is no \" in the line, split will do the job much quicker. Just checked, you are talking regular expression. Still need to learn about those. AEonAnd then a char[][] splitLine = std.string.split(line, "\""); accessing the value of the var of interest via splitLine[1] Now that is fine and dandy. But when I want to allow the user to use double quotes (") in the config file, this will turn ugly, since the above split does not differ between " and \".<snip> By using split for this you're making life difficult for yourself. How about just picking out the first and last quotes, using find and findr?
Mar 21 2005
AEon wrote: <snip>I actually meant the find and rfind (oops, where did findr come from?) in std.string, not std.regexp. Stewart. -- My e-mail is valid but not my primary mailbox. Please keep replies on the 'group where everyone may benefit.By using split for this you're making life difficult for yourself. How about just picking out the first and last quotes, using find and findr?Well as long as there is no \" in the line, split will do the job much quicker. Just checked, you are talking regular expression. Still need to learn about those.
Mar 21 2005
AEon wrote:I started to code by parser, a *lot* easier with D, commands like std.string.split work mirracles. But I am still wondering how to optimize parsing, in this case of a configuration file: <code> // comments [General] game "Quake III Arena" gameInfo "Retail, Rocket Arena III, Q3: Team Arena" gameOpt "-q3a" // *** comment gameMode "16" // comments </code> I do a std.string.find(line, "game") to find out if the line contains my key-variable. And then a char[][] splitLine = std.string.split(line, "\""); accessing the value of the var of interest via splitLine[1] Now that is fine and dandy. But when I want to allow the user to use double quotes (") in the config file, this will turn ugly, since the above split does not differ between " and \". Any ideas how to elegantly read the var/value pairs should the value contain a \"? (In C I did some very evil manual hacking to make that work). Thanx. AEonWhy not just use an existing scripting language for your configuration files? I would recommend Small (http://www.compuphase.com/small.htm) or Lua (http://www.lua.org/). This scripting language would be useful within your game as well. -David
Mar 21 2005
David Medlock wrote: <snip>Why not just use an existing scripting language for your configuration files? I would recommend Small (http://www.compuphase.com/small.htm) or Lua (http://www.lua.org/).Around two years ago I invented a configuration language called Configur8. It's basically a slightly more powerful version of Windows INI files (with one or two syntactical differences). It's no match for a scripting language, but is perfect for stuff like the above appears to be. I haven't yet created a D interface, but I plan to do it at some point. Stewart. -- My e-mail is valid but not my primary mailbox. Please keep replies on the 'group where everyone may benefit.
Mar 21 2005
David Medlock says...Why not just use an existing scripting language for your configuration files? I would recommend Small (http://www.compuphase.com/small.htm) or Lua (http://www.lua.org/). This scripting language would be useful within your game as well.Is that not a tad overkill... I only want to define a few variables and log file obituaries, that need to be as readable as possible. AEon
Mar 21 2005
On Mon, 21 Mar 2005 16:44:36 +0000 (UTC), AEon <AEon_member pathlink.com> wrote:David Medlock says...The simplest possible format... If you assume your values cannot contain \r\n and your labels/settings cannot contain spaces then you can simply use the following format: label<space>value<\r\n> and parse it by calling "find" on each line, looking for a space, and assuming the rest of the line (minus the \r\n) is the value. If you decide later on that you need \r\n in your values you can encode them as \, r, \, n eg. label<space>regan\r\nwas\r\nhere<\r\n> In general the fewer special characters you define, the fewer special cases you have to handle in values. Further if you can pick characters you will never want to use in values you don't have to handle any special cases at all. Regan ReganWhy not just use an existing scripting language for your configuration files? I would recommend Small (http://www.compuphase.com/small.htm) or Lua (http://www.lua.org/). This scripting language would be useful within your game as well.Is that not a tad overkill... I only want to define a few variables and log file obituaries, that need to be as readable as possible.
Mar 21 2005
Regan Heath says...The simplest possible format... If you assume your values cannot contain \r\n and your labels/settings cannot contain spaces then you can simply use the following format: label<space>value<\r\n> and parse it by calling "find" on each line, looking for a space, and assuming the rest of the line (minus the \r\n) is the value. If you decide later on that you need \r\n in your values you can encode them as \, r, \, n eg. label<space>regan\r\nwas\r\nhere<\r\n> In general the fewer special characters you define, the fewer special cases you have to handle in values. Further if you can pick characters you will never want to use in values you don't have to handle any special cases at all.Hmm... sure that would work, but space is definately something that needs to be used. But your example would be a nightmare to read. The config file format is supposed to be read and changed by not only myself, but any casual stats user. A way to do it: [varlable] values of the variable the whole line not allowing // comments sure that would be trivial to do. But also that is a bit less nice to read. My stats will have quite extensive 4-5 columns ob obituatries using " as a delimiter. So a good way to get rid of \" would help. In my ANSI C code, it simply was not possible, and I was lucky enough not to require it (or I had to hack my code to allow for it). A quite elegant way to solve the problem would be to disallow tabs as values, and use the tab char to seperate the columns. Problem with that is, the user might accidentally add a tab and never notice it. AEon
Mar 21 2005
On Mon, 21 Mar 2005 23:10:05 +0000 (UTC), AEon <AEon_member pathlink.com> wrote:Regan Heath says...In the label/setting name?The simplest possible format... If you assume your values cannot contain \r\n and your labels/settings cannot contain spaces then you can simply use the following format: label<space>value<\r\n> and parse it by calling "find" on each line, looking for a space, and assuming the rest of the line (minus the \r\n) is the value. If you decide later on that you need \r\n in your values you can encode them as \, r, \, n eg. label<space>regan\r\nwas\r\nhere<\r\n> In general the fewer special characters you define, the fewer special cases you have to handle in values. Further if you can pick characters you will never want to use in values you don't have to handle any special cases at all.Hmm... sure that would work, but space is definately something that needs to be used.But your example would be a nightmare to read.I don't think so, but I guess this is personal preference. If you like, replace <space> with <tab>, or allow both.The config file format is supposed to be read and changed by not only myself, but any casual stats user.KISS (Keep It Simple Stupid) - no insult intended. The more people who have to edit it, the simpler you should attempt to make it. Alternately provide a simple program to read/write it and get them to use that.A quite elegant way to solve the problem would be to disallow tabs as values, and use the tab char to seperate the columns. Problem with that is, the user might accidentally add a tab and never notice it.- Treat consecutive tabs as 1 tab. - Ignore trailing tabs. I can see 1 potential problem. Depending on the text editor and length of the values in the file the columns might not line up in the text file. However, Excell and I imagine other spreadsheet style programs can load/save tab seperated value text files, also comma seperated text files i.e. a,b,c d,e,f ..etc.. Regan
Mar 21 2005
On Mon, 21 Mar 2005 01:27:31 +0000 (UTC), AEon wrote:I started to code by parser, a *lot* easier with D, commands like std.string.split work mirracles. But I am still wondering how to optimize parsing, in this case of a configuration file: <code> // comments [General] game "Quake III Arena" gameInfo "Retail, Rocket Arena III, Q3: Team Arena" gameOpt "-q3a" // *** comment gameMode "16" // comments </code> I do a std.string.find(line, "game") to find out if the line contains my key-variable. And then a char[][] splitLine = std.string.split(line, "\""); accessing the value of the var of interest via splitLine[1] Now that is fine and dandy. But when I want to allow the user to use double quotes (") in the config file, this will turn ugly, since the above split does not differ between " and \". Any ideas how to elegantly read the var/value pairs should the value contain a \"? (In C I did some very evil manual hacking to make that work). Thanx.I have a module that will 'tokenize' lines that will probably suit your needs. I've attached the code but if you can't fetch it that way, let me know and I'll make it available on the web. -- Derek Melbourne, Australia 22/03/2005 1:46:54 PM begin 644 test.d M:6UP;W)T(&QI;F5T;VME;CL-" T*:6UP;W)T('-T9"YS=&1I;SL-" T*=F]I M"B` ("!4;VMS(#T M"B` ("!W<FET969L;B B7&Y5<VEN9R! )7- +"! )7- +"! )7- (BP 3&EN M92P 1&5L:6TL($-O;6TI.PT*("` (&9O<F5A8V H:6YT(&DL(&-H87);72!L M3&EN93L 5&]K<RD-"B` ("` ("` =W)I=&5F;&XH(B4R9"TM/F`E<V`B+"!I M8RP 9&5F("P 6V=H:2P :FML72` (#L M("([(CL-" T*("` (%1O:W, /2!4;VME;FEZ94QI;F4H3&EN92P 1&5L:6TL M($-O;6TI.PT*("` ('=R:71E9FQN*")<;E5S:6YG(&`E<V`L(&`E<V`L(&`E M8VAA<EM=(&Q,:6YE.R!4;VMS*0T*("` ("` ("!W<FET969L;B B)3)D+2T^ 68"5S8"(L(&DL(&Q,:6YE*3L-" T*?0`` ` end begin 644 linetoken.d M;6]D=6QE(&QI;F5T;VME;CL-"G!R:79A=&4 >PT*("` (&EM<&]R="!S=&0N M8V4L(&-H87);72!P1&5L:6T /2`B+"(L(&-H87);72!P0V]M;65N="`]("(O M9&-H87);72!P4V]U<F-E+"!D8VAA<EM=('!$96QI;2`]("(L(BP 9&-H87); M=&EC(&1C:&%R6UT =D-L;W-E0G)A8VME="`](")<(B<I77U (CL-" T*("` M(&EF("AP1&5L:6TN;&5N9W1H(#X ,"D-"B` ("` ("` +R\ 3VYL>2!U<V4 M<VEN9VQE+6-H87( 9&5L:6UI=&5R<RX 17AC97-S(&-H87)S(&%R92!I9VYO M<F5D+ T*("` ("` ("!L1&5L:6T ?CT <$1E;&EM6S!=.PT*("` (&5L<V4- M"B` ("` ("` ;$1E;&EM(#T (B([("` +R\ 365A;FEN9R`G86YY(&=R;W5P M("` (&Q4<FEM4W!O="`]("TQ.PT*("` (&9O<F5A8V H:6YT(&DL(&1C:&%R M/2`P*0T*("` ("` ("![("` +R\ 0VAE8VL 9F]R(&-O;6UE;G0 <W1R:6YG M("` ("` ("` ("` ("` ("` ('L-"B` ("` ("` ("` ("` ("` ("` ("` M/2`M,2D-"B` ("` ("` >PT*("` ("` ("` ("` +R\ 3F]T(&EN(&$ =&]K M96X >65T+ T*("` ("` ("` ("` :68 *'-T9"YC='EP92YI<W-P86-E*&,I M*0T*("` ("` ("` ("` ("` (&-O;G1I;G5E.R` +R\ 4VMI<"!O=F5R('-P M<R!A8F]U="!T;R!S=&%R="X-"B` ("` ("` ("` (&Q);E1O:V5N(#T ;%)E M;E1O:V5N("L ,3L-"B` ("` ("` ("` (&Q4<FEM4W!O="`]("TQ.PT*("` M("` ("` ("` ("!L4F5S=6QT6VQ);E1O:V5N72!^/2!C.PT*("` ("` ("` M("` ;$QI=$UO9&4 /2!F86QS93L-"B` ("` ("` ("` (&Q4<FEM4W!O="`] M('L ("`O+R!/;FQY(&-H96-K(&9O<B!D96QI;6ET97)S(&EF(&YO="!I;B`G M8G)A8VME="<M;6]D92X-"B` ("` ("` ("` (&EF("AL1&5L:6TN;&5N9W1H M(#T M9"YC='EP92YI<W-P86-E*&,I*0T*("` ("` ("` ("` ("` ('L-"B` ("` M("` ("` ("` ("` ("` ;%1R:6U3<&]T(#T M("` ("` (&Q);E1O:V5N(#T M($=O(&9E=&-H(&YE>'0 8VAA<F%C=&5R+ T*("` ("` ("` ("` ("` ("` M("` ("` ('L-"B` ("` ("` ("` ("` ("`O+R!&;W5N9"!A('1O:V5N(&1E M("` ("` ("` :68 *&Q4<FEM4W!O="`A/2`M,2D-"B` ("` ("` ("` ("` M;V9F('1R86EL:6YG('-P86-E<RX-"B` ("` ("` ("` ("` ("` ("` ;%)E M<W5L=%ML26Y4;VME;ETN;&5N9W1H(#T M("` ("` ("` ("` (&Q4<FEM4W!O="`]("TQ.PT*("` ("` ("` ("` ("` M('T-"B` ("` ("` ("` ("` ("!L26Y4;VME;B`]("TQ.PT*("` ("` ("` M("` ("` ("\O($=O(&9E=&-H(&YE>'0 8VAA<F%C=&5R+ T*("` ("` ("` M"B` ("` ("` :68 *&Q297-U;'1;;$EN5&]K96Y=+FQE;F=T:"`]/2`P*0T* M('EE="X-"B` ("` ("` ("` (&Q0;W, /2!F:6YD*'9/<&5N0G)A8VME="P M('L-"B` ("` ("` ("` ("` ("`O+R!!;B`G;W!E;B< 8G)A8VME="!W87, M9F]U;F0L('-O(&UA:V4 =&AI<R!I=',-"B` ("` ("` ("` ("` ("`O+R!O M=VX =&]K96XL('-T87)T(&%N;W1H97( ;F5W(&]N92P 86YD(&=O(&EN=&\- M"B` ("` ("` ("` ("` ("`O+R`G8G)A8VME="<M;6]D92X-"B` ("` ("` M("` ("` ;$EN5&]K96X /2!L4F5S=6QT+FQE;F=T:#L-"B` ("` ("` ("` M("` ("!L4F5S=6QT+FQE;F=T:"`](&Q);E1O:V5N("L ,3L-" T*("` ("` M("AC(#T M("` ("` ("` ;$YE<W1,979E;"TM.PT*("` ("` ("` ("` ("` (&EF("AL M3F5S=$QE=F5L(#T M("` ("` ("` ("`O+R!/:V%Y+"!))W9E(&9O=6YD('1H92!E;F0 ;V8 =&AE M(&)R86-K971E9"!C:&%R<RX-"B` ("` ("` ("` ("` ("` ("` +R\ 3F]T M92!T:&%T('1H:7, 9&]E<VXG="!N96-E<W-A<FEL>2!M96%N('1H92!E;F0 M;V8-"B` ("` ("` ("` ("` ("` ("` +R\ 82!T;VME;B!W87, 86QS;R!F M("` ("` ("` (&-O;G1I;G5E.PT*("` ("` ("` ("` ("` ('T-"B` ("` M("` ("` +R\ 1FEN86QL>2P 22!G970 =&\ 861D('1H:7, 8VAA<B!T;R!T M"B` ("` ("` :68 *&Q.97-T3&5V96P /3T ,"D-"B` ("` ("` ("` ("\O M($]N;'D 8VAE8VL 9F]R('1R86EL:6YG('-P86-E<R!I9B!N;W0 :6X )V)R M86-K970G+6UO9&4-"B` ("` ("` ("` (&EF("AS=&0N8W1Y<&4N:7-S<&%C M92AC*2D-"B` ("` ("` ("` ('L-"B` ("` ("` ("` ("` ("`O+R!)="!W M87, 82!S<&%C92P <V\ :70 :7, <&]T96YT:6%L;'D 82!T<F%I;&EN9R!S M<&%C92P-"B` ("` ("` ("` ("` ("`O+R!T:'5S($D ;6%R:R!I=', <W!O M="`H:68 :70G<R!T:&4 9FER<W0 :6X 82!S970 ;V8 <W!A8V5S+BD-"B` M("` ("` ("` ("` ("!I9B`H;%1R:6U3<&]T(#T]("TQ*0T*("` ("` ("` M("` ("` ("` ("!L5')I;5-P;W0 /2!L4F5S=6QT6VQ);E1O:V5N72YL96YG M("` ("` ("!L5')I;5-P;W0 /2`M,3L-" T*("` ('T-" T*("` (&EF("AL M4F5S=6QT+FQE;F=T:"`]/2`P*0T*("` ("` ("!L4F5S=6QT('X]("(B.PT* M(%1R:6T ;V9F('1R86EL:6YG('-P86-E<R!O;B!L87-T('1O:V5N+ T*("` M("` ("!L4F5S=6QT6R0M,5TN;&5N9W1H(#T M>PT*("` ("` ("!I9B`H<$-H87)4;T9I;F0 /3T 8RD-"B` ("` ("` ("` M=R!4;R!5<V4 /3T]/3T]/3T]/3T]/3T]/3T]/3T]/3T]/3T]/3T]/0T*26YS M(%1O:W, /2!4;VME;FEZ94QI;F4H26YP=71,:6YE+"!$96QI;4-H87(L($-O M;6UE;G13=')I;F<I.PT**BH 3F]T92!T:&%T(&ET(&%C8V5P=', 86QL("=C M=&EN92!S8V%N<R!T:&4 :6YP=70 <W1R:6YG(&%N9"!R971U<FYS(&$ <V5T M(&]F('-T<FEN9W,L(&]N90T*<&5R('1O:V5N(&9O=6YD(&EN('1H92!I;G!U M96QI;4-H87( :7, 86X 96UP='D <W1R:6YG+"!T:&5N('1O:V5N<R!A<F4 M9&5L:6UI=&5D(&)Y(&%N>2!G<F]U<`T*;V8 ;VYE(&]R(&UO<F4 =VAI=&4M M(BX-" T*268 0V]M;65N=%-T<FEN9R!I<R!N;W0 96UP='DL('1H96X 86QL M('!A<G1S(&]F('1H92!I;G!U="!S=')I;F< 9G)O;0T*=&AE(&)E9VEN:6YG M(&]F('1H92!C;VUM96YT('1O('1H92!E;F0 87)E(&EG;F]R960N($)Y(&1E M9F%U;'0-"D-O;6UE;G13=')I;F< :7, (B\O(BX-" T*268 82!T;VME;B!B M96=I;G, =VET:"!A('%U;W1E("AS:6YG;&4L(&1O=6)L92!O<B!B86-K*2P M=&AE;B!Y;W4 =VEL;`T*9V5T(&)A8VL ='=O('1O:V5N<RX 5&AE(&9I<G-T M(&ES('1H92!Q=6]T92!A<R!A('-I;F=L92!C:&%R86-T97( <W1R:6YG+`T* M86YD('1H92!S96-O;F0 :7, 86QL('1H92!C:&%R86-T97)S('5P('1O+"!B M=70 ;F]T(&EN8VQU9&EN9R!T:&4 ;F5X=`T*<75O=&4 ;V8 =&AE('-A;64 M=&]K96X 8F5G:6YS('=I=& 82!B<F%C:V5T("AP87)E;G1H97-I<RP <W%U M;VME;G,N(%1H92!F:7)S="!I<R!T:&4 ;W!E;FEN9R!B<F%C:V5T(&%S(&$ M;&P =&AE(&-H87)A8W1E<G, =7` =&\L(&)U="!N;W0-"FEN8VQU9&EN9RP M=&AE(&UA=&-H:6YG(&5N9"!B<F%C:V5T+"!T86MI;F< ;F5S=&5D(&)R86-K M971S("AO9B!T:&4 <V%M90T*='EP92D :6YT;R!C;VYS:61E<F%T:6]N+ T* M(&$ 8F%C:RUS;&%S:"!C:&%R86-T97( *%PI+"!T:&5N(&YE>'0 8VAA<F%C M;BX 66]U(&-A;B!U<V4 =&AI<R!T;R!F;W)C90T*=&AE(&1E;&EM:71E<B!C M:&%R86-T97( ;W( <W!A8V5S('1O(&)E(&EN<V5R=&5D(&EN=&\ 82!T;VME M("!4;VME;FEZ94QI;F4H(F-H87)A8W1E<B` ("!O<B!S<&%C97, =&\ 8F4 M7'0 :6YS97)T960B+"`B(BD-"B`M+3X >R)C:&%R86-T97(B+"`B;W(B+"`B M<W!A8V5S(BP (G1O(BP (F)E(BP (FEN<V5R=&5D(GT-" T*("` 5&]K96YI M>F5,:6YE*"( 86)C.R!D968 +"!G:&D[("(L("([(BD-"B`M+3X >R)A8F,B M(%MD968 +"!G:&E=("` ("` ("` ("`B*0T*("TM/B![(F%B8R(L(");(BP M9VAI(GT-" T*("` 5&]K96YI>F5,:6YE*"( 86)C+"!;9&5F("P 6V=H:2P M:FML72!=("`B*0T*("TM/B![(F%B8R(L(");(BP (F1E9B`L(%MG:&DL(&IK M;%T (GT-" T*("` 5&]K96YI>F5,:6YE*"( 86)C+"!D968 +"!G:&D .R!C ` end
Mar 21 2005
Derek Parnell says...22/03/2005 1:46:54 PM begin 644 test.dWhat kind of format it that. Looks like something similar to tar, shar or something. Those I would copy/paste into a test file and unpack them via TotalCommander. But your format? AEon
Mar 22 2005
On Tue, 22 Mar 2005 12:43:44 +0000 (UTC), AEon <AEon_member pathlink.com> wrote:Derek Parnell says...It's uuencoded. Search the web for a utility commonly called uudecode. Or, if you have WinACE rename/save the data as a .uue file right click on it in windows explorer and use the winace "extract here" option. Regan22/03/2005 1:46:54 PM begin 644 test.dWhat kind of format it that. Looks like something similar to tar, shar or something. Those I would copy/paste into a test file and unpack them via TotalCommander. But your format?
Mar 22 2005
Regan Heath wrote:On Tue, 22 Mar 2005 12:43:44 +0000 (UTC), AEon <AEon_member pathlink.com> wrote:...Derek Parnell says...It's uuencoded. Search the web for a utility commonly called uudecode. Or, if you have WinACE rename/save the data as a .uue file right click on it in windows explorer and use the winace "extract here" option. ReganDecoding the fun way: http://www.dsource.org/tutorials/index.php?show_example=146 -- Justin (a/k/a jcc7) http://jcc_7.tripod.com/d/
Mar 22 2005
On Tue, 22 Mar 2005 12:43:44 +0000 (UTC), AEon wrote:Derek Parnell says...It ain't "my" format but a very commonly used one - UUEncode. Most news readers can handle it but I'll make the file available on the web (for now). http://www.users.bigpond.com/ddparnell/linetoken.d -- Derek Parnell Melbourne, Australia 22/03/2005 11:49:15 PM22/03/2005 1:46:54 PM begin 644 test.dWhat kind of format it that. Looks like something similar to tar, shar or something. Those I would copy/paste into a test file and unpack them via TotalCommander. But your format?
Mar 22 2005
In article <18k6r1o1h3k7g.1478moloz1j32.dlg 40tude.net>, Derek Parnell says...On Tue, 22 Mar 2005 12:43:44 +0000 (UTC), AEon wrote:I had not been suggesting you invented it ;)... Copy/paste, name file uue works just fine with TotalCommander. Had totally forgotten about uue. AEonDerek Parnell says...It ain't "my" format but a very commonly used one - UUEncode. Most news readers can handle it but I'll make the file available on the web (for now). http://www.users.bigpond.com/ddparnell/linetoken.d22/03/2005 1:46:54 PM begin 644 test.dWhat kind of format it that. Looks like something similar to tar, shar or something. Those I would copy/paste into a test file and unpack them via TotalCommander. But your format?
Mar 22 2005
Derek Parnell says...begin 644 linetoken.dA few questions about your code: char[][] TokenizeLine(char[] pSource, char[] pDelim = ",", char[] pComment = "//") As I understand this, pDelim and pComment can be set on calling via TokenizeLine(), but need not since both have "default" values? If so, another very useful code example. int find(dchar[] pStringToScan, dchar pCharToFind) I noted that you defined you own find() function. Generally would that function conflict with those defined in the std lib? Or do user-defined functiontions automatically shadow lib functions? Amazing piece code (will take me a while to read/understand), from your example output test cases : TokenizeLine(" abc, def , ghi, ") // default Delim "," Comment is "//" --> {"abc", "def", "ghi", ""} split(" abc, def , ghi, ", ","), and then apply strip() on every element, would do the same, but not as elegantly :) TokenizeLine("character or spaces to be \t inserted", "") --> {"character", "or", "spaces", "to", "be", "inserted"} An empty delimiter seems to be an alias for \t and " " (space)? Nice! (Just noted from your info: However, if DelimChar is an empty string, then tokens are delimited by any group of one or more white-space characters. By default, DelimChar is ",".) Duplicating that with split() would be tough. TokenizeLine(" abc; def , ghi; ", ";") --> {"abc", "def , ghi", "" } Noting, you seem to be calling something like strip() though not exactly that function. TokenizeLine(" abc, [def , ghi] ") // default Delim "," Comment is "//" --> {"abc", "[", "def , ghi"} (Explanation: If a token begins with a bracket (parenthesis, square, or brace), then you will get back two tokens. The first is the opening bracket as a single character string, and the second is all the characters up to, but not including, the matching end bracket, taking nested brackets (of the same type) into consideration.) Would not: --> {"abc", "[", "def , ghi", "]"} or even --> {"abc", "def , ghi" } be "neater"? TokenizeLine(" abc, [def , [ghi, jkl] ] ") --> {"abc", "[", "def , [ghi, jkl] "} Anything in brackets is treated literally (i.e. as is), so nested brackets are not interpreted. OK. So if you actually wanted to use [] or () in strings, and that may well happen often, one would actually need to "escape" those in some way? I am not sure that the special treatment brackets require will always be convenient. TokenizeLine(` abc, "def , ghi" , jkl `) --> {"abc", `"`, "def , ghi", "jkl"} TokenizeLine(` "moo" \t " oi\"nk\" " \t "ladida " //Comment`, `"`, `//`) 0-->`` 1-->`moo` 2-->`t` 3-->`oi"nk"` 4-->`t` 5-->`ladida` Wishlist: 0: Would wish to not have element 0. 1: fine, but should be element token 0. 2: \t tab no longer recognized, tab should have been ingnored 3: Perfect 4: same as 2 5: Perfect "6" comment ignored, fine I would hope to do this to a line: `"moo" <whitespaces> " oi\"nk\" " <whitespaces> "ladida "//Comment` ->0: `moo` ->1: `oi"nk"` ->2: `ladida` Presently it would not be possibly to rely on a specific column to contain the info a specific double quote pair. Could that be made possible? Thanx for your work... AEon
Mar 22 2005
On Tue, 22 Mar 2005 14:14:12 +0000 (UTC), AEon wrote:Derek Parnell says...The D method to resolve such ambiguities is to fully qualify the reference with the package/module name, such as ... lPos = util.linetoken.find(lResult, lToken); [snip]begin 644 linetoken.dA few questions about your code: char[][] TokenizeLine(char[] pSource, char[] pDelim = ",", char[] pComment = "//") As I understand this, pDelim and pComment can be set on calling via TokenizeLine(), but need not since both have "default" values? If so, another very useful code example. int find(dchar[] pStringToScan, dchar pCharToFind) I noted that you defined you own find() function. Generally would that function conflict with those defined in the std lib? Or do user-defined functiontions automatically shadow lib functions?TokenizeLine("character or spaces to be \t inserted", "") --> {"character", "or", "spaces", "to", "be", "inserted"} An empty delimiter seems to be an alias for \t and " " (space)? Nice! (Just noted from your info: However, if DelimChar is an empty string, then tokens are delimited by any group of one or more white-space characters. By default, DelimChar is ",".) Duplicating that with split() would be tough.Yes, the empty delimiter uses groups of one or more whitespace characters to act as a single delimiter. If you really need *just* the space character to be the delimiter then use that, same with tabs.TokenizeLine(" abc; def , ghi; ", ";") --> {"abc", "def , ghi", "" } Noting, you seem to be calling something like strip() though not exactly that function. TokenizeLine(" abc, [def , ghi] ") // default Delim "," Comment is "//" --> {"abc", "[", "def , ghi"} (Explanation: If a token begins with a bracket (parenthesis, square, or brace), then you will get back two tokens. The first is the opening bracket as a single character string, and the second is all the characters up to, but not including, the matching end bracket, taking nested brackets (of the same type) into consideration.) Would not: --> {"abc", "[", "def , ghi", "]"} or even --> {"abc", "def , ghi" } be "neater"?Often, when parsing the tokens you need to know if a token was enclosed in brackets or quotes. By supplying the opening bracket or quote in the returned tokens, you can quickly see which were bracketed tokens. Also, there is no need to supply the closing bracket or quote as you know what that would have been by the opening bracket or quote. In other words, if you come across a token of "{" you know the next token was enclosed in braces, so you don't need to see the final brace.TokenizeLine(" abc, [def , [ghi, jkl] ] ") --> {"abc", "[", "def , [ghi, jkl] "} Anything in brackets is treated literally (i.e. as is), so nested brackets are not interpreted. OK. So if you actually wanted to use [] or () in strings, and that may well happen often, one would actually need to "escape" those in some way? I am not sure that the special treatment brackets require will always be convenient.There are two ways (at least) to do that. First method is to use the Escape character (the back-slash "\"). TokenizeLine(` abc, \[def , [ghi, jkl] ] `) --> { "abc", "[def", "ghi, jkl", "]" } TokenizeLine(`abc, def\, ghi, jkl`) --> { "abc", "def, ghi", "jkl"} Note only 3 tokens. The other way is to enclose it inside a different sort of bracket/quote. TokenizeLine(`He said, '"Let's go down to the river".`, ``) --> { `He`, `said,`, `'`, `"Let's go down to the river".` }TokenizeLine(` "moo" \t " oi\"nk\" " \t "ladida " //Comment`, `"`, `//`) 0-->`` 1-->`moo` 2-->`t` 3-->`oi"nk"` 4-->`t` 5-->`ladida` Wishlist: 0: Would wish to not have element 0. 1: fine, but should be element token 0. 2: \t tab no longer recognized, tab should have been ingnored 3: Perfect 4: same as 2 5: Perfect "6" comment ignored, fineWell, you said that the token delimiter was the double-quote. Also, you used the 'raw' string format so the sequence "\t" is not a tab but literally a backslash-t combination. So this would have been broken up like this ... ` ` `moo` ` \t ` ` oi\"nk\" ` ` \t "` `ladida ` ` //Comment` Then when leading and trailing spaces are removed you get ... `` `moo` `\t` `oi\"nk\"` `\t` `ladida` `//Comment` Then applying escaped characters `` `moo` `t` `oi"nk"` `t` `ladida` `//Comment` Then when removing comments ... `` `moo` `t` `oi"nk"` `t` `ladida`I would hope to do this to a line: `"moo" <whitespaces> " oi\"nk\" " <whitespaces> "ladida "//Comment` ->0: `moo` ->1: `oi"nk"` ->2: `ladida`Toks = TokenizeLine(`"moo" <whitespaces> " oi\"nk\" " <whitespaces> "ladida "//Comment`", ""); // Toks --> { `"`, `moo`, `"`, ` oi"nk" `"`, `ladida` } int i; foreach(char[] aTok; Toks) { if (aTok != `"`) { writefln("->%d: `%s`", i, std.string.strip(aTok)); i++; } }Presently it would not be possibly to rely on a specific column to contain the info a specific double quote pair. Could that be made possible?I suppose so, but it is designed to handle free form text and not column-delimited stuff. -- Derek Parnell Melbourne, Australia 23/03/2005 1:40:47 AM
Mar 22 2005
Derek Parnell, At least in my case: [Weapons] "0" " killed by MOD_SHOTGUN" "Shotgun" "SG" "1" " killed by MOD_GAUNTLET" "Gauntlet" "G" something quite simple just occured to me. When using std.string.splitline() to read complete lines from a text file you, will *never* encounter a \n in the line, since that would have placed the content on another line. So when you have something line this: "1" " killed \" \" by MOD_GAUNTLET" "Gauntlet" "G" You cound do a replace \", \n and be sure that line will not loose any information. Then char[][] spline = split(line, "\""); And finally replace any \n back to \" (or right back to " depending how you want to use the spline elements). Obviously your code is a lot more flexible, but as we all strive for "KISS" ;) my idea should work quite well. BTW: I have been noting, many feedback posts should really be archived, especially all the very useful code-examples. AEon
Mar 22 2005
AEon wrote:Derek Parnell, BTW: I have been noting, many feedback posts should really be archived, especially all the very useful code-examples. AEonI'm not sure what you mean by "archive". It's not like Walter clears out these newsgroups at the end of every month. Walter has even produced some handy index pages such as http://www.digitalmars.com/d/archives/digitalmars/D/index.html They are particularly useful because Google indexes them (http://www.digitalmars.com/d/archives/advancedsearch.html). Apparently, he hasn't spun out "archives" for this particular newsgroup yet, but I'm sure he will eventually. On the other hand, if by "archive" you mean gathering together snippets of code, the dsource tutorials projects has already done some of this: http://www.dsource.org/tutorials/. If you think new examples are being added too slowly, you're welcome to start adding some yourself. :) Also, there's an ever-growing amount of useful information available at Wiki4D. Two of my favorite pages: http://www.prowiki.org/wiki4d/wiki.cgi?NewsDmD http://www.prowiki.org/wiki4d/wiki.cgi?ErrorMessages Everyone is invited to add and/or update to the wiki content, too. It's much easier than writting HTML -- and quite self-explanatory. -- Justin (a/k/a jcc7) http://jcc_7.tripod.com/d/
Mar 22 2005
J C Calvarese wrote:AEon wrote:True enough, and now that I finally have a newsreader installed, (Mozilla Thunderbird), I can download and search the posts. I had not been aware of this, since I never used newsgroups before :). I am more of a Forum guy.BTW: I have been noting, many feedback posts should really be archived, especially all the very useful code-examples.I'm not sure what you mean by "archive". It's not like Walter clears out these newsgroups at the end of every month.Walter has even produced some handy index pages such as http://www.digitalmars.com/d/archives/digitalmars/D/index.html They are particularly useful because Google indexes them (http://www.digitalmars.com/d/archives/advancedsearch.html). Apparently, he hasn't spun out "archives" for this particular newsgroup yet, but I'm sure he will eventually.Personally I hope Walter does have to "waste" his time with things like that too much, giving him more time to work on D. So if the updates are less regular that is fine.On the other hand, if by "archive" you mean gathering together snippets of code, the dsource tutorials projects has already done some of this: http://www.dsource.org/tutorials/. If you think new examples are being added too slowly, you're welcome to start adding some yourself. :)That was what I had been thinking about. And I would normally help with this, but I am desperately trying to recode some 1000+ hours of AEstats coding to D, and that takes up all my time. But once that is done I'd be glad to help. Till then I should have a more solid grasp of D as well. AEon
Mar 23 2005
AEon wrote:J C Calvarese wrote:...True enough, and now that I finally have a newsreader installed, (Mozilla Thunderbird), I can download and search the posts. I had not been aware of this, since I never used newsgroups before :). I am more of a Forum guy.I made that transition myself a few years ago (web forums -> newsreader). I still regularly use the web interface when I'm away from my home computer, but I much prefer Thunderbird when it's available.I think it's automated to where it isn't much effort for him. He probably just pushes a button every month or so. (I don't want him to waste a lot of time on it either, but it's nice to have Google index the newsgroup messages.)Apparently, he hasn't spun out "archives" for this particular newsgroup yet, but I'm sure he will eventually.Personally I hope Walter does have to "waste" his time with things like that too much, giving him more time to work on D. So if the updates are less regular that is fine.Good. No pressure, I was just suggesting some easy ways to collaborate. AEstats sounds like an interesting use of time, too. ;) -- Justin (a/k/a jcc7) http://jcc_7.tripod.com/d/On the other hand, if by "archive" you mean gathering together snippets of code, the dsource tutorials projects has already done some of this: http://www.dsource.org/tutorials/. If you think new examples are being added too slowly, you're welcome to start adding some yourself. :)That was what I had been thinking about. And I would normally help with this, but I am desperately trying to recode some 1000+ hours of AEstats coding to D, and that takes up all my time. But once that is done I'd be glad to help. Till then I should have a more solid grasp of D as well. AEon
Mar 23 2005
J C Calvarese wrote:No pressure, I was just suggesting some easy ways to collaborate. AEstats sounds like an interesting use of time, too. ;):9 In 4 days I have been able to do more in AEstats (in D), than I was able to do in AEstats++ (in C) in 3-4 weeks. Since in D I no longer need to use pointers, strings are for free, and D has very powerful easy to use sting functions, most of my code is basically error checking, e.g. config using invalid syntax, missing double quotes and the like. The code itself is very minimal. This is the way C could should always have been! I already have all the hardcoded obituaries replaced with configuration obituaries that are read on the fly. Sure this is not really a big deal, trying to coding that in C made me weep... And the best part, AEstats (then called AEstats++) will sooner or later be database driven via MYSQL... and that also should be a *lot* simpler to do than an C++ code. AEon
Mar 25 2005