digitalmars.D.learn - String Parsing with \" in a ".." text line

AEon (27/27) Mar 20 2005 I started to code by parser, a *lot* easier with D, commands like

Regan Heath (6/10) Mar 20 2005 I think you have to write your own version of split, one that allows

AEon (12/20) Mar 21 2005 :)... will take a while to get a useful version written, since I am stil...

Stewart Gordon (14/45) Mar 21 2005 Is this a third-party file format? If not, why not define a format

AEon (9/43) Mar 21 2005 True... it is totally up to me to define the format, I felt that was the...

Stewart Gordon (8/14) Mar 21 2005 I actually meant the find and rfind (oops, where did findr come from?)

David Medlock (7/47) Mar 21 2005 Why not just use an existing scripting language for your configuration

Stewart Gordon (11/16) Mar 21 2005 Around two years ago I invented a configuration language called
AEon (4/9) Mar 21 2005 Is that not a tad overkill... I only want to define a few variables and ...

Regan Heath (17/28) Mar 21 2005 The simplest possible format...

AEon (16/29) Mar 21 2005 Hmm... sure that would work, but space is definately something that need...

Regan Heath (20/53) Mar 21 2005 In the label/setting name?

Derek Parnell (210/249) Mar 21 2005 I have a module that will 'tokenize' lines that will probably suit your

AEon (5/7) Mar 22 2005 What kind of format it that. Looks like something similar to tar, shar o...

Regan Heath (6/12) Mar 22 2005 It's uuencoded. Search the web for a utility commonly called uudecode. O...

J C Calvarese (7/16) Mar 22 2005 Decoding the fun way:

Derek Parnell (9/17) Mar 22 2005 It ain't "my" format but a very commonly used one - UUEncode. Most news

AEon (5/18) Mar 22 2005 I had not been suggesting you invented it ;)...

AEon (74/75) Mar 22 2005 A few questions about your code:

Derek Parnell (78/170) Mar 22 2005 The D method to resolve such ambiguities is to fully qualify the referen...

AEon (19/19) Mar 22 2005 Derek Parnell,

J C Calvarese (22/27) Mar 22 2005 I'm not sure what you mean by "archive". It's not like Walter clears out...

AEon (13/31) Mar 23 2005 True enough, and now that I finally have a newsreader installed,

J C Calvarese (15/39) Mar 23 2005 I made that transition myself a few years ago (web forums ->

AEon (16/18) Mar 25 2005 :9

AEon <AEon_member pathlink.com> writes:

I started to code by parser, a *lot* easier with D, commands like
std.string.split work mirracles.

But I am still wondering how to optimize parsing, in this case of a
configuration file:

<code>
// comments
[General]
game		"Quake III Arena"
gameInfo	"Retail, Rocket Arena III, Q3: Team Arena"
gameOpt		"-q3a"			// *** comment
gameMode	"16"
// comments
</code>

I do a 

   std.string.find(line, "game")

to find out if the line contains my key-variable. And then a

  char[][] splitLine = std.string.split(line, "\"");

accessing the value of the var of interest via

 splitLine[1]

Now that is fine and dandy. But when I want to allow the user to use double
quotes (") in the config file, this will turn ugly, since the above split does
not differ between " and \".

Any ideas how to elegantly read the var/value pairs should the value contain a
\"?

(In C I did some very evil manual hacking to make that work).

Thanx.

AEon

Mar 20 2005

"Regan Heath" <regan netwin.co.nz> writes:

On Mon, 21 Mar 2005 01:27:31 +0000 (UTC), AEon <AEon_member pathlink.com>  
wrote:
 Any ideas how to elegantly read the var/value pairs should the value  
 contain a
 \"?

 (In C I did some very evil manual hacking to make that work).

I think you have to write your own version of split, one that allows  
"escaped" characters. Once written I'd recommend it for inclusion into  
std.string.

Regan

Mar 20 2005

AEon <AEon_member pathlink.com> writes:

Regan Heath says...

 Any ideas how to elegantly read the var/value pairs should the value  
 contain a
 \"?

 (In C I did some very evil manual hacking to make that work).

I think you have to write your own version of split, one that allows  
"escaped" characters. Once written I'd recommend it for inclusion into  
std.string.

:)... will take a while to get a useful version written, since I am still
learning about all the goodies in std.string.


Basically what could be useful would be a

char[][] splitx(char[] stringtosplit, char[] delimiter, char[] non-delimiters)

of sorts:

splitx( line, "\"", "\\\"");

A simpler solution would be to use another delimiter in my config files. But
that would leave the problem, that any delimiter could also be needed in the
text.

If I find anything useful, will post the code.

AEon

Mar 21 2005

Stewart Gordon <smjg_1998 yahoo.com> writes:

AEon wrote:
 I started to code by parser, a *lot* easier with D, commands like
 std.string.split work mirracles.
 
 But I am still wondering how to optimize parsing, in this case of a
 configuration file:
 
 <code>
 // comments
 [General]
 game		"Quake III Arena"
 gameInfo	"Retail, Rocket Arena III, Q3: Team Arena"
 gameOpt		"-q3a"			// *** comment
 gameMode	"16"
 // comments
 </code>

Is this a third-party file format?  If not, why not define a format 
that's that little bit easier to parse?  I'd be inclined to go for 
something resembling Windows .ini files.  But if you still want to do it 
this way....

 I do a 
 
    std.string.find(line, "game")
 
 to find out if the line contains my key-variable.

Which won't work if "game" is somewhere in the value, not in the key. 
How about checking whether the line _begins_ with "game"?

 And then a
 
   char[][] splitLine = std.string.split(line, "\"");
 
 accessing the value of the var of interest via
 
  splitLine[1]
 
 Now that is fine and dandy. But when I want to allow the user to use double
 quotes (") in the config file, this will turn ugly, since the above split does
 not differ between " and \".

<snip>

By using split for this you're making life difficult for yourself.  How 
about just picking out the first and last quotes, using find and findr?

Stewart.

-- 
My e-mail is valid but not my primary mailbox.  Please keep replies on 
the 'group where everyone may benefit.

Mar 21 2005

AEon <AEon_member pathlink.com> writes:

Stewart Gordon...

good points

 <code>
 // comments
 [General]
 game		"Quake III Arena"
 gameInfo	"Retail, Rocket Arena III, Q3: Team Arena"
 gameOpt		"-q3a"			// *** comment
 gameMode	"16"
 // comments
 </code>

Is this a third-party file format?  If not, why not define a format 
that's that little bit easier to parse?  I'd be inclined to go for 
something resembling Windows .ini files.  But if you still want to do it 
this way....

True... it is totally up to me to define the format, I felt that was the easiest
way to format the cfg file, and is easy to read.

 I do a 
 
    std.string.find(line, "game")
 
 to find out if the line contains my key-variable.

Which won't work if "game" is somewhere in the value, not in the key. 
How about checking whether the line _begins_ with "game"?

I had thought of that, but then forgot to check for it. Sigh :)


 And then a
 
   char[][] splitLine = std.string.split(line, "\"");
 
 accessing the value of the var of interest via
 
  splitLine[1]
 
 Now that is fine and dandy. But when I want to allow the user to use double
 quotes (") in the config file, this will turn ugly, since the above split does
 not differ between " and \".

<snip>

By using split for this you're making life difficult for yourself.  How 
about just picking out the first and last quotes, using find and findr?

Well as long as there is no \" in the line, split will do the job much quicker.
Just checked, you are talking regular expression. Still need to learn about
those.

AEon

Mar 21 2005

Stewart Gordon <smjg_1998 yahoo.com> writes:

AEon wrote:
<snip>
 By using split for this you're making life difficult for yourself.  How 
 about just picking out the first and last quotes, using find and findr?

 
 Well as long as there is no \" in the line, split will do the job much
quicker. 
 Just checked, you are talking regular expression. Still need to learn about 
 those.

I actually meant the find and rfind (oops, where did findr come from?) 
in std.string, not std.regexp.

Stewart.

-- 
My e-mail is valid but not my primary mailbox.  Please keep replies on 
the 'group where everyone may benefit.

Mar 21 2005

David Medlock <amedlock nospam.org> writes:

AEon wrote:
 I started to code by parser, a *lot* easier with D, commands like
 std.string.split work mirracles.
 
 But I am still wondering how to optimize parsing, in this case of a
 configuration file:
 
 <code>
 // comments
 [General]
 game		"Quake III Arena"
 gameInfo	"Retail, Rocket Arena III, Q3: Team Arena"
 gameOpt		"-q3a"			// *** comment
 gameMode	"16"
 // comments
 </code>
 
 I do a 
 
    std.string.find(line, "game")
 
 to find out if the line contains my key-variable. And then a
 
   char[][] splitLine = std.string.split(line, "\"");
 
 accessing the value of the var of interest via
 
  splitLine[1]
 
 Now that is fine and dandy. But when I want to allow the user to use double
 quotes (") in the config file, this will turn ugly, since the above split does
 not differ between " and \".
 
 Any ideas how to elegantly read the var/value pairs should the value contain a
 \"?
 
 (In C I did some very evil manual hacking to make that work).
 
 Thanx.
 
 AEon


Why not just use an existing scripting language for your configuration 
files?

I would recommend Small (http://www.compuphase.com/small.htm) or
Lua (http://www.lua.org/).

This scripting language would be useful within your game as well.

-David

Mar 21 2005

Stewart Gordon <smjg_1998 yahoo.com> writes:

David Medlock wrote:
<snip>
 Why not just use an existing scripting language for your configuration 
 files?
 
 I would recommend Small (http://www.compuphase.com/small.htm) or
 Lua (http://www.lua.org/).

Around two years ago I invented a configuration language called 
Configur8.  It's basically a slightly more powerful version of Windows 
INI files (with one or two syntactical differences).  It's no match for 
a scripting language, but is perfect for stuff like the above appears to be.

I haven't yet created a D interface, but I plan to do it at some point.

Stewart.

-- 
My e-mail is valid but not my primary mailbox.  Please keep replies on 
the 'group where everyone may benefit.

Mar 21 2005

AEon <AEon_member pathlink.com> writes:

David Medlock says...

Why not just use an existing scripting language for your configuration 
files?

I would recommend Small (http://www.compuphase.com/small.htm) or
Lua (http://www.lua.org/).

This scripting language would be useful within your game as well.

Is that not a tad overkill... I only want to define a few variables and log file
obituaries, that need to be as readable as possible.

AEon

Mar 21 2005

"Regan Heath" <regan netwin.co.nz> writes:

On Mon, 21 Mar 2005 16:44:36 +0000 (UTC), AEon <AEon_member pathlink.com>  
wrote:
 David Medlock says...

 Why not just use an existing scripting language for your configuration
 files?

 I would recommend Small (http://www.compuphase.com/small.htm) or
 Lua (http://www.lua.org/).

 This scripting language would be useful within your game as well.

 Is that not a tad overkill... I only want to define a few variables and  
 log file
 obituaries, that need to be as readable as possible.

The simplest possible format...

If you assume your values cannot contain \r\n and your labels/settings  
cannot contain spaces then you can simply use the following format:

label<space>value<\r\n>

and parse it by calling "find" on each line, looking for a space, and  
assuming the rest of the line (minus the \r\n) is the value.

If you decide later on that you need \r\n in your values you can encode  
them as \, r, \, n eg.

label<space>regan\r\nwas\r\nhere<\r\n>

In general the fewer special characters you define, the fewer special  
cases you have to handle in values. Further if you can pick characters you  
will never want to use in values you don't have to handle any special  
cases at all.

Regan

Regan

Mar 21 2005

AEon <AEon_member pathlink.com> writes:

Regan Heath says...
The simplest possible format...

If you assume your values cannot contain \r\n and your labels/settings  
cannot contain spaces then you can simply use the following format:

label<space>value<\r\n>

and parse it by calling "find" on each line, looking for a space, and  
assuming the rest of the line (minus the \r\n) is the value.

If you decide later on that you need \r\n in your values you can encode  
them as \, r, \, n eg.

label<space>regan\r\nwas\r\nhere<\r\n>

In general the fewer special characters you define, the fewer special  
cases you have to handle in values. Further if you can pick characters you  
will never want to use in values you don't have to handle any special  
cases at all.

Hmm... sure that would work, but space is definately something that needs to be
used. But your example would be a nightmare to read. The config file format is
supposed to be read and changed by not only myself, but any casual stats user.

A way to do it:

[varlable]
values of the variable the whole line not allowing // comments

sure that would be trivial to do. But also that is a bit less nice to read.

My stats will have quite extensive 4-5 columns ob obituatries using " as a
delimiter. So a good way to get rid of \" would help. In my ANSI C code, it
simply was not possible, and I was lucky enough not to require it (or I had to
hack my code to allow for it).

A quite elegant way to solve the problem would be to disallow tabs as values,
and use the tab char to seperate the columns. Problem with that is, the user
might accidentally add a tab and never notice it.

AEon

Mar 21 2005

"Regan Heath" <regan netwin.co.nz> writes:

On Mon, 21 Mar 2005 23:10:05 +0000 (UTC), AEon <AEon_member pathlink.com>  
wrote:
 Regan Heath says...
 The simplest possible format...

 If you assume your values cannot contain \r\n and your labels/settings
 cannot contain spaces then you can simply use the following format:

 label<space>value<\r\n>

 and parse it by calling "find" on each line, looking for a space, and
 assuming the rest of the line (minus the \r\n) is the value.

 If you decide later on that you need \r\n in your values you can encode
 them as \, r, \, n eg.

 label<space>regan\r\nwas\r\nhere<\r\n>

 In general the fewer special characters you define, the fewer special
 cases you have to handle in values. Further if you can pick characters  
 you
 will never want to use in values you don't have to handle any special
 cases at all.

 Hmm... sure that would work, but space is definately something that  
 needs to be
 used.

In the label/setting name?

 But your example would be a nightmare to read.

I don't think so, but I guess this is personal preference.
If you like, replace <space> with <tab>, or allow both.

 The config file format is
 supposed to be read and changed by not only myself, but any casual stats  
 user.

KISS (Keep It Simple Stupid) - no insult intended. The more people who  
have to edit it, the simpler you should attempt to make it.

Alternately provide a simple program to read/write it and get them to use  
that.

 A quite elegant way to solve the problem would be to disallow tabs as  
 values,
 and use the tab char to seperate the columns. Problem with that is, the  
 user
 might accidentally add a tab and never notice it.

- Treat consecutive tabs as 1 tab.
- Ignore trailing tabs.

I can see 1 potential problem. Depending on the text editor and length of  
the values in the file the columns might not line up in the text file.  
However, Excell and I imagine other spreadsheet style programs can  
load/save tab seperated value text files, also comma seperated text files  
i.e.

a,b,c
d,e,f
..etc..

Regan

Mar 21 2005

Derek Parnell <derek psych.ward> writes:

On Mon, 21 Mar 2005 01:27:31 +0000 (UTC), AEon wrote:

 I started to code by parser, a *lot* easier with D, commands like
 std.string.split work mirracles.
 
 But I am still wondering how to optimize parsing, in this case of a
 configuration file:
 
 <code>
 // comments
 [General]
 game		"Quake III Arena"
 gameInfo	"Retail, Rocket Arena III, Q3: Team Arena"
 gameOpt		"-q3a"			// *** comment
 gameMode	"16"
 // comments
 </code>
 
 I do a 
 
    std.string.find(line, "game")
 
 to find out if the line contains my key-variable. And then a
 
   char[][] splitLine = std.string.split(line, "\"");
 
 accessing the value of the var of interest via
 
  splitLine[1]
 
 Now that is fine and dandy. But when I want to allow the user to use double
 quotes (") in the config file, this will turn ugly, since the above split does
 not differ between " and \".
 
 Any ideas how to elegantly read the var/value pairs should the value contain a
 \"?
 
 (In C I did some very evil manual hacking to make that work).
 
 Thanx.
 

I have a module that will 'tokenize' lines that will probably suit your
needs. I've attached the code but if you can't fetch it that way, let me
know and I'll make it available on the web.

-- 
Derek
Melbourne, Australia
22/03/2005 1:46:54 PM
begin 644 test.d
M:6UP;W)T(&QI;F5T;VME;CL-" T*:6UP;W)T('-T9"YS=&1I;SL-" T*=F]I




M"B` ("!4;VMS(#T
M"B` ("!W<FET969L;B B7&Y5<VEN9R! )7- +"! )7- +"! )7- (BP 3&EN
M92P 1&5L:6TL($-O;6TI.PT*("` (&9O<F5A8V H:6YT(&DL(&-H87);72!L
M3&EN93L 5&]K<RD-"B` ("` ("` =W)I=&5F;&XH(B4R9"TM/F`E<V`B+"!I

M8RP 9&5F("P 6V=H:2P :FML72` (#L
M("([(CL-" T*("` (%1O:W, /2!4;VME;FEZ94QI;F4H3&EN92P 1&5L:6TL
M($-O;6TI.PT*("` ('=R:71E9FQN*")<;E5S:6YG(&`E<V`L(&`E<V`L(&`E

M8VAA<EM=(&Q,:6YE.R!4;VMS*0T*("` ("` ("!W<FET969L;B B)3)D+2T^
68"5S8"(L(&DL(&Q,:6YE*3L-" T*?0``
`
end


begin 644 linetoken.d
M;6]D=6QE(&QI;F5T;VME;CL-"G!R:79A=&4 >PT*("` (&EM<&]R="!S=&0N


M8V4L(&-H87);72!P1&5L:6T /2`B+"(L(&-H87);72!P0V]M;65N="`]("(O








M9&-H87);72!P4V]U<F-E+"!D8VAA<EM=('!$96QI;2`]("(L(BP 9&-H87);






M=&EC(&1C:&%R6UT =D-L;W-E0G)A8VME="`](")<(B<I77U (CL-" T*("` 
M(&EF("AP1&5L:6TN;&5N9W1H(#X ,"D-"B` ("` ("` +R\ 3VYL>2!U<V4 
M<VEN9VQE+6-H87( 9&5L:6UI=&5R<RX 17AC97-S(&-H87)S(&%R92!I9VYO
M<F5D+ T*("` ("` ("!L1&5L:6T ?CT <$1E;&EM6S!=.PT*("` (&5L<V4-
M"B` ("` ("` ;$1E;&EM(#T (B([("` +R\ 365A;FEN9R`G86YY(&=R;W5P

M("` (&Q4<FEM4W!O="`]("TQ.PT*("` (&9O<F5A8V H:6YT(&DL(&1C:&%R

M/2`P*0T*("` ("` ("![("` +R\ 0VAE8VL 9F]R(&-O;6UE;G0 <W1R:6YG




M("` ("` ("` ("` ("` ("` ('L-"B` ("` ("` ("` ("` ("` ("` ("` 




M/2`M,2D-"B` ("` ("` >PT*("` ("` ("` ("` +R\ 3F]T(&EN(&$ =&]K
M96X >65T+ T*("` ("` ("` ("` :68 *'-T9"YC='EP92YI<W-P86-E*&,I
M*0T*("` ("` ("` ("` ("` (&-O;G1I;G5E.R` +R\ 4VMI<"!O=F5R('-P

M<R!A8F]U="!T;R!S=&%R="X-"B` ("` ("` ("` (&Q);E1O:V5N(#T ;%)E

M;E1O:V5N("L ,3L-"B` ("` ("` ("` (&Q4<FEM4W!O="`]("TQ.PT*("` 

M("` ("` ("` ("!L4F5S=6QT6VQ);E1O:V5N72!^/2!C.PT*("` ("` ("` 
M("` ;$QI=$UO9&4 /2!F86QS93L-"B` ("` ("` ("` (&Q4<FEM4W!O="`]




M('L ("`O+R!/;FQY(&-H96-K(&9O<B!D96QI;6ET97)S(&EF(&YO="!I;B`G
M8G)A8VME="<M;6]D92X-"B` ("` ("` ("` (&EF("AL1&5L:6TN;&5N9W1H
M(#T
M9"YC='EP92YI<W-P86-E*&,I*0T*("` ("` ("` ("` ("` ('L-"B` ("` 
M("` ("` ("` ("` ("` ;%1R:6U3<&]T(#T
M("` ("` (&Q);E1O:V5N(#T
M($=O(&9E=&-H(&YE>'0 8VAA<F%C=&5R+ T*("` ("` ("` ("` ("` ("` 


M("` ("` ('L-"B` ("` ("` ("` ("` ("`O+R!&;W5N9"!A('1O:V5N(&1E

M("` ("` ("` :68 *&Q4<FEM4W!O="`A/2`M,2D-"B` ("` ("` ("` ("` 

M;V9F('1R86EL:6YG('-P86-E<RX-"B` ("` ("` ("` ("` ("` ("` ;%)E
M<W5L=%ML26Y4;VME;ETN;&5N9W1H(#T
M("` ("` ("` ("` (&Q4<FEM4W!O="`]("TQ.PT*("` ("` ("` ("` ("` 
M('T-"B` ("` ("` ("` ("` ("!L26Y4;VME;B`]("TQ.PT*("` ("` ("` 
M("` ("` ("\O($=O(&9E=&-H(&YE>'0 8VAA<F%C=&5R+ T*("` ("` ("` 

M"B` ("` ("` :68 *&Q297-U;'1;;$EN5&]K96Y=+FQE;F=T:"`]/2`P*0T*

M('EE="X-"B` ("` ("` ("` (&Q0;W, /2!F:6YD*'9/<&5N0G)A8VME="P 

M('L-"B` ("` ("` ("` ("` ("`O+R!!;B`G;W!E;B< 8G)A8VME="!W87, 
M9F]U;F0L('-O(&UA:V4 =&AI<R!I=',-"B` ("` ("` ("` ("` ("`O+R!O
M=VX =&]K96XL('-T87)T(&%N;W1H97( ;F5W(&]N92P 86YD(&=O(&EN=&\-
M"B` ("` ("` ("` ("` ("`O+R`G8G)A8VME="<M;6]D92X-"B` ("` ("` 

M("` ("` ;$EN5&]K96X /2!L4F5S=6QT+FQE;F=T:#L-"B` ("` ("` ("` 
M("` ("!L4F5S=6QT+FQE;F=T:"`](&Q);E1O:V5N("L ,3L-" T*("` ("` 






M("AC(#T
M("` ("` ("` ;$YE<W1,979E;"TM.PT*("` ("` ("` ("` ("` (&EF("AL
M3F5S=$QE=F5L(#T
M("` ("` ("` ("`O+R!/:V%Y+"!))W9E(&9O=6YD('1H92!E;F0 ;V8 =&AE
M(&)R86-K971E9"!C:&%R<RX-"B` ("` ("` ("` ("` ("` ("` +R\ 3F]T
M92!T:&%T('1H:7, 9&]E<VXG="!N96-E<W-A<FEL>2!M96%N('1H92!E;F0 
M;V8-"B` ("` ("` ("` ("` ("` ("` +R\ 82!T;VME;B!W87, 86QS;R!F




M("` ("` ("` (&-O;G1I;G5E.PT*("` ("` ("` ("` ("` ('T-"B` ("` 




M("` ("` +R\ 1FEN86QL>2P 22!G970 =&\ 861D('1H:7, 8VAA<B!T;R!T

M"B` ("` ("` :68 *&Q.97-T3&5V96P /3T ,"D-"B` ("` ("` ("` ("\O
M($]N;'D 8VAE8VL 9F]R('1R86EL:6YG('-P86-E<R!I9B!N;W0 :6X )V)R
M86-K970G+6UO9&4-"B` ("` ("` ("` (&EF("AS=&0N8W1Y<&4N:7-S<&%C
M92AC*2D-"B` ("` ("` ("` ('L-"B` ("` ("` ("` ("` ("`O+R!)="!W
M87, 82!S<&%C92P <V\ :70 :7, <&]T96YT:6%L;'D 82!T<F%I;&EN9R!S
M<&%C92P-"B` ("` ("` ("` ("` ("`O+R!T:'5S($D ;6%R:R!I=', <W!O
M="`H:68 :70G<R!T:&4 9FER<W0 :6X 82!S970 ;V8 <W!A8V5S+BD-"B` 
M("` ("` ("` ("` ("!I9B`H;%1R:6U3<&]T(#T]("TQ*0T*("` ("` ("` 
M("` ("` ("` ("!L5')I;5-P;W0 /2!L4F5S=6QT6VQ);E1O:V5N72YL96YG

M("` ("` ("!L5')I;5-P;W0 /2`M,3L-" T*("` ('T-" T*("` (&EF("AL
M4F5S=6QT+FQE;F=T:"`]/2`P*0T*("` ("` ("!L4F5S=6QT('X]("(B.PT*

M(%1R:6T ;V9F('1R86EL:6YG('-P86-E<R!O;B!L87-T('1O:V5N+ T*("` 
M("` ("!L4F5S=6QT6R0M,5TN;&5N9W1H(#T



M>PT*("` ("` ("!I9B`H<$-H87)4;T9I;F0 /3T 8RD-"B` ("` ("` ("` 

M=R!4;R!5<V4 /3T]/3T]/3T]/3T]/3T]/3T]/3T]/3T]/3T]/3T]/0T*26YS




M(%1O:W, /2!4;VME;FEZ94QI;F4H26YP=71,:6YE+"!$96QI;4-H87(L($-O
M;6UE;G13=')I;F<I.PT**BH 3F]T92!T:&%T(&ET(&%C8V5P=', 86QL("=C

M=&EN92!S8V%N<R!T:&4 :6YP=70 <W1R:6YG(&%N9"!R971U<FYS(&$ <V5T
M(&]F('-T<FEN9W,L(&]N90T*<&5R('1O:V5N(&9O=6YD(&EN('1H92!I;G!U


M96QI;4-H87( :7, 86X 96UP='D <W1R:6YG+"!T:&5N('1O:V5N<R!A<F4 
M9&5L:6UI=&5D(&)Y(&%N>2!G<F]U<`T*;V8 ;VYE(&]R(&UO<F4 =VAI=&4M

M(BX-" T*268 0V]M;65N=%-T<FEN9R!I<R!N;W0 96UP='DL('1H96X 86QL
M('!A<G1S(&]F('1H92!I;G!U="!S=')I;F< 9G)O;0T*=&AE(&)E9VEN:6YG
M(&]F('1H92!C;VUM96YT('1O('1H92!E;F0 87)E(&EG;F]R960N($)Y(&1E
M9F%U;'0-"D-O;6UE;G13=')I;F< :7, (B\O(BX-" T*268 82!T;VME;B!B
M96=I;G, =VET:"!A('%U;W1E("AS:6YG;&4L(&1O=6)L92!O<B!B86-K*2P 
M=&AE;B!Y;W4 =VEL;`T*9V5T(&)A8VL ='=O('1O:V5N<RX 5&AE(&9I<G-T
M(&ES('1H92!Q=6]T92!A<R!A('-I;F=L92!C:&%R86-T97( <W1R:6YG+`T*
M86YD('1H92!S96-O;F0 :7, 86QL('1H92!C:&%R86-T97)S('5P('1O+"!B
M=70 ;F]T(&EN8VQU9&EN9R!T:&4 ;F5X=`T*<75O=&4 ;V8 =&AE('-A;64 

M=&]K96X 8F5G:6YS('=I=&  82!B<F%C:V5T("AP87)E;G1H97-I<RP <W%U

M;VME;G,N(%1H92!F:7)S="!I<R!T:&4 ;W!E;FEN9R!B<F%C:V5T(&%S(&$ 

M;&P =&AE(&-H87)A8W1E<G, =7` =&\L(&)U="!N;W0-"FEN8VQU9&EN9RP 
M=&AE(&UA=&-H:6YG(&5N9"!B<F%C:V5T+"!T86MI;F< ;F5S=&5D(&)R86-K
M971S("AO9B!T:&4 <V%M90T*='EP92D :6YT;R!C;VYS:61E<F%T:6]N+ T*


M(&$ 8F%C:RUS;&%S:"!C:&%R86-T97( *%PI+"!T:&5N(&YE>'0 8VAA<F%C

M;BX 66]U(&-A;B!U<V4 =&AI<R!T;R!F;W)C90T*=&AE(&1E;&EM:71E<B!C
M:&%R86-T97( ;W( <W!A8V5S('1O(&)E(&EN<V5R=&5D(&EN=&\ 82!T;VME


M("!4;VME;FEZ94QI;F4H(F-H87)A8W1E<B` ("!O<B!S<&%C97, =&\ 8F4 
M7'0 :6YS97)T960B+"`B(BD-"B`M+3X >R)C:&%R86-T97(B+"`B;W(B+"`B
M<W!A8V5S(BP (G1O(BP (F)E(BP (FEN<V5R=&5D(GT-" T*("` 5&]K96YI
M>F5,:6YE*"( 86)C.R!D968 +"!G:&D[("(L("([(BD-"B`M+3X >R)A8F,B

M(%MD968 +"!G:&E=("` ("` ("` ("`B*0T*("TM/B![(F%B8R(L(");(BP 


M9VAI(GT-" T*("` 5&]K96YI>F5,:6YE*"( 86)C+"!;9&5F("P 6V=H:2P 
M:FML72!=("`B*0T*("TM/B![(F%B8R(L(");(BP (F1E9B`L(%MG:&DL(&IK
M;%T (GT-" T*("` 5&]K96YI>F5,:6YE*"( 86)C+"!D968 +"!G:&D .R!C




`
end

Mar 21 2005

AEon <AEon_member pathlink.com> writes:

Derek Parnell says...

22/03/2005 1:46:54 PM
begin 644 test.d

What kind of format it that. Looks like something similar to tar, shar or
something. Those I would copy/paste into a test file and unpack them via
TotalCommander. But your format?

AEon

Mar 22 2005

"Regan Heath" <regan netwin.co.nz> writes:

On Tue, 22 Mar 2005 12:43:44 +0000 (UTC), AEon <AEon_member pathlink.com>  
wrote:
 Derek Parnell says...

 22/03/2005 1:46:54 PM
 begin 644 test.d

 What kind of format it that. Looks like something similar to tar, shar or
 something. Those I would copy/paste into a test file and unpack them via
 TotalCommander. But your format?

It's uuencoded. Search the web for a utility commonly called uudecode. Or,  
if you have WinACE rename/save the data as a .uue file right click on it  
in windows explorer and use the winace "extract here" option.

Regan

Mar 22 2005

J C Calvarese <jcc7 cox.net> writes:

Regan Heath wrote:
 On Tue, 22 Mar 2005 12:43:44 +0000 (UTC), AEon 
 <AEon_member pathlink.com>  wrote:
 
 Derek Parnell says...


...
 It's uuencoded. Search the web for a utility commonly called uudecode. 
 Or,  if you have WinACE rename/save the data as a .uue file right click 
 on it  in windows explorer and use the winace "extract here" option.
 
 Regan

Decoding the fun way:
http://www.dsource.org/tutorials/index.php?show_example=146


-- 
Justin (a/k/a jcc7)
http://jcc_7.tripod.com/d/

Mar 22 2005

Derek Parnell <derek psych.ward> writes:

On Tue, 22 Mar 2005 12:43:44 +0000 (UTC), AEon wrote:

 Derek Parnell says...
 
22/03/2005 1:46:54 PM
begin 644 test.d

 
 What kind of format it that. Looks like something similar to tar, shar or
 something. Those I would copy/paste into a test file and unpack them via
 TotalCommander. But your format?

It ain't "my" format but a very commonly used one - UUEncode. Most news
readers can handle it but I'll make the file available on the web (for
now).

  http://www.users.bigpond.com/ddparnell/linetoken.d


-- 
Derek Parnell
Melbourne, Australia
22/03/2005 11:49:15 PM

Mar 22 2005

AEon <AEon_member pathlink.com> writes:

In article <18k6r1o1h3k7g.1478moloz1j32.dlg 40tude.net>, Derek Parnell says...
On Tue, 22 Mar 2005 12:43:44 +0000 (UTC), AEon wrote:

 Derek Parnell says...
 
22/03/2005 1:46:54 PM
begin 644 test.d

 
 What kind of format it that. Looks like something similar to tar, shar or
 something. Those I would copy/paste into a test file and unpack them via
 TotalCommander. But your format?

It ain't "my" format but a very commonly used one - UUEncode. Most news
readers can handle it but I'll make the file available on the web (for
now).

  http://www.users.bigpond.com/ddparnell/linetoken.d

I had not been suggesting you invented it ;)...

Copy/paste, name file uue works just fine with TotalCommander. Had totally
forgotten about uue.

AEon

Mar 22 2005

AEon <AEon_member pathlink.com> writes:

Derek Parnell says...

begin 644 linetoken.d

A few questions about your code:


  char[][] TokenizeLine(char[] pSource, char[] pDelim = ",", char[] pComment =
"//")

As I understand this, pDelim and pComment can be set on calling via
TokenizeLine(), but need not since both have "default" values? If so, another
very useful code example.


 int find(dchar[] pStringToScan, dchar pCharToFind)

I noted that you defined you own find() function. Generally would that function
conflict with those defined in the std lib? Or do user-defined functiontions
automatically shadow lib functions?


Amazing piece code (will take me a while to read/understand), from your example
output test cases :

TokenizeLine(" abc, def , ghi, ") // default Delim ","  Comment is "//"
--> {"abc", "def", "ghi", ""}

split(" abc, def , ghi, ", ","), and then apply strip() on every element, would
do the same, but not as elegantly :)


TokenizeLine("character    or spaces to be \t inserted", "")
--> {"character", "or", "spaces", "to", "be", "inserted"}

An empty delimiter seems to be an alias for \t and " " (space)? Nice!
(Just noted from your info: However, if DelimChar is an empty string, then
tokens are delimited by any group of one or more white-space characters. By
default, DelimChar is ",".)
Duplicating that with split() would be tough.


TokenizeLine(" abc; def , ghi; ", ";")
--> {"abc", "def , ghi", "" }

Noting, you seem to be calling something like strip() though not exactly that
function.


TokenizeLine(" abc, [def , ghi]        ") // default Delim ","  Comment is "//"
--> {"abc", "[", "def , ghi"}

(Explanation: If a token begins with a bracket (parenthesis, square, or brace),
then you will get back two tokens. The first is the opening bracket as a single
character string, and the second is all the characters up to, but not including,
the matching end bracket, taking nested brackets (of the same type) into
consideration.)

Would not:

--> {"abc", "[", "def , ghi", "]"}

or even

--> {"abc", "def , ghi" }

be "neater"?


TokenizeLine(" abc, [def , [ghi, jkl] ]  ")
--> {"abc", "[", "def , [ghi, jkl] "}

Anything in brackets is treated literally (i.e. as is), so nested brackets are
not interpreted. OK.

So if you actually wanted to use [] or () in strings, and that may well happen
often, one would actually need to "escape" those in some way? I am not sure that
the special treatment brackets require will always be convenient.

TokenizeLine(` abc, "def , ghi" , jkl `)
--> {"abc", `"`, "def , ghi", "jkl"}


TokenizeLine(` "moo"  \t " oi\"nk\"  " \t "ladida " //Comment`, `"`, `//`)
0-->``
1-->`moo`
2-->`t`
3-->`oi"nk"`
4-->`t`
5-->`ladida`

Wishlist:
0: Would wish to not have element 0.
1: fine, but should be element token 0.
2: \t tab no longer recognized, tab should have been ingnored
3: Perfect
4: same as 2
5: Perfect
"6" comment ignored, fine

I would hope to do this to a line:

`"moo" <whitespaces> " oi\"nk\"  " <whitespaces> "ladida "//Comment`

->0: `moo`
->1: `oi"nk"`
->2: `ladida`

Presently it would not be possibly to rely on a specific column to contain the
info a specific double quote pair.

Could that be made possible?

Thanx for your work...

AEon

Mar 22 2005

Derek Parnell <derek psych.ward> writes:

On Tue, 22 Mar 2005 14:14:12 +0000 (UTC), AEon wrote:

 Derek Parnell says...
 
begin 644 linetoken.d

 
 A few questions about your code:
 
 
   char[][] TokenizeLine(char[] pSource, char[] pDelim = ",", char[] pComment =
 "//")
 
 As I understand this, pDelim and pComment can be set on calling via
 TokenizeLine(), but need not since both have "default" values? If so, another
 very useful code example.
 
 
  int find(dchar[] pStringToScan, dchar pCharToFind)
 
 I noted that you defined you own find() function. Generally would that function
 conflict with those defined in the std lib? Or do user-defined functiontions
 automatically shadow lib functions?

The D method to resolve such ambiguities is to fully qualify the reference
with the package/module name, such as ...

    lPos = util.linetoken.find(lResult, lToken);

[snip]
 
 TokenizeLine("character    or spaces to be \t inserted", "")
 --> {"character", "or", "spaces", "to", "be", "inserted"}
 
 An empty delimiter seems to be an alias for \t and " " (space)? Nice!
 (Just noted from your info: However, if DelimChar is an empty string, then
 tokens are delimited by any group of one or more white-space characters. By
 default, DelimChar is ",".)
 Duplicating that with split() would be tough.

Yes, the empty delimiter uses groups of one or more whitespace characters
to act as a single delimiter. If you really need *just* the space character
to be the delimiter then use that, same with tabs. 

 
 TokenizeLine(" abc; def , ghi; ", ";")
 --> {"abc", "def , ghi", "" }
 
 Noting, you seem to be calling something like strip() though not exactly that
 function.
 
 
 TokenizeLine(" abc, [def , ghi]        ") // default Delim ","  Comment is "//"
 --> {"abc", "[", "def , ghi"}
 
 (Explanation: If a token begins with a bracket (parenthesis, square, or brace),
 then you will get back two tokens. The first is the opening bracket as a single
 character string, and the second is all the characters up to, but not
including,
 the matching end bracket, taking nested brackets (of the same type) into
 consideration.)
 
 Would not:
 
 --> {"abc", "[", "def , ghi", "]"}
 
 or even
 
 --> {"abc", "def , ghi" }
 
 be "neater"?

Often, when parsing the tokens you need to know if a token was enclosed in
brackets or quotes. By supplying the opening bracket or quote in the
returned tokens, you can quickly see which were bracketed tokens. Also,
there is no need to supply the closing bracket or quote as you know what
that would have been by the opening bracket or quote. In other words, if
you come across a token of "{" you know the next token was enclosed in
braces, so you don't need to see the final brace.
 
 
 TokenizeLine(" abc, [def , [ghi, jkl] ]  ")
 --> {"abc", "[", "def , [ghi, jkl] "}
 
 Anything in brackets is treated literally (i.e. as is), so nested brackets are
 not interpreted. OK.
 
 So if you actually wanted to use [] or () in strings, and that may well happen
 often, one would actually need to "escape" those in some way? I am not sure
that
 the special treatment brackets require will always be convenient.

There are two ways (at least) to do that. First method is to use the Escape
character (the back-slash "\").  

  TokenizeLine(` abc, \[def , [ghi, jkl] ]  `)
 --> { "abc", "[def", "ghi, jkl", "]" }

  TokenizeLine(`abc, def\, ghi, jkl`)
 --> { "abc", "def, ghi", "jkl"} Note only 3 tokens.

The other way is to enclose it inside a different sort of bracket/quote.

  TokenizeLine(`He said, '"Let's go down to the river".`,  ``)
 --> { `He`, `said,`,  `'`, `"Let's go down to the river".` }


 TokenizeLine(` "moo"  \t " oi\"nk\"  " \t "ladida " //Comment`, `"`, `//`)
 0-->``
 1-->`moo`
 2-->`t`
 3-->`oi"nk"`
 4-->`t`
 5-->`ladida`
 
 Wishlist:
 0: Would wish to not have element 0.
 1: fine, but should be element token 0.
 2: \t tab no longer recognized, tab should have been ingnored
 3: Perfect
 4: same as 2
 5: Perfect
 "6" comment ignored, fine

Well, you said that the token delimiter was the double-quote. Also, you
used the 'raw' string format so the sequence "\t" is not a tab but
literally a backslash-t combination. 
So this would have been broken up like this ...

` `
`moo`
`  \t `
` oi\"nk\"  `
` \t "`
`ladida `
` //Comment`

Then when leading and trailing spaces are removed you get ...
``
`moo`
`\t`
`oi\"nk\"`
`\t`
`ladida`
`//Comment`

Then applying escaped characters
``
`moo`
`t`
`oi"nk"`
`t`
`ladida`
`//Comment`

Then when removing comments ...
``
`moo`
`t`
`oi"nk"`
`t`
`ladida`



 I would hope to do this to a line:
 
 `"moo" <whitespaces> " oi\"nk\"  " <whitespaces> "ladida "//Comment`
 
 ->0: `moo`
 ->1: `oi"nk"`
 ->2: `ladida`
 

Toks = TokenizeLine(`"moo" <whitespaces> " oi\"nk\"  " <whitespaces>
"ladida "//Comment`", "");
// Toks --> { `"`, `moo`, `"`, ` oi"nk"  `"`, `ladida` }
int i;
foreach(char[] aTok; Toks)
{
   if (aTok != `"`) 
   {
       writefln("->%d: `%s`", i, std.string.strip(aTok));
       i++;
   }
}


 Presently it would not be possibly to rely on a specific column to contain the
 info a specific double quote pair.
 
 Could that be made possible?

I suppose so, but it is designed to handle free form text and not
column-delimited stuff. 

-- 
Derek Parnell
Melbourne, Australia
23/03/2005 1:40:47 AM

Mar 22 2005

AEon <AEon_member pathlink.com> writes:

Derek Parnell,

At least in my case:

[Weapons]
"0"	" killed   by MOD_SHOTGUN"	"Shotgun"	"SG"
"1"	" killed   by MOD_GAUNTLET"	"Gauntlet"	"G"

something quite simple just occured to me. When using std.string.splitline() to
read complete lines from a text file you, will *never* encounter a \n in the
line, since that would have placed the content on another line. 

So when you have something line this:

"1"	" killed \" \" by MOD_GAUNTLET"	"Gauntlet"	"G"

You cound do a replace \", \n and be sure that line will not loose any
information.

Then char[][] spline = split(line, "\""); And finally replace any \n back to \"
(or right back to " depending how you want to use the spline elements).

Obviously your code is a lot more flexible, but as we all strive for "KISS" ;)
my idea should work quite well.

BTW: I have been noting, many feedback posts should really be archived,
especially all the very useful code-examples.

AEon

Mar 22 2005

J C Calvarese <jcc7 cox.net> writes:

AEon wrote:
 Derek Parnell,
 BTW: I have been noting, many feedback posts should really be archived,
 especially all the very useful code-examples.
 
 AEon

I'm not sure what you mean by "archive". It's not like Walter clears out 
these newsgroups at the end of every month.

Walter has even produced some handy index pages such as 
http://www.digitalmars.com/d/archives/digitalmars/D/index.html

They are particularly useful because Google indexes them 
(http://www.digitalmars.com/d/archives/advancedsearch.html).

Apparently, he hasn't spun out "archives" for this particular newsgroup 
yet, but I'm sure he will eventually.

On the other hand, if by "archive" you mean gathering together snippets 
of code, the dsource tutorials projects has already done some of this: 
http://www.dsource.org/tutorials/. If you think new examples are being 
added too slowly, you're welcome to start adding some yourself. :)

Also, there's an ever-growing amount of useful information available at 
Wiki4D. Two of my favorite pages:
http://www.prowiki.org/wiki4d/wiki.cgi?NewsDmD
http://www.prowiki.org/wiki4d/wiki.cgi?ErrorMessages

Everyone is invited to add and/or update to the wiki content, too. It's 
much easier than writting HTML -- and quite self-explanatory.

-- 
Justin (a/k/a jcc7)
http://jcc_7.tripod.com/d/

Mar 22 2005

AEon <aeon2001 lycos.de> writes:

J C Calvarese wrote:

 AEon wrote:
 BTW: I have been noting, many feedback posts should really be archived,
 especially all the very useful code-examples.

 
 I'm not sure what you mean by "archive". It's not like Walter clears out 
 these newsgroups at the end of every month.

True enough, and now that I finally have a newsreader installed, 
(Mozilla Thunderbird), I can download and search the posts. I had not 
been aware of this, since I never used newsgroups before :). I am more 
of a Forum guy.

 Walter has even produced some handy index pages such as 
 http://www.digitalmars.com/d/archives/digitalmars/D/index.html
 
 They are particularly useful because Google indexes them 
 (http://www.digitalmars.com/d/archives/advancedsearch.html).
 
 Apparently, he hasn't spun out "archives" for this particular newsgroup 
 yet, but I'm sure he will eventually.

Personally I hope Walter does have to "waste" his time with things like 
that too much, giving him more time to work on D. So if the updates are 
less regular that is fine.

 On the other hand, if by "archive" you mean gathering together snippets 
 of code, the dsource tutorials projects has already done some of this: 
 http://www.dsource.org/tutorials/. If you think new examples are being 
 added too slowly, you're welcome to start adding some yourself. :)

That was what I had been thinking about. And I would normally help with 
this, but I am desperately trying to recode some 1000+ hours of AEstats 
coding to D, and that takes up all my time. But once that is done I'd be 
glad to help. Till then I should have a more solid grasp of D as well.

AEon

Mar 23 2005

J C Calvarese <jcc7 cox.net> writes:

AEon wrote:
 J C Calvarese wrote:

...
 True enough, and now that I finally have a newsreader installed, 
 (Mozilla Thunderbird), I can download and search the posts. I had not 
 been aware of this, since I never used newsgroups before :). I am more 
 of a Forum guy.

I made that transition myself a few years ago (web forums -> 
newsreader). I still regularly use the web interface when I'm away from 
my home computer, but I much prefer Thunderbird when it's available.

 Apparently, he hasn't spun out "archives" for this particular 
 newsgroup yet, but I'm sure he will eventually.

 
 Personally I hope Walter does have to "waste" his time with things like 
 that too much, giving him more time to work on D. So if the updates are 
 less regular that is fine.

I think it's automated to where it isn't much effort for him. He 
probably just pushes a button every month or so. (I don't want him to 
waste a lot of time on it either, but it's nice to have Google index the 
newsgroup messages.)

 On the other hand, if by "archive" you mean gathering together 
 snippets of code, the dsource tutorials projects has already done some 
 of this: http://www.dsource.org/tutorials/. If you think new examples 
 are being added too slowly, you're welcome to start adding some 
 yourself. :)

 
 
 That was what I had been thinking about. And I would normally help with 
 this, but I am desperately trying to recode some 1000+ hours of AEstats 
 coding to D, and that takes up all my time. But once that is done I'd be 
 glad to help. Till then I should have a more solid grasp of D as well.
 
 AEon

Good.

No pressure, I was just suggesting some easy ways to collaborate. 
AEstats sounds like an interesting use of time, too. ;)

-- 
Justin (a/k/a jcc7)
http://jcc_7.tripod.com/d/

Mar 23 2005

AEon <aeon2001 lycos.de> writes:

J C Calvarese wrote:

 No pressure, I was just suggesting some easy ways to collaborate. 
 AEstats sounds like an interesting use of time, too. ;)

:9

In 4 days I have been able to do more in AEstats (in D), than I was able 
to do in AEstats++ (in C) in 3-4 weeks. Since in D I no longer need to 
use pointers, strings are for free, and D has very powerful easy to use 
sting functions, most of my code is basically error checking, e.g. 
config using invalid syntax, missing double quotes and the like. The 
code itself is very minimal.

This is the way C could should always have been!

I already have all the hardcoded obituaries replaced with configuration 
obituaries that are read on the fly. Sure this is not really a big deal, 
trying to coding that in C made me weep...

And the best part, AEstats (then called AEstats++) will sooner or later 
be database driven via MYSQL... and that also should be a *lot* simpler 
to do than an C++ code.

AEon

Mar 25 2005

D Programming

C/C++ Programming

Other

digitalmars.D.learn - String Parsing with \" in a ".." text line