digitalmars.D.learn - How would I optimize this parser?
- T.D.Spenser (99/99) Oct 30 2010 I wrote a simple parser in D. The language for it looks like this:
- bearophile (10/11) Oct 30 2010 Very good :-)
- spir (60/61) Oct 31 2010 some cases an Appender helps, and in some cases some workaround may be n...
- Nick Sabalausky (3/5) Oct 31 2010 std.algorithm. But the docs for it do need to be improved.
- spir (13/20) Oct 31 2010 I find it very strange that map/filter/reduce take *strings* as func exp...
- Simen kjaeraas (9/25) Oct 31 2010 They take both, in fact:
- bearophile (25/28) Oct 31 2010 But that needs to be compile-time constant, so if you have several funct...
- Steven Schveighoffer (6/13) Nov 02 2010 I don't think it needs to be a compile-time constant.
- Rainer Deyke (5/9) Oct 31 2010 In other words: function literals in D are too verbose to be convenient.
- spir (28/36) Nov 01 2010 True. But it is very hard to find a notation that is at the same time:
- T.D.Spenser (12/23) Oct 31 2010 something to optimize for.
- bearophile (12/14) Oct 31 2010 Good. Your code looks good, well formatted and readable.
- bearophile (15/15) Oct 31 2010 If you import:
- bearophile (6/7) Oct 31 2010 The program allocates 169_000 TextNode and 245_001 TagNode. Just alloca...
- bearophile (3/3) Oct 31 2010 Converting the nodes to tagged structs, and allocating the nodes with a ...
- spir (9/10) Nov 01 2010 ey cause reallocations.
- bearophile (4/5) Nov 01 2010 In some situations that's not handy to do, because you don't know what's...
- bearophile (5/6) Nov 01 2010 I have answered the wrong question sorry :-) My answer was about the tre...
- T.D.Spenser (31/47) Nov 02 2010 I'll reply to several posts at once.
- bearophile (50/71) Nov 02 2010 Java code, prints "12":
I wrote a simple parser in D. The language for it looks like this: [tagName content content [childTagName] [childTagName more content]] The parser is here: http://thoughtdispenser.net/files/parser.d Unfortunately, it's quite slow. Can anyone point out what might be the issue(s)? begin 644 parser.d M<'5B;&EC(&-L87-S(%-T871I8U!A<G-E<B!["B` ("!P=6)L:6, <W1A=&EC M(%1A9TYO9&4 <&%R<V4H<W1R:6YG(&-O9&4I>PH ("` ("` (%1A9TYO9&4 M<F]O="`](&YE=R!486=.;V1E*"(B*3L*("` ("` ("!I9B`H8V]D92YL96YG M=& /B`P*2!["B` ("` ("` ("` (&EN="!C;VYT96YT16YD(#T <&%R<V5" M9"`\(&-O9&4N;&5N9W1H*2!["B` ("` ("` ("` ("` ("!T:')O=R!N97< M17AC97!T:6]N*")4:&5R92!A<F4 97AT<F$ 8VQO<VEN9R!B<F%C:V5T<R(I M.PH ("` ("` ("` ("!]"B` ("` ("` ?0H ("` ("` (')E='5R;B!R;V]T M.PH ("` ?0H*("` ('-T871I8R!P=6)L:6, :6YT('!A<G-E0F]D>2AS=')I M;F< =&5X="P 5&%G3F]D92!C=7)R96YT3F]D92P :6YT('!O<VET:6]N*7L* M("` ("` ("!W:&EL92`H<&]S:71I;VX /"!T97AT+FQE;F=T:"D >PH ("` M("` ("` ("!I9B`H=&5X=%MP;W-I=&EO;ET /3T )ULG*2!["B` ("` ("` M("` ("` ("!P;W-I=&EO;B`]('!A<G-E3F]D92AT97AT+"!C=7)R96YT3F]D M92P <&]S:71I;VX *R`Q*3L*("` ("` ("` ("` ("` ('!O<VET:6]N*RL[ M"B` ("` ("` ("` ('T 96QS92!I9B`H=&5X=%MP;W-I=&EO;ET /3T )UTG M*2!["B` ("` ("` ("` ("` ("!R971U<FX <&]S:71I;VX["B` ("` ("` M("` ('T 96QS92!["B` ("` ("` ("` ("` ("!P;W-I=&EO;B`]('!A<G-E M5&5X="AT97AT+"!C=7)R96YT3F]D92P <&]S:71I;VXI.PH ("` ("` ("` M("!]"B` ("` ("` ?0H ("` ("` (')E='5R;B!T97AT+FQE;F=T:#L*("` M('T*"B` ("!S=&%T:6, <'5B;&EC(&EN="!P87)S951E>'0H<W1R:6YG('1E M>'0L(%1A9TYO9&4 8W5R<F5N=$YO9&4L(&EN="!S=&%R="E["B` ("` ("` M+R]A<W-E<G0H=&5X=%MS=&%R=%T (3T )ULG("8F('1E>'1;<W1A<G1=("$] M("==)RD["B` ("` ("` :6YT('!O<VET:6]N(#T <W1A<G0["B` ("` ("` M=VAI;&4 *'!O<VET:6]N(#P =&5X="YL96YG=& I('L*("` ("` ("` ("` M:68 *'1E>'1;<&]S:71I;VY=(#T]("=;)R!\?"!T97AT6W!O<VET:6]N72`] M/2`G72<I('L*("` ("` ("` ("` ("` (&)R96%K.PH ("` ("` ("` ("!] M"B` ("` ("` ("` ('!O<VET:6]N*RL["B` ("` ("` ?0H ("` ("` ('-T M<FEN9R!T97AT0VAA<G, /2!T97AT6W-T87)T("XN('!O<VET:6]N73L*("` M("` ("!497AT3F]D92!T97AT3F]D92`](&YE=R!497AT3F]D92AT97AT0VAA M<G,I.PH ("` ("` (&-U<G)E;G1.;V1E+F%D9$-H:6QD*'1E>'1.;V1E*3L* M("` ("` ("!R971U<FX <&]S:71I;VX["B` ("!]" H ("` <W1A=&EC('!U M8FQI8R!I;G0 9V5T3F%M945N9"AS=')I;F< =&5X="P :6YT('!O<VET:6]N M*7L*("` ("` ("!W:&EL92`H<&]S:71I;VX /"!T97AT+FQE;F=T:"D >PH M("` ("` ("` ("!S=VET8V *'1E>'1;<&]S:71I;VY=*2!["B` ("` ("` M("` ("` ("!C87-E("< )RP )UQT)RP )UQN)RP )UQR)RP )UTG+"`G6R<Z M"B` ("` ("` ("` ("` ("` ("` <F5T=7)N('!O<VET:6]N.PH ("` ("` M("` ("` ("` 8G)E86L["B` ("` ("` ("` ("` ("!D969A=6QT.B`*("` M("` ("` ("` ("` ("` ("`O+V1O(&YO=&AI;F<L(&YE960 =&AI<R!T;R!A M=F]I9"!S:6QL>2!E<G)O<G,*("` ("` ("` ("` ?0H ("` ("` ("` ("!P M;W-I=&EO;BLK.PH ("` ("` ('T*("` ("` ("`O+V%S<V5R="AF86QS92D[ M"B` ("` ("` =&AR;W< ;F5W($5X8V5P=&EO;B B4V]M92!T86=S(&%R92!N M;W0 8VQO<V5D(BD["B` ("!]" H ("` <W1A=&EC('!U8FQI8R!I;G0 <&%R M<V5.;V1E*'-T<FEN9R!T97AT+"!486=.;V1E('!A<F5N=$YO9&4L(&EN="!S M=&%R="E["B` ("` ("` :6YT('!O<VET:6]N(#T 9V5T3F%M945N9"AT97AT M+"!S=&%R="D["B` ("` ("` <W1R:6YG(&YA;64 /2!T97AT6W-T87)T("XN M('!O<VET:6]N73L*("` ("` ("!486=.;V1E('1H:7-.;V1E(#T ;F5W(%1A M9TYO9&4H;F%M92D[" H ("` ("` (&EF("AT97AT6W!O<VET:6]N72`A/2`G M6R< )B8 =&5X=%MP;W-I=&EO;ET (3T )UTG*2!["B` ("` ("` ("` ('!O M<VET:6]N*RL[("\O<VMI<"!T:&4 =VAI=&5S<&%C92!C:&%R('1H870 <V5P M87)A=&5S(&YA;64 86YD(&)O9'D*("` ("` ("!]" H ("` ("` (&EN="!T M86=%;F0 /2!P87)S94)O9'DH=&5X="P =&AI<TYO9&4L('!O<VET:6]N*3L* M("` ("` ("!I9B`H=&%G16YD(#X]('1E>'0N;&5N9W1H*2!["B` ("` ("` M("` ('1H<F]W(&YE=R!%>&-E<'1I;VXH(E-O;64 =&%G(&ES;B=T(&-L;W-E M9"(I.PH ("` ("` ('T*("` ("` ("!P87)E;G1.;V1E+F%D9$-H:6QD*'1H M:7-.;V1E*3L*("` ("` ("!R971U<FX =&%G16YD.PH ("` ?0I]" IP=6)L M:6, 86)S=')A8W0 8VQA<W, 3F]D92!["B` ("!.;V1E('!A<F5N=#L*("` M(`H ("` <'5B;&EC(&%B<W1R86-T('-T<FEN9R!T;U-T<FEN9R I.PH ("` M"B` ("!P=6)L:6, 3F]D92!G971087)E;G0H*7L*("` ("` ("!R971U<FX M<&%R96YT.PH ("` ?0I]" IP=6)L:6, 8VQA<W, 5&%G3F]D92`Z($YO9&4 M>PH ("` <W1R:6YG(&YA;64["B` ("!.;V1E6UT 8VAI;&1R96X[" H ("` M<'5B;&EC('1H:7,H<W1R:6YG(&YA;64L($YO9&5;72!C:&EL9')E;BXN+BE[ M("\O8V%N)W0 :&%V92!T=V\ 8V]N<W1R=6-T;W)S(&QI:V4 :6X 2F%V80H M("` ("` ('1H:7,N;F%M92`](&YA;64["B` ("` ("` :68 *&-H:6QD<F5N M:6QD<F5N.PH ("` ("` ("` ("!F;W)E86-H("A.;V1E(&-H:6QD(#L 8VAI M;&1R96XI('L*("` ("` ("` ("` ("` (&-H:6QD+G!A<F5N="`]('1H:7,[ M"B` ("` ("` ("` ('T*("` ("` ("!]"B` ("!]"B` ("`*("` ('!U8FQI M8R!S=')I;F< 9V5T3F%M92 I>PH ("` ("` (')E='5R;B!N86UE.PH ("` M?0H*("` ('!U8FQI8R!.;V1E(&=E=$-H:6QD*&EN="!P;W-I=&EO;BE["B` M("` ("` <F5T=7)N(&-H:6QD<F5N6W!O<VET:6]N73L*("` ('T*"B` ("!P M=6)L:6, 5&%G3F]D95M=(&=E=$-H:6QD<F5N*'-T<FEN9R!N86UE*7L*("` M("` ("!486=.;V1E6UT ;6%T8VAE<SL*("` ("` ("!F;W)E86-H("A.;V1E M(&-H:6QD(#L 8VAI;&1R96XI('L*("` ("` ("` ("` 875T;R!C:&EL9%1A M9R`](&-A<W0H5&%G3F]D92D 8VAI;&0["B` ("` ("` ("` (&EF("AC:&EL M9%1A9R`A:7, ;G5L;"`F)B!C:&EL9%1A9RYG971.86UE*"D /3T ;F%M92D M>R`O+W1R:6-K>2P <VEN8V4 8V%S="!I<R!U<V5D('1O(&-H96-K('1Y<&4* M("` ("` ("` ("` ("` (&UA=&-H97, ?CT 8VAI;&1486<["B` ("` ("` M("` ('T*("` ("` ("!]"B` ("` ("` <F5T=7)N(&UA=&-H97,["B` ("!] M("` ("` 8VAI;&1R96X ?CT ;F5W0VAI;&0["B` ("` ("` ;F5W0VAI;&0N M<&%R96YT(#T =&AI<SL*("` ('T*"B` ("!P=6)L:6, <W1R:6YG('1O4W1R M:6YG*"E["B` ("` ("` <W1R:6YG(&-O;G1E;G0 /2!N86UE('X (B`B.PH M("` ("` (&9O<F5A8V *$YO9&4 8VAI;&0 .R!C:&EL9')E;BD >PH ("` M("` ("` ("!C;VYT96YT('X](&-H:6QD+G1O4W1R:6YG*"D["B` ("` ("` M?0H ("` ("` (&EF("AP87)E;G0 :7, ;G5L;"D >PH ("` ("` ("` ("!R M971U<FX 8V]N=&5N=#L*("` ("` ("!]"B` ("` ("` <F5T=7)N(");(B!^ M(&-O;G1E;G0 ?B`B72(["B` ("!]"GT*"G!U8FQI8R!C;&%S<R!497AT3F]D M92`Z($YO9&4 >PH ("` <W1R:6YG(&-O;G1E;G0[" H ("` =&AI<RAS=')I M;F< 8V]N=&5N="E["B` ("` ("` =&AI<RYC;VYT96YT(#T 8V]N=&5N=#L* M("` ('T*"B` ("!P=6)L:6, <W1R:6YG('1O4W1R:6YG*"E["B` ("` ("` 8<F5T=7)N(&-O;G1E;G0["B` ("!]"GT* ` end
Oct 30 2010
T.D.Spenser:Unfortunately, it's quite slow. Can anyone point out what might be the issue(s)?Very good :-) First suggestions: - Generally compile your D code using the -w switch (you have missing some overrides and more things) - Add a main() with some benchmarking examples to your program, so I will have something to optimize for. - Use the -profile compiler switch on the benchmarking examples to see what's slow. - Use final classes or final methods where possible and keep an eye on memory allocations. Consider using structs in some places instead of classes. - Keep in mind that ~ and ~= in arrays isn't a very fast operation. In some cases an Appender helps, and in some cases some workaround may be needed. See you later, bearophile
Oct 30 2010
On Sat, 30 Oct 2010 23:00:59 -0400 bearophile <bearophileHUGS lycos.com> wrote:- Keep in mind that ~ and ~=3D in arrays isn't a very fast operation. In =some cases an Appender helps, and in some cases some workaround may be need= ed. A use case of join (in std.string) is indeed to build a text from snippets. But I personly use this func most often to build a text representation of l= ist of elements of any kind. For this, join falls short: * no auto-conversion to string * no delimiters Thus, I think the following may be useful in the stdlib (aside join): =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D string listText(T)(T[] elements, string sep) { uint l =3D elements.length ; if (l =3D=3D 0) return "" ; string[] texts ; texts.length =3D l ; foreach (uint i, T element ; elements) texts[i] =3D to!string(elements[i]) ; return join(texts, sep) ; } string listText(T)(T[] elements, string sep, string leftDelim, string right= Delim) { return format("%s%s%s", leftDelim, listText(elements, sep), rightDelim)= ; } // testing struct Symbol { string k; int v; string toString () { return format("%s:%s", this.k, to!string(this.v)) ; } } void main () { int[] ints =3D [1,2,3] ; writeln(listText(ints , "---")) ; writeln(listText(ints , " " , "(",")")) ; Symbol[3] symbols =3D [Symbol("a",1) , Symbol("b",2) , Symbol("c",3)] ; writeln(listText(symbols , " | " , "{ "," }")) ; } // writes: 1---2---3 (1 2 3) { a:1 | b:2 | c:3 } =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D This works sensibly with any custom type where toString is defined. (else y= ou get the type name ;-) I would also love a mapping-func param, to allow expressing things like lis= tText(ints, square, " "); but without optional parameters (AFAIK), combinat= ions are problematic. Also, I could not find functional methods like map, filter, reduce in std.f= unctional. Where else? Also not in std.array. Map would be handy above to s= tring-ify. And remove the need for a map func param in listText (I would be= happy to contribute them if someone guides me on how to participate.) Denis -- -- -- -- -- -- -- vit esse estrany =E2=98=A3 spir.wikidot.com
Oct 31 2010
"spir" <denis.spir gmail.com> wrote in message news:mailman.45.1288523296.21107.digitalmars-d-learn puremagic.com...Also, I could not find functional methods like map, filter, reduce in std.functional. Where else? Also not in std.array.std.algorithm. But the docs for it do need to be improved.
Oct 31 2010
On Sun, 31 Oct 2010 17:31:54 -0400 "Nick Sabalausky" <a a.a> wrote:"spir" <denis.spir gmail.com> wrote in message=20 news:mailman.45.1288523296.21107.digitalmars-d-learn puremagic.com...I find it very strange that map/filter/reduce take *strings* as func expres= sions, instead of function literals: int[] arr =3D [ 1, 2, 3, 4 ]; auto squares =3D map!("a * a")(arr); Why then have func literal? Func parameter is precisely the place where pro= grammers like to use them, isn't it? Isn't this a sign that D func literals= are not considered as practical enough? Denis -- -- -- -- -- -- -- vit esse estrany =E2=98=A3 spir.wikidot.comAlso, I could not find functional methods like map, filter, reduce in std.functional. Where else? Also not in std.array.=20 std.algorithm. But the docs for it do need to be improved.
Oct 31 2010
spir <denis.spir gmail.com> wrote:On Sun, 31 Oct 2010 17:31:54 -0400 "Nick Sabalausky" <a a.a> wrote:They take both, in fact: auto cubes = map!((a){ return a*a*a; })(arr);"spir" <denis.spir gmail.com> wrote in message news:mailman.45.1288523296.21107.digitalmars-d-learn puremagic.com...I find it very strange that map/filter/reduce take *strings* as func expressions, instead of function literals: int[] arr = [ 1, 2, 3, 4 ]; auto squares = map!("a * a")(arr);Also, I could not find functional methods like map, filter, reduce in std.functional. Where else? Also not in std.array.std.algorithm. But the docs for it do need to be improved.Why then have func literal? Func parameter is precisely the place where programmers like to use them, isn't it? Isn't this a sign that D func literals are not considered as practical enough?For very short functions, strings are better, because of the length of the 'return' keyword. Had D instead always returned the result of the last line of a function (unless specified to be of void return type), map/filter/reduce would likely not take strings. -- Simen
Oct 31 2010
Simen kjaeraas:They take both, in fact: auto cubes = map!((a){ return a*a*a; })(arr);But that needs to be compile-time constant, so if you have several functions, you need to put them inside a typetuple, or duplicate the code. And some idioms are just not possible. You may see it well here regarding the "compose": http://rosettacode.org/wiki/First-class_functions#D import std.stdio, std.typetuple, std.functional; private import std.math; void main() { // wrappers needed as not all built-in functions // have same signature, eg pure/nothrow auto sin = (real x) { return std.math.sin(x); }; auto asin = (real x) { return std.math.asin(x); }; auto cos = (real x) { return std.math.cos(x); }; auto acos = (real x) { return std.math.acos(x); }; auto cube = (real x) { return x ^^ 3; }; auto cbrt = (real x) { return std.math.cbrt(x); }; alias TypeTuple!(sin, cos, cube) dir; alias TypeTuple!(asin, acos, cbrt) inv; foreach (i, f; dir) writefln("%6.3f", compose!(f, inv[i])(0.5)); } Bye, bearophile
Oct 31 2010
On Sun, 31 Oct 2010 19:20:04 -0400, bearophile <bearophileHUGS lycos.com> wrote:Simen kjaeraas:I don't think it needs to be a compile-time constant. I think one of the most horrible problems with documenting D templates is it's impossible to tell what's possible without trying it. -SteveThey take both, in fact: auto cubes = map!((a){ return a*a*a; })(arr);But that needs to be compile-time constant, so if you have several functions, you need to put them inside a typetuple, or duplicate the code. And some idioms are just not possible.
Nov 02 2010
On 10/31/2010 16:57, Simen kjaeraas wrote:For very short functions, strings are better, because of the length of the 'return' keyword. Had D instead always returned the result of the last line of a function (unless specified to be of void return type), map/filter/reduce would likely not take strings.In other words: function literals in D are too verbose to be convenient. This language-level shortcoming is plastered over at the library level. -- Rainer Deyke - rainerd eldwood.com
Oct 31 2010
On Sun, 31 Oct 2010 20:24:59 -0600 Rainer Deyke <rainerd eldwood.com> wrote:On 10/31/2010 16:57, Simen kjaeraas wrote:True. But it is very hard to find a notation that is at the same time: * compact * general * clear In many language communities, there are endless discussions of this topic. = One common issue is the contradiction between compact and general. Oftentim= es people come up with a cooool format without realising they only consider= their little egotic use case (typically one single arg, one single expr, e= tc...). Also, the need for clarity is difficult to achieve for historical reasons i= n languages of the C & Pascal lines: namely that func defs (and type defs) = were basically special cased (because then they were not first-class elemen= ts): string Text (...) {...} instead of Text =3D func string (...) {...} This makes theoretically superior formats difficult to integrate nicely wit= h the general syntax of the language. The solution is indeed to get rid of = the legacy syntax and use only literals (1), but... Denis (1) Once the bug that prevents using literals in func defs, using the form auto f =3D function type (params) {block} is solved, we do not need the legacy format at all anymore. -- -- -- -- -- -- -- vit esse estrany =E2=98=A3 spir.wikidot.comFor very short functions, strings are better, because of the length of the 'return' keyword. Had D instead always returned the result of the last line of a function (unless specified to be of void return type), map/filter/reduce would likely not take strings.=20 In other words: function literals in D are too verbose to be convenient. This language-level shortcoming is plastered over at the library level.
Nov 01 2010
== Quote from bearophile (bearophileHUGS lycos.com)'s articleT.D.Spenser:overrides and more things)Unfortunately, it's quite slow. Can anyone point out what might be the issue(s)?Very good :-) First suggestions: - Generally compile your D code using the -w switch (you have missing some- Add a main() with some benchmarking examples to your program, so I will havesomething to optimize for.- Use the -profile compiler switch on the benchmarking examples to see what's slow. - Use final classes or final methods where possible and keep an eye on memoryallocations. Consider using structs in some places instead of classes.- Keep in mind that ~ and ~= in arrays isn't a very fast operation. In somecases an Appender helps, and in some cases some workaround may be needed.See you later, bearophileThanks for replying. Here is a version with a main function and sample file to parse: http://thoughtdispenser.net/files/parser-test.7z I fixed the warnings, but didn't change anything else yet. Having built-in profiler is nice, I didn't know about it. Although, the files it spits out are quite strange. If I read them right, it seems the parser spends most time constructing nodes for the document's object model.
Oct 31 2010
T.D.Spenser:Here is a version with a main function and sample file to parse:Good. Your code looks good, well formatted and readable. But before you try to optimize code you have write some tests, otherwise you can't be sure you are not breaking the code, it's like driving with closed eyes. D has both unittests and design by contract, add them (it's also a good training in using such important things). I will try to look at your code even without that necessary safety net, but I can't be sure I am not breaking your code, so I hope you will give me some tests&contracts too.Although, the files it spits out are quite strange.I think they are normal enough for a profiler. GCC produces a not too much different file. Notes on your code: Is this a critique of the D way to tell if an object is of a class? :-) auto childTag = cast(TagNode) child; if (childTag !is null && childTag.getName() == name) { //tricky, since cast is used to check type In D you may have many constructors, so what were you trying to do here? public this(string name, Node[] children...) { //can't have two constructors like in Java Bye, bearophile
Oct 31 2010
If you import: import core.memory: GC; And then you disable the GC just before parsing: GC.disable(); The parsing runtime on my PC becomes about one third. Disabling the GC when you have to build a large data structure and re-enabling it after the creation is a normal optimization both in CPython and D. You may also import: import std.c.stdlib: exit; And add this last line to the main: exit(0); to kill garbage collection at the end to remove the final collection time you weren't timing (but was there). Now the total running time is about 0.3 seconds instead of 1.1 seconds. All this means that your code is GC-bound :-) D GC is far worse than the Java one. I will try to optimize other things. Later, bearophile
Oct 31 2010
bearophile:Now the total running time is about 0.3 seconds instead of 1.1 seconds.The program allocates 169_000 TextNode and 245_001 TagNode. Just allocating two dynamic arrays of them, even with disabled GC, takes about 0.16 seconds of the about 0.30 of running time. The children arrays inside TagNode receive a total of 414_000 appends, they cause reallocations. I'll try to study the code some more. Bye, bearophile
Oct 31 2010
Converting the nodes to tagged structs, and allocating the nodes with a memory pool data structure (that allocates large chunks from the C heap), seems to reduce the total running time to about 0.15 seconds. I don't know a good language to write this kind of code. Bye, bearophile
Oct 31 2010
On Sun, 31 Oct 2010 22:02:10 -0400 bearophile <bearophileHUGS lycos.com> wrote:The children arrays inside TagNode receive a total of 414_000 appends, th=ey cause reallocations. Then an optimization would be to somehow grossly predict array sizes and pr= eallocate? Denis -- -- -- -- -- -- -- vit esse estrany =E2=98=A3 spir.wikidot.com
Nov 01 2010
spir:Then an optimization would be to somehow grossly predict array sizes and preallocate?In some situations that's not handy to do, because you don't know what's a good size to preallocate. So I suggest a more general solution, to create a struct that inside keeps a dynamic array of pointers to fixed-sized chunks of structs. The size of the chunk size is chosen at compile time to be "good" (close to 64 KB, for example). A member function of this struct may return a pointer to a new struct on request, and allocate a new chunk when the last allocated chunk is full. Another member function may clear the data structure with no deallocation (just resetting the pointer to the first free struct), and another member function may really free all the chunks. This data structure may allocate its chunks from the GC heap or the C heap. This is the data structure I have used for this parser. Bye, bearophile
Nov 01 2010
spir:Then an optimization would be to somehow grossly predict array sizes and preallocate?I have answered the wrong question sorry :-) My answer was about the tree node allocations. I have not studied enough the allocation patterns of those arrays, so I can't answer. You may collect better statistics yourself. Bye, bearophile
Nov 01 2010
I'll reply to several posts at once. The code comments you quoted weren't meant as a criticism. They are mostly notes to myself in case I want to document my learning experience with D on some personal website no one ever visits. I'll post criticisms separately if/when I have any. :) When I was speaking about several constructors, I referred to stuff like this: public TagNode(String name){ this.name = name; this.children = new ArrayList<Node>(1); } public TagNode(String name, Node... children){ this.children = Arrays.asList(children); for (Node child : children) { child.parent = this; } } It's allowed in Java, but I can see why this can be prohibited. You're right about unit tests. One thing that surprised me is that unit tests are run when the program is run. For some reason I thought they were run immediately after the compilation with the appropriate flag. == Quote from bearophile (bearophileHUGS lycos.com)'s articleIf you import: import core.memory: GC; And then you disable the GC just before parsing: GC.disable(); The parsing runtime on my PC becomes about one third.Interesting. I didn't think about GC, because there aren't any objects that get unreferenced. But, of course, garbage collector can't know about that until it runs. (Well, unless there is a mechanism for marking objects as non-collectible manually.) Tangent question. Is there a way to disable GC per application thread?Disabling the GC when you have to build a large data structure and re-enablingit after the creation is a normal optimization both in CPython and D.You may also import: import std.c.stdlib: exit; And add this last line to the main: exit(0); to kill garbage collection at the end to remove the final collection time youweren't timing (but was there).Now the total running time is about 0.3 seconds instead of 1.1 seconds. All this means that your code is GC-bound :-) D GC is far worse than the Java one. I will try to optimize other things. Later, bearophileThanks for such informative replies. I can't quite keep up on the weekdays, but I will experiment with the suggestions this weekend and probably post an update of some sort.
Nov 02 2010
T.D.Spense:public TagNode(String name){ this.name = name; this.children = new ArrayList<Node>(1); } public TagNode(String name, Node... children){ this.children = Arrays.asList(children); for (Node child : children) { child.parent = this; } } It's allowed in Java, but I can see why this can be prohibited.Java code, prints "12": class Node {} class Main { public void foo(String name) { System.out.print("1"); } public void foo(String name, Node... nodes) { System.out.print("2"); } public static void main(String[] args) { Main m = new Main(); m.foo("hello"); m.foo("hello", new Node()); } } In D the T[]... syntax means zero or more T, so if you supply zero T, the compiler can't know what of the two methods you are calling, so it's statically forbidden. This prints 22 in Java: class Node {} class Main { public void foo(String name, Node... nodes) { System.out.print("2"); } public static void main(String[] args) { Main m = new Main(); m.foo("hello"); m.foo("hello", new Node()); } } Here I prefer how D is designed, it looks tidier (often it's the opposite, theYou're right about unit tests. One thing that surprised me is that unit tests are run when the program is run. For some reason I thought they were run immediately after the compilation with the appropriate flag.If you compile your code with: dmd foo.d And then you run the program: foo.exe or: ./foo Your unit tests aren't run. If you instead compile with this: dmd -unittest foo.d And then you run the program, the unittests are run and then the program is run. In D2 with a version(unittest) you may disable the running as you like.Interesting. I didn't think about GC, because there aren't any objects that get unreferenced. But, of course, garbage collector can't know about that until it runs.Right, the program builds the data structure and then now and then the GC transversed it all and marks all objects as reachable. This takes a lot of time.(Well, unless there is a mechanism for marking objects as non-collectible manually.)A simple way to make objects not collectable is to allocate them from the C heap (there is a kind of placement new in D).Tangent question. Is there a way to disable GC per application thread?I am not expert on this, I think there is no way yet. Maybe it will be added. Bye, bearophile
Nov 02 2010