digitalmars.D.learn - Lexer in D
- Namespace (15/15) Mar 02 2013 For one of our courses this year we have a very cool topic: Lexer
- Dmitry Olshansky (7/20) Mar 02 2013 See this one I'm currently investing my time into:
- Namespace (2/2) Mar 02 2013 I'm already using these flags. But thanks for your Link I will
- David (4/19) Mar 02 2013 Making `types` and `keywords` an array might improve the speed a little ...
- Dmitry Olshansky (8/27) Mar 02 2013 That would be slower as canFind is linear search.
- Namespace (4/6) Mar 02 2013 I don't understand how you could do this without AA's and how
- Namespace (18/18) Mar 02 2013 I sorted both, types and keywords, alphabetical and wrote my own
- Namespace (40/43) Mar 02 2013 I think I get it.
- Dmitry Olshansky (28/66) Mar 02 2013 No-no and no.
- Namespace (7/31) Mar 02 2013 I don't understand.
- Dmitry Olshansky (19/53) Mar 02 2013 No.
- Dmitry Olshansky (5/38) Mar 02 2013 return true;
- Namespace (4/4) Mar 02 2013 I hope I understand you right this time:
- Dmitry Olshansky (28/32) Mar 02 2013 This is a mess of conditionals is what a direct array of 256 entries
- Namespace (6/31) Mar 02 2013 I changed it and merged them together.
- Namespace (2/2) Mar 02 2013 It is sufficient if my array has only 26 units, because I check
- Dmitry Olshansky (22/56) Mar 02 2013 Obviously, you still largely don't :)
- Zach the Mystic (3/39) Mar 02 2013 And it doesn't even need to:
- Dmitry Olshansky (6/42) Mar 02 2013 Yes, obviously AA is becoming a central pain point that needs fixing. If...
- Namespace (14/14) Mar 02 2013 I never used the -profile flag before (only -conv) so I want to
- Namespace (11/11) Mar 02 2013 I noted it to late: decode comes from readText: readText
- Namespace (6/6) Mar 02 2013 I think that thanks to your suggestions isKeyword and isType are
- Dmitry Olshansky (8/14) Mar 02 2013 I'd repeat that I think it makes no sense to separately treat isType. In...
- Namespace (12/18) Mar 03 2013 I did this and I refactored a lot of my code.
- Dmitry Olshansky (6/22) Mar 03 2013 Simple - don't use array append and don't produce and array.
- Namespace (4/8) Mar 03 2013 But hasn't each range an array internally?
- Namespace (6/15) Mar 03 2013 Appender does not work for my Token struct because I have
- Dmitry Olshansky (4/19) Mar 03 2013 Just rip off the const why would you need it anyway?
- Namespace (4/5) Mar 03 2013 The data in 'tokens' are final, so why they should not be const?
- Dmitry Olshansky (9/14) Mar 03 2013 Then just return const(Token) like simple full const. In general I'd
- Namespace (7/7) Mar 03 2013 It's quite amazing, with Appender I reach between 115 and 125
- Namespace (9/9) Mar 04 2013 I've installed cygwin to use the time benchmark function and that
- Namespace (23/23) Mar 04 2013 With reserving of 60% of the file size in memory I get a big
- Dmitry Olshansky (7/29) Mar 04 2013 Keep in mind the CPU you are testing on. What's you chip?
- Namespace (3/3) Mar 04 2013 Yes, I know. But I do not think I have a chance to get even
- Dmitry Olshansky (11/14) Mar 04 2013 One thing that all of these lexers do (dmd's orignal and Dscanner) is
- Namespace (4/4) Mar 04 2013 Yes, but I'm not as familiar with ranges. But I want to learn
- Dmitry Olshansky (6/10) Mar 04 2013 Basically you copy Token structs and sometimes reallocate the whole
- Namespace (14/14) Mar 11 2013 ------------------------
- Namespace (109/109) Mar 11 2013 ------------------------
- FG (4/7) Mar 11 2013 Probably just because D's Appender has a better growth scheme than yours...
- Namespace (7/17) Mar 11 2013 D's:
- David (1/2) Mar 11 2013 What about realloc?
- Namespace (4/7) Mar 11 2013 That's what I use. But it moves your old memory into the new
- monarch_dodra (10/18) Mar 11 2013 extends if possible.
- Namespace (45/54) Mar 11 2013 Cool, nice to know.
- monarch_dodra (7/8) Mar 11 2013 Profile?
- Namespace (6/12) Mar 11 2013 My profile output looks like this: http://dpaste.1azy.net/0409f0bc
- Namespace (18/18) Mar 13 2013 Array has an horrible code...
- Dmitry Olshansky (4/22) Mar 14 2013 --
- Namespace (7/7) Mar 14 2013 Yes, and it's really fun. :)
- Dmitry Olshansky (7/14) Mar 14 2013 And my is 15 with Dscanner. But it's stuck at around 21-22.
- Artur Skawina (4/15) Mar 14 2013 What are you both actually measuring? IOW what happens with the
- Dmitry Olshansky (8/23) Mar 14 2013 I'm counting and/or filtering out specific tokens. The latter is a tiny
- Namespace (3/3) Mar 15 2013 So far, my lexer is pure exercise.
- Namespace (6/9) Mar 21 2013 I'm almost finished. In my few tests, my little parser detects
- Namespace (2/11) Mar 23 2013 Moved to https://github.com/Dgame/DAT/
- Namespace (7/19) Mar 24 2013 It's time for a beta. Would be nice if someone takes a look at
- Dmitry Olshansky (7/27) Mar 24 2013 Have no time - looks interesting but I highly doubt it can do what is
- Namespace (11/14) Mar 24 2013 Yes you're right, it's going to take some time before all would
- Brian Schott (3/17) Mar 24 2013 I have similar goals for the Dscananer tool, so I may steal some
- Namespace (4/22) Mar 24 2013 I'm glad to hear that. If I can help let me know, would be happy
- Dmitry Olshansky (4/11) Mar 14 2013 So you got rid of array creation. About time ;)
- Namespace (1/2) Mar 15 2013 Yes, that was the only way to get closer to your measured time. :D
- Brian Schott (5/7) Mar 04 2013 https://github.com/Hackerpilot/Dscanner/blob/range-based-lexer/perftest....
- Dmitry Olshansky (5/12) Mar 04 2013 Great, thanks!
- Dmitry Olshansky (10/17) Mar 03 2013 By having a struct or generally any object that in fact works as
- Jonathan M Davis (17/18) Mar 03 2013 Many do, but some don't. Take std.algorithm.ioto, or the ranges in std.r...
- Dmitry Olshansky (5/16) Mar 02 2013 That's why Walter promotes profilers. They help verify hypothesis on
For one of our courses this year we have a very cool topic: Lexer and Compiler. In the past I have already tried some experiments in this direction with Romulus and Remus, but I've never measured my Lexing time. That's why I wanted to start again to write a lexer (hitherto purely for the learning effect). I looked at the code of some of the other lexers, that are also written in D, to get some inspiration from, but currently my Lexer is a little bit slow, it requires for example ~200 msecs for std.datetime. So I wanted to ask whether there are nice people who help me to improve our lexer. Code: https://github.com/Dgame/Puzzle/blob/master/lexer.d Thanks in advance. :)
Mar 02 2013
02-Mar-2013 13:33, Namespace пишет:For one of our courses this year we have a very cool topic: Lexer and Compiler. In the past I have already tried some experiments in this direction with Romulus and Remus, but I've never measured my Lexing time. That's why I wanted to start again to write a lexer (hitherto purely for the learning effect). I looked at the code of some of the other lexers, that are also written in D, to get some inspiration from, but currently my Lexer is a little bit slow, it requires for example ~200 msecs for std.datetime. So I wanted to ask whether there are nice people who help me to improve our lexer. Code: https://github.com/Dgame/Puzzle/blob/master/lexer.d Thanks in advance. :)See this one I'm currently investing my time into: https://github.com/Hackerpilot/Dscanner/tree/range-based-lexer Other then this use cachegrind and compile with ldc/gdc :) Also for dmd check the flags are "-release -O -inline -noboundscheck" -- Dmitry Olshansky
Mar 02 2013
I'm already using these flags. But thanks for your Link I will take a look at it. :)
Mar 02 2013
Am 02.03.2013 10:33, schrieb Namespace:For one of our courses this year we have a very cool topic: Lexer and Compiler. In the past I have already tried some experiments in this direction with Romulus and Remus, but I've never measured my Lexing time. That's why I wanted to start again to write a lexer (hitherto purely for the learning effect). I looked at the code of some of the other lexers, that are also written in D, to get some inspiration from, but currently my Lexer is a little bit slow, it requires for example ~200 msecs for std.datetime. So I wanted to ask whether there are nice people who help me to improve our lexer. Code: https://github.com/Dgame/Puzzle/blob/master/lexer.d Thanks in advance. :)Making `types` and `keywords` an array might improve the speed a little bit: if (id in types) -> if (types.canFind(id)) (same with keywords)
Mar 02 2013
02-Mar-2013 15:04, David пишет:Am 02.03.2013 10:33, schrieb Namespace:That would be slower as canFind is linear search. Simpler way is use first letter and length to select a bucket of keywords. But otherwise, yes, the first advice is never use built-in AA as these take quite some time to allocate. Plus they not that fast to lookup as they used mod-prime not mod-power-of-2 to pick a bucket. -- Dmitry OlshanskyFor one of our courses this year we have a very cool topic: Lexer and Compiler. In the past I have already tried some experiments in this direction with Romulus and Remus, but I've never measured my Lexing time. That's why I wanted to start again to write a lexer (hitherto purely for the learning effect). I looked at the code of some of the other lexers, that are also written in D, to get some inspiration from, but currently my Lexer is a little bit slow, it requires for example ~200 msecs for std.datetime. So I wanted to ask whether there are nice people who help me to improve our lexer. Code: https://github.com/Dgame/Puzzle/blob/master/lexer.d Thanks in advance. :)Making `types` and `keywords` an array might improve the speed a little bit: if (id in types) -> if (types.canFind(id)) (same with keywords)
Mar 02 2013
With canFind it takes ~300 msecs. :/ Any idea how I could improve it?Simpler way is use first letter and length to select a bucket of keywords.I don't understand how you could do this without AA's and how this could be more performant than canFind.
Mar 02 2013
I sorted both, types and keywords, alphabetical and wrote my own 'canFind': bool canFind(const size_t size)(ref const string[size] values, string value) pure nothrow { if (value[0] < 'm' || value[0] == '_') { foreach (string _val; values) { if (_val == value) return true; } } else { foreach_reverse (string _val; values) { if (_val == value) return true; } } return false; } That brings ~20 msecs, so I have ~280 mesecs instead of ~ 300. But it isn't perfect.
Mar 02 2013
That would be slower as canFind is linear search. Simpler way is use first letter and length to select a bucket of keywords.I think I get it. Did you mean something like this? [code] immutable ubyte[char] typeMap; immutable ubyte[char] keywordMap; static this() { typeMap = [ 'd' : 7, 'l' : 16, 'i' : 12, 'u' : 20, 'b' : 0, 'f' : 10, 'r' : 17, 'v' : 25, 'c' : 2, 's' : 18, 'w' : 26 ]; keywordMap = [ '_' : 0, 'a' : 6, 'b' : 12, 'c' : 14, 'd' : 20, 'e' : 26, 'f' : 30, 'g' : 36, 'i' : 37, 'l' : 45, 'm' : 46, 'n' : 49, 'o' : 52, 'p' : 54, 'r' : 60, 's' : 62, 't' : 69, 'u' : 77, 'v' : 79, 'w' : 81 ]; } bool isKeyword(string value) pure nothrow { if (value[0] !in keywordMap) return false; foreach (string kword; keywords[keywordMap[value[0]] .. $]) { if (kword == value) return true; if (kword[0] != value[0]) return false; } return false; } bool isType(string value) pure nothrow { if (value[0] !in typeMap) return false; foreach (string type; types[typeMap[value[0]] .. $]) { if (type == value) return true; if (type[0] != value[0]) return false; } return false; } [/code] I'm now by ~230 msecs.
Mar 02 2013
02-Mar-2013 18:53, Namespace пишет:No-no and no. Just translate list of keywords/types to a bunch of switch-cases that will give you near optimal performance. Use CTFE to construct that string of D code. In fact I proposed to do just 2 levels of switches: first switch by length then by first char. Even simple if you don't want to go for CTFE code-generation is to make plain array of array. The length of array is exactly 256 i.e. Even this should have decent speed: string[][256] keywords = [ null, null, ... //at index 'i' ['int', 'if', 'invariant', 'idouble', 'ifloat' ...] //array of strings starting with char for each ... ]; bool isKeyword(string value) { string[] toSearch = keywords[value[0]]; return toSearch.canFind(value); } Better yet only store [1..$] parts of keywords and search for value[1..$]. Even faster is to make the same for each possible length of keyword.That would be slower as canFind is linear search. Simpler way is use first letter and length to select a bucket of keywords.I think I get it. Did you mean something like this?[code] immutable ubyte[char] typeMap; immutable ubyte[char] keywordMap; static this() { typeMap = [ 'd' : 7, 'l' : 16, 'i' : 12, 'u' : 20, 'b' : 0, 'f' : 10, 'r' : 17, 'v' : 25, 'c' : 2, 's' : 18, 'w' : 26 ]; keywordMap = [ '_' : 0, 'a' : 6, 'b' : 12, 'c' : 14, 'd' : 20, 'e' : 26, 'f' : 30, 'g' : 36, 'i' : 37, 'l' : 45, 'm' : 46, 'n' : 49, 'o' : 52, 'p' : 54, 'r' : 60, 's' : 62, 't' : 69, 'u' : 77, 'v' : 79, 'w' : 81 ]; } bool isKeyword(string value) pure nothrow { if (value[0] !in keywordMap) return false; foreach (string kword; keywords[keywordMap[value[0]] .. $]) { if (kword == value) return true; if (kword[0] != value[0]) return false; } return false; } bool isType(string value) pure nothrow { if (value[0] !in typeMap) return false; foreach (string type; types[typeMap[value[0]] .. $]) { if (type == value) return true; if (type[0] != value[0]) return false; } return false; } [/code] I'm now by ~230 msecs.-- Dmitry Olshansky
Mar 02 2013
No-no and no. Just translate list of keywords/types to a bunch of switch-cases that will give you near optimal performance. Use CTFE to construct that string of D code. In fact I proposed to do just 2 levels of switches: first switch by length then by first char.I don't understand. You don't mean something like: switch (value) { case "if": isKeyword = true; break; ... right?Even simple if you don't want to go for CTFE code-generation is to make plain array of array. The length of array is exactly 256 i.e. Even this should have decent speed: string[][256] keywords = [ null, null, ... //at index 'i' ['int', 'if', 'invariant', 'idouble', 'ifloat' ...] //array of strings starting with char for each ... ]; bool isKeyword(string value) { string[] toSearch = keywords[value[0]]; return toSearch.canFind(value); }Ok I will try it. Thank you.
Mar 02 2013
02-Mar-2013 22:09, Namespace пишет:No. switch(value.length) case 2: switch(value[0]) { case 'i': switch(value[1]){ case 'f': return value.length == 2; case ... ... } ... case 3: ... }No-no and no. Just translate list of keywords/types to a bunch of switch-cases that will give you near optimal performance. Use CTFE to construct that string of D code. In fact I proposed to do just 2 levels of switches: first switch by length then by first char.I don't understand. You don't mean something like: switch (value) { case "if": isKeyword = true; break; ... right?-- Dmitry OlshanskyEven simple if you don't want to go for CTFE code-generation is to make plain array of array. The length of array is exactly 256 i.e. Even this should have decent speed: string[][256] keywords = [ null, null, ... //at index 'i' ['int', 'if', 'invariant', 'idouble', 'ifloat' ...] //array of strings starting with char for each ... ]; bool isKeyword(string value) { string[] toSearch = keywords[value[0]]; return toSearch.canFind(value); }Ok I will try it. Thank you.
Mar 02 2013
02-Mar-2013 22:15, Dmitry Olshansky пишет:02-Mar-2013 22:09, Namespace пишет:return true; I edited the post to include length switch before per-char switch.No. switch(value.length) case 2: switch(value[0]) { case 'i': switch(value[1]){ case 'f': return value.length == 2;No-no and no. Just translate list of keywords/types to a bunch of switch-cases that will give you near optimal performance. Use CTFE to construct that string of D code. In fact I proposed to do just 2 levels of switches: first switch by length then by first char.I don't understand. You don't mean something like: switch (value) { case "if": isKeyword = true; break; ... right?case ... ... } ... case 3: ... }-- Dmitry Olshansky
Mar 02 2013
I hope I understand you right this time: http://dpaste.1azy.net/4c2e4428 Was this your idea? With this I reached between 215 and 230 msecs.
Mar 02 2013
02-Mar-2013 23:01, Namespace пишет:I hope I understand you right this time: http://dpaste.1azy.net/4c2e4428 Was this your idea? With this I reached between 215 and 230 msecs.This is a mess of conditionals is what a direct array of 256 entries would avoid: int idx; if (value[0] != '_') { idx = value[0] - 'a'; if (idx == 26) return false; } else { idx = 26; } if (idx < 0 || idx > 26) { // debug writeln("kword: ", idx, ':', value[0]); return false; } if (keywords[idx] is null) return false; return keywords[idx].canFind(value); Gaining some speed in the process. Plus another layer of array to discern keywords by length. You see why I suggested to generate the code in the first place ? ;) BTW what's the reason to separate keywords and type keywords? They are processed the same in lexer and only parser somewhere up above knows what to do with these regardless. Just return different token values for each. -- Dmitry Olshansky
Mar 02 2013
This is a mess of conditionals is what a direct array of 256 entries would avoid: int idx; if (value[0] != '_') { idx = value[0] - 'a'; if (idx == 26) return false; } else { idx = 26; } if (idx < 0 || idx > 26) { // debug writeln("kword: ", idx, ':', value[0]); return false; } if (keywords[idx] is null) return false; return keywords[idx].canFind(value); Gaining some speed in the process. Plus another layer of array to discern keywords by length. You see why I suggested to generate the code in the first place ? ;) BTW what's the reason to separate keywords and type keywords? They are processed the same in lexer and only parser somewhere up above knows what to do with these regardless. Just return different token values for each.I changed it and merged them together. Also I use now a array of 256 entities, but I must keep the check if idx is < 0, because 'D' - 'a' is negative. And yes I see what you meant.^^ Code: http://dpaste.1azy.net/317241c0 I reach still 215 - 230 msecs.
Mar 02 2013
It is sufficient if my array has only 26 units, because I check only lower cases.
Mar 02 2013
03-Mar-2013 00:03, Namespace пишет:Obviously, you still largely don't :) Use the full byte to do the lookup dammit. Saving few array slots doesn't bring your speed up. Using full byte *directly* as index does speed things up because you don't have to do *any* computations and *conditionals*. Conditionals that can easily go both ways are the worst thing performance-wise. I don't know how to phrase that better.This is a mess of conditionals is what a direct array of 256 entries would avoid: int idx; if (value[0] != '_') { idx = value[0] - 'a'; if (idx == 26) return false; } else { idx = 26; } if (idx < 0 || idx > 26) { // debug writeln("kword: ", idx, ':', value[0]); return false; } if (keywords[idx] is null) return false; return keywords[idx].canFind(value); Gaining some speed in the process. Plus another layer of array to discern keywords by length. You see why I suggested to generate the code in the first place ? ;) BTW what's the reason to separate keywords and type keywords? They are processed the same in lexer and only parser somewhere up above knows what to do with these regardless. Just return different token values for each.I changed it and merged them together. Also I use now a array of 256 entities, but I must keep the check if idx is < 0, because 'D' - 'a' is negative. And yes I see what you meant.^^Code: http://dpaste.1azy.net/317241c0 I reach still 215 - 230 msecs.Use profiling to determine what takes the most of the time. If you don't like callgrind/cachegrind (I seriously don't know why not use it) try dmd's -profile. Looking at your main code I see some spaghetti code and passing index by pointer. Don't do that - encapsulate lexing state in some kind of struct, then the index would be implicitly passed to all functions by this pointer. It may very well be the case you are bound by some other code. In any case after some heroic efforts Dscanner lexes std.datetime in 24 msecs, and that on linux VM powered by the oldish AMD Phenom II. What kind of CPU do you use? -- Dmitry Olshansky
Mar 02 2013
On Saturday, 2 March 2013 at 11:59:39 UTC, Dmitry Olshansky wrote:02-Mar-2013 15:04, David пишет:And it doesn't even need to: http://d.puremagic.com/issues/show_bug.cgi?id=9522Am 02.03.2013 10:33, schrieb Namespace:That would be slower as canFind is linear search. Simpler way is use first letter and length to select a bucket of keywords. But otherwise, yes, the first advice is never use built-in AA as these take quite some time to allocate. Plus they not that fast to lookup as they used mod-prime not mod-power-of-2 to pick a bucket.For one of our courses this year we have a very cool topic: Lexer and Compiler. In the past I have already tried some experiments in this direction with Romulus and Remus, but I've never measured my Lexing time. That's why I wanted to start again to write a lexer (hitherto purely for the learning effect). I looked at the code of some of the other lexers, that are also written in D, to get some inspiration from, but currently my Lexer is a little bit slow, it requires for example ~200 msecs for std.datetime. So I wanted to ask whether there are nice people who help me to improve our lexer. Code: https://github.com/Dgame/Puzzle/blob/master/lexer.d Thanks in advance. :)Making `types` and `keywords` an array might improve the speed a little bit: if (id in types) -> if (types.canFind(id)) (same with keywords)
Mar 02 2013
02-Mar-2013 23:50, Zach the Mystic пишет:On Saturday, 2 March 2013 at 11:59:39 UTC, Dmitry Olshansky wrote:Yes, obviously AA is becoming a central pain point that needs fixing. If you are inclined to help I suggest you join forces with H.S. Teoh and Dicebot and help fixing the AA. -- Dmitry Olshansky02-Mar-2013 15:04, David пишет:And it doesn't even need to: http://d.puremagic.com/issues/show_bug.cgi?id=9522Am 02.03.2013 10:33, schrieb Namespace:That would be slower as canFind is linear search. Simpler way is use first letter and length to select a bucket of keywords. But otherwise, yes, the first advice is never use built-in AA as these take quite some time to allocate. Plus they not that fast to lookup as they used mod-prime not mod-power-of-2 to pick a bucket.For one of our courses this year we have a very cool topic: Lexer and Compiler. In the past I have already tried some experiments in this direction with Romulus and Remus, but I've never measured my Lexing time. That's why I wanted to start again to write a lexer (hitherto purely for the learning effect). I looked at the code of some of the other lexers, that are also written in D, to get some inspiration from, but currently my Lexer is a little bit slow, it requires for example ~200 msecs for std.datetime. So I wanted to ask whether there are nice people who help me to improve our lexer. Code: https://github.com/Dgame/Puzzle/blob/master/lexer.d Thanks in advance. :)Making `types` and `keywords` an array might improve the speed a little bit: if (id in types) -> if (types.canFind(id)) (same with keywords)
Mar 02 2013
I never used the -profile flag before (only -conv) so I want to ask if I understand it right: http://dpaste.1azy.net/4382097b It seems that the call which is listed on line 133 1589328 214949 214948 0 pure trusted dchar std.utf.decode!(const(immutable(char)[])).decode(ref const(immutable(char)[]), ref uint) take the most time/performance. Am I right? And why is something decoded? Should I use dchar instead of char? And I do not take anything that is already finished, because I want to be more familiar with the matter. I want to see, what you have to note in order to get performance. The 24 msecs would obviously be a dream, but I would be pleased with 50-100 msecs. :D
Mar 02 2013
I noted it to late: decode comes from readText: readText validates your text. I will now use std.file.read. The new trace output is here and it seems you're right, isKeyword and isType (yes I'd decided to separate it again and keep both in the lexer) take a lot of time: http://dpaste.1azy.net/0d6aff6b 154874 106186 95403 0 pure nothrow bool Puzzle.Lexer.isKeyword(immutable(char)[]) 159688 78020 69289 0 pure nothrow bool Puzzle.Lexer.isType(immutable(char)[]) I never thought that. :)
Mar 02 2013
I think that thanks to your suggestions isKeyword and isType are entirely inlined, because they aren't listet in the trace.log anymore (http://dpaste.1azy.net/b94b19ff). The only thing which makes trouble is the isNext function. Maybe I should also use a StringStream (implemented as Range). What do you think?
Mar 02 2013
03-Mar-2013 03:52, Namespace пишет:I think that thanks to your suggestions isKeyword and isType are entirely inlined, because they aren't listet in the trace.log anymore (http://dpaste.1azy.net/b94b19ff). The only thing which makes trouble is the isNext function. Maybe I should also use a StringStream (implemented as Range). What do you think?I'd repeat that I think it makes no sense to separately treat isType. In any case there is way more to types then built-in ones and is the job of parser (to assume types) and semantic step (to type-check and infer). Another thing is to run -profile without inline to understand the structure of time spent per each subroutine better. -- Dmitry Olshansky
Mar 02 2013
I'd repeat that I think it makes no sense to separately treat isType. In any case there is way more to types then built-in ones and is the job of parser (to assume types) and semantic step (to type-check and infer).Yes I've understood, but currently I want it so.Another thing is to run -profile without inline to understand the structure of time spent per each subroutine better.I did this and I refactored a lot of my code. Now I get 170 - 180 msecs and this is my current trace.log: http://dpaste.1azy.net/b94b19ff But currently I have no more ideas how I could gain more performance. Maybe I should disable the compiler while looping through the text? Or maybe I should allocate more space for the Token array (e.g. toks.length = 100;)? I hope that you or anyone other have further ideas. But anyway, thanks for the help! :)
Mar 03 2013
03-Mar-2013 18:28, Namespace пишет:Simple - don't use array append and don't produce and array. Just produce a lazy forward range that is iterated. At the very least use Appender!(Token[]) it ought to be much faster. -- Dmitry OlshanskyI'd repeat that I think it makes no sense to separately treat isType. In any case there is way more to types then built-in ones and is the job of parser (to assume types) and semantic step (to type-check and infer).Yes I've understood, but currently I want it so.Another thing is to run -profile without inline to understand the structure of time spent per each subroutine better.I did this and I refactored a lot of my code. Now I get 170 - 180 msecs and this is my current trace.log: http://dpaste.1azy.net/b94b19ff But currently I have no more ideas how I could gain more performance. Maybe I should disable the compiler while looping through the text? Or maybe I should allocate more space for the Token array (e.g. toks.length = 100;)? I hope that you or anyone other have further ideas. But anyway, thanks for the help! :)
Mar 03 2013
Simple - don't use array append and don't produce and array. Just produce a lazy forward range that is iterated. At the very least use Appender!(Token[]) it ought to be much faster.But hasn't each range an array internally? Or how does that work? I try to use Appender, but this lazy forward range sounds interesting.
Mar 03 2013
On Sunday, 3 March 2013 at 16:19:27 UTC, Namespace wrote:Appender does not work for my Token struct because I have const/immutable members. So I cannot add something. Example: http://dpaste.1azy.net/21f100d3 Without 'const' it works fine.Simple - don't use array append and don't produce and array. Just produce a lazy forward range that is iterated. At the very least use Appender!(Token[]) it ought to be much faster.But hasn't each range an array internally? Or how does that work? I try to use Appender, but this lazy forward range sounds interesting.
Mar 03 2013
03-Mar-2013 20:36, Namespace пишет:On Sunday, 3 March 2013 at 16:19:27 UTC, Namespace wrote:Just rip off the const why would you need it anyway? -- Dmitry OlshanskyAppender does not work for my Token struct because I have const/immutable members. So I cannot add something. Example: http://dpaste.1azy.net/21f100d3 Without 'const' it works fine.Simple - don't use array append and don't produce and array. Just produce a lazy forward range that is iterated. At the very least use Appender!(Token[]) it ought to be much faster.But hasn't each range an array internally? Or how does that work? I try to use Appender, but this lazy forward range sounds interesting.
Mar 03 2013
Just rip off the const why would you need it anyway?The data in 'tokens' are final, so why they should not be const? What exactly is the problem if there are const member? Is this a known problem? Otherwise, I will go and investigate it.
Mar 03 2013
03-Mar-2013 20:45, Namespace пишет:Then just return const(Token) like simple full const. In general I'd avoid marking data as bitwise const unless that is actually true. For all you know some algorithm up above might tweak existing token values directly.Just rip off the const why would you need it anyway?The data in 'tokens' are final, so why they should not be const?What exactly is the problem if there are const member? Is this a known problem?It's quite old and ugly as a lot of things would need to bit-blit stuff over it and fail to do so.Otherwise, I will go and investigate it.-- Dmitry Olshansky
Mar 03 2013
It's quite amazing, with Appender I reach between 115 and 125 msecs. I adapted the Appender for myself and solve the problem with memcpy/memmove. That works fine. But is there any other I can do (besides the LazyForwardRange) to gain more performance?
Mar 03 2013
I've installed cygwin to use the time benchmark function and that are the results: $ time ./Lexer.exe real 0m0.225s user 0m0.015s sys 0m0.046s I also read in the std.d.lexer thread that the use of 'char' isn't that good for a lexer, because dmd wants to decode 'char' to 'dchar'. The usage of ubyte should be better. Is that right?
Mar 04 2013
With reserving of 60% of the file size in memory I get a big speed boost. ------------------------ Total time (ms): 17544.6 Repetitions : 200 Sample mode : 87 (121 ocurrences) Median time : 87.3835 Avg time : 87.7232 Std dev. : 1.77905 Minimum : 84.238 Maximum : 101.919 95% conf.int. : [84.2363, 91.21] e = 3.48687 99% conf.int. : [83.1407, 92.3057] e = 4.58252 EstimatedAvg95%: [87.4766, 87.9697] e = 0.246559 EstimatedAvg99%: [87.3991, 88.0472] e = 0.324033 But I'm still away from: Avg time : 69.3088 http://forum.dlang.org/thread/dpdgcycrgfspcxenzrjf forum.dlang.org?page=8#post-abncwoqtpnaoohihkliw:40forum.dlang.org I've tried also 10%, 40% and 80% but 60% seems optimal. And I think I reached my performance limit. I could implement the LazyForwardRange and, if anyone can confirm the statement of char <-> dchar decoding and that the usage of ubyte would be (crucial) better, I will try this too.
Mar 04 2013
04-Mar-2013 16:17, Namespace пишет:With reserving of 60% of the file size in memory I get a big speed boost. ------------------------ Total time (ms): 17544.6 Repetitions : 200 Sample mode : 87 (121 ocurrences) Median time : 87.3835 Avg time : 87.7232 Std dev. : 1.77905 Minimum : 84.238 Maximum : 101.919 95% conf.int. : [84.2363, 91.21] e = 3.48687 99% conf.int. : [83.1407, 92.3057] e = 4.58252 EstimatedAvg95%: [87.4766, 87.9697] e = 0.246559 EstimatedAvg99%: [87.3991, 88.0472] e = 0.324033 But I'm still away from: Avg time : 69.3088 http://forum.dlang.org/thread/dpdgcycrgfspcxenzrjf forum.dlang.org?page=8#post-abncwoqtpnaoohihkliw:40forum.dlang.orgKeep in mind the CPU you are testing on. What's you chip? In any case Dscanner has been improved to take far far less time then 60ms. Brain listed graphs showing under 20 ms on his CPU. for all and <10 for every module in std except std.datetime.I've tried also 10%, 40% and 80% but 60% seems optimal. And I think I reached my performance limit. I could implement the LazyForwardRange and, if anyone can confirm the statement of char <-> dchar decoding and that the usage of ubyte would be (crucial) better, I will try this too.-- Dmitry Olshansky
Mar 04 2013
Yes, I know. But I do not think I have a chance to get even closer to it. At least I don't know, how. I'm on a AMD Athlon 64 X2 Dual Core Processor 5400+.
Mar 04 2013
04-Mar-2013 22:58, Namespace пишет:Yes, I know. But I do not think I have a chance to get even closer to it. At least I don't know, how.One thing that all of these lexers do (dmd's orignal and Dscanner) is get tokens as you need them - lazily. You would always be slower if taking time to append them in some array (no matter how). The idiomatic way as I said is to use ranges.I'm on a AMD Athlon 64 X2 Dual Core Processor 5400+.Mm that's not that far away from mine. But Brain has got something even more powerful :) You can easily download and test Dscanner on the your CPU so that you get numbers that are actually comparable. -- Dmitry Olshansky
Mar 04 2013
Yes, but I'm not as familiar with ranges. But I want to learn more about them and try it then. What do you mean, how much would entail the use of ranges compared to my current method? 30%?
Mar 04 2013
04-Mar-2013 23:19, Namespace пишет:Yes, but I'm not as familiar with ranges. But I want to learn more about them and try it then. What do you mean, how much would entail the use of ranges compared to my current method? 30%?Basically you copy Token structs and sometimes reallocate the whole array (I guess at 60% you don't reallocate but waste time on GC.malloc that zeroes out memory block). My estimate is no less then 10-15%. -- Dmitry Olshansky
Mar 04 2013
------------------------ Total time (ms): 11374.4 Repetitions : 200 Sample mode : 56 (89 ocurrences) Median time : 56.312 Avg time : 56.872 Std dev. : 2.60698 Minimum : 53.978 Maximum : 81.602 95% conf.int. : [51.7624, 61.9816] e = 5.10959 99% conf.int. : [50.1569, 63.5871] e = 6.71514 EstimatedAvg95%: [56.5107, 57.2333] e = 0.361303 EstimatedAvg99%: [56.3972, 57.3468] e = 0.474832 I'm getting closer.
Mar 11 2013
------------------------ Total time (ms): 10941.9 Repetitions : 200 Sample mode : 54 (118 ocurrences) Median time : 54.5855 Avg time : 54.7094 Std dev. : 1.33935 Minimum : 52.535 Maximum : 67.206 95% conf.int. : [52.0843, 57.3345] e = 2.62509 99% conf.int. : [51.2594, 58.1593] e = 3.44995 EstimatedAvg95%: [54.5238, 54.895] e = 0.185622 EstimatedAvg99%: [54.4654, 54.9533] e = 0.243948 And even a bit closer. I've also done some benchmarks. And I wonder why my appender is much faster if I enable memory reservation. My Appender: http://dpaste.1azy.net/1bb34d9a I'm also very interested to hear some improvements tips. In each benchmark 1_000_000 times these struct is appended: struct B { public: const int b; } Without memory reserve: My Appender: ------------------------ Total time (ms): 7638.67 Repetitions : 200 Sample mode : 38 (68 ocurrences) Median time : 38.0545 Avg time : 38.1934 Std dev. : 1.56827 Minimum : 35.859 Maximum : 47.586 95% conf.int. : [35.1196, 41.2671] e = 3.07375 99% conf.int. : [34.1538, 42.233] e = 4.0396 EstimatedAvg95%: [37.976, 38.4107] e = 0.217347 EstimatedAvg99%: [37.9077, 38.479] e = 0.285643 D's Appender: ------------------------ Total time (ms): 6972.66 Repetitions : 200 Sample mode : 34 (97 ocurrences) Median time : 34.7795 Avg time : 34.8633 Std dev. : 1.41623 Minimum : 32.96 Maximum : 49.579 95% conf.int. : [32.0876, 37.6391] e = 2.77576 99% conf.int. : [31.2153, 38.5113] e = 3.64797 EstimatedAvg95%: [34.667, 35.0596] e = 0.196276 EstimatedAvg99%: [34.6054, 35.1213] e = 0.257951 D array: ------------------------ Total time (ms): 22868.2 Repetitions : 200 Sample mode : 110 (198 ocurrences) Median time : 113.658 Avg time : 114.341 Std dev. : 1.97184 Minimum : 113.011 Maximum : 133.978 95% conf.int. : [110.477, 118.206] e = 3.86473 99% conf.int. : [109.262, 119.42] e = 5.07911 EstimatedAvg95%: [114.068, 114.615] e = 0.273277 EstimatedAvg99%: [113.982, 114.7] e = 0.359147 With memory reserve: My Appender: ------------------------ Total time (ms): 6197.48 Repetitions : 200 Sample mode : 30 (88 ocurrences) Median time : 30.617 Avg time : 30.9874 Std dev. : 2.86913 Minimum : 28.583 Maximum : 64.06 95% conf.int. : [25.364, 36.6108] e = 5.62339 99% conf.int. : [23.597, 38.3778] e = 7.39039 EstimatedAvg95%: [30.5897, 31.385] e = 0.397634 EstimatedAvg99%: [30.4648, 31.51] e = 0.52258 D's Appender: ------------------------ Total time (ms): 6628.31 Repetitions : 200 Sample mode : 32 (104 ocurrences) Median time : 32.6775 Avg time : 33.1415 Std dev. : 3.00344 Minimum : 31.506 Maximum : 57.493 95% conf.int. : [27.2549, 39.0282] e = 5.88663 99% conf.int. : [25.4052, 40.8779] e = 7.73635 EstimatedAvg95%: [32.7253, 33.5578] e = 0.416248 EstimatedAvg99%: [32.5945, 33.6886] e = 0.547042 D array: ------------------------ Total time (ms): 22849.9 Repetitions : 200 Sample mode : 110 (198 ocurrences) Median time : 113.025 Avg time : 114.249 Std dev. : 8.40906 Minimum : 112.355 Maximum : 222.982 95% conf.int. : [97.768, 130.731] e = 16.4815 99% conf.int. : [92.5891, 135.91] e = 21.6603 EstimatedAvg95%: [113.084, 115.415] e = 1.16542 EstimatedAvg99%: [112.718, 115.781] e = 1.53162
Mar 11 2013
On 2013-03-11 16:11, Namespace wrote:I've also done some benchmarks. And I wonder why my appender is much faster if I enable memory reservation.Probably just because D's Appender has a better growth scheme than yours and you reallocate too often. Try to compare how often that happens.I'm also very interested to hear some improvements tips.Well, you may add a check whether alloc/realloc actually succeeded. :)
Mar 11 2013
On Monday, 11 March 2013 at 15:21:31 UTC, FG wrote:On 2013-03-11 16:11, Namespace wrote:D's: 24 extends, 9 moves (qalloc + memcpy). My Appender: 25 moves. But I do not think you can extend a memory block with C functions.I've also done some benchmarks. And I wonder why my appender is much faster if I enable memory reservation.Probably just because D's Appender has a better growth scheme than yours and you reallocate too often. Try to compare how often that happens.Maybe. :)I'm also very interested to hear some improvements tips.Well, you may add a check whether alloc/realloc actually succeeded. :)
Mar 11 2013
But I do not think you can extend a memory block with C functions.What about realloc?
Mar 11 2013
On Monday, 11 March 2013 at 15:51:58 UTC, David wrote:That's what I use. But it moves your old memory into the new allocated. Or proves it first if your current memory block could be extended?But I do not think you can extend a memory block with C functions.What about realloc?
Mar 11 2013
On Monday, 11 March 2013 at 15:56:37 UTC, Namespace wrote:On Monday, 11 March 2013 at 15:51:58 UTC, David wrote:extends if possible. http://dpaste.dzfl.pl/f86e5db6 import core.stdc.stdlib; void main() { auto p1 = malloc(14); auto p2 = realloc(p1, 15); assert(p1 is p2); }That's what I use. But it moves your old memory into the new allocated. Or proves it first if your current memory block could be extended?But I do not think you can extend a memory block with C functions.What about realloc?
Mar 11 2013
extends if possible. http://dpaste.dzfl.pl/f86e5db6 import core.stdc.stdlib; void main() { auto p1 = malloc(14); auto p2 = realloc(p1, 15); assert(p1 is p2); }Cool, nice to know. Currently I have this benchmark: Without memory reserve: ------------------------ Total time (ms): 6779.16 Repetitions : 200 Sample mode : 33 (73 ocurrences) Median time : 33.689 Avg time : 33.8958 Std dev. : 1.86754 Minimum : 31.414 Maximum : 52.237 95% conf.int. : [30.2355, 37.5561] e = 3.66031 99% conf.int. : [29.0853, 38.7063] e = 4.81047 EstimatedAvg95%: [33.637, 34.1546] e = 0.258823 EstimatedAvg99%: [33.5556, 34.2359] e = 0.340151 With memory reserve: ------------------------ Total time (ms): 5945.1 Repetitions : 200 Sample mode : 29 (68 ocurrences) Median time : 29.3215 Avg time : 29.7255 Std dev. : 2.083 Minimum : 27.458 Maximum : 45.32 95% conf.int. : [25.6429, 33.8081] e = 4.08261 99% conf.int. : [24.36, 35.0909] e = 5.36546 EstimatedAvg95%: [29.4368, 30.0142] e = 0.288684 EstimatedAvg99%: [29.3461, 30.1049] e = 0.379396 But I does not gain that much: ------------------------ Total time (ms): 10833 Repetitions : 200 Sample mode : 53 (105 ocurrences) Median time : 53.848 Avg time : 54.1652 Std dev. : 1.72681 Minimum : 52.118 Maximum : 69.703 95% conf.int. : [50.7807, 57.5497] e = 3.38448 99% conf.int. : [49.7172, 58.6131] e = 4.44796 EstimatedAvg95%: [53.9259, 54.4045] e = 0.239319 EstimatedAvg99%: [53.8507, 54.4797] e = 0.314518 Seems to be the wrong point of optimizations.
Mar 11 2013
On Monday, 11 March 2013 at 16:37:00 UTC, Namespace wrote:Seems to be the wrong point of optimizations.Profile? Also, note that appender is a tool for appending stuff *into an array*. If you don't actually need an array at the end, then you could give Array a roll? I don't think it will necessarily do better, but it doesn't hurt to give it a roll.
Mar 11 2013
Profile? Also, note that appender is a tool for appending stuff *into an array*. If you don't actually need an array at the end, then you could give Array a roll? I don't think it will necessarily do better, but it doesn't hurt to give it a roll.My profile output looks like this: http://dpaste.1azy.net/0409f0bc Since I have already discussed most of the stuff in this thread, I do not know what I could still improve on the Lexer. But I have not yet thought of Array. Because my Appender use already malloc and free, I could learn something by the implementation of 'Array'. Thanks for the suggestion.
Mar 11 2013
Array has an horrible code... I decided to start from scratch. So I took a close look at the lexer of dmd and took over the basic functionality. Result for std.datetime (without profiling so far): ------------------------ Total time (ms): 5971.66 Repetitions : 200 Sample mode : 28 (101 ocurrences) Median time : 28.89 Avg time : 29.8583 Std dev. : 3.37222 Minimum : 27.958 Maximum : 70.583 95% conf.int. : [23.2489, 36.4677] e = 6.60943 99% conf.int. : [21.172, 38.5446] e = 8.68627 EstimatedAvg95%: [29.3909, 30.3257] e = 0.467357 EstimatedAvg99%: [29.2441, 30.4725] e = 0.614212
Mar 13 2013
14-Mar-2013 01:19, Namespace пишет:Array has an horrible code... I decided to start from scratch. So I took a close look at the lexer of dmd and took over the basic functionality. Result for std.datetime (without profiling so far):It's getting better!------------------------ Total time (ms): 5971.66 Repetitions : 200 Sample mode : 28 (101 ocurrences) Median time : 28.89 Avg time : 29.8583 Std dev. : 3.37222 Minimum : 27.958 Maximum : 70.583 95% conf.int. : [23.2489, 36.4677] e = 6.60943 99% conf.int. : [21.172, 38.5446] e = 8.68627 EstimatedAvg95%: [29.3909, 30.3257] e = 0.467357 EstimatedAvg99%: [29.2441, 30.4725] e = 0.614212-- Dmitry Olshansky
Mar 14 2013
Yes, and it's really fun. :) By the way, the code can be found here: https://github.com/Dgame/Puzzle/blob/master/DLexer.d Maybe some of you have an idea how I could improve anything. Last trace.log is also there: https://github.com/Dgame/Puzzle/blob/master/trace.log My goal is about 20 msecs. :D
Mar 14 2013
14-Mar-2013 21:20, Namespace пишет:Yes, and it's really fun. :) By the way, the code can be found here: https://github.com/Dgame/Puzzle/blob/master/DLexer.d Maybe some of you have an idea how I could improve anything. Last trace.log is also there: https://github.com/Dgame/Puzzle/blob/master/trace.log My goal is about 20 msecs. :DAnd my is 15 with Dscanner. But it's stuck at around 21-22. But both of us can just cheat and buy a newer PC :) So what we need to count is the number of syscalls + cycles spent in user mode to have a meaningful metric. -- Dmitry Olshansky
Mar 14 2013
On 03/14/13 18:24, Dmitry Olshansky wrote:14-Mar-2013 21:20, Namespace пишет:What are you both actually measuring? IOW what happens with the resulting tokens? arturYes, and it's really fun. :) By the way, the code can be found here: https://github.com/Dgame/Puzzle/blob/master/DLexer.d Maybe some of you have an idea how I could improve anything. Last trace.log is also there: https://github.com/Dgame/Puzzle/blob/master/trace.log My goal is about 20 msecs. :DAnd my is 15 with Dscanner. But it's stuck at around 21-22. But both of us can just cheat and buy a newer PC :)
Mar 14 2013
15-Mar-2013 01:17, Artur Skawina пишет:On 03/14/13 18:24, Dmitry Olshansky wrote:I'm counting and/or filtering out specific tokens. The latter is a tiny bit slower. The other options of Dscanner are to generate e.g. HTML highlighting for code or measure token frequencies. Haven't timed those as their speed doesn't entirely depend on lexer as is. -- Dmitry Olshansky14-Mar-2013 21:20, Namespace пишет:What are you both actually measuring? IOW what happens with the resulting tokens? arturYes, and it's really fun. :) By the way, the code can be found here: https://github.com/Dgame/Puzzle/blob/master/DLexer.d Maybe some of you have an idea how I could improve anything. Last trace.log is also there: https://github.com/Dgame/Puzzle/blob/master/trace.log My goal is about 20 msecs. :DAnd my is 15 with Dscanner. But it's stuck at around 21-22. But both of us can just cheat and buy a newer PC :)
Mar 14 2013
So far, my lexer is pure exercise. But my goal is actually to filter variables and functions, to see if they are ever used in the code.
Mar 15 2013
On Friday, 15 March 2013 at 08:20:18 UTC, Namespace wrote:So far, my lexer is pure exercise. But my goal is actually to filter variables and functions, to see if they are ever used in the code.I'm almost finished. In my few tests, my little parser detects function, struct and variable declarations and expressions. It can resolve rvalue references (by creating temporary variables) and detects unused/underused variables. But it is still a beta. But it makes a lot of fun. :D
Mar 21 2013
On Thursday, 21 March 2013 at 11:36:45 UTC, Namespace wrote:On Friday, 15 March 2013 at 08:20:18 UTC, Namespace wrote:Moved to https://github.com/Dgame/DAT/So far, my lexer is pure exercise. But my goal is actually to filter variables and functions, to see if they are ever used in the code.I'm almost finished. In my few tests, my little parser detects function, struct and variable declarations and expressions. It can resolve rvalue references (by creating temporary variables) and detects unused/underused variables. But it is still a beta. But it makes a lot of fun. :D
Mar 23 2013
On Saturday, 23 March 2013 at 15:39:50 UTC, Namespace wrote:On Thursday, 21 March 2013 at 11:36:45 UTC, Namespace wrote:It's time for a beta. Would be nice if someone takes a look at DAT or try it. I'm sure that there are still some bugs, but maybe this little recreational fun is indeed useful for one or the other. I had a lot of fun. It's tested with 'simple.d', std.stdio and std.datetime.On Friday, 15 March 2013 at 08:20:18 UTC, Namespace wrote:Moved to https://github.com/Dgame/DAT/So far, my lexer is pure exercise. But my goal is actually to filter variables and functions, to see if they are ever used in the code.I'm almost finished. In my few tests, my little parser detects function, struct and variable declarations and expressions. It can resolve rvalue references (by creating temporary variables) and detects unused/underused variables. But it is still a beta. But it makes a lot of fun. :D
Mar 24 2013
24-Mar-2013 20:08, Namespace пишет:On Saturday, 23 March 2013 at 15:39:50 UTC, Namespace wrote:Have no time - looks interesting but I highly doubt it can do what is promised. Still take time to do a small formal announcement for beta. Reusing the old and fizzled out thread of a D lexer to discuss a tool created on top of it is bound to have very little feedback. -- Dmitry OlshanskyOn Thursday, 21 March 2013 at 11:36:45 UTC, Namespace wrote:It's time for a beta. Would be nice if someone takes a look at DAT or try it. I'm sure that there are still some bugs, but maybe this little recreational fun is indeed useful for one or the other. I had a lot of fun. It's tested with 'simple.d', std.stdio and std.datetime.On Friday, 15 March 2013 at 08:20:18 UTC, Namespace wrote:Moved to https://github.com/Dgame/DAT/So far, my lexer is pure exercise. But my goal is actually to filter variables and functions, to see if they are ever used in the code.I'm almost finished. In my few tests, my little parser detects function, struct and variable declarations and expressions. It can resolve rvalue references (by creating temporary variables) and detects unused/underused variables. But it is still a beta. But it makes a lot of fun. :D
Mar 24 2013
Have no time - looks interesting but I highly doubt it can do what is promised. Still take time to do a small formal announcement for beta.Yes you're right, it's going to take some time before all would work in general. In my small test runs, it has, however, been proven, but D is also incredibly complex. I will also stop very soon to work on it, but maybe the code / the idea inspire someone else to write something more mature for the D community. This is also the reason why I'm asking in this fizzle thread for a feedback. :) But once you've taken a look on it, I would be happy if you could tell me what should be improved next time.
Mar 24 2013
On Sunday, 24 March 2013 at 16:54:22 UTC, Namespace wrote:I have similar goals for the Dscananer tool, so I may steal some of your ideas.Have no time - looks interesting but I highly doubt it can do what is promised. Still take time to do a small formal announcement for beta.Yes you're right, it's going to take some time before all would work in general. In my small test runs, it has, however, been proven, but D is also incredibly complex. I will also stop very soon to work on it, but maybe the code / the idea inspire someone else to write something more mature for the D community. This is also the reason why I'm asking in this fizzle thread for a feedback. :) But once you've taken a look on it, I would be happy if you could tell me what should be improved next time.
Mar 24 2013
On Sunday, 24 March 2013 at 22:16:45 UTC, Brian Schott wrote:On Sunday, 24 March 2013 at 16:54:22 UTC, Namespace wrote:I'm glad to hear that. If I can help let me know, would be happy to gain more knowledge in this area. And I would be glad if some of my ideas become reality.I have similar goals for the Dscananer tool, so I may steal some of your ideas.Have no time - looks interesting but I highly doubt it can do what is promised. Still take time to do a small formal announcement for beta.Yes you're right, it's going to take some time before all would work in general. In my small test runs, it has, however, been proven, but D is also incredibly complex. I will also stop very soon to work on it, but maybe the code / the idea inspire someone else to write something more mature for the D community. This is also the reason why I'm asking in this fizzle thread for a feedback. :) But once you've taken a look on it, I would be happy if you could tell me what should be improved next time.
Mar 24 2013
14-Mar-2013 21:20, Namespace пишет:Yes, and it's really fun. :) By the way, the code can be found here: https://github.com/Dgame/Puzzle/blob/master/DLexer.d Maybe some of you have an idea how I could improve anything. Last trace.log is also there: https://github.com/Dgame/Puzzle/blob/master/trace.log My goal is about 20 msecs. :DSo you got rid of array creation. About time ;) -- Dmitry Olshansky
Mar 14 2013
So you got rid of array creation. About time ;)Yes, that was the only way to get closer to your measured time. :D
Mar 15 2013
On Monday, 4 March 2013 at 18:10:48 UTC, Dmitry Olshansky wrote:Brain listed graphs showing under 20 ms on his CPU. for all and <10 for every module in std except std.datetime.https://github.com/Hackerpilot/Dscanner/blob/range-based-lexer/perftest.sh I checked in the script that I use to generate the data for the graph last night. You can copy/paste its output into a spreadsheet (using tabs as field separators).
Mar 04 2013
05-Mar-2013 01:35, Brian Schott пишет:On Monday, 4 March 2013 at 18:10:48 UTC, Dmitry Olshansky wrote:Great, thanks! I'll see if I can make a plot from it with std-dev shown as well. -- Dmitry OlshanskyBrain listed graphs showing under 20 ms on his CPU. for all and <10 for every module in std except std.datetime.https://github.com/Hackerpilot/Dscanner/blob/range-based-lexer/perftest.sh I checked in the script that I use to generate the data for the graph last night. You can copy/paste its output into a spreadsheet (using tabs as field separators).
Mar 04 2013
03-Mar-2013 20:19, Namespace пишет:Of course, not or the whole range concept is meaningless.Simple - don't use array append and don't produce and array. Just produce a lazy forward range that is iterated. At the very least use Appender!(Token[]) it ought to be much faster.But hasn't each range an array internally?Or how does that work?By having a struct or generally any object that in fact works as generator of tokens. popFront lexes new token, front gives the current token and empty indicate if there are no more tokens. See std.range for some simple ranges. If you throw in a .save method you can replay tokenization from some point in the past (= soem older state, think lookahead when parsing).I try to use Appender, but this lazy forward range sounds interesting.-- Dmitry Olshansky
Mar 03 2013
On Sunday, March 03, 2013 17:19:25 Namespace wrote:But hasn't each range an array internally?Many do, but some don't. Take std.algorithm.ioto, or the ranges in std.random for instance. They're generative. And if all ranges had arrays underneat the hood, infinite ranges wouldn't be possible except for something like std.algorithm.cycle. Something like struct Range { property int front() { return _i; } void popFront() {++i} property bool empty { return _i != int.max; } private int _i; } would be a valid range. In the case of a lexer, you generate the next token on each popFront call, so you consume the range of characters that you're lexing as you go but don't need to allocate memory for more than the current token at a time. - Jonathan M Davis
Mar 03 2013
03-Mar-2013 03:12, Namespace пишет:I noted it to late: decode comes from readText: readText validates your text. I will now use std.file.read.That's why Walter promotes profilers. They help verify hypothesis on performance and constantly surprise people in what they actually find.The new trace output is here and it seems you're right, isKeyword and isType (yes I'd decided to separate it again and keep both in the lexer) take a lot of time: http://dpaste.1azy.net/0d6aff6b 154874 106186 95403 0 pure nothrow bool Puzzle.Lexer.isKeyword(immutable(char)[]) 159688 78020 69289 0 pure nothrow bool Puzzle.Lexer.isType(immutable(char)[]) I never thought that. :)-- Dmitry Olshansky
Mar 02 2013