www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - regex format string problem

reply yawniek <dlang srtnwz.com> writes:
hi!

how can i format  a string with captures from a regular 
expression?
basically make this pass:
https://gist.github.com/f17647fb2f8ff2261d42


context: i'm trying to write a implementation for 
https://github.com/ua-parser
where the regular expression as well as the format strings are 
given.
Nov 22 2015
parent reply Rikki Cattermole <alphaglosined gmail.com> writes:
On 23/11/15 12:41 PM, yawniek wrote:
 hi!

 how can i format  a string with captures from a regular expression?
 basically make this pass:
 https://gist.github.com/f17647fb2f8ff2261d42


 context: i'm trying to write a implementation for
 https://github.com/ua-parser
 where the regular expression as well as the format strings are given.
I take it that browscap[0] does it not do what you want? I have an generator at [1]. Feel free to steal. Also once you do get yours working, you'll want to use ctRegex and generate a file with all of them in it. That'll increase performance significantly. Reguarding regex, if you want a named sub part use: (?<text>[a-z]*) Where [a-z]* is just an example. I would recommend you learning how input ranges work. They are used with how to get the matches out, e.g. auto rgx = ctRegex!`([a-z])[123]`; foreach(match; rgx.matchAll("b3")) { writeln(match.hit); } Or something along those lines, I did it off the top of my head. [0] https://github.com/rikkimax/Cmsed/blob/master/tools/browser_detection/browscap.ini [1] https://github.com/rikkimax/Cmsed/blob/master/tools/browser_detection/generator.d
Nov 22 2015
parent reply yawniek <dlang srtnwz.com> writes:
Hi Rikki,

On Monday, 23 November 2015 at 03:57:06 UTC, Rikki Cattermole 
wrote:
 I take it that browscap[0] does it not do what you want?
 I have an generator at [1].
 Feel free to steal.
This looks interesting, thanks for the hint. However it might be a bit limited, i have 15M+ different User Agents with all kind of weird cases, sometimes not even the extensive ua-core regexs work. (if you're interested for testing let me know)
 Also once you do get yours working, you'll want to use ctRegex 
 and generate a file with all of them in it. That'll increase 
 performance significantly.
that was my plan.
 Reguarding regex, if you want a named sub part use:
 (?<text>[a-z]*)
 Where [a-z]* is just an example.

 I would recommend you learning how input ranges work. They are 
 used with how to get the matches out, e.g.

 auto rgx = ctRegex!`([a-z])[123]`;
 foreach(match; rgx.matchAll("b3")) {
     writeln(match.hit);
 }
i'm aware how this works, the problem is a different one: i do have a second string that contains $n's which can occur in any order. now of course i can just go and write another regex and replace it, job done. but from looking at std.regex this seems to be built in, i just failed to get it to work properly, see my gist. i hoped this to be a 1liner.
Nov 23 2015
parent Rikki Cattermole <alphaglosined gmail.com> writes:
On 23/11/15 9:22 PM, yawniek wrote:
 Hi Rikki,

 On Monday, 23 November 2015 at 03:57:06 UTC, Rikki Cattermole wrote:
 I take it that browscap[0] does it not do what you want?
 I have an generator at [1].
 Feel free to steal.
This looks interesting, thanks for the hint. However it might be a bit limited, i have 15M+ different User Agents with all kind of weird cases, sometimes not even the extensive ua-core regexs work. (if you're interested for testing let me know)
 Also once you do get yours working, you'll want to use ctRegex and
 generate a file with all of them in it. That'll increase performance
 significantly.
that was my plan.
 Reguarding regex, if you want a named sub part use:
 (?<text>[a-z]*)
 Where [a-z]* is just an example.

 I would recommend you learning how input ranges work. They are used
 with how to get the matches out, e.g.

 auto rgx = ctRegex!`([a-z])[123]`;
 foreach(match; rgx.matchAll("b3")) {
     writeln(match.hit);
 }
i'm aware how this works, the problem is a different one: i do have a second string that contains $n's which can occur in any order. now of course i can just go and write another regex and replace it, job done. but from looking at std.regex this seems to be built in, i just failed to get it to work properly, see my gist. i hoped this to be a 1liner.
So like this? import std.regex; import std.stdio : readln, writeln, write, stdout; auto REG = ctRegex!(`(\S+)(?: (.*))?`); void main() { for(;;) { write("> "); stdout.flush; string line = readln(); line.length--; if (line.length == 0) return; writeln("< ", line.replaceAll(REG, "Unknown program: $1")); } }
Nov 23 2015