www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Parse a String given some delimiters

reply Alfred Newman <alfredonewman gmail.com> writes:
Hello,

I'm migrating some Python code to D, but I stuck at a dead end...

Sorry to provide some .py lines over here, but I got some doubts 
about the best (fastest) way to do that in D.

Executing the function parsertoken("_My   input.string", " _,.", 
2) will result "input".
Parsercount("Dlang=-rocks!", " =-") will result 2,

def parsertoken(istring, idelimiters, iposition):
"""
Return a specific token of a given input string,
considering its position and the provided delimiters

:param istring: raw input string
:param idelimiteres: delimiters to split the tokens
:param iposition: position of the token
:return: token
"""
	vlist=''.join([s if s not in idelimiters else ' ' for s in 
istring]).split()
	return vlist[vposition]

def parsercount(istring, idelimiters):
"""
Return the number of tokens at the input string
considering the delimiters provided

:param istring: raw input string
:param idelimiteres: delimiters to split the tokens
:return: a list with all the tokens found
"""
	vlist=''.join([s if s not in vdelimiters else ' ' for s in 
istring]).split()
	return len(vlist)-1
	
		
Thanks in advance
Oct 30 2016
next sibling parent Lodovico Giaretta <lodovico giaretart.net> writes:
On Sunday, 30 October 2016 at 20:50:47 UTC, Alfred Newman wrote:
 Hello,

 I'm migrating some Python code to D, but I stuck at a dead 
 end...

 Sorry to provide some .py lines over here, but I got some 
 doubts about the best (fastest) way to do that in D.

 Executing the function parsertoken("_My   input.string", " 
 _,.", 2) will result "input".
 Parsercount("Dlang=-rocks!", " =-") will result 2,

 Thanks in advance
You can take inspiration from the following snippet: ============================================= import std.stdio, std.regex; void main() { string s = "Hello.World !"; auto ss = split(s, regex(`\.| `)); ss.length.writeln; ss[1].writeln; } =============================================
Oct 30 2016
prev sibling next sibling parent sarn <sarn theartofmachinery.com> writes:
On Sunday, 30 October 2016 at 20:50:47 UTC, Alfred Newman wrote:
 Hello,

 I'm migrating some Python code to D, but I stuck at a dead 
 end...

 Sorry to provide some .py lines over here, but I got some 
 doubts about the best (fastest) way to do that in D.
The "splitter" generic function sounds like what you want. The basic versions use a fixed separator, but you can make more complex separators using either the regex version, or the function version. The function version is simplest for what you're doing. Check out the first example here: https://dlang.org/phobos/std_algorithm_iteration.html#.splitter.3 (You'd use "among" instead of plain ==) Also check out "walkLength" for getting the number of tokens. However, if you really care about speed, I suggest changing the API. With your API, if you want to get multiple tokens from a string, you have to split the string every single time. Why not just return the full range? You can use "array" to return a proper array instead of an ad hoc range struct.
Oct 30 2016
prev sibling parent reply =?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:
On 10/30/2016 01:50 PM, Alfred Newman wrote:
 Hello,

 I'm migrating some Python code to D, but I stuck at a dead end...

 Sorry to provide some .py lines over here, but I got some doubts about
 the best (fastest) way to do that in D.

 Executing the function parsertoken("_My   input.string", " _,.", 2) will
 result "input".
 Parsercount("Dlang=-rocks!", " =-") will result 2,

 def parsertoken(istring, idelimiters, iposition):
 """
 Return a specific token of a given input string,
 considering its position and the provided delimiters

 :param istring: raw input string
 :param idelimiteres: delimiters to split the tokens
 :param iposition: position of the token
 :return: token
 """
     vlist=''.join([s if s not in idelimiters else ' ' for s in
 istring]).split()
     return vlist[vposition]

 def parsercount(istring, idelimiters):
 """
 Return the number of tokens at the input string
 considering the delimiters provided

 :param istring: raw input string
 :param idelimiteres: delimiters to split the tokens
 :return: a list with all the tokens found
 """
     vlist=''.join([s if s not in vdelimiters else ' ' for s in
 istring]).split()
     return len(vlist)-1


 Thanks in advance
Here is something along the lines of what others have suggested: auto parse(R, S)(R range, S separators) { import std.algorithm : splitter, filter, canFind; import std.range : empty; return range.splitter!(c => separators.canFind(c)).filter!(token => !token.empty); } unittest { import std.algorithm : equal; assert(parse("_My input.string", " _,.").equal([ "My", "input", "string" ])); } auto parsertoken(R, S)(R range, S separator, size_t position) { import std.range : drop; return parse(range, separator).drop(position).front; } unittest { import std.algorithm : equal; assert(parsertoken("_My input.string", " _,.", 1).equal("input")); } auto parsercount(R, S)(R range, S separator) { import std.algorithm : count; return parse(range, separator).count; } unittest { assert(parsercount("Dlang=-rocks!", " =-") == 2); } void main() { } Ali
Oct 30 2016
parent Alfred Newman <alfredonewman gmail.com> writes:
On Sunday, 30 October 2016 at 23:47:54 UTC, Ali Çehreli wrote:
 On 10/30/2016 01:50 PM, Alfred Newman wrote:
 [...]
Here is something along the lines of what others have suggested: auto parse(R, S)(R range, S separators) { import std.algorithm : splitter, filter, canFind; import std.range : empty; [...]
Thank you all. Ali, that's exactly what I was looking for. The phobos is awesome (and pretty huge). Cheers
Oct 30 2016