digitalmars.D.learn - Fuzzy string matching?
- Andrej Mitrovic (15/15) Jul 15 2011 Is there any such method in Phobos?
- dsmith (18/33) Jul 15 2011 Until recently, you could easily use std.regexp.search(target_string, fi...
- Jonathan M Davis (3/13) Jul 15 2011 std.regex is std.regexp's replacement.
- dsmith (9/17) Jul 15 2011 Could you demonstrate how to use std.regex for pattern matching, prefera...
- Jonathan M Davis (8/17) Jul 15 2011 I'm afraid that I've never used either std.regexp or std.regex, so I'm n...
- Dmitry Olshansky (17/35) Jul 16 2011 Your and apparently an awful lot of people hit this, the thing is that
- Mike Wey (5/20) Jul 16 2011 You could try stc.algorithm.levenshteinDistance to check how much the
Is there any such method in Phobos? I have to rename some files based on a string array of known names which need to be fuzzy-matched to file names and then rename the files to the matches. E.g.: string[] strings = ["food", "lamborghini", "architecture"] files on system: .\foo.ext .\lmbrghinione.ext .\archtwo.ext and if there's a fuzzy match then the matched files would be renamed to: .\food.ext .\lamborghini.ext .\architecture.ext Perhaps there's a C library I can use for this?
Jul 15 2011
Until recently, you could easily use std.regexp.search(target_string, find_string), but regexp is apparently no longer in phobos. I seek a simple substitute. std.algorithm.canFind might work, as it is bool. Maybe try something like: foreach(str; strings) foreach(fls; system_files) if(std.algorithm.canFind(fls, str)) // usage needs verification str ~= ".ext"; == Repost the article of Jonathan M Davis (jmdavisProg gmx.com) == Posted at 2011/07/15 22:03 to digitalmars.D.learn On Saturday 16 July 2011 01:17:36 Andrej Mitrovic wrote:On Saturday 16 July 2011 05:07:38 dsmith wrote:Until recently, you could easily use std.regexp.search(target_string, find_string), but regexp is apparently no longer in phobos. I seek a simple substitute. std.algorithm.canFind might work, as it is bool. Maybe try something like: foreach(str; strings) foreach(fls; system_files) if(std.algorithm.canFind(fls, str)) // usage needs verification str ~= ".ext";std.regex is std.regexp's replacement. - Jonathan M davisJul 15 2011Could you demonstrate how to use std.regex for pattern matching, preferably with a bool method? My usage of std.regex.match yields this error: core.exception.AssertError /usr/include/d/dmd/phobos/std/regex.d(1796): 4294967295 .. 4294967295 vs. 5 My usage is: auto m = match(long_string, regex(str)); writeln(m.hit); == Repost the article of Jonathan M Davis (jmdavisProg gmx.com) == Posted at 2011/07/16 01:08 to digitalmars.D.learnOn Saturday 16 July 2011 06:17:56 dsmith wrote:Could you demonstrate how to use std.regex for pattern matching, preferably with a bool method? My usage of std.regex.match yields this error: core.exception.AssertError /usr/include/d/dmd/phobos/std/regex.d(1796): 4294967295 .. 4294967295 vs. 5 My usage is: auto m = match(long_string, regex(str)); writeln(m.hit);I'm afraid that I've never used either std.regexp or std.regex, so I'm not familiar with the usage of either one. There's every chance that this is a bug rather than misuse on your part. I'd advise posting a question about it separately (so that people are more likely to see it) with an appropriate subject, and there's a decent chance that someone who's actually familiar with std.regex will answer your question. - Jonathan M DavisJul 15 2011On 16.07.2011 10:17, dsmith wrote:Could you demonstrate how to use std.regex for pattern matching, preferably with a bool method? My usage of std.regex.match yields this error: core.exception.AssertError /usr/include/d/dmd/phobos/std/regex.d(1796): 4294967295 .. 4294967295 vs. 5 My usage is: auto m = match(long_string, regex(str)); writeln(m.hit);Your and apparently an awful lot of people hit this, the thing is that .hit method is returning _matched slice_ of string if there is a match and asserts otherwise. (there is also issue of this assert having message is of a _very_ poor quality) As it stands now regex works like ranges: you need to check if it was empty then use it, so if all you want to do is a test: auto m = match(long_string, regex(str)); writeln(!m.empty); // substitute for "there was match" Thinking more about this, it should be in synopsis part of std.regex in docs on d-p-l.org. Along with something like: foreach(m; match("abc", regex("\w", "g")) //uses range syntax to iterate over all matches (so empty is checked) writeln(m.hit); // here m.hit is guaranteed to hold something (and not asserting)== Repost the article of Jonathan M Davis (jmdavisProg gmx.com) == Posted at 2011/07/16 01:08 to digitalmars.D.learn On�Saturday�16�July�2011�05:07:38�dsmith�wrote:-- Dmitry Olshansky�Until�recently,�you�could�easily�use�std.regexp.search(target_string, �find_string),�but�regexp�is�apparently�no�longer�in�phobos.��I�seek�a �simple�substitute.��std.algorithm.canFind�might�work,�as�it�is�bool. �Maybe�try�something�like: �foreach(str;�strings) �����foreach(fls;�system_files) ���������if(std.algorithm.canFind(fls,�str))���//�usage�needs�verification str�~=�".ext";std.regex�is�std.regexp's�replacement. -�Jonathan�M�davisJul 16 2011On 07/16/2011 01:17 AM, Andrej Mitrovic wrote:Is there any such method in Phobos? I have to rename some files based on a string array of known names which need to be fuzzy-matched to file names and then rename the files to the matches. E.g.: string[] strings = ["food", "lamborghini", "architecture"] files on system: .\foo.ext .\lmbrghinione.ext .\archtwo.ext and if there's a fuzzy match then the matched files would be renamed to: .\food.ext .\lamborghini.ext .\architecture.ext Perhaps there's a C library I can use for this?You could try stc.algorithm.levenshteinDistance to check how much the two names differ. -- Mike WeyJul 16 2011