www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - path matching problem

reply Charles Hixson <charleshixsn earthlink.net> writes:
Is there a better way to do this?  (I want to find files that match any 
of some extensions and don't match any of several other strings, or are 
not in some directories.):

  import	std.file;

...

  string  exts  =  "*.{txt,utf8,utf-8,TXT,UTF8,UTF-8}";
  string[]  exclude  =  ["/template/",  "biblio.txt",  "categories.txt",
         "subjects.txt",  "/toCDROM/"]

  int  limit  =  1
  //  Iterate  a  directory  in  depth
  foreach  (string  name;  dirEntries(sDir,  exts,  SpanMode.depth))
  {  bool  excl  =  false;
     foreach  (string  part;  exclude)
     {  if  (part  in  name)
        {  excl  =  true;
           break;
        }
     }
     if  (excl)  break;
etc.
Nov 27 2012
next sibling parent reply "Joshua Niehus" <jm.niehus gmail.com> writes:
On Tuesday, 27 November 2012 at 19:40:56 UTC, Charles Hixson 
wrote:
 Is there a better way to do this?  (I want to find files that 
 match any of some extensions and don't match any of several 
 other strings, or are not in some directories.):

  import	std.file;

 ...

  string  exts  =  "*.{txt,utf8,utf-8,TXT,UTF8,UTF-8}";
  string[]  exclude  =  ["/template/",  "biblio.txt",  
 "categories.txt",
         "subjects.txt",  "/toCDROM/"]

  int  limit  =  1
  //  Iterate  a  directory  in  depth
  foreach  (string  name;  dirEntries(sDir,  exts,  
 SpanMode.depth))
  {  bool  excl  =  false;
     foreach  (string  part;  exclude)
     {  if  (part  in  name)
        {  excl  =  true;
           break;
        }
     }
     if  (excl)  break;
 etc.
maybe this:? import std.algorithm, std.array, std.regex; import std.stdio, std.file; void main() { enum string[] exts = [`".txt"`, `".utf8"`, `".utf-8"`, `".TXT"`, `".UTF8"`, `".UTF-8"`]; enum string exclude = `r"/template/|biblio\.txt|categories\.txt|subjects\.txt|/toCDROM/"`; auto x = dirEntries("/path", SpanMode.depth) .filter!(`endsWith(a.name,` ~ exts.join(",") ~ `)`) .filter!(`std.regex.match(a.name,` ~ exclude ~ `).empty`);; writeln(x); }
Nov 27 2012
parent reply Charles Hixson <charleshixsn earthlink.net> writes:
On 11/27/2012 01:31 PM, Joshua Niehus wrote:
 On Tuesday, 27 November 2012 at 19:40:56 UTC, Charles Hixson wrote:
 Is there a better way to do this? (I want to find files that match any
 of some extensions and don't match any of several other strings, or
 are not in some directories.):

 import std.file;

 ...

 string exts = "*.{txt,utf8,utf-8,TXT,UTF8,UTF-8}";
 string[] exclude = ["/template/", "biblio.txt", "categories.txt",
 "subjects.txt", "/toCDROM/"]

 int limit = 1
 // Iterate a directory in depth
 foreach (string name; dirEntries(sDir, exts, SpanMode.depth))
 { bool excl = false;
 foreach (string part; exclude)
 { if (part in name)
 { excl = true;
 break;
 }
 }
 if (excl) break;
 etc.
maybe this:? import std.algorithm, std.array, std.regex; import std.stdio, std.file; void main() { enum string[] exts = [`".txt"`, `".utf8"`, `".utf-8"`, `".TXT"`, `".UTF8"`, `".UTF-8"`]; enum string exclude = `r"/template/|biblio\.txt|categories\.txt|subjects\.txt|/toCDROM/"`; auto x = dirEntries("/path", SpanMode.depth) .filter!(`endsWith(a.name,` ~ exts.join(",") ~ `)`) .filter!(`std.regex.match(a.name,` ~ exclude ~ `).empty`);; writeln(x); }
That's a good approach, except that I want to step through the matching paths rather than accumulate them in a vector...though ... the filter documentation could mean that it would return an iterator. So I could replace writeln (x); by foreach (string name; x) { ... } and x wouldn't have to hold all the matching strings at the same time. But why the chained filters, rather than using the option provided by dirEntries for one of them? Is it faster? Just the way you usually do things? (Which I accept as a legitimate answer. I can see that that approach would be more flexible.)
Nov 27 2012
parent "Joshua Niehus" <jm.niehus gmail.com> writes:
On Tuesday, 27 November 2012 at 23:43:43 UTC, Charles Hixson 
wrote:
 But why the chained filters, rather than using the option 
 provided by dirEntries for one of them?  Is it faster?  Just 
 the way you usually do things? (Which I accept as a legitimate 
 answer.  I can see that that approach would be more flexible.)
Ignorance... Your right, I didn't realize that dirEntries had that filter option, you should use that. I doubt the double .filter would effect performance at all (might even slow it down for all i know :) //update: import std.algorithm, std.array, std.regex; import std.stdio, std.file; void main() { string exts = "*.{txt,utf8,utf-8,TXT,UTF8,UTF-8}"; enum string exclude = `r"/template/|biblio\.txt|categories\.txt|subjects\.txt|/toCDROM/"`; dirEntries("/path", exts, SpanMode.depth) .filter!(` std.regex.match(a.name,` ~ exclude ~ `).empty `) .writeln(); }
Nov 27 2012
prev sibling parent reply "jerro" <a a.com> writes:
On Tuesday, 27 November 2012 at 19:40:56 UTC, Charles Hixson 
wrote:
 Is there a better way to do this?  (I want to find files that 
 match any of some extensions and don't match any of several 
 other strings, or are not in some directories.):

  import	std.file;

 ...

  string  exts  =  "*.{txt,utf8,utf-8,TXT,UTF8,UTF-8}";
  string[]  exclude  =  ["/template/",  "biblio.txt",  
 "categories.txt",
         "subjects.txt",  "/toCDROM/"]

  int  limit  =  1
  //  Iterate  a  directory  in  depth
  foreach  (string  name;  dirEntries(sDir,  exts,  
 SpanMode.depth))
  {  bool  excl  =  false;
     foreach  (string  part;  exclude)
     {  if  (part  in  name)
        {  excl  =  true;
           break;
        }
     }
     if  (excl)  break;
 etc.
You could replace the inner loop with somehting like: bool excl = exclude.any!(part => name.canFind(part)); There may be even some easier way to do it, take a look at std.algorithm.
Nov 27 2012
parent reply Charles Hixson <charleshixsn earthlink.net> writes:
On 11/27/2012 01:34 PM, jerro wrote:
 On Tuesday, 27 November 2012 at 19:40:56 UTC, Charles Hixson wrote:
 Is there a better way to do this? (I want to find files that match any
 of some extensions and don't match any of several other strings, or
 are not in some directories.):

 import std.file;

 ...

 string exts = "*.{txt,utf8,utf-8,TXT,UTF8,UTF-8}";
 string[] exclude = ["/template/", "biblio.txt", "categories.txt",
 "subjects.txt", "/toCDROM/"]

 int limit = 1
 // Iterate a directory in depth
 foreach (string name; dirEntries(sDir, exts, SpanMode.depth))
 { bool excl = false;
 foreach (string part; exclude)
 { if (part in name)
 { excl = true;
 break;
 }
 }
 if (excl) break;
 etc.
You could replace the inner loop with somehting like: bool excl = exclude.any!(part => name.canFind(part)); There may be even some easier way to do it, take a look at std.algorithm.
std.algorithm seems to generally be running the match in the opposite direction, if I'm understanding it properly. (Dealing with D template is always confusing to me.) OTOH, I couldn't find the string any method, so I'm not really sure what you're proposing, though it does look attractive. Still, though your basic approach sounds good, the suggestion of Joshua Niehus would let me filter out the strings that didn't fit before entering the loop. There's probably no real advantage to doing it that way, but it does seem more elegant. (You were right, though. That is in std.algorithms.)
Nov 27 2012
parent reply "jerro" <a a.com> writes:
 You could replace the inner loop with somehting like:

 bool excl = exclude.any!(part => name.canFind(part));
 std.algorithm seems to generally be running the match in the 
 opposite direction, if I'm understanding it properly.  (Dealing 
 with D template is always confusing to me.)  OTOH, I couldn't 
 find the string any method, so I'm not really sure what you're 
 proposing, though it does look attractive.
I don't understand what you mean with running the match in the opposite direction, but I'll explain how my line of code works. First of all, it is equivalent to: any!(part => canFind(name, part))(exclude); The feature that that lets you write that in the way I did in my previous post is called uniform function call syntax (often abbreviated to UFCS) and is described at http://www.drdobbs.com/cpp/uniform-function-call-syntax/232700394. canFind(name, part) returns true if name contains part. (part => canFind(name, part)) is a short syntax for (part){ return canFind(name, part); } any!(condition)(range) returns true if condition is true for any element of range So the line of code in my previous post sets excl to true if name contains any of the strings in exclude. If you know all the strings you want to exclude in advance, it is easier to do that with a regex like Joshua did. If you want to learn about D templates, try this tutorial: https://github.com/PhilippeSigaud/D-templates-tutorial/blob/master/dtemplates.pdf?raw=true
 Still, though your basic approach sounds good, the suggestion 
 of Joshua Niehus would let me filter out the strings that 
 didn't fit before entering the loop.  There's probably no real 
 advantage to doing it that way, but it does seem more elegant.
I agree, it is more elegant.
Nov 27 2012
parent reply Charles Hixson <charleshixsn earthlink.net> writes:
On 11/27/2012 06:45 PM, jerro wrote:
 You could replace the inner loop with somehting like:

 bool excl = exclude.any!(part => name.canFind(part));
 std.algorithm seems to generally be running the match in the opposite
 direction, if I'm understanding it properly. (Dealing with D template
 is always confusing to me.) OTOH, I couldn't find the string any
 method, so I'm not really sure what you're proposing, though it does
 look attractive.
I don't understand what you mean with running the match in the opposite direction, but I'll explain how my line of code works. First of all, it is equivalent to: any!(part => canFind(name, part))(exclude); The feature that that lets you write that in the way I did in my previous post is called uniform function call syntax (often abbreviated to UFCS) and is described at http://www.drdobbs.com/cpp/uniform-function-call-syntax/232700394. canFind(name, part) returns true if name contains part. (part => canFind(name, part)) is a short syntax for (part){ return canFind(name, part); } any!(condition)(range) returns true if condition is true for any element of range So the line of code in my previous post sets excl to true if name contains any of the strings in exclude. If you know all the strings you want to exclude in advance, it is easier to do that with a regex like Joshua did. If you want to learn about D templates, try this tutorial: https://github.com/PhilippeSigaud/D-templates-tutorial/blob/master/dtemplates.pdf?raw=true
 Still, though your basic approach sounds good, the suggestion of
 Joshua Niehus would let me filter out the strings that didn't fit
 before entering the loop. There's probably no real advantage to doing
 it that way, but it does seem more elegant.
I agree, it is more elegant.
Thanks for the tutorial link, I'll give it a try. (Whee! A 182 page tutorial!) Those things, though, don't seem to stick in my mind. I learned programming in FORTRAN IV, and I don't seem to be able to force either templates, Scheme, or Haskell into my way of thinking about programming. (Interestingly, classes and structured programming fit without problems.) The link to the Walter article in Dr. Dobbs is interesting. I intend to read it first. OTOH, I still don't know where "any" is documented. It's clearly some sort of template instantiation, but it doesn't seem to be defined in either std.string or std.object (or anywhere else I've thought to check). And it look as if it would be something very useful to know.
Nov 28 2012
next sibling parent Philippe Sigaud <philippe.sigaud gmail.com> writes:
 Thanks for the tutorial link, I'll give it a try. (Whee!  A 182 page
 tutorial!)
Well, it *started* as a tutorial. Then people sent me code :)
Nov 28 2012
prev sibling parent "jerro" <a a.com> writes:
 OTOH, I still don't know where "any" is documented.  It's 
 clearly some sort of template instantiation, but it doesn't 
 seem to be defined in either std.string or std.object (or 
 anywhere else I've thought to check).  And it look as if it 
 would be something very useful to know.
It's documented here: http://dlang.org/phobos/std_algorithm.html#any
Nov 28 2012