digitalmars.D.learn - path matching problem

Charles Hixson (20/20) Nov 27 2012 Is there a better way to do this? (I want to find files that match any

Joshua Niehus (17/39) Nov 27 2012 maybe this:?

Charles Hixson (16/55) Nov 27 2012 That's a good approach, except that I want to step through the matching

Joshua Niehus (20/24) Nov 27 2012 Ignorance...

jerro (6/28) Nov 27 2012 You could replace the inner loop with somehting like:

Charles Hixson (11/39) Nov 27 2012 std.algorithm seems to generally be running the match in the opposite

jerro (20/32) Nov 27 2012 I don't understand what you mean with running the match in the

Charles Hixson (13/45) Nov 28 2012 Thanks for the tutorial link, I'll give it a try. (Whee! A 182 page

Philippe Sigaud (1/3) Nov 28 2012 Well, it *started* as a tutorial. Then people sent me code :)
jerro (2/7) Nov 28 2012 It's documented here:

Charles Hixson <charleshixsn earthlink.net> writes:

Is there a better way to do this?  (I want to find files that match any 
of some extensions and don't match any of several other strings, or are 
not in some directories.):

  import	std.file;

...

  string  exts  =  "*.{txt,utf8,utf-8,TXT,UTF8,UTF-8}";
  string[]  exclude  =  ["/template/",  "biblio.txt",  "categories.txt",
         "subjects.txt",  "/toCDROM/"]

  int  limit  =  1
  //  Iterate  a  directory  in  depth
  foreach  (string  name;  dirEntries(sDir,  exts,  SpanMode.depth))
  {  bool  excl  =  false;
     foreach  (string  part;  exclude)
     {  if  (part  in  name)
        {  excl  =  true;
           break;
        }
     }
     if  (excl)  break;
etc.

Nov 27 2012

"Joshua Niehus" <jm.niehus gmail.com> writes:

On Tuesday, 27 November 2012 at 19:40:56 UTC, Charles Hixson 
wrote:
 Is there a better way to do this?  (I want to find files that 
 match any of some extensions and don't match any of several 
 other strings, or are not in some directories.):

  import	std.file;

 ...

  string  exts  =  "*.{txt,utf8,utf-8,TXT,UTF8,UTF-8}";
  string[]  exclude  =  ["/template/",  "biblio.txt",  
 "categories.txt",
         "subjects.txt",  "/toCDROM/"]

  int  limit  =  1
  //  Iterate  a  directory  in  depth
  foreach  (string  name;  dirEntries(sDir,  exts,  
 SpanMode.depth))
  {  bool  excl  =  false;
     foreach  (string  part;  exclude)
     {  if  (part  in  name)
        {  excl  =  true;
           break;
        }
     }
     if  (excl)  break;
 etc.

maybe this:?

import std.algorithm, std.array, std.regex;
import std.stdio, std.file;
void main()
{
     enum string[] exts  =  [`".txt"`, `".utf8"`, `".utf-8"`, 
`".TXT"`, `".UTF8"`, `".UTF-8"`];
     enum string exclude = 
`r"/template/|biblio\.txt|categories\.txt|subjects\.txt|/toCDROM/"`;

     auto x = dirEntries("/path", SpanMode.depth)
         .filter!(`endsWith(a.name,` ~ exts.join(",") ~ `)`)
         .filter!(`std.regex.match(a.name,` ~ exclude ~ 
`).empty`);;

     writeln(x);
}

Nov 27 2012

Charles Hixson <charleshixsn earthlink.net> writes:

On 11/27/2012 01:31 PM, Joshua Niehus wrote:
 On Tuesday, 27 November 2012 at 19:40:56 UTC, Charles Hixson wrote:
 Is there a better way to do this? (I want to find files that match any
 of some extensions and don't match any of several other strings, or
 are not in some directories.):

 import std.file;

 ...

 string exts = "*.{txt,utf8,utf-8,TXT,UTF8,UTF-8}";
 string[] exclude = ["/template/", "biblio.txt", "categories.txt",
 "subjects.txt", "/toCDROM/"]

 int limit = 1
 // Iterate a directory in depth
 foreach (string name; dirEntries(sDir, exts, SpanMode.depth))
 { bool excl = false;
 foreach (string part; exclude)
 { if (part in name)
 { excl = true;
 break;
 }
 }
 if (excl) break;
 etc.

 maybe this:?

 import std.algorithm, std.array, std.regex;
 import std.stdio, std.file;
 void main()
 {
 enum string[] exts = [`".txt"`, `".utf8"`, `".utf-8"`, `".TXT"`,
 `".UTF8"`, `".UTF-8"`];
 enum string exclude =
 `r"/template/|biblio\.txt|categories\.txt|subjects\.txt|/toCDROM/"`;

 auto x = dirEntries("/path", SpanMode.depth)
 .filter!(`endsWith(a.name,` ~ exts.join(",") ~ `)`)
 .filter!(`std.regex.match(a.name,` ~ exclude ~ `).empty`);;

 writeln(x);
 }

That's a good approach, except that I want to step through the matching 
paths rather than accumulate them in a vector...though ... the filter 
documentation could mean that it would return an iterator.  So I could 
replace
writeln (x);
by
foreach (string name; x)
{
	...
}
and x wouldn't have to hold all the matching strings at the same time.

But why the chained filters, rather than using the option provided by 
dirEntries for one of them?  Is it faster?  Just the way you usually do 
things? (Which I accept as a legitimate answer.  I can see that that 
approach would be more flexible.)

Nov 27 2012

"Joshua Niehus" <jm.niehus gmail.com> writes:

On Tuesday, 27 November 2012 at 23:43:43 UTC, Charles Hixson 
wrote:
 But why the chained filters, rather than using the option 
 provided by dirEntries for one of them?  Is it faster?  Just 
 the way you usually do things? (Which I accept as a legitimate 
 answer.  I can see that that approach would be more flexible.)

Ignorance...
Your right, I didn't realize that dirEntries had that filter 
option, you should use that.  I doubt the double .filter would 
effect performance at all (might even slow it down for all i know 
:)

//update:
import std.algorithm, std.array, std.regex;
import std.stdio, std.file;
void main()
{
   string exts = "*.{txt,utf8,utf-8,TXT,UTF8,UTF-8}";
   enum string exclude =
     
`r"/template/|biblio\.txt|categories\.txt|subjects\.txt|/toCDROM/"`;

   dirEntries("/path", exts, SpanMode.depth)
     .filter!(` std.regex.match(a.name,` ~ exclude ~ `).empty `)
     .writeln();
}

Nov 27 2012

"jerro" <a a.com> writes:

On Tuesday, 27 November 2012 at 19:40:56 UTC, Charles Hixson 
wrote:
 Is there a better way to do this?  (I want to find files that 
 match any of some extensions and don't match any of several 
 other strings, or are not in some directories.):

  import	std.file;

 ...

  string  exts  =  "*.{txt,utf8,utf-8,TXT,UTF8,UTF-8}";
  string[]  exclude  =  ["/template/",  "biblio.txt",  
 "categories.txt",
         "subjects.txt",  "/toCDROM/"]

  int  limit  =  1
  //  Iterate  a  directory  in  depth
  foreach  (string  name;  dirEntries(sDir,  exts,  
 SpanMode.depth))
  {  bool  excl  =  false;
     foreach  (string  part;  exclude)
     {  if  (part  in  name)
        {  excl  =  true;
           break;
        }
     }
     if  (excl)  break;
 etc.

You could replace the inner loop with somehting like:

bool excl = exclude.any!(part => name.canFind(part));

There may be even some easier way to do it, take a look at 
std.algorithm.

Nov 27 2012

Charles Hixson <charleshixsn earthlink.net> writes:

On 11/27/2012 01:34 PM, jerro wrote:
 On Tuesday, 27 November 2012 at 19:40:56 UTC, Charles Hixson wrote:
 Is there a better way to do this? (I want to find files that match any
 of some extensions and don't match any of several other strings, or
 are not in some directories.):

 import std.file;

 ...

 string exts = "*.{txt,utf8,utf-8,TXT,UTF8,UTF-8}";
 string[] exclude = ["/template/", "biblio.txt", "categories.txt",
 "subjects.txt", "/toCDROM/"]

 int limit = 1
 // Iterate a directory in depth
 foreach (string name; dirEntries(sDir, exts, SpanMode.depth))
 { bool excl = false;
 foreach (string part; exclude)
 { if (part in name)
 { excl = true;
 break;
 }
 }
 if (excl) break;
 etc.

 You could replace the inner loop with somehting like:

 bool excl = exclude.any!(part => name.canFind(part));

 There may be even some easier way to do it, take a look at std.algorithm.

std.algorithm seems to generally be running the match in the opposite 
direction, if I'm understanding it properly.  (Dealing with D template 
is always confusing to me.)  OTOH, I couldn't find the string any 
method, so I'm not really sure what you're proposing, though it does 
look attractive.

Still, though your basic approach sounds good, the suggestion of Joshua 
Niehus would let me filter out the strings that didn't fit before 
entering the loop.  There's probably no real advantage to doing it that 
way, but it does seem more elegant.  (You were right, though.  That is 
in std.algorithms.)

Nov 27 2012

"jerro" <a a.com> writes:

 You could replace the inner loop with somehting like:

 bool excl = exclude.any!(part => name.canFind(part));


 std.algorithm seems to generally be running the match in the 
 opposite direction, if I'm understanding it properly.  (Dealing 
 with D template is always confusing to me.)  OTOH, I couldn't 
 find the string any method, so I'm not really sure what you're 
 proposing, though it does look attractive.

I don't understand what you mean with running the match in the 
opposite direction, but I'll explain how my line of code works. 
First of all, it is equivalent to:

any!(part => canFind(name, part))(exclude);

The feature that that lets you write that in the way I did in my 
previous post is called uniform function call syntax (often 
abbreviated to UFCS) and is described at 
http://www.drdobbs.com/cpp/uniform-function-call-syntax/232700394.

canFind(name, part) returns true if name contains part.

(part => canFind(name, part)) is a short syntax for (part){ 
return canFind(name, part); }

any!(condition)(range) returns true if condition is true for any 
element of range

So the line of code in my previous post sets excl to true if name 
contains any of the strings in exclude. If you know all the 
strings you want to exclude in advance, it is easier to do that 
with a regex like Joshua did.

If you want to learn about D templates, try this tutorial:

https://github.com/PhilippeSigaud/D-templates-tutorial/blob/master/dtemplates.pdf?raw=true

 Still, though your basic approach sounds good, the suggestion 
 of Joshua Niehus would let me filter out the strings that 
 didn't fit before entering the loop.  There's probably no real 
 advantage to doing it that way, but it does seem more elegant.

I agree, it is more elegant.

Nov 27 2012

Charles Hixson <charleshixsn earthlink.net> writes:

On 11/27/2012 06:45 PM, jerro wrote:
 You could replace the inner loop with somehting like:

 bool excl = exclude.any!(part => name.canFind(part));


 std.algorithm seems to generally be running the match in the opposite
 direction, if I'm understanding it properly. (Dealing with D template
 is always confusing to me.) OTOH, I couldn't find the string any
 method, so I'm not really sure what you're proposing, though it does
 look attractive.

 I don't understand what you mean with running the match in the opposite
 direction, but I'll explain how my line of code works. First of all, it
 is equivalent to:

 any!(part => canFind(name, part))(exclude);

 The feature that that lets you write that in the way I did in my
 previous post is called uniform function call syntax (often abbreviated
 to UFCS) and is described at
 http://www.drdobbs.com/cpp/uniform-function-call-syntax/232700394.

 canFind(name, part) returns true if name contains part.

 (part => canFind(name, part)) is a short syntax for (part){ return
 canFind(name, part); }

 any!(condition)(range) returns true if condition is true for any element
 of range

 So the line of code in my previous post sets excl to true if name
 contains any of the strings in exclude. If you know all the strings you
 want to exclude in advance, it is easier to do that with a regex like
 Joshua did.

 If you want to learn about D templates, try this tutorial:

 https://github.com/PhilippeSigaud/D-templates-tutorial/blob/master/dtemplates.pdf?raw=true


 Still, though your basic approach sounds good, the suggestion of
 Joshua Niehus would let me filter out the strings that didn't fit
 before entering the loop. There's probably no real advantage to doing
 it that way, but it does seem more elegant.

 I agree, it is more elegant.

Thanks for the tutorial link, I'll give it a try. (Whee!  A 182 page 
tutorial!)  Those things, though, don't seem to stick in my mind.  I 
learned programming in FORTRAN IV, and I don't seem to be able to force 
either templates, Scheme, or Haskell into my way of thinking about 
programming.  (Interestingly, classes and structured programming fit 
without problems.)

The link to the Walter article in Dr. Dobbs is interesting.  I intend to 
read it first.

OTOH, I still don't know where "any" is documented.  It's clearly some 
sort of template instantiation, but it doesn't seem to be defined in 
either std.string or std.object (or anywhere else I've thought to 
check).  And it look as if it would be something very useful to know.

Nov 28 2012

Philippe Sigaud <philippe.sigaud gmail.com> writes:

 Thanks for the tutorial link, I'll give it a try. (Whee!  A 182 page
 tutorial!)


Well, it *started* as a tutorial. Then people sent me code :)

Nov 28 2012

"jerro" <a a.com> writes:

 OTOH, I still don't know where "any" is documented.  It's 
 clearly some sort of template instantiation, but it doesn't 
 seem to be defined in either std.string or std.object (or 
 anywhere else I've thought to check).  And it look as if it 
 would be something very useful to know.

It's documented here:

http://dlang.org/phobos/std_algorithm.html#any

Nov 28 2012

D Programming

C/C++ Programming

Other

digitalmars.D.learn - path matching problem