www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.bugs - [Issue 8680] New: SpanMode.breadth incorrect

reply d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=8680

           Summary: SpanMode.breadth incorrect
           Product: D
           Version: D2
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Phobos
        AssignedTo: nobody puremagic.com
        ReportedBy: Jesse.K.Phillips+D gmail.com



11:44:24 PDT ---
Using Linux 64bit with a reiserFS the following code does not span in a breadth
manner. Testing on Windows 32bit it works as expected with on exception, all
directories are not listed before spanning.

import std.file;
import std.stdio;

void main() {
    mkdir("a");
    mkdir("a/b");
    mkdir("a/c");
    mkdir("a/c/z");
    std.file.write("a/1.txt", "");
    std.file.write("a/2.txt", "");
    std.file.write("a/b/1.txt", "");
    std.file.write("a/b/2.txt", "");
    std.file.write("a/c/1.txt", "");
    std.file.write("a/c/z/1.txt", "");
    foreach(string file; dirEntries("a/", SpanMode.breadth))
        writeln(file);

    rmdirRecurse("a");
}

   // Expected Approximation
    // a/2.txt
    // a/1.txt
    // a/b
    // a/c
    // a/c/1.txt
    // a/c/z
    // a/c/z/1.txt
    // a/b/1.txt
    // a/b/2.txt
    //
    // Actual
    // a/c
    // a/c/z
    // a/c/z/1.txt
    // a/c/1.txt
    // a/b
    // a/b/2.txt
    // a/b/1.txt
    // a/2.txt
    // a/1.txt

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Sep 17 2012
next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=8680




18:43:39 PDT ---
A breadth first traversal is not a good algorithm for filesystem traversal. The
described implementation is useful. As shown Linux ReiserFS does not traverse
as described in the documentation.

SpanMode.breadth should be deprecated for not be a breadth traversal.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Sep 17 2012
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=8680


entheh cantab.net changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |entheh cantab.net



I originally identified the bug. First I'd like to comment more on the
terminology, and give my proposed fix.

The description in the documentation matches the implementation, and reflects
what a user would find most useful. There is no implementation bug that I'm
aware of (see below for more discussion of this).

The only issue is naming. The SpanMode enum constants 'depth' and 'breadth'
actually describe 'postorder depth-first' and 'preorder depth-first'
respectively. That means the word 'depth' carries insufficient information to
identify it, and 'breadth' is just wrong. If you're unfamiliar with tree
traversal algorithms and can't immediately see this, then please review the
examples and terminology on http://en.wikipedia.org/wiki/Tree_traversal .
(Pretty much everyone missed the point on the newsgroup - please do this if
you're at all uncertain!)

So here's my proposed fix:

enum SpanMode {
  shallow,
  preorder,
  postorder;

   deprecated alias preorder breadth;
   deprecated alias postorder depth;
}

As for the behaviour on Linux - I don't think this is actually violating the
spec. Probably what you're complaining about is that the directories 'c' and
'b' are coming before the files '2.txt' and '1.txt' which are siblings.
However, the spec doesn't say anything about a directory's position relative to
its siblings. It only talks about a directory's position relative to its
children. See the difference:

For preorder depth-first:
  a/b
must come before
  a/b/1.txt

But
  a/b
has no ordering constraint with
  a/1.txt

If 'directories first' or 'directories last' is required, then that would be
more of a feature request than a bug :) It could probably be generalised to
child sorting - the user could supply a comparator which (if present) is used
to sort the children in each directory.

Hope that helps :)

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Sep 18 2012
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=8680




08:04:40 PDT ---

 As for the behaviour on Linux - I don't think this is actually violating the
 spec. Probably what you're complaining about is that the directories 'c' and
 'b' are coming before the files '2.txt' and '1.txt' which are siblings.
 However, the spec doesn't say anything about a directory's position relative to
 its siblings. It only talks about a directory's position relative to its
 children. See the difference:
Looks like you are still correct. I'm still looking at the desire for useful traversal of a filesystem.
 If 'directories first' or 'directories last' is required, then that would be
 more of a feature request than a bug :) It could probably be generalised to
 child sorting - the user could supply a comparator which (if present) is used
 to sort the children in each directory.
I do not see sorting as a solution. I really do not care if the dir is before the file. What is useful is recieving all children before traversing children. I usually end up ignoring directories anyway. So I'm thinking your origional suggestion for childrenFirst and parentFirst is actuall what is useful and does not need too abide by actual traversal rules, only usefulness. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Sep 18 2012
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=8680





 Looks like you are still correct. I'm still looking at the desire for useful
 traversal of a filesystem.
OK, let's explore that:
 I do not see sorting as a solution. I really do not care if the dir is before
 the file. What is useful is recieving all children before traversing children.
Are you saying you want a real breadth-first search to be implemented? Fine, we can do that, if it's useful. Note though that it'll intersperse children, e.g. a a/1 a/2 a/1/A a/1/B a/2/A a/2/B a/1/A/... a/1/B/... a/2/A/... a/2/B/... a/1/A/.../... etc. - so directories 1 and 2 become increasingly interleaved.
 I usually end up ignoring directories anyway. So I'm thinking your origional
 suggestion for childrenFirst and parentFirst is actuall what is useful and does
 not need too abide by actual traversal rules, only usefulness.
I intended those as alternative names instead of 'postorder' and 'preorder' (respectively) - I didn't intend to suggest any other behaviour. What behaviour do you want? Do you want all of the first-level files and directories (as if starting a breadth-first), followed by all subsequent levels in depth-first order? See I would argue that that's very specific, not intuitive in the general case, and best implemented using two separate calls to the function with different arguments: auto firstLevel = dirEntries("a", SpanMode.shallow); foreach (file; firstLevel) { if (file.isDir()) { ... = dirEntries(file, SpanMode.preorder); } } Or do you want something else? I think this needs clarifying, since there's obviously a misunderstanding in our interpretation of "parentFirst", and "usefulness" is a goal, not a spec. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Sep 18 2012
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=8680




20:59:48 PDT ---

 Are you saying you want a real breadth-first search to be implemented? Fine, we
 can do that, if it's useful. Note though that it'll intersperse children, e.g.
No, but someone else did, which is fine. Probably not really needed as a standard traversal type though.
 I intended those as alternative names instead of 'postorder' and 'preorder'
 (respectively) - I didn't intend to suggest any other behaviour.
I know, but they are descriptive of the two most useful traversal types I can think of.
 What behaviour do you want?
I have only had two types of traversals. I don't care: In this type of traversal I just need all the files and couldn't care less when they come to me. In all cases I've never wished to have the directory name (but don't wish for a traversal that leaves it out). Parent First: Give me everything in the Parent First, then you can traverse the children. Almost as you described, except it would be starting a breadth first from each child. The other traversal types are likely useful in other situations. I was close to instead of using Parent First choosing Child First: Give me what is in the Child First before that in the parent. This turns the Parent First on its head.
 I would argue that that's very specific, not
 intuitive in the general case, and best implemented using two separate calls to
 the function with different arguments:
 
 auto firstLevel = dirEntries("a", SpanMode.shallow);
 foreach (file; firstLevel) {
   if (file.isDir()) {
     ... = dirEntries(file, SpanMode.preorder);
   }
 }
This isn't as you described nor what I want. You'll notice all you did was create a preorder search where you'll never see the top level directories. I'm looking more at: auto firstLevel = dirEntries("a", SpanMode.shallow); foreach (file; firstLevel) { if (file.isDir()) { secondLevel ~= dirEntries(file, SpanMode.shallow); } } foreach(sl; secondLevel) foreach (file; sl) { if (file.isDir()) { thirdLevel ~= dirEntries(file, SpanMode.shallow); } } foreach(tl; thirdLevel) foreach (file; tl) { if (file.isDir()) { thirdLevel ~= dirEntries(file, SpanMode.shallow); } } .... On until you get tired of trying to predict how many levels this directory actually has. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Sep 18 2012
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=8680




07:03:06 PDT ---
Sorry, you can ignore my code, that's what I get for copy pasting a guess at
what it would be. That would be breadth first, which is close, instead they'd
need to be some loop that when finished with the files do the same thing for
the first subdirectory.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Sep 19 2012
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=8680





 I know, but they are descriptive of the two most useful traversal types I can
 think of.
I think they are unambiguous, but I have a feeling I know what traversal you want.
 What behaviour do you want?
I have only had two types of traversals. I don't care: In this type of traversal I just need all the files and couldn't care less when they come to me. In all cases I've never wished to have the directory name (but don't wish for a traversal that leaves it out).
I suspect we don't need to offer the 'I don't care' option explicitly. People may as well be specific. Could add a documentation note that the depth-first options will generally be slightly cheaper than breadth-first.
 Parent First: Give me everything in the Parent First, then you can traverse the
 children. Almost as you described, except it would be starting a breadth first
 from each child.
There are two ways you could get this effect. You could do this: recurse(dir) { foreach (file; dirEntries(dir, shallow)) { result~=file; } foreach (file; dirEntries(dir, shallow)) { if (file.isDir()) recurse(file); } } Or you could do this, which is equivalent if you (the user) are ignoring directories themselves and only care about files: recurse(dir) { auto list=dirEntries(dir, shallow); sortDirectoriesLast(list); foreach (file; list) { result~=file; if (file.isDir()) recurse(file); } } This is the 'standard preorder depth-first with sorting' that I described before. Or you could offer both these implementations (and the corresponding reversed ones). For the sorting case, I was merely suggesting that the sort could be put in the user's control, so they can sort alphabetically or by timestamp or however they want.
 The other traversal types are likely useful in other situations. I was close to
 instead of using Parent First choosing
 
 Child First: Give me what is in the Child First before that in the parent. This
 turns the Parent First on its head.
So the above options can be reversed in all possible permutations to give what you want here.
 I would argue that that's very specific, not
 intuitive in the general case, and best implemented using two separate calls to
 the function with different arguments:
 
 auto firstLevel = dirEntries("a", SpanMode.shallow);
 foreach (file; firstLevel) {
   if (file.isDir()) {
     ... = dirEntries(file, SpanMode.preorder);
   }
 }
This isn't as you described nor what I want. You'll notice all you did was create a preorder search where you'll never see the top level directories.
I see, you're reading it as if "..." is the final result, but I guess I meant to imply that this was user-code and would make use of "firstLevel" directly as well as later recursing into it. I can't actually remember :) But yes, I thought you were describing shallow first level followed by preorder for each child - and if not, then by describing it, I helped you understand what I misunderstood, so you could help me understand what you actually wanted. Hopefully. :)
 Sorry, you can ignore my code, that's what I get for copy pasting a guess at
 what it would be.
We're even :)
 That would be breadth first, which is close, instead they'd
 need to be some loop that when finished with the files do the same thing for
 the first subdirectory.
Presumably for each subdirectory in turn? So to conclude, I'd recommend making do with my second example above if it fits your needs - with obviously the sorting more customisable. But if you think there's a strong case for the first example, then do it. Remember though that anyone can get any of these results by writing their own recursive function and repeatedly calling 'shallow', so I wouldn't go too overboard with different options. (Consider whether you even want the sorting callback, since again, the effect can be achieved by just writing it out.) -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Sep 19 2012
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=8680





 I think they are unambiguous, but I have a feeling I know what traversal you
 want.
I apologise, I meant to say I think they are ambiguous. 'parentFirst' could mean you want the *direct contents* of each parent directory before any subdirectories are considered (your interpretation), or it could mean you want the parent directory *itself* before any of its children (files or subdirectories) are considered (my original interpretation). If you want a name for 'contents of each directory before contents of subdirectories', then maybe... hmm... directChildrenFirst and directChildrenLast? They're not great names - other suggestions welcome. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Sep 19 2012
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=8680




10:00:49 PDT ---

 I suspect we don't need to offer the 'I don't care' option explicitly. People
 may as well be specific. Could add a documentation note that the depth-first
 options will generally be slightly cheaper than breadth-first.
I was not trying to claim traversals that should be available, only emphasizing the ones I've used.
 Parent First: Give me everything in the Parent First, then you can traverse the
 children. Almost as you described, except it would be starting a breadth first
 from each child.
There are two ways you could get this effect. You could do this:
listdir was deprecated for a very good reason. Would you now kindly consider the steps to fold this into a Range. We can implement any traversal with shallow, why bother adding depth and breadth or anything else? I'm very glade you have brought up the naming and implementation issue as I am relying on a behavior which isn't defined, and now I can plan to correct this. But it would be nice if the standard library included the use case I would be using.
 For the sorting case, I was merely suggesting that the sort could be put
 in the user's control, so they can sort alphabetically or by timestamp or
 however they want.
With those examples I see this as a very useful addition. Just not supporting the traversal I use.
 So the above options can be reversed in all possible permutations to give what
 you want here.
And that is even easier if the traversal is already a range.
 So to conclude, I'd recommend making do with my second example above if it fits
 your needs - with obviously the sorting more customisable. But if you think
 there's a strong case for the first example, then do it. Remember though that
 anyone can get any of these results by writing their own recursive function and
 repeatedly calling 'shallow', so I wouldn't go too overboard with different
 options. (Consider whether you even want the sorting callback, since again, the
 effect can be achieved by just writing it out.)
This argument applies to so many languages that I hope you reconsider your position "it can be done with primitives already." Why did you use foreach? Do we really need the function we have goto. In the beginning I was confused with your issues, to clear up I just wanted to explain the traversal types I do use so whoever gets around to fixing this may consider including them. But the main issue of Breadth not being breadth needs resolved. I can fend for myself, just don't want to. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Sep 20 2012
prev sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=8680





 listdir was deprecated for a very good reason. Would you now kindly consider
 the steps to fold this into a Range.
Oh wow, I didn't realise dirEntries supported bidirectional iteration. That's interesting. A reverse breadth-first iteration would be a special thing to try and implement, and I can't see it usefully doing anything other than precalculating the lot. Reversing the order of files in a shallow iteration is of questionable use if the sorting of entries isn't defined anyway (I suppose only if you rely on the order being consistent and call it twice). Reversing preorder and postorder depth-first will give you postorder and preorder respectively (along with a reversed child ordering which is meaningless as above). I wonder if anyone has actually used the reversibility of dirEntries? If we had the sorting option, here's how I'd see it working (for forward iteration): - Always lazily iterate the recursion into subdirectories. - When sorting is not requested, also lazily iterate each directory's direct contents. - When sorting IS requested, each directory's direct contents will need requesting in a block - OR alternatively, the sort is specified as a set of parameters which are passed directly to the underlying OS (if the underlying OS has the corresponding options of course), and then it can still be done lazily. Actually I have no idea if the current implementations are able to be lazy within a single directory. The 'Finish reporting each directory's contents before processing its children' option (weaker than breadth-first) will not impede laziness, but will require the directory's subdirectories to be buffered up and remembered (if they aren't already precalculated). As for the reverse iteration option - I can see that it was easy to implement for the existing depth-first options, but it would be annoying to implement for breadth-first. While it's not impossible to implement, I would view it as acceptable for reverse breadth-first to be defined as a runtime error. I don't think I can convert this to code without knowing how the underlying API looks. I have a feeling it would have to be done separately for different OS's. But it should be trivial - my main goal was to define the behaviour precisely enough to avoid misunderstanding, which I think we've done with the latest pseudocode? You didn't say explicitly if you want both the options I suggested, but I guess you like the first one (where you buffer up a directory's subdirectories and then process them once you've finished that directory's direct contents)? What I can do is propose an API that I think covers all the use cases: enum SpanMode { shallow, ///f1, d1, d2, f2 depthFirstPreorder, ///f1, d1, d1/sd, d1/sd/sf, d2, d2/sd, d2/sd/sf, f2 depthFirstPostorder, ///f1, d1/sd/sf, d1/sd, d1, d2/sd/sf, d2/sd, d2, f2 contiguousLevelsPreorder, ///f1, d1, d2, f2, d1/sd, d1/sd/sf, d2/sd, d2/sd/sf contiguousLevelsPostorder,///d1/sd/sf, d1/sd, d2/sd/sf, d2/sd, f1, d1, d2, f2 breadthFirst, ///f1, d1, d2, f2, d1/sd, d2/sd, d1/sd/sf, d2/sd/sf //This syntax apparently isn't supported yet... deprecated breadth = depthFirstPreorder, deprecated depth = depthFirstPostorder, } ///If supplied, levelSorter is applied to the whole result in case of 'shallow', ///otherwise each 'shallow' result computed internally along the way. ///Note that supplying a levelSorter may impede the algorithm's laziness. auto dirEntries(string path, SpanMode mode, bool followSymlink = true, void delegate(T[]) levelSorter = null); //T being string or DirEntry, ideally put in caller's control if possible
 We can implement any traversal with shallow, why bother adding depth and
 breadth or anything else?
 
 I'm very glade you have brought up the naming and implementation issue as I am
 relying on a behavior which isn't defined, and now I can plan to correct this.
 But it would be nice if the standard library included the use case I would be
 using.
Agreed, provided it can be done in a way that's simple to understand. I shied away from it earlier because I couldn't think of decent names and I thought it would do more harm than good if the names were bad. But I think contiguousLevels is clear.
 For the sorting case, I was merely suggesting that the sort could be put
 in the user's control, so they can sort alphabetically or by timestamp or
 however they want.
With those examples I see this as a very useful addition. Just not supporting the traversal I use.
Incorporated above :)
 So the above options can be reversed in all possible permutations to give what
 you want here.
And that is even easier if the traversal is already a range.
Well you say that, but... foreach_reverse simultaneously reverses the file order in addition to the preorder/postorder distinction, and you might only want to reverse one of those properties. I'm not sure people will realistically use foreach_reverse very much (except maybe with SpanMode.shallow and a sorter, or on an OS that provides useful sorting by default).
 This argument applies to so many languages that I hope you reconsider your
 position "it can be done with primitives already." Why did you use foreach? Do
 we really need the function we have goto.
No, my point was that features are only valuable if they're easy to understand, and I was merely struggling to think of an API that made them easy to understand. :) (Poor devs who end up fixing this bug - it now contains a very long discussion and a multitude of separate requests, from 'names are wrong' to several new feature requests! Sorry ><) -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Sep 22 2012