digitalmars.D - SpanMode uses incorrect terminology (breadth)
- Ben Davis (18/18) Sep 16 2012 According to http://dlang.org/phobos/std_file.html :
- Andrej Mitrovic (4/12) Sep 16 2012 I don't understand why "depth" is used in the first place. If you have
- Andrei Alexandrescu (4/22) Sep 16 2012 What would be an example illustrating that "breadth" is doing the wrong
- Jesse Phillips (39/42) Sep 16 2012 Linux 64bit:
- Dmitry Olshansky (9/34) Sep 17 2012 Shouldn't be hard to add "true" breadth first then.
- Ben Davis (3/14) Sep 17 2012 Yeah, I think that would work for true breadth-first. :)
- Andrei Alexandrescu (5/48) Sep 17 2012 Thanks, that does seem to be a bug. Please make sure it's in bugzilla.
- Jesse Phillips (17/21) Sep 17 2012 I should have noted that I'm using ReiserFS, as it use to fail
- Ben Davis (4/25) Sep 17 2012 These results also show a correct preorder depth-first search.
- Jesse Phillips (5/19) Sep 17 2012 You can probably just update. But I fail to see what is
- Ben Davis (67/98) Sep 17 2012 I have a feeling most people have missed the point here. Thanks for the
- Jesse Phillips (4/9) Sep 17 2012 Ah sorry, yes. I didn't pay too close attention as I've never
- David Piepgrass (7/23) Sep 18 2012 Actually I prefer breadth-first search when searching the file
- Mehrdad (3/28) Sep 18 2012 We really need iterative deepening.
- Andrei Alexandrescu (3/26) Sep 18 2012 Yes, that was my intent too.
- Mehrdad (8/19) Oct 08 2012 I just wanted to point out, BFS is generally a pretty BAD idea
- Ben Davis (10/33) Sep 18 2012 Fair point. :)
According to http://dlang.org/phobos/std_file.html : depth Spans the directory depth-first, i.e. the content of any subdirectory is spanned before that subdirectory itself. Useful e.g. when recursively deleting files. breadth Spans the directory breadth-first, i.e. the content of any subdirectory is spanned right after that subdirectory itself. That's not what breadth-first means at all. See http://en.wikipedia.org/wiki/Tree_traversal which has the correct definition: breadth-first means ALL directories at one level are returned before ANY child of ANY directory is considered. Wikipedia uses the terms preorder and postorder for what's intended here, but those are a bit obtuse. Something like parentFirst and childrenFirst would get the message across clearly. I don't much mind what the names change to, but the current completely wrong situation really irks me! Deprecate it at once! :P Ben :)
Sep 16 2012
On 9/16/12, Ben Davis <entheh cantab.net> wrote:According to http://dlang.org/phobos/std_file.html : depth Spans the directory depth-first, i.e. the content of any subdirectory is spanned before that subdirectory itself. Useful e.g. when recursively deleting files. breadth Spans the directory breadth-first, i.e. the content of any subdirectory is spanned right after that subdirectory itself.I don't understand why "depth" is used in the first place. If you have "shallow" then the opposite of that is "deep", not "depth". I always make that typo when using spanmode.
Sep 16 2012
On 9/16/12 2:57 PM, Ben Davis wrote:According to http://dlang.org/phobos/std_file.html : depth Spans the directory depth-first, i.e. the content of any subdirectory is spanned before that subdirectory itself. Useful e.g. when recursively deleting files. breadth Spans the directory breadth-first, i.e. the content of any subdirectory is spanned right after that subdirectory itself. That's not what breadth-first means at all. See http://en.wikipedia.org/wiki/Tree_traversal which has the correct definition: breadth-first means ALL directories at one level are returned before ANY child of ANY directory is considered. Wikipedia uses the terms preorder and postorder for what's intended here, but those are a bit obtuse. Something like parentFirst and childrenFirst would get the message across clearly. I don't much mind what the names change to, but the current completely wrong situation really irks me! Deprecate it at once! :P Ben :)What would be an example illustrating that "breadth" is doing the wrong thing? Andrei
Sep 16 2012
What would be an example illustrating that "breadth" is doing the wrong thing? AndreiLinux 64bit: import std.file; import std.stdio; void main() { mkdir("a"); mkdir("a/b"); mkdir("a/c"); mkdir("a/c/z"); std.file.write("a/1.txt", ""); std.file.write("a/2.txt", ""); std.file.write("a/b/1.txt", ""); std.file.write("a/b/2.txt", ""); std.file.write("a/c/1.txt", ""); std.file.write("a/c/z/1.txt", ""); foreach(string file; dirEntries("a/", SpanMode.breadth)) writeln(file); rmdirRecurse("a"); // Expected Approximation // a/2.txt // a/1.txt // a/b // a/c // a/c/1.txt // a/c/z // a/c/z/1.txt // a/b/1.txt // a/b/2.txt // // Actual // a/c // a/c/z // a/c/z/1.txt // a/c/1.txt // a/b // a/b/2.txt // a/b/1.txt // a/2.txt // a/1.txt }
Sep 16 2012
On 17-Sep-12 09:30, Jesse Phillips wrote:Shouldn't be hard to add "true" breadth first then. Since it's a stack based visitation one just needs to instead use queue (FIFO) and change code so that it puts all directories on given level into the queue and only then picks next one from queue.What would be an example illustrating that "breadth" is doing the wrong thing? Andrei// Expected Approximation // a/2.txt // a/1.txt // a/b // a/c // a/c/1.txt // a/c/z // a/c/z/1.txt // a/b/1.txt // a/b/2.txt // // Actual // a/c // a/c/z // a/c/z/1.txt // a/c/1.txt // a/b // a/b/2.txt // a/b/1.txt // a/2.txt // a/1.txtP.S. Sorry for pinging you by mail. Damn Thunderbird UI caught me by surprise again :( -- Dmitry Olshansky
Sep 17 2012
On 17/09/2012 19:15, Dmitry Olshansky wrote:On 17-Sep-12 09:30, Jesse Phillips wrote: > >> What would be an example illustrating that "breadth" is doing the >> wrong thing? >> >> Andrei > Shouldn't be hard to add "true" breadth first then. Since it's a stack based visitation one just needs to instead use queue (FIFO) and change code so that it puts all directories on given level into the queue and only then picks next one from queue.Yeah, I think that would work for true breadth-first. :) Doubt anyone actually needs that though...?
Sep 17 2012
On 9/17/12 1:30 AM, Jesse Phillips wrote:Thanks, that does seem to be a bug. Please make sure it's in bugzilla. Probably the best way to go is to adjust the behavior so it matches the specification. AndreiWhat would be an example illustrating that "breadth" is doing the wrong thing? AndreiLinux 64bit: import std.file; import std.stdio; void main() { mkdir("a"); mkdir("a/b"); mkdir("a/c"); mkdir("a/c/z"); std.file.write("a/1.txt", ""); std.file.write("a/2.txt", ""); std.file.write("a/b/1.txt", ""); std.file.write("a/b/2.txt", ""); std.file.write("a/c/1.txt", ""); std.file.write("a/c/z/1.txt", ""); foreach(string file; dirEntries("a/", SpanMode.breadth)) writeln(file); rmdirRecurse("a"); // Expected Approximation // a/2.txt // a/1.txt // a/b // a/c // a/c/1.txt // a/c/z // a/c/z/1.txt // a/b/1.txt // a/b/2.txt // // Actual // a/c // a/c/z // a/c/z/1.txt // a/c/1.txt // a/b // a/b/2.txt // a/b/1.txt // a/2.txt // a/1.txt }
Sep 17 2012
On Monday, 17 September 2012 at 18:20:55 UTC, Andrei Alexandrescu wrote:Thanks, that does seem to be a bug. Please make sure it's in bugzilla. Probably the best way to go is to adjust the behavior so it matches the specification. AndreiI should have noted that I'm using ReiserFS, as it use to fail completely to span subdirs in the past. Anyway. Testing on windows shows a smaller issue. http://d.puremagic.com/issues/show_bug.cgi?id=8680 Files show before spanning but directories are shown before spanning itself. a/1.txt a/2.txt a/b a/b\1.txt a/b\2.txt a/c a/c\1.txt a/c\z a/c\z\1.txt
Sep 17 2012
On 17/09/2012 19:47, Jesse Phillips wrote:On Monday, 17 September 2012 at 18:20:55 UTC, Andrei Alexandrescu wrote:These results also show a correct preorder depth-first search. The bug report as it stands is misleading; should I post a follow-up on the bug tracker, or should we close it and start again?Thanks, that does seem to be a bug. Please make sure it's in bugzilla. Probably the best way to go is to adjust the behavior so it matches the specification. AndreiI should have noted that I'm using ReiserFS, as it use to fail completely to span subdirs in the past. Anyway. Testing on windows shows a smaller issue. http://d.puremagic.com/issues/show_bug.cgi?id=8680 Files show before spanning but directories are shown before spanning itself. a/1.txt a/2.txt a/b a/b\1.txt a/b\2.txt a/c a/c\1.txt a/c\z a/c\z\1.txt
Sep 17 2012
On Monday, 17 September 2012 at 23:55:17 UTC, Ben Davis wrote:On 17/09/2012 19:47, Jesse Phillips wrote:You can probably just update. But I fail to see what is missleading? Did I get the expected results wrong? Pre-order depth first does not satisfy breadth first so what am I missin? Sorry Im on a tablet making organization hard.a/1.txt a/2.txt a/b a/b\1.txt a/b\2.txt a/c a/c\1.txt a/c\z a/c\z\1.txtThese results also show a correct preorder depth-first search. The bug report as it stands is misleading; should I post a follow-up on the bug tracker, or should we close it and start again?
Sep 17 2012
I have a feeling most people have missed the point here. Thanks for the example - it's a good one to work with. The 'expected approximation' was a bit of a mix of traversal strategies, so I've snipped it out below and put my own examples. Hope this helps clarify what I was getting at: On 17/09/2012 06:30, Jesse Phillips wrote:[snip] Here are the CORRECT tree traversal patterns: Breadth-first (probably never required): a/b a/c a/1.txt a/2.txt a/b/1.txt a/b/2.txt a/c/z a/c/1.txt a/c/z/1.txt Defining property: number of /'s increases monotonically. Note how the deeper you go, the more spread out the children become. It's ALL children, then ALL grandchildren, then ALL great-grandchildren, etc. I wouldn't bother implementing breadth-first. It's doubtful that anyone would want it, surely...? Depth-first search has two useful variants, preorder and postorder. Preorder depth-first (currently wrongly called SpanMode.breadth): a/b a/b/1.txt a/b/2.txt a/c a/c/z a/c/z/1.txt a/c/1.txt a/1.txt a/2.txt Defining property: every directory is immediately followed by all its children. Postorder depth-first (currently only called SpanMode.depth): a/b/1.txt a/b/2.txt a/b a/c/z/1.txt a/c/z a/c/1.txt a/c a/1.txt a/2.txt Defining property: every directory is immediately *preceded* by all its children. Going back to the 'actual' output you quoted:What would be an example illustrating that "breadth" is doing the wrong thing? AndreiLinux 64bit: import std.file; import std.stdio; void main() { mkdir("a"); mkdir("a/b"); mkdir("a/c"); mkdir("a/c/z"); std.file.write("a/1.txt", ""); std.file.write("a/2.txt", ""); std.file.write("a/b/1.txt", ""); std.file.write("a/b/2.txt", ""); std.file.write("a/c/1.txt", ""); std.file.write("a/c/z/1.txt", ""); foreach(string file; dirEntries("a/", SpanMode.breadth)) writeln(file); rmdirRecurse("a");// Actual // a/c // a/c/z // a/c/z/1.txt // a/c/1.txt // a/b // a/b/2.txt // a/b/1.txt // a/2.txt // a/1.txtThis looks like a preorder depth-first iteration (albeit with child order reversed at each level). The implementation is correct in that it matches the spec as documented. The problem is that the spec is using the term 'breadth' wrongly. Here is what I would do to fix it (sorry if I got the syntax wrong): enum SpanMode { shallow, preorder, postorder; deprecated alias preorder breadth; deprecated alias postorder depth; } I initially said that possibly preorder and postorder weren't clear, but they grew on me very quickly. I think people could learn that "preorder means parents first" or "preorder is the one I usually want". Also, these are the standard terms (refer to the Wikipedia link), so once you know them, you can use them for everything, both in and outside of :D. Thanks, Ben :)
Sep 17 2012
On Monday, 17 September 2012 at 23:27:22 UTC, Ben Davis wrote:Note how the deeper you go, the more spread out the children become. It's ALL children, then ALL grandchildren, then ALL great-grandchildren, etc. I wouldn't bother implementing breadth-first. It's doubtful that anyone would want it, surely...?Ah sorry, yes. I didn't pay too close attention as I've never wanted to traverse a tree in that manner. I'll update the title of the issue, add your comments.
Sep 17 2012
Breadth-first (probably never required): a/b a/c a/1.txt a/2.txt a/b/1.txt a/b/2.txt a/c/z a/c/1.txt a/c/z/1.txt Defining property: number of /'s increases monotonically. Note how the deeper you go, the more spread out the children become. It's ALL children, then ALL grandchildren, then ALL great-grandchildren, etc. I wouldn't bother implementing breadth-first. It's doubtful that anyone would want it, surely...?Actually I prefer breadth-first search when searching the file system. When I search an entire volume, inevitably the (depth-first) search gets stuck in a few giant, deep directories like the source code of Mono or some other cave of source code, you know, something 12 directories deep with 100,000 files in it. A breadth-first search would be more likely to find the thing I'm looking for BEFORE going spelunking in these 12-deep caves.
Sep 18 2012
On Tuesday, 18 September 2012 at 14:24:30 UTC, David Piepgrass wrote:We really need iterative deepening.Breadth-first (probably never required): a/b a/c a/1.txt a/2.txt a/b/1.txt a/b/2.txt a/c/z a/c/1.txt a/c/z/1.txt Defining property: number of /'s increases monotonically. Note how the deeper you go, the more spread out the children become. It's ALL children, then ALL grandchildren, then ALL great-grandchildren, etc. I wouldn't bother implementing breadth-first. It's doubtful that anyone would want it, surely...?Actually I prefer breadth-first search when searching the file system. When I search an entire volume, inevitably the (depth-first) search gets stuck in a few giant, deep directories like the source code of Mono or some other cave of source code, you know, something 12 directories deep with 100,000 files in it. A breadth-first search would be more likely to find the thing I'm looking for BEFORE going spelunking in these 12-deep caves.
Sep 18 2012
On 9/18/12 10:25 AM, David Piepgrass wrote:Yes, that was my intent too. AndreiBreadth-first (probably never required): a/b a/c a/1.txt a/2.txt a/b/1.txt a/b/2.txt a/c/z a/c/1.txt a/c/z/1.txt Defining property: number of /'s increases monotonically. Note how the deeper you go, the more spread out the children become. It's ALL children, then ALL grandchildren, then ALL great-grandchildren, etc. I wouldn't bother implementing breadth-first. It's doubtful that anyone would want it, surely...?Actually I prefer breadth-first search when searching the file system. When I search an entire volume, inevitably the (depth-first) search gets stuck in a few giant, deep directories like the source code of Mono or some other cave of source code, you know, something 12 directories deep with 100,000 files in it. A breadth-first search would be more likely to find the thing I'm looking for BEFORE going spelunking in these 12-deep caves.
Sep 18 2012
On Tuesday, 18 September 2012 at 15:19:08 UTC, Andrei Alexandrescu wrote:On 9/18/12 10:25 AM, David Piepgrass wrote:I just wanted to point out, BFS is generally a pretty BAD idea for searching a file system. You instead probably want to implement DFS, and then alter it slightly to use iterative deepening instead of doing plain breadth-first search. That way you won't be keeping open a million OS file handles.Actually I prefer breadth-first search when searching the file system. When I search an entire volume, inevitably the (depth-first) search gets stuck in a few giant, deep directories like the source code of Mono or some other cave of source code, you know, something 12 directories deep with 100,000 files in it. A breadth-first search would be more likely to find the thing I'm looking for BEFORE going spelunking in these 12-deep caves.Yes, that was my intent too. Andrei
Oct 08 2012
On 18/09/2012 15:25, David Piepgrass wrote:Fair point. :) So there is a case for implementing breadth-first iteration (or as Mehrdad suggests, iterative deepening - it's just a CPU-memory tradeoff). However, the currently implemented 'breadth' (preorder depth-first) is probably used a lot, and someone might be relying on its current behaviour (e.g. to create a tree view). So how do we proceed without silently breaking existing code? Do we use 'breadthFirst' instead of 'breadth'? Do we just gulp and change it? Do we rename 'SpanMode' to 'TreeMode' or something?Breadth-first (probably never required): a/b a/c a/1.txt a/2.txt a/b/1.txt a/b/2.txt a/c/z a/c/1.txt a/c/z/1.txt Defining property: number of /'s increases monotonically. Note how the deeper you go, the more spread out the children become. It's ALL children, then ALL grandchildren, then ALL great-grandchildren, etc. I wouldn't bother implementing breadth-first. It's doubtful that anyone would want it, surely...?Actually I prefer breadth-first search when searching the file system. When I search an entire volume, inevitably the (depth-first) search gets stuck in a few giant, deep directories like the source code of Mono or some other cave of source code, you know, something 12 directories deep with 100,000 files in it. A breadth-first search would be more likely to find the thing I'm looking for BEFORE going spelunking in these 12-deep caves.
Sep 18 2012
On 18/09/2012 21:08, Ben Davis wrote:On 18/09/2012 15:25, David Piepgrass wrote:So maybe we'd have: enum SpanMode { shallow, depthFirstPreorder, depthFirstPostorder, breadthFirst; deprecated alias depthFirstPostorder depth; deprecated alias depthFirstPreorder breadth; } Also don't forget the intent conveyed by the current docs: "depth Spans the directory depth-first, i.e. the content of any subdirectory is spanned before that subdirectory itself. Useful e.g. when recursively deleting files. breadth Spans the directory breadth-first, i.e. the content of any subdirectory is spanned right after that subdirectory itself." This would need to be rewritten to say something like: "depthFirstPostorder Spans the directory depth-first postorder, i.e. the content of any subdirectory is spanned before that subdirectory itself. Useful e.g. when recursively deleting files. depthFirstPreorder Spans the directory depth-first preorder, i.e. the content of any subdirectory is spanned right after that subdirectory itself. breadthFirst Spans the directory breadth-first, i.e. all children are returned first, followed by all grandchildren, followed by all great-grandchildren, and so on. Useful e.g. when searching for a file that is likely not to be very deep." I personally won't object if everyone feels comfortable with reusing 'breadth' and changing existing behaviour. I just have a feeling it's generally considered a bad thing. Otherwise I'm all for shortening the names (perhaps by just omitting 'First' from the depth ones). :)Fair point. :) So there is a case for implementing breadth-first iteration (or as Mehrdad suggests, iterative deepening - it's just a CPU-memory tradeoff). However, the currently implemented 'breadth' (preorder depth-first) is probably used a lot, and someone might be relying on its current behaviour (e.g. to create a tree view). So how do we proceed without silently breaking existing code? Do we use 'breadthFirst' instead of 'breadth'? Do we just gulp and change it? Do we rename 'SpanMode' to 'TreeMode' or something?Breadth-first (probably never required): a/b a/c a/1.txt a/2.txt a/b/1.txt a/b/2.txt a/c/z a/c/1.txt a/c/z/1.txt Defining property: number of /'s increases monotonically. Note how the deeper you go, the more spread out the children become. It's ALL children, then ALL grandchildren, then ALL great-grandchildren, etc. I wouldn't bother implementing breadth-first. It's doubtful that anyone would want it, surely...?Actually I prefer breadth-first search when searching the file system. When I search an entire volume, inevitably the (depth-first) search gets stuck in a few giant, deep directories like the source code of Mono or some other cave of source code, you know, something 12 directories deep with 100,000 files in it. A breadth-first search would be more likely to find the thing I'm looking for BEFORE going spelunking in these 12-deep caves.
Sep 18 2012
Another thing worth mentioning occurred to me today. Whoever implements breadth-first - don't get caught out by symlinks. For the depth-first variants, you can just detect them by looking in the stack. For the breadth-first, you won't be able to detect them by looking in the queue that replaced the stack. :)
Sep 18 2012