www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.bugs - [Issue 7002] New: std.path needs a isValidFilePath function

reply d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=7002

           Summary: std.path needs a isValidFilePath function
           Product: D
           Version: D2
          Platform: Other
        OS/Version: Windows
            Status: NEW
          Severity: enhancement
          Priority: P2
         Component: Phobos
        AssignedTo: nobody puremagic.com
        ReportedBy: andrej.mitrovich gmail.com



13:10:38 PST ---
I currently see no way of verifying that a given path is a path to a *file* and
not a folder, IOW I want this assert to pass:

assert(isValidFilePath(r"C:\folder\file"))

but the following one to fail, because 'file' in this case is syntactically a
folder:

assert(isValidFilePath(r"C:\folder\file\"))

Using basename and isValidFilename isn't a workaround as basename strips the
slash:

auto file = r"C:\folder\file\";
assert(file.baseName.isValidFilename);  // passes, baseName returns "file"

Either we implement a isValidFilePath function, or a "fileName" function which
could be used like so:

auto file = r"C:\folder\file\";
assert(file.fileName.isValidFilename);  // this would fail, as expected

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Nov 24 2011
next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=7002


Andrej Mitrovic <andrej.mitrovich gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |INVALID



00:57:48 PST ---
Err, no actually what happened was I was reading:
isValidFilename

and the docs said it verifies *directories* as well, but it doesn't. It
verifies filenames only, and this is what I need.

The docs confused me, sorry.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Nov 26 2011
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=7002


Andrej Mitrovic <andrej.mitrovich gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|RESOLVED                    |REOPENED
         Resolution|INVALID                     |



01:11:38 PST ---
Double confusion for the win.

isValidFilename checks filenames, not file paths. I was right the first time.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Nov 26 2011
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=7002


Jonathan M Davis <jmdavisProg gmx.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jmdavisProg gmx.com
           Platform|Other                       |All
         OS/Version|Windows                     |All



PST ---
The _only_ way to know for sure if a path points to a file is to use std.file -
typically either std.file.isFile on a string or isFile on a std.file.DirEntry.

The fact that a path does not end in a directory separator does not say
anything about whether it's a directory or a file. As such, depending on
whether it does or not seems like a bad idea. A path which _does_ end in a
directory separator _is_ directory (assuming that the path is valid), but the
lack of one means nothing.

isValidFilename essentially assumes that it's getting the base name of a file
or directory and verifies that it's valid. isValidPath essentilally splits on
the directory separator and calls isValidFilename on each of those components
(I believe that it's somewhat more complicated than that, but that's
essentially what it's doing. So, isValidFilename is going to return false if
the name that it's given contains a directory separator and isValidPath doesn't
care about the directory separator, since it's effectively splitting on it. In
_neither_ case are they concerned about whether ending in a directory separator
or not indicates a file or a directory.

So, if you want a function which returns whether a particular path is
definitively for a directory or not (i.e. ends in a directory separator), then
a new one needs to be created as opposed to changing what any of the current
functions are doing. But it seems to me that if you want to do _that_, all you
need to do is

assert(path.endsWith(dirSeparator));

What are you looking for beyond that?

As I said, the _lack_ of a directory separator doesn't mean that the path is
for a file, so you can only _really_ know if you use std.file.isFile. So, given
that and the fact that endsWith will tell you whether a path ends with a
directory separator, I'm not sure what more you're really looking for here.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Nov 27 2011
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=7002




20:30:54 PST ---
I don't need to know if it's a file, that's std.file's business. I'm talking
strictly about string processing.

I want to reject paths that are clearly directories and not files. In other
words, *accept* paths which are valid file paths. I know well that a file can
be extensionless.

assert(path.endsWith(dirSeparator)) is what I want (but a call to isValidPath
beforehand is also required). I think it's standard enough to be included as a
"isValidFilePath" function.

========

Consider a build system where the user can specify a path to an output file,
even if that path or file might not exist:

build --output=C:\dir\name   <- pass
build --output=C:\dir\name\  <- reject this

I can't use isValidFilename here, it will reject both of these cases because it
only works on "name" and "name.ext" variants, if it sees any separators it
rejects it.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Nov 27 2011
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=7002




20:39:12 PST ---
One more thing, your solution doesn't work because a user might pass a
different dir separator, IOW on Windows this would pass while it should be
rejected:
C:\dir\name/

It needs to be in the library so people don't reinvent it all the damn time.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Nov 27 2011
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=7002




20:42:34 PST ---
bool isValidFilePath(string filePath)
{
    immutable end = filePath[$-1];
    return (filePath.isValidPath && 
            end != '/' && end != '\\' && 
            filePath.baseName.isValidFilename);
}

But I can't tell if this is 100% right.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Nov 27 2011
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=7002




PST ---
My point was that the lack of a directory separator on the end says _nothing_
about whether a path refers to a directory or a file. I can do both
/home/jmdavis and /home/jmdavis/ and _both_ refer to a directory on my system.

And as for across OSes, I can legally put a backslash in a filename on Linux.
It would be very wrong to say that there was anything invalid about having one
at the end of a file name except on Windows. path.endsWith(dirSeparator) tells
you exactly whether the file name ends with a directory separator or not. If
you want to check that a path is valid and does not end with a directory
separator, then do

assert(isValidPath(path) && !path.endsWith(dirSeparator));

I don't see why a new function is needed for this, _especially_ when it's
perfectly legal to refer to directories _without_ a directory separator on the
end of the path. isValidPath _already_ makes sure that an incorrect directory
separator isn't used.

I don't see how you can really be using paths alone to try and determine
whether a path refers to a file or not when the path doesn't give you enough
information to do that. It would make no sense for std.path to try and
determine that IMHO precisely because the information isn't there. You need to
actually check the disk for that - which is what std.file.isFile is for.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Nov 27 2011
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=7002




23:09:18 PST ---
I think there's some confusion here. When I say "valid file path" I mean
*syntactically* valid, not whether they exist and whether something that looks
like a file path is actually a directory on a hard drive.


 My point was that the lack of a directory separator on the end says _nothing_
 about whether a path refers to a directory or a file. I can do both
 /home/jmdavis and /home/jmdavis/ and _both_ refer to a directory on my system.
I would reject `/home/jmdavis/` as a valid file path on Windows only. /home/jdavis would be accepted as a valid file path, even if it's *actually* a directory. I could easily do a std.file.isDir check *after* that if I need to.
 And as for across OSes, I can legally put a backslash in a filename on Linux.
 It would be very wrong to say that there was anything invalid about having one
 at the end of a file name except on Windows. 
On Linux this would pass as a valid file path. Putting it in the library means it will work in a cross-platform and platform-specific way. Forward slashes are also *valid* on Windows, and yet dirSeparator only returns r"\" on Windows.
 path.endsWith(dirSeparator) tells
 you exactly whether the file name ends with a directory separator or not. If
 you want to check that a path is valid and does not end with a directory
 separator, then do
 
 assert(isValidPath(path) && !path.endsWith(dirSeparator));
Wrong. A forward slash is a valid directory separator on Windows, your assert will pass for this path even though it *can't* be a file path specifier on windows: import std.path; import std.algorithm; void main() { string path = r"C:\foo\bar/"; assert(isValidPath(path) && !path.endsWith(dirSeparator)); }
 I don't see why a new function is needed for this, _especially_ when it's
 perfectly legal to refer to directories _without_ a directory separator on the
 end of the path. 
The focus is on files, not directories. Read my post with the build example I've pasted before.
 I don't see how you can really be using paths alone to try and determine
 whether a path refers to a file or not when the path doesn't give you enough
 information to do that. It would make no sense for std.path to try and
 determine that IMHO precisely because the information isn't there. You need to
 actually check the disk for that - which is what std.file.isFile is for.
Of course I don't do it with *just* a syntactical check, but if I can possibly reject a string as syntactically invalid, I'd rather do that than wait for the OS to tell me if it's a file or directory. This could be a useful optimization technique, e.g. if you're working on a large list of paths and you need to quickly verify that they're all syntactically file paths. This could save you time compared to just blindly doing: try { foreach (path; possibleFilePaths) auto file = File(path, "w"); // could fail if it's a directory string } catch { } -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Nov 27 2011
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=7002




PST ---
 Wrong. A forward slash is a valid directory separator on Windows, your assert
 will pass for this path even though it *can't* be a file path specifier on
 windows
Then use assert(isValidPath(path) && !isDirSeparator(path[$-]);
 Of course I don't do it with *just* a syntactical check, but if I can possibly
 reject a string as syntactically invalid, I'd rather do that than wait for the
 OS to tell me if it's a file or directory. 
 This could be a useful optimization technique, e.g. if you're working on a
 large list of paths and you need to quickly verify that they're all
 syntactically file paths.
I'd argue that std.file.isFile is fast enough given that you're going to be doing I/O anyway and that you're going to need to use it regardless if you want to verify that a path is a valid file, since the syntactic check is not enough. So, the extra benefit of verifying that the path doesn't end with a directory separator is minimal. And if it matters, then do the check that I suggested above. Personally, I just always use std.file.exists and std.file.isFile. If you want to avoid trying to open invalid files, you always have to call them before opening any file anyway. I seriously question that needing to verify whether a path ends in a directory separator deserves its own function - even when coupled with verifying that the path doesn't contain any invalid characters. It's simple enough to do with two checks if you actually need it, and in the general case, it should be completely unnecessary, since there are other checks that you need to done anyway which will catch it. It strikes me as a minor optimization which is completely unnecessary in the general case. There's nothing wrong with an enhancement request for it, but I don't think that such a function merits being added to std.path. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Nov 27 2011
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=7002


Lars T. Kyllingstad <bugzilla kyllingen.net> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |bugzilla kyllingen.net



14:27:36 PST ---
Sorry for coming late to the party, but I totally agree with Jonathan on this.

Such a function does not seem generic enough to warrant inclusion in the
standard library, and std.path already provides the building blocks making it a
one-liner:

bool isDefinitelyADir(string path)
{
    return isValidPath(path) && isDirSeparator(path[$-1]);
}

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Feb 01 2012
prev sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=7002


Andrej Mitrovic <andrej.mitrovich gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|REOPENED                    |RESOLVED
         Resolution|                            |INVALID


-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Feb 01 2012