www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - DirEntry on Windows - wstring variant?

reply "dcrepid" <dcrepid none.com> writes:
As a Windows programmer using D, I find a number of questionable 
things with D's focus on using string everywhere. It's not a huge 
deal to add in UTF-8 to UTF-16 mapping in certain areas, but when 
it comes to working with a lot of data and Windows API calls, the 
less needless conversions the better.

I like the DirEntries (std.file) approach to traversing files and 
folders in a directory (almost as nice as C++14's <filesystem>), 
but I think its a bit odd that native-OS strings aren't used in D 
here.  Sure, I get that having a fairly consistent programming 
interface may make using the language easier for certain 
programmers, but if you're using D with Windows, then you will be 
made well aware of the incompatibilities between D's strings and 
the Windows API (unless you always use ASCII I suppose).

Anyway, I'm curious if proposing changes to those interfaces is 
worthwhile, or if I should just modify it for my own purposes and 
leave the standard library be.

P.S. Its a shame to keep running into Unicode issues with D and 
Windows, and sometimes its a bit discouraging. Right before I 
peeked into DirEntry, I worked a bit on a workaround for 
stdio.File's unicode problems (a documented bug thats 2+ years 
old). I remember trying D a while back and giving up because 
optlink was choking on paths. And just yesterday it choked on 
what the %PATH% environment variable was set to, so I had to 
clear that before running it.
Oct 24 2014
parent reply Jonathan M Davis via Digitalmars-d-learn writes:
On Friday, October 24, 2014 21:06:37 dcrepid via Digitalmars-d-learn wrote:
 As a Windows programmer using D, I find a number of questionable
 things with D's focus on using string everywhere. It's not a huge
 deal to add in UTF-8 to UTF-16 mapping in certain areas, but when
 it comes to working with a lot of data and Windows API calls, the
 less needless conversions the better.

 I like the DirEntries (std.file) approach to traversing files and
 folders in a directory (almost as nice as C++14's <filesystem>),
 but I think its a bit odd that native-OS strings aren't used in D
 here.  Sure, I get that having a fairly consistent programming
 interface may make using the language easier for certain
 programmers, but if you're using D with Windows, then you will be
 made well aware of the incompatibilities between D's strings and
 the Windows API (unless you always use ASCII I suppose).

 Anyway, I'm curious if proposing changes to those interfaces is
 worthwhile, or if I should just modify it for my own purposes and
 leave the standard library be.

 P.S. Its a shame to keep running into Unicode issues with D and
 Windows, and sometimes its a bit discouraging. Right before I
 peeked into DirEntry, I worked a bit on a workaround for
 stdio.File's unicode problems (a documented bug thats 2+ years
 old). I remember trying D a while back and giving up because
 optlink was choking on paths. And just yesterday it choked on
 what the %PATH% environment variable was set to, so I had to
 clear that before running it.
I don't know. The expectation is generally that programs will use string and that wstring will be used only in the rare cases that you have to interact directly with the Windows API. When it was suggested previously that the various functions in std.file be templatized on string type to support other string types, it was decided that that was unnecessary code bloat and not worth it. Also, given how DirEntry works internally, I'd definitely be inclined to argue that it would be too much of a mess to support wstring unless it's by simply converting the name to a wstring when requested (which is kind of pointless, since you can just do to!wstring on the name if that's what you want). Making it support wstring directly would involve a lot of code duplication, and it would increase the memory footprint, because the structs involved would then have to hold the path and whatnot as both a string and wstring. So, I question that it's at all worth it to try and make dirEntries support wstring. And we definitely don't want to encourage the use of wstring. It's there for when you need it (which is great), but programs really should be using string if they don't actually need to use wstring or dstring. - Jonathan M Davis
Oct 24 2014
parent reply "dcrepid" <dcrepid none.com> writes:
On Friday, 24 October 2014 at 22:53:15 UTC, Jonathan M Davis via 
Digitalmars-d-learn wrote:
 Also, given how DirEntry works internally, I'd definitely be 
 inclined to argue
 that it would be too much of a mess to support wstring unless 
 it's by simply
 converting the name to a wstring when requested (which is kind 
 of pointless,
 since you can just do to!wstring on the name if that's what you 
 want). Making
 it support wstring directly would involve a lot of code 
 duplication, and it
 would increase the memory footprint, because the structs 
 involved would then
 have to hold the path and whatnot as both a string and wstring. 
 So, I question
 that it's at all worth it to try and make dirEntries support 
 wstring.
I would suggest that the string be kept as wstring inside the DirEntry structure, rather than converting twice as you suggest. Then a decision can be made as to whether .name() returns a string or wstring. If backwards compatibility is a concern, then it could be converted to a string on that call. It would break the nothrow promise that way, though. Adding something like .wname() would work here for getting the native wstring, I suppose. Another alternative is to have a union of string and wstring, and a bool indicating how strings are handled internally. Of course, the .name and .wname properties would need to check it and convert depending on how it is stored. Its not pretty, but its just another possibility. The whole point is that there is a lot of wasted time doing the UTF16-UTF8 conversions when using these library functions.
 And we
 definitely don't want to encourage the use of wstring. It's 
 there for when you
 need it (which is great), but programs really should be using 
 string if they
 don't actually need to use wstring or dstring.
I get that wstring on a whole is ugly, but its the native unicode string type in Windows. If someone is doing serious work on Windows, wstring will eventually need to be used. It'd be nice to keep the abstraction of string at every level of a program, but in Windows its impossible. The standard library, even if it was comprehensive enough, will never cover every corner case where strings are needed. Whether using the Windows API, COM, or interfacing with other Windows libraries, wstring will still rear its ugly head. But, idealism aside, there are good reasons for keeping the pathname in its native format on Windows: - If a program is processing lots of files, there's going to be a lot of wasted cycles doing those wstring->string conversions. - Doing anything more with the files, besides listing them, will probably result in a string->wstring conversion during a call to Windows for opening or querying information about the file = more cycles wasted - Additionally, Windows has a peculiar way of handling long pathnames that requires a "\\?\" prefix, and only works with the unicode versions of its functions. This also makes the pathname uniquely OS-specific.. Anyway, some things to think about.
Oct 24 2014
parent Jonathan M Davis via Digitalmars-d-learn writes:
On Saturday, October 25, 2014 01:11:26 dcrepid via Digitalmars-d-learn wrote:
 On Friday, 24 October 2014 at 22:53:15 UTC, Jonathan M Davis via
 Anyway, some things to think about.
DirEntry and all of the related functions and types would need quite a bit of rewriting to do what you're suggesting, and most folks aren't going to be using the Windows API enough for it to matter that std.file operates on string rather than wstring. And much as the Windows API unfortunately uses wchar all over the place, the functions that are being used internally in dirEntries and company and using static arrays of wchar, so it's not like using wstring instead of string would avoid any allocations. All it would do would be to avoid decoding and re-encoding the Unicode from UTF-16 to UTF-8. Additionally, the cost of the file operations would dwarf the cost of any allocations anyway. So, I'm not in the least bit convinced that altering anything in std.file to use wstring would be of much benefit, and all previous suggestions to support wstring anywhere in std.file have been shot down (and iin at least one case, it was by Walter Bright, and he works on Windosw primarily). - Jonathan M Davis
Oct 24 2014