digitalmars.D - better string
- Mike B Johnson (19/19) Jun 07 2017 Why not alias string so that one can easily switch from the old
- mk0w (3/6) Jun 07 2017 I'd suggest to avoid utf16 wherever possible.
- Mike B Johnson (12/34) Jun 07 2017 I should mention, that with such a design, strings can default to
- ag0aep6g (4/18) Jun 07 2017 I'm not sure what exactly you're asking for, but `string` is an alias
- Mike B Johnson (14/33) Jun 07 2017 But that isn't program/compiler wide. e.g., stringof won't return
- Stanislav Blinov (5/10) Jun 07 2017 It doesn't work that way and it can't work that way: you'd never
- Mike B Johnson (3/13) Jun 07 2017 um, because I'm god and I get to wear the big boy pants.
- Jonathan M Davis via Digitalmars-d (47/66) Jun 07 2017 The official solution for handling multiple string types is to templatiz...
Why not alias string so that one can easily switch from the old string or wstring, etc? e.g., rename string internally to sstring or whatever. then globally define alias string = sstring; Which can be over realiased to wstring to affect the whole program alias string = wstring; Or use a command line to set it or whatever makes you happy. I'm in the progress of converting a large source code database to use the above technique so we can move to using wstring... it is not fun. Most code that works with a string should with any string encoding, so it shouldn't matter. Making D string agnostic(after all, the only main different in 99% of programs is the space they take up). If you are worried about it causing subtle bugs, then don't... because those same bugs would occur if one manually had to switch. By designing techniques to use strings that are agnostic of there internal representation should save a lot of headache. For those few cases that it matters, simple static analysis works fine.
Jun 07 2017
On Wednesday, 7 June 2017 at 10:58:06 UTC, Mike B Johnson wrote:Why not alias string so that one can easily switch from the old string or wstring, etc? [...]I'd suggest to avoid utf16 wherever possible. why? http://utf8everywhere.org/
Jun 07 2017
On Wednesday, 7 June 2017 at 10:58:06 UTC, Mike B Johnson wrote:Why not alias string so that one can easily switch from the old string or wstring, etc? e.g., rename string internally to sstring or whatever. then globally define alias string = sstring; Which can be over realiased to wstring to affect the whole program alias string = wstring; Or use a command line to set it or whatever makes you happy. I'm in the progress of converting a large source code database to use the above technique so we can move to using wstring... it is not fun. Most code that works with a string should with any string encoding, so it shouldn't matter. Making D string agnostic(after all, the only main different in 99% of programs is the space they take up). If you are worried about it causing subtle bugs, then don't... because those same bugs would occur if one manually had to switch. By designing techniques to use strings that are agnostic of there internal representation should save a lot of headache. For those few cases that it matters, simple static analysis works fine.I should mention, that with such a design, strings can default to the string type, whatever it would be. e.g., "this is a string" depends on the "alias". If it is sstring then it is an sstring, if it is wstring then it is a wstring. Anything that returns a string will return it depend on the alias, even templated functions such as foreach(name; AliasSeq!(X.tupleof.stringof)) in which, generally makes name a sstring. (I suppose due to stringof returning an sstring regardless).
Jun 07 2017
On 06/07/2017 12:58 PM, Mike B Johnson wrote:Why not alias string so that one can easily switch from the old string or wstring, etc? e.g., rename string internally to sstring or whatever. then globally define alias string = sstring; Which can be over realiased to wstring to affect the whole program alias string = wstring; Or use a command line to set it or whatever makes you happy.I'm not sure what exactly you're asking for, but `string` is an alias already (of `immutable(char)[]`). And you can define your own `string` as you like. `alias string = wstring;` works.
Jun 07 2017
On Wednesday, 7 June 2017 at 21:32:25 UTC, ag0aep6g wrote:On 06/07/2017 12:58 PM, Mike B Johnson wrote:But that isn't program/compiler wide. e.g., stringof won't return a wstring if you do the alias, will it? Or will simply setting "alias string = wstring;" at the top of my program end up having the entire program, regardless of what it is, use wstring's instead of strings? e.g., when I do a string literal "This is a string" but do your alias, is the literal a string or wstring? The reason I say this is because I converted my program to use wstrings but I got many errors because of string literals being interpreted as strings and no automatic conversion took place, I had to append w to turn then in to wstrings.Why not alias string so that one can easily switch from the old string or wstring, etc? e.g., rename string internally to sstring or whatever. then globally define alias string = sstring; Which can be over realiased to wstring to affect the whole program alias string = wstring; Or use a command line to set it or whatever makes you happy.I'm not sure what exactly you're asking for, but `string` is an alias already (of `immutable(char)[]`). And you can define your own `string` as you like. `alias string = wstring;` works.
Jun 07 2017
On Wednesday, 7 June 2017 at 23:57:44 UTC, Mike B Johnson wrote:Or will simply setting "alias string = wstring;" at the top of my program end up having the entire program, regardless of what it is, use wstring's instead of strings?It doesn't work that way and it can't work that way: you'd never be able to link against anything if it did.The reason I say this is because I converted my program to use wstrings...Why? Why trade one variable-width encoding for another, especially a nasty one like UTF-16?
Jun 07 2017
On Thursday, 8 June 2017 at 00:59:06 UTC, Stanislav Blinov wrote:On Wednesday, 7 June 2017 at 23:57:44 UTC, Mike B Johnson wrote:Not true, way to overgeneralize!Or will simply setting "alias string = wstring;" at the top of my program end up having the entire program, regardless of what it is, use wstring's instead of strings?It doesn't work that way and it can't work that way: you'd never be able to link against anything if it did.um, because I'm god and I get to wear the big boy pants.The reason I say this is because I converted my program to use wstrings...Why? Why trade one variable-width encoding for another, especially a nasty one like UTF-16?
Jun 07 2017
On Wednesday, June 07, 2017 10:58:06 Mike B Johnson via Digitalmars-d wrote:Why not alias string so that one can easily switch from the old string or wstring, etc? e.g., rename string internally to sstring or whatever. then globally define alias string = sstring; Which can be over realiased to wstring to affect the whole program alias string = wstring; Or use a command line to set it or whatever makes you happy. I'm in the progress of converting a large source code database to use the above technique so we can move to using wstring... it is not fun. Most code that works with a string should with any string encoding, so it shouldn't matter. Making D string agnostic(after all, the only main different in 99% of programs is the space they take up). If you are worried about it causing subtle bugs, then don't... because those same bugs would occur if one manually had to switch. By designing techniques to use strings that are agnostic of there internal representation should save a lot of headache. For those few cases that it matters, simple static analysis works fine.The official solution for handling multiple string types is to templatize code and operate on ranges of charaters. Regardless, all string is is an alias. All of the problems that you're running into relate to the fact that all built-in D facilities use UTF-8 when they have to choose a character type. Most would agree that if you have to pick, UTF-8 is the better choice. And it doesn't make sense for something like .stringof or toString to vary in string type, because D doesn't overload based on return type, and making those change based on a compiler flag would make D libraries incompatible with one another if they're not built exactly the same way. In addition, we'd get yet more problems akin to what happens with size_t when someone always builds their code on 32-bit or always on 64-bit and never on the other. Not many types in D vary based on platform, but the ones that do tend to result in bugs due to folks not building and testing their code on enough platforms. In D, it is generally considered best practice to use UTF-8 everywhere in your code except in places where you need to use UTF-16 or UTF-32. For a lot of programs, that means using UTF-8 everywhere and then the standard library functions deal with system APIs for stuff like dealing with files, since Windows uses UTF-16 for many of its APIs. If you're using the Windows API directly, that then means doing the conversion yourself with functions like toUTFz, but most programs don't have to worry about that, and it's still considered best practice for those that do to convert to UTF-16 when they have to but to use UTF-8 as much as possible. If you want to use UTF-16 everywhere throughout your program, then you certainly can, and many of the standard library facilities will work just fine that way, because they're templatized and deal with the differences in character types, but the language and runtime use UTF-8 when they had to make a choice, and most any library you're going to find for D is going to use UTF-8 in its API when it's not templated code. I don't think that you're going to find much support for the idea that you can change all of the string types in a program with a compiler switch. D provides solid facilities for converting between different UTF character encodings, and templates allow you to write code that is encoding-agnostic, but doing something like Windows' TCHAR is a whole other kettle of fish. D's general approach is to make it so that the types do not vary from platform to platform. There are a few cases where it's done to get at the full address space (size_t) or to get full access to the hardware's capabilities (real) - or simply because there is no way around it (e.g. pointers are going to be 32-bits on 32-bit systems and 64-bit or 64-bit systems) - but in general, the idea has been to make the types vary based on the platform as little as reasonably possible, and nowhere do the built-in types vary based on compiler flags. And I would not expect that to change. But if you feel strongly about it, you can certainly create a DIP and try to get your proposed changes into the language: https://github.com/dlang/DIPs - Jonathan M Davis
Jun 07 2017