www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - better string

reply Mike B Johnson <Mikey Ikes.com> writes:
Why not alias string so that one can easily switch from the old 
string or wstring, etc?

e.g., rename string internally to sstring or whatever.

then globally define

alias string = sstring;

Which can be over realiased to wstring to affect the whole program

alias string = wstring;

Or use a command line to set it or whatever makes you happy.

I'm in the progress of converting a large source code database to 
use the above technique so we can move to using wstring... it is 
not fun. Most code that works with a string should with any 
string encoding, so it shouldn't matter. Making D string 
agnostic(after all, the only main different in 99% of programs is 
the space they take up).

If you are worried about it causing subtle bugs, then don't... 
because those same bugs would occur if one manually had to switch.

By designing techniques to use strings that are agnostic of there 
internal representation should save a lot of headache. For those 
few cases that it matters, simple static analysis works fine.
Jun 07 2017
next sibling parent mk0w <no no.no> writes:
On Wednesday, 7 June 2017 at 10:58:06 UTC, Mike B Johnson wrote:
 Why not alias string so that one can easily switch from the old 
 string or wstring, etc?

 [...]
I'd suggest to avoid utf16 wherever possible. why? http://utf8everywhere.org/
Jun 07 2017
prev sibling next sibling parent Mike B Johnson <Mikey Ikes.com> writes:
On Wednesday, 7 June 2017 at 10:58:06 UTC, Mike B Johnson wrote:
 Why not alias string so that one can easily switch from the old 
 string or wstring, etc?

 e.g., rename string internally to sstring or whatever.

 then globally define

 alias string = sstring;

 Which can be over realiased to wstring to affect the whole 
 program

 alias string = wstring;

 Or use a command line to set it or whatever makes you happy.

 I'm in the progress of converting a large source code database 
 to use the above technique so we can move to using wstring... 
 it is not fun. Most code that works with a string should with 
 any string encoding, so it shouldn't matter. Making D string 
 agnostic(after all, the only main different in 99% of programs 
 is the space they take up).

 If you are worried about it causing subtle bugs, then don't... 
 because those same bugs would occur if one manually had to 
 switch.

 By designing techniques to use strings that are agnostic of 
 there internal representation should save a lot of headache. 
 For those few cases that it matters, simple static analysis 
 works fine.
I should mention, that with such a design, strings can default to the string type, whatever it would be. e.g., "this is a string" depends on the "alias". If it is sstring then it is an sstring, if it is wstring then it is a wstring. Anything that returns a string will return it depend on the alias, even templated functions such as foreach(name; AliasSeq!(X.tupleof.stringof)) in which, generally makes name a sstring. (I suppose due to stringof returning an sstring regardless).
Jun 07 2017
prev sibling next sibling parent reply ag0aep6g <anonymous example.com> writes:
On 06/07/2017 12:58 PM, Mike B Johnson wrote:
 Why not alias string so that one can easily switch from the old string 
 or wstring, etc?
 
 e.g., rename string internally to sstring or whatever.
 
 then globally define
 
 alias string = sstring;
 
 Which can be over realiased to wstring to affect the whole program
 
 alias string = wstring;
 
 Or use a command line to set it or whatever makes you happy.
I'm not sure what exactly you're asking for, but `string` is an alias already (of `immutable(char)[]`). And you can define your own `string` as you like. `alias string = wstring;` works.
Jun 07 2017
parent reply Mike B Johnson <Mikey Ikes.com> writes:
On Wednesday, 7 June 2017 at 21:32:25 UTC, ag0aep6g wrote:
 On 06/07/2017 12:58 PM, Mike B Johnson wrote:
 Why not alias string so that one can easily switch from the 
 old string or wstring, etc?
 
 e.g., rename string internally to sstring or whatever.
 
 then globally define
 
 alias string = sstring;
 
 Which can be over realiased to wstring to affect the whole 
 program
 
 alias string = wstring;
 
 Or use a command line to set it or whatever makes you happy.
I'm not sure what exactly you're asking for, but `string` is an alias already (of `immutable(char)[]`). And you can define your own `string` as you like. `alias string = wstring;` works.
But that isn't program/compiler wide. e.g., stringof won't return a wstring if you do the alias, will it? Or will simply setting "alias string = wstring;" at the top of my program end up having the entire program, regardless of what it is, use wstring's instead of strings? e.g., when I do a string literal "This is a string" but do your alias, is the literal a string or wstring? The reason I say this is because I converted my program to use wstrings but I got many errors because of string literals being interpreted as strings and no automatic conversion took place, I had to append w to turn then in to wstrings.
Jun 07 2017
parent reply Stanislav Blinov <stanislav.blinov gmail.com> writes:
On Wednesday, 7 June 2017 at 23:57:44 UTC, Mike B Johnson wrote:

 Or will simply setting "alias string = wstring;" at the top of 
 my program end up having the entire program, regardless of what 
 it is, use wstring's instead of strings?
It doesn't work that way and it can't work that way: you'd never be able to link against anything if it did.
 The reason I say this is because I converted my program to use 
 wstrings...
Why? Why trade one variable-width encoding for another, especially a nasty one like UTF-16?
Jun 07 2017
parent Mike B Johnson <Mikey Ikes.com> writes:
On Thursday, 8 June 2017 at 00:59:06 UTC, Stanislav Blinov wrote:
 On Wednesday, 7 June 2017 at 23:57:44 UTC, Mike B Johnson wrote:

 Or will simply setting "alias string = wstring;" at the top of 
 my program end up having the entire program, regardless of 
 what it is, use wstring's instead of strings?
It doesn't work that way and it can't work that way: you'd never be able to link against anything if it did.
Not true, way to overgeneralize!
 The reason I say this is because I converted my program to use 
 wstrings...
Why? Why trade one variable-width encoding for another, especially a nasty one like UTF-16?
um, because I'm god and I get to wear the big boy pants.
Jun 07 2017
prev sibling parent Jonathan M Davis via Digitalmars-d <digitalmars-d puremagic.com> writes:
On Wednesday, June 07, 2017 10:58:06 Mike B Johnson via Digitalmars-d wrote:
 Why not alias string so that one can easily switch from the old
 string or wstring, etc?

 e.g., rename string internally to sstring or whatever.

 then globally define

 alias string = sstring;

 Which can be over realiased to wstring to affect the whole program

 alias string = wstring;

 Or use a command line to set it or whatever makes you happy.

 I'm in the progress of converting a large source code database to
 use the above technique so we can move to using wstring... it is
 not fun. Most code that works with a string should with any
 string encoding, so it shouldn't matter. Making D string
 agnostic(after all, the only main different in 99% of programs is
 the space they take up).

 If you are worried about it causing subtle bugs, then don't...
 because those same bugs would occur if one manually had to switch.

 By designing techniques to use strings that are agnostic of there
 internal representation should save a lot of headache. For those
 few cases that it matters, simple static analysis works fine.
The official solution for handling multiple string types is to templatize code and operate on ranges of charaters. Regardless, all string is is an alias. All of the problems that you're running into relate to the fact that all built-in D facilities use UTF-8 when they have to choose a character type. Most would agree that if you have to pick, UTF-8 is the better choice. And it doesn't make sense for something like .stringof or toString to vary in string type, because D doesn't overload based on return type, and making those change based on a compiler flag would make D libraries incompatible with one another if they're not built exactly the same way. In addition, we'd get yet more problems akin to what happens with size_t when someone always builds their code on 32-bit or always on 64-bit and never on the other. Not many types in D vary based on platform, but the ones that do tend to result in bugs due to folks not building and testing their code on enough platforms. In D, it is generally considered best practice to use UTF-8 everywhere in your code except in places where you need to use UTF-16 or UTF-32. For a lot of programs, that means using UTF-8 everywhere and then the standard library functions deal with system APIs for stuff like dealing with files, since Windows uses UTF-16 for many of its APIs. If you're using the Windows API directly, that then means doing the conversion yourself with functions like toUTFz, but most programs don't have to worry about that, and it's still considered best practice for those that do to convert to UTF-16 when they have to but to use UTF-8 as much as possible. If you want to use UTF-16 everywhere throughout your program, then you certainly can, and many of the standard library facilities will work just fine that way, because they're templatized and deal with the differences in character types, but the language and runtime use UTF-8 when they had to make a choice, and most any library you're going to find for D is going to use UTF-8 in its API when it's not templated code. I don't think that you're going to find much support for the idea that you can change all of the string types in a program with a compiler switch. D provides solid facilities for converting between different UTF character encodings, and templates allow you to write code that is encoding-agnostic, but doing something like Windows' TCHAR is a whole other kettle of fish. D's general approach is to make it so that the types do not vary from platform to platform. There are a few cases where it's done to get at the full address space (size_t) or to get full access to the hardware's capabilities (real) - or simply because there is no way around it (e.g. pointers are going to be 32-bits on 32-bit systems and 64-bit or 64-bit systems) - but in general, the idea has been to make the types vary based on the platform as little as reasonably possible, and nowhere do the built-in types vary based on compiler flags. And I would not expect that to change. But if you feel strongly about it, you can certainly create a DIP and try to get your proposed changes into the language: https://github.com/dlang/DIPs - Jonathan M Davis
Jun 07 2017