D - Why is there no String?
- Helmut Leitner (25/25) Apr 04 2003 I tried something like:
- Luna Kid (9/33) Apr 04 2003 Huhh, now this is going to be a long thread... :)
- Jonathan Andrew (18/43) Apr 04 2003 Hello,
- Matthew Wilson (17/69) Apr 04 2003 Not long; not painful. Simply agree
- Benji Smith (6/12) Apr 04 2003 Would it be appropriate to think of a String class within a template. Yo...
- Luna Kid (17/29) Apr 04 2003 Sounds like quite a good idea to me.
- Helmut Leitner (32/35) Apr 05 2003 Sorry, I can't agree. In programming we deal with real world
- J. Daniel Smith (5/40) Apr 05 2003 For better or worse, UNICODE strings are no long as simple as "16-bit
- Helmut Leitner (10/13) Apr 06 2003 I understand that, but the existance of certain encodings - e. g. in a w...
- Jonathan Andrew (9/21) Apr 04 2003 Howdy,
- Ilya Minkov (39/58) Apr 07 2003 Hello.
- Achilleas Margaritis (19/77) Apr 15 2003 be -
- Ilya Minkov (12/18) Apr 16 2003 It doesn't yet make life easy. IIRC you only have less than 1/4 of this
- J. Daniel Smith (6/9) Apr 16 2003 As UNICODE 3.0 shows, 16 bits is not enough. It's looking like around 2...
I tried something like: alias char [] string; int main(string [] args) { printf("%s\n",(char *)args[0]); return 0; } hey, and it worked! But only until I added import string; :-( Ok, no problem. But the question is: Why do you live without getting rid of this "string is an char array" type of thinking. Although it is the case, there is no need to reflect this in all interfaces and the daily work. E. g. the call char [][] files = DirFindFile('c:\dmd','*.d',DFF_FILES|DFF_RECURSIVE); I just posted in a separate thread would look more beautiful written as String [] files = DirFindFile('c:\dmd','*.d',DFF_FILES|DFF_RECURSIVE); wouldn't it? -- Helmut Leitner leitner hls.via.at Graz, Austria www.hls-software.com
Apr 04 2003
Huhh, now this is going to be a long thread... :) I think I completely agree with Helmut, and with Matthew W., and Mark E. and probably many others. And I let them talk -- they can do it better. So the battle has begun... Good luck, folks! :) Cheers, Sab "Helmut Leitner" <leitner hls.via.at> wrote in message news:3E8D5571.E6234E26 hls.via.at...I tried something like: alias char [] string; int main(string [] args) { printf("%s\n",(char *)args[0]); return 0; } hey, and it worked! But only until I added import string; :-( Ok, no problem. But the question is: Why do you live without getting rid of this "string is an char array" type of thinking. Although it is the case, there is no need to reflect this in all interfaces and the daily work. E. g. the call char [][] files = DirFindFile('c:\dmd','*.d',DFF_FILES|DFF_RECURSIVE); I just posted in a separate thread would look more beautiful written as String [] files = DirFindFile('c:\dmd','*.d',DFF_FILES|DFF_RECURSIVE); wouldn't it? -- Helmut Leitner leitner hls.via.at Graz, Austria www.hls-software.com
Apr 04 2003
Hello, I think that an array of chars was probably appropriate, but now that Unicode is being considered for the language, I think a primitive string type might be necessary, because an array of uneven sized chars would be very awkward when talking about indexing (ie. is mystr[6] talking about the 6th byte, or the 6th character?) and declaring new strings ("char [40] mystr;", is this 40 bytes, or 40 characters long?). Basically, stuff that has already been talked about in here. A dedicated string type might resolve some of this ambiguity by providing, for example, both .length (characters) and .size properties (byte-size). Stuff that is important for strings, but not really appropriate for other array types. I don't really care too much either way, and if we are stuck with good old ASCII, it really doesn't matter either way. But if Unicode is put in, then some mechanism should be put in place to take care of these issues, whether its a string type or not. And yes, this is probably going to be the start of a long, painful thread. =) -Jon In article <3E8D5571.E6234E26 hls.via.at>, Helmut Leitner says...I tried something like: alias char [] string; int main(string [] args) { printf("%s\n",(char *)args[0]); return 0; } hey, and it worked! But only until I added import string; :-( Ok, no problem. But the question is: Why do you live without getting rid of this "string is an char array" type of thinking. Although it is the case, there is no need to reflect this in all interfaces and the daily work. E. g. the call char [][] files = DirFindFile('c:\dmd','*.d',DFF_FILES|DFF_RECURSIVE); I just posted in a separate thread would look more beautiful written as String [] files = DirFindFile('c:\dmd','*.d',DFF_FILES|DFF_RECURSIVE); wouldn't it? -- Helmut Leitner leitner hls.via.at Graz, Austria www.hls-software.com
Apr 04 2003
Not long; not painful. Simply agree "Jonathan Andrew" <Jonathan_member pathlink.com> wrote in message news:b6ktj0$2o0u$1 digitaldaemon.com...Hello, I think that an array of chars was probably appropriate, but now thatUnicode isbeing considered for the language, I think a primitive string type mightbenecessary, because an array of uneven sized chars would be very awkwardwhentalking about indexing (ie. is mystr[6] talking about the 6th byte, or the6thcharacter?) and declaring new strings ("char [40] mystr;", is this 40bytes, or40 characters long?). Basically, stuff that has already been talked aboutinhere. A dedicated string type might resolve some of this ambiguity by providing,forexample, both .length (characters) and .size properties (byte-size). Stuffthatis important for strings, but not really appropriate for other arraytypes. Idon't really care too much either way, and if we are stuck with good oldASCII,it really doesn't matter either way. But if Unicode is put in, then some mechanism should be put in place to take care of these issues, whether itsastring type or not. And yes, this is probably going to be the start of along,painful thread. =) -Jon In article <3E8D5571.E6234E26 hls.via.at>, Helmut Leitner says...DirFindFile('c:\dmd','*.d',DFF_FILES|DFF_RECURSIVE);I tried something like: alias char [] string; int main(string [] args) { printf("%s\n",(char *)args[0]); return 0; } hey, and it worked! But only until I added import string; :-( Ok, no problem. But the question is: Why do you live without getting rid of this "string is an char array" type of thinking. Although it is the case, there is no need to reflect this in all interfaces and the daily work. E. g. the call char [][] files =DirFindFile('c:\dmd','*.d',DFF_FILES|DFF_RECURSIVE);I just posted in a separate thread would look more beautiful written as String [] files =wouldn't it? -- Helmut Leitner leitner hls.via.at Graz, Austria www.hls-software.com
Apr 04 2003
Would it be appropriate to think of a String class within a template. You could declare your string to be a UTF-8 or UTF-16 or ECDEC or whatever simply by instantiating the String template. I know that this isn't what templates are technically desinged to do, but it seems like a good idea to me. --BenjiI think that an array of chars was probably appropriate, but now that Unicode is being considered for the language, I think a primitive string type might be necessary, because an array of uneven sized chars would be very awkward when talking about indexing (ie. is mystr[6] talking about the 6th byte, or the 6th character?) and declaring new strings ("char [40] mystr;", is this 40 bytes, or 40 characters long?).
Apr 04 2003
Sounds like quite a good idea to me. Well, but first, it would be good to have that generic string "something". ;) Currently, calling an array of bytes a string is like calling an Otto engine a car... (It's not even a class inheritance thing.) Cheers, Lunar Sab "Benji Smith" <Benji_member pathlink.com> wrote in message news:b6l4vj$2t15$1 digitaldaemon.com...Would it be appropriate to think of a String class within a template. Youcoulddeclare your string to be a UTF-8 or UTF-16 or ECDEC or whatever simply by instantiating the String template. I know that this isn't what templates are technically desinged to do, butitseems like a good idea to me. --BenjiUnicode isI think that an array of chars was probably appropriate, but now thatbebeing considered for the language, I think a primitive string type mightwhennecessary, because an array of uneven sized chars would be very awkwardthe 6thtalking about indexing (ie. is mystr[6] talking about the 6th byte, orbytes, orcharacter?) and declaring new strings ("char [40] mystr;", is this 4040 characters long?).
Apr 04 2003
Luna Kid wrote:Currently, calling an array of bytes a string is like calling an Otto engine a car... (It's not even a class inheritance thing.)Sorry, I can't agree. In programming we deal with real world objects and give them names (like File, String or System). These names may be thought to represent virtual objects, that may be handled - by handles (like MS windows HWND, file handles) - by names (like a customer in a database, file delete) - by OO object references - implicitly withou handle ( SystemRestart(); ) - ... I think that the OO feeling is wrong, that only a class is a good abstraction and that anything of importance must become a class. This feeling developed, because OO is overhyped to a point where this hype effectively reduces the quality of resulting code. There can be no better API functions than FileDelete("test.txt"); String []=UrlGetStringArray("http://walterbright.com"); because they express what the programmer wants to do in a single "sentence" *and* do it. Notice that there is not a single formal object involved. So IMHO, whether something is a formal object or not, should be considered an implementation detail. If Unicode strings are a topic (I do know little about them, I always thought we would just switch to 16-Bit-characters sometime in the future) then someone could create a class to wrap its complexities. I would vote for "class Struni" and suggest that a team forms to prototype it to the point, where Walter can just take it and make it part of Phobos. -- Helmut Leitner leitner hls.via.at Graz, Austria www.hls-software.com
Apr 05 2003
For better or worse, UNICODE strings are no long as simple as "16-bit characters". That's just one example of why this is a rather thorny issue. Dan "Helmut Leitner" <leitner hls.via.at> wrote in message news:3E8E9AAF.865E0DFF hls.via.at...Luna Kid wrote:Currently, calling an array of bytes a string is like calling an Otto engine a car... (It's not even a class inheritance thing.)Sorry, I can't agree. In programming we deal with real world objects and give them names (like File, String or System). These names may be thought to represent virtual objects, that may be handled - by handles (like MS windows HWND, file handles) - by names (like a customer in a database, file delete) - by OO object references - implicitly withou handle ( SystemRestart(); ) - ... I think that the OO feeling is wrong, that only a class is a good abstraction and that anything of importance must become a class. This feeling developed, because OO is overhyped to a point where this hype effectively reduces the quality of resulting code. There can be no better API functions than FileDelete("test.txt"); String []=UrlGetStringArray("http://walterbright.com"); because they express what the programmer wants to do in a single "sentence" *and* do it. Notice that there is not a single formal object involved. So IMHO, whether something is a formal object or not, should be considered an implementation detail. If Unicode strings are a topic (I do know little about them, I always thought we would just switch to 16-Bit-characters sometime in the future) then someone could create a class to wrap its complexities. I would vote for "class Struni" and suggest that a team forms to prototype it to the point, where Walter can just take it and make it part of Phobos. -- Helmut Leitner leitner hls.via.at Graz, Austria www.hls-software.com
Apr 05 2003
"J. Daniel Smith" wrote:For better or worse, UNICODE strings are no long as simple as "16-bit characters". That's just one example of why this is a rather thorny issue.I understand that, but the existance of certain encodings - e. g. in a web page - doesn't immediately mean to me that there is the need for a native type or a class handling these encodings (maybe to have transforming functions is enough). I suppose that this comes from my lack of knowledge in this area. I hadn't yet time to read the existing older threads about this topic. Is there a summary of all the issues somewhere? -- Helmut Leitner leitner hls.via.at Graz, Austria www.hls-software.com
Apr 06 2003
Howdy, I think that the templated way of doing strings is probably adding more abstraction and complication to the string concept, which at this point should be a primitive type in the language, whether it is a completely new type or still an array of chars with special properties. That's just my opinion though, and I confess to being afraid of templates, coming mostly from a C/Java background. -Jon In article <b6l4vj$2t15$1 digitaldaemon.com>, Benji Smith says...Would it be appropriate to think of a String class within a template. You could declare your string to be a UTF-8 or UTF-16 or ECDEC or whatever simply by instantiating the String template. I know that this isn't what templates are technically desinged to do, but it seems like a good idea to me. --BenjiI think that an array of chars was probably appropriate, but now that Unicode is being considered for the language, I think a primitive string type might be necessary, because an array of uneven sized chars would be very awkward when talking about indexing (ie. is mystr[6] talking about the 6th byte, or the 6th character?) and declaring new strings ("char [40] mystr;", is this 40 bytes, or 40 characters long?).
Apr 04 2003
Hello. While char[] is a good and native thing for working with console, simple textfiles, and such, it is not a solution for applications processing any data subject to internalisation. And that is almost every piece of text currently out there. However, i keep thinking that one dedicated string class is not at all enough. I propose - not one - not two - but *at least* three of them. First and the basic one - --- a String type - should be an array of 4-byte characters. It is used inside functions for processing strings. With modern processors, handling 4-byte values may be cheaper than 2-byte and not much costier than of 1-byte. As to space considerations - forget them, this type is for local chewing only. If you want to keep this string in memory or some database consider the second one - --- a CompactString type - should consist of 2 arrays, first one for raw characters, second one for mode changes. The second one is the key. It should store a list of events like "at character x change to codepage y, encoding z" "or at character x make an exceptional 4-byte value", which could be swizzled into a few bytes each. It should also be quite fast to handle, since unlike UTF7/8/16, the raw string need not be scanned to determine its length, this can be done by scanning mode changes, which has to be an order or two of magnitude shorter. And it can adapt itself to whatever takes the least space - 8-bit with explicit codepage for e.g. european and russian, 16-bit for japanese kanji and somesuch, or even 32-bit in rare case you mix all languages evenly. But this type would not be directly standards-complying. There should obviously also be - --- another type which corresponds to the underlying system's preferred encoding. A set of functions also has to be provided to convert any of these types to and from any of the other standard unicode types. As to templates - i don't hold much of them for these purposes. There is a limited number of types - you don't want to create a string of floats, do you? And besides, their handling differs in some ways. But making them into classes could give further flexibility, at the price of an (8-byte IIRC) space overhead per instance. -i. PS. i've been away for a short while... 300 new messages! i wonder how Walter manages to read them AND maintain two complex compilers! Jonathan Andrew wrote:Hello, I think that an array of chars was probably appropriate, but now that Unicode is being considered for the language, I think a primitive string type might be necessary, because an array of uneven sized chars would be very awkward when talking about indexing (ie. is mystr[6] talking about the 6th byte, or the 6th character?) and declaring new strings ("char [40] mystr;", is this 40 bytes, or 40 characters long?). Basically, stuff that has already been talked about in here. A dedicated string type might resolve some of this ambiguity by providing, for example, both .length (characters) and .size properties (byte-size). Stuff that is important for strings, but not really appropriate for other array types. I don't really care too much either way, and if we are stuck with good old ASCII, it really doesn't matter either way. But if Unicode is put in, then some mechanism should be put in place to take care of these issues, whether its a string type or not. And yes, this is probably going to be the start of a long, painful thread. =) -Jon
Apr 07 2003
"Ilya Minkov" <midiclub 8ung.at> wrote in message news:b6t1i8$nt4$1 digitaldaemon.com...Hello. While char[] is a good and native thing for working with console, simple textfiles, and such, it is not a solution for applications processing any data subject to internalisation. And that is almost every piece of text currently out there. However, i keep thinking that one dedicated string class is not at all enough. I propose - not one - not two - but *at least* three of them. First and the basic one - --- a String type - should be an array of 4-byte characters. It is used inside functions for processing strings. With modern processors, handling 4-byte values may be cheaper than 2-byte and not much costier than of 1-byte. As to space considerations - forget them, this type is for local chewing only. If you want to keep this string in memory or some database consider the second one - --- a CompactString type - should consist of 2 arrays, first one for raw characters, second one for mode changes. The second one is the key. It should store a list of events like "at character x change to codepage y, encoding z" "or at character x make an exceptional 4-byte value", which could be swizzled into a few bytes each. It should also be quite fast to handle, since unlike UTF7/8/16, the raw string need not be scanned to determine its length, this can be done by scanning mode changes, which has to be an order or two of magnitude shorter. And it can adapt itself to whatever takes the least space - 8-bit with explicit codepage for e.g. european and russian, 16-bit for japanese kanji and somesuch, or even 32-bit in rare case you mix all languages evenly. But this type would not be directly standards-complying. There should obviously alsobe ---- another type which corresponds to the underlying system's preferred encoding. A set of functions also has to be provided to convert any of these types to and from any of the other standard unicode types. As to templates - i don't hold much of them for these purposes. There is a limited number of types - you don't want to create a string of floats, do you? And besides, their handling differs in some ways. But making them into classes could give further flexibility, at the price of an (8-byte IIRC) space overhead per instance. -i. PS. i've been away for a short while... 300 new messages! i wonder how Walter manages to read them AND maintain two complex compilers! Jonathan Andrew wrote:Unicode isHello, I think that an array of chars was probably appropriate, but now thatbebeing considered for the language, I think a primitive string type mightwhennecessary, because an array of uneven sized chars would be very awkwardthe 6thtalking about indexing (ie. is mystr[6] talking about the 6th byte, orbytes, orcharacter?) and declaring new strings ("char [40] mystr;", is this 40about in40 characters long?). Basically, stuff that has already been talkedproviding, forhere. A dedicated string type might resolve some of this ambiguity byStuff thatexample, both .length (characters) and .size properties (byte-size).types. Iis important for strings, but not really appropriate for other arrayASCII,don't really care too much either way, and if we are stuck with good oldits ait really doesn't matter either way. But if Unicode is put in, then some mechanism should be put in place to take care of these issues, whetherlong,string type or not. And yes, this is probably going to be the start of aWhy complicate our lives ? D should use 16-bit unicode and provide implicit conversions in any I/O, according to environment. These conversions should be transparent to the user. 65536 characters are enough to represent most Earth languages...painful thread. =) -Jon
Apr 15 2003
Achilleas Margaritis wrote:Why complicate our lives ? D should use 16-bit unicode and provide implicit conversions in any I/O, according to environment. These conversions should be transparent to the user. 65536 characters are enough to represent most Earth languages...It doesn't yet make life easy. IIRC you only have less than 1/4 of this set available. Besides, how do you treat separate accents, if they cannot be combined into the letters? This is really rare though. 32-bit is better for speed. And besides, you have already let everyone down for years with that endless "we don't care, since *our* language fits in less than seven bits. now the rest of teh world may share the leftovers, if they like." Now, aren't we letting anyone down with 16-bit? Besides, the second type i proposed, would usually use 1 byte for european, cyrillic, arabic, hebrew, greek and such, unlike UTF-8 and UTF-16 which *both* requiere 2, so it's better for storage! -i.
Apr 16 2003
As UNICODE 3.0 shows, 16 bits is not enough. It's looking like around 23 bits are needed, a value which of course isn't very practical on today's computers. Dan "Achilleas Margaritis" <axilmar in.gr> wrote in message news:b7h8tm$t5j$1 digitaldaemon.com...[...] be transparent to the user. 65536 characters are enough to represent most Earth languages...
Apr 16 2003