D - Automatic Safe and Efficient Sz-ing
- Matthew Wilson (46/46) Mar 31 2003 An idea I had in my sleep, so please forgive if I've overlooked some hug...
- Bill Cox (9/12) Mar 31 2003 This is a rare occasion when I agree with Mark. The fact that a
- Matthew Wilson (26/38) Mar 31 2003 :)
- Matthew Wilson (8/51) Mar 31 2003 Correction: meant UCS-/UTF-16, not 32
- Bill Cox (9/17) Mar 31 2003 From a user point of view, I like the char null*. The single most commo...
- Matthew Wilson (5/8) Mar 31 2003 Good point. Maybe you've invented a new, and quite definitive, metric fo...
- Burton Radons (15/31) Apr 01 2003 The problems of newbies are eminently ignorable. It's the problems of
- Matthew Wilson (28/59) Apr 01 2003 I don't get your point.
- Mark Evans (3/3) Mar 31 2003 Matthew please post in the other thread if you want me to respond. That...
- Matthew Wilson (5/8) Mar 31 2003 Can't remember which bit of which post applies to which thread. Verbal
- Walter (3/7) Mar 31 2003 That's the direction D is going.
- Walter (3/7) May 24 2003 The next release will provide a module to do all the conversions.
- Mark Evans (4/5) May 25 2003 What it won't provide are manipulation routines for the results. Conver...
An idea I had in my sleep, so please forgive if I've overlooked some huge obvious beastie. When interfacing a character array (btw, I'm with Mark in thinking we should have a separate string class, but have not amassed my ammunition so am not looking to engage in that debate yet) to a C API expecting a null string, we have the options of - not terminating - crash! - terminating in the array via ~= (char)0; - using toStringz() which seems from the implementation to contain most of my sleepytime ideas for an efficient placement of a terminating null. Gah! Nonetheless, I was wondering whether there was some way of making this call implicit, perhaps in the declaration of the C function. For example, strlen is declared thus extern (C) { int strlen(char *); } Would it be a nice thing to declare it extern (C) { int strlen(char null *); } and the D compiler would insert a call to toStringz() automatically? Sure there is an efficiency argument against, but I suspect most of such C calls that expect ZTS have to involve some similar treatment. And really, the null decorator would not mean that "the compiler must call toStringz", rather it could mean that "the compiler must ensure that the string is zero-terminated". Hence the compiler would be free to optimise out such a call where it is dealing with a literal, or static, or something that it's already established is null terminated. For example, the code void blah(char[] s) { int len1 = strlen(s); int len2 = strlen(s); } Could be translated to void blah(char[] s) { char[] s_zt = toStringz(s); int len1 = strlen(s_zt); int len2 = strlen(s_zt); } This would eradicate many of the problems that are likely to bite people interfacing to C code, without in any way adding a cost to "pure" D. Any takers? Matthew
Mar 31 2003
Hi, Matthew.When interfacing a character array (btw, I'm with Mark in thinking we should have a separate string class, but have not amassed my ammunition so am not looking to engage in that debate yet)This is a rare occasion when I agree with Mark. The fact that a minimalist like me, and a maximalist like Mark, and a pragmatist like yourself seem to agree is something Walter should consider. I would want to hold built-in string support to just UTF-8. D could offer some support for the other formats through conversion routines in a standard library. Having a single string format would surely be simpler than supporting them all. Bill
Mar 31 2003
:) Pragmatist is a lot more of a compliment than what I usually get: pedant. Yes, the string stuff is highly toxic in C, C++ and (it seems) D. I am also, however, wary of building in support for inefficient (in terms of speed, not size) variable character length encoding schemes. Is there are reason why UCS-32 (or is that UTF-32 - I need to go and digest all that awful gunk again and get my terminology back up to speed), a la wchar_t, Java, .NETis not sufficient? I know that 65536 doesn't cover all the bases of _all_ languages, but it is nevertheless used as a "complete" solution by so many languages, so is it "near enough is good enough". Dunno, seems Mark's much more of an expert, so hopefully he can enlighten me on that one. Anyway, Bill, everyone, do you like the "char null *" idea? - Doesn't introduce another keyword. - Surely not hard to parse. - Improves robustness. - Doesn't add operations that would not have to be done anyway. - Leaves it all to compiler's best discretion, so plenty of chances for being _faster_ than leaving it up to user, which seems to be a theme of D, where achievable. Sure, fire away, but I think we should have it running for parliament. ;) Percy the pragmatist "Bill Cox" <bill viasic.com> wrote in message news:3E88BE91.6010403 viasic.com...Hi, Matthew.shouldWhen interfacing a character array (btw, I'm with Mark in thinking wenothave a separate string class, but have not amassed my ammunition so amlooking to engage in that debate yet)This is a rare occasion when I agree with Mark. The fact that a minimalist like me, and a maximalist like Mark, and a pragmatist like yourself seem to agree is something Walter should consider. I would want to hold built-in string support to just UTF-8. D could offer some support for the other formats through conversion routines in a standard library. Having a single string format would surely be simpler than supporting them all. Bill
Mar 31 2003
Correction: meant UCS-/UTF-16, not 32 "Matthew Wilson" <dmd synesis.com.au> wrote in message news:b6aph7$1dbp$1 digitaldaemon.com...:) Pragmatist is a lot more of a compliment than what I usually get: pedant. Yes, the string stuff is highly toxic in C, C++ and (it seems) D. I amalso,however, wary of building in support for inefficient (in terms of speed,notsize) variable character length encoding schemes. Is there are reason why UCS-32 (or is that UTF-32 - I need to go anddigestall that awful gunk again and get my terminology back up to speed), a la wchar_t, Java, .NETis not sufficient? I know that 65536 doesn't cover all the bases of _all_ languages, but itisnevertheless used as a "complete" solution by so many languages, so is it "near enough is good enough". Dunno, seems Mark's much more of an expert,sohopefully he can enlighten me on that one. Anyway, Bill, everyone, do you like the "char null *" idea? - Doesn't introduce another keyword. - Surely not hard to parse. - Improves robustness. - Doesn't add operations that would not have to be done anyway. - Leaves it all to compiler's best discretion, so plenty of chances for being _faster_ than leaving it up to user, which seems to be a theme of D, where achievable. Sure, fire away, but I think we should have it running for parliament. ;) Percy the pragmatist "Bill Cox" <bill viasic.com> wrote in message news:3E88BE91.6010403 viasic.com...Hi, Matthew.shouldWhen interfacing a character array (btw, I'm with Mark in thinking wenothave a separate string class, but have not amassed my ammunition so amlooking to engage in that debate yet)This is a rare occasion when I agree with Mark. The fact that a minimalist like me, and a maximalist like Mark, and a pragmatist like yourself seem to agree is something Walter should consider. I would want to hold built-in string support to just UTF-8. D could offer some support for the other formats through conversion routines in a standard library. Having a single string format would surely be simpler than supporting them all. Bill
Mar 31 2003
In article <b6aph7$1dbp$1 digitaldaemon.com>, Matthew Wilson says...Anyway, Bill, everyone, do you like the "char null *" idea? - Doesn't introduce another keyword. - Surely not hard to parse. - Improves robustness. - Doesn't add operations that would not have to be done anyway. - Leaves it all to compiler's best discretion, so plenty of chances for being _faster_ than leaving it up to user, which seems to be a theme of D, where achievable.From a user point of view, I like the char null*. The single most common "Help!, I've crashed my simple D program" post on this newsgroup seems to have to do with the terminating null, and how it interacts with character array slicing. I'd be nice to help clear that one up. I don't know how hard the support would be. I'd have to be pretty hard to amount to more of Walter's time than dealing with the confused D users. Bill
Mar 31 2003
I'd be nice to help clear that one up. I don't know how hard the supportwouldbe. I'd have to be pretty hard to amount to more of Walter's time thandealingwith the confused D users.Good point. Maybe you've invented a new, and quite definitive, metric for measuring the worth of D changes. :) Walter ?
Mar 31 2003
Bill Cox wrote:In article <b6aph7$1dbp$1 digitaldaemon.com>, Matthew Wilson says...The problems of newbies are eminently ignorable. It's the problems of people who are indoctrinated that are worth looking into, they're the ones who are going to be running into it in the years following. About the issue itself, uh... it's a good match for D (as set out at the top of the Phobos page), it's not a good match for what I want D to be. I don't like referring to C functions directly, because of incompatible signatures, lack of exceptions, weird overloading, and extreme operating system variations in Unices - for example, sometimes errno is a symbol, sometimes it's a macro calling a function. Purifying this variability is the first task of cross-platform work, which I do quite a lot of, and char* is one small factor of the problem. So altogether there's no win in it for me. toStringz shows up 38 times in the interface library dig, 0 times in the client program dedit. That's the way it should be.Anyway, Bill, everyone, do you like the "char null *" idea? - Doesn't introduce another keyword. - Surely not hard to parse. - Improves robustness. - Doesn't add operations that would not have to be done anyway. - Leaves it all to compiler's best discretion, so plenty of chances for being _faster_ than leaving it up to user, which seems to be a theme of D, where achievable.From a user point of view, I like the char null*. The single most common "Help!, I've crashed my simple D program" post on this newsgroup seems to have to do with the terminating null, and how it interacts with character array slicing.
Apr 01 2003
I don't get your point. Without a DNI (i.e. D Native Interface) with which to approach D from the underside, we are forced to have C compatibility within D itself, touching C from the upperside, if you like. Frankly (and I guess this is because I'm a pragmatist, eh Bill?) I don't care which it is, but I do think it's important to maximise robustness wherever possible without cause any significant degradation of performance. As I've argued, this feature would most certainly increase robustness and would also likely increase performance (quality of compiler optimisations allowing). You say that toStringz() shows up 38 times in your code, and then say there's no win in it for you. This seems contradictory. Have I misunderstood your post? As for the "purity" of D, I'll have to leave that to those of a more philosophical bent. I'd offer this thought, though: I have a friend who works on the Solaris kernel team, and he tells me they're not thinking of going C++ or Java or anything else other than C, for the "foreseeable future" (which is a long time, I think). Walter's created a language to supercede C (among others), but has wisely put C compatibility into it. It being the case that C compatibility is built in to D, I cannot see the sense in denying ourselves more robustness and efficiency for free, just because it's less pure? "Burton Radons" <loth users.sourceforge.net> wrote in message news:b6c7cp$2bh2$1 digitaldaemon.com...Bill Cox wrote:D,In article <b6aph7$1dbp$1 digitaldaemon.com>, Matthew Wilson says...Anyway, Bill, everyone, do you like the "char null *" idea? - Doesn't introduce another keyword. - Surely not hard to parse. - Improves robustness. - Doesn't add operations that would not have to be done anyway. - Leaves it all to compiler's best discretion, so plenty of chances for being _faster_ than leaving it up to user, which seems to be a theme ofcommonwhere achievable.From a user point of view, I like the char null*. The single mostto have"Help!, I've crashed my simple D program" post on this newsgroup seemsarrayto do with the terminating null, and how it interacts with characterslicing.The problems of newbies are eminently ignorable. It's the problems of people who are indoctrinated that are worth looking into, they're the ones who are going to be running into it in the years following. About the issue itself, uh... it's a good match for D (as set out at the top of the Phobos page), it's not a good match for what I want D to be. I don't like referring to C functions directly, because of incompatible signatures, lack of exceptions, weird overloading, and extreme operating system variations in Unices - for example, sometimes errno is a symbol, sometimes it's a macro calling a function. Purifying this variability is the first task of cross-platform work, which I do quite a lot of, and char* is one small factor of the problem. So altogether there's no win in it for me. toStringz shows up 38 times in the interface library dig, 0 times in the client program dedit. That's the way it should be.
Apr 01 2003
Matthew please post in the other thread if you want me to respond. That's why I started it. Mark
Mar 31 2003
Can't remember which bit of which post applies to which thread. Verbal diarrhoea, I'm afraid. "Mark Evans" <Mark_member pathlink.com> wrote in message news:b6av14$1h2u$1 digitaldaemon.com...Matthew please post in the other thread if you want me to respond. That'swhy Istarted it. Mark
Mar 31 2003
"Bill Cox" <bill viasic.com> wrote in message news:3E88BE91.6010403 viasic.com...I would want to hold built-in string support to just UTF-8. D could offer some support for the other formats through conversion routines in a standard library. Having a single string format would surely be simpler than supporting them all.That's the direction D is going.
Mar 31 2003
"Bill Cox" <bill viasic.com> wrote in message news:3E88BE91.6010403 viasic.com...I would want to hold built-in string support to just UTF-8. D could offer some support for the other formats through conversion routines in a standard library. Having a single string format would surely be simpler than supporting them all.The next release will provide a module to do all the conversions.
May 24 2003
The next release will provide a module to do all the conversions.What it won't provide are manipulation routines for the results. Conversions aren't enough, one wants a consistent design that treats strings the same no matter their encoding. Mark
May 25 2003