digitalmars.D.learn - why is string not implicit convertable to const(char*) ?
- mta`chrono (15/15) Jun 29 2012 does anyone know why string not implicit convertable to const(char*) ?
- Jonathan M Davis (26/45) Jun 29 2012 Because it's _not_ const char*. It's an array. And passing a string dire...
- mta`chrono (2/2) Jul 02 2012 Your answers are remarkable elaborated. Thanks for your great effort,
- dcoder (7/74) Jul 05 2012 Thanks for the thorough explanation, but it begs the question why
- Timon Gehr (6/11) Jul 05 2012 Because that is inefficient. It disables string slicing and is
- Jonathan M Davis (14/21) Jul 05 2012 Are you serious? I'm shocked to hear anyone suggest that. Zero-terminate...
- Wouter Verhelst (10/30) Jul 05 2012 To be fair, there are a _few_ areas in which zero-terminated strings may
- Timon Gehr (5/34) Jul 05 2012 It is impossible to know that the memory block is large enough unless
- Wouter Verhelst (15/25) Jul 05 2012 Sure it is, but not by looking at the string itself.
- Timon Gehr (8/31) Jul 05 2012 This incurs the cost of determining the original string's length, which
- Wouter Verhelst (17/18) Jul 06 2012 There are ways to know the original string's length without having to
- Jonathan M Davis (8/11) Jul 06 2012 Well, then we're going to have to agree to disagree on that one. While s...
- akaz (9/21) Jul 07 2012 I agree, despite the fact that it allows, in principle, creating
- Jonathan M Davis (22/55) Jul 05 2012 Actually, I'd expect a string that maintains its length to beat a zero-
- Wouter Verhelst (15/40) Jul 05 2012 Absolutely.
- Jonathan M Davis (6/15) Jul 05 2012 There are a number of things that we do now with programming languages t...
- dcoder (4/4) Jul 06 2012 Thanks for the lengthy threaded explanations. I just use the
does anyone know why string not implicit convertable to const(char*) ? ------- import core.sys.posix.unistd; void main() { // ok unlink("foo.txt"); // failed string file = "bar.txt"; unlink(file); } test.d(10): Error: function core.sys.posix.unistd.unlink (const(char*)) is not callable using argument types (string) test.d(10): Error: cannot implicitly convert expression (file) of type string to const(char*)
Jun 29 2012
On Saturday, June 30, 2012 02:12:22 mta`chrono wrote:does anyone know why string not implicit convertable to const(char*) ? ------- import core.sys.posix.unistd; void main() { // ok unlink("foo.txt"); // failed string file = "bar.txt"; unlink(file); } test.d(10): Error: function core.sys.posix.unistd.unlink (const(char*)) is not callable using argument types (string) test.d(10): Error: cannot implicitly convert expression (file) of type string to const(char*)Because it's _not_ const char*. It's an array. And passing a string directly to a C function (which is almost the only reason that you'd want a string to convert to a const char*) is generally _wrong_. Strings in D are _not_ zero- terminated. String _literals_ are (they have a '\0' one character passed their end), so as it just so happens, if string implicitly converted to const char*, your code would work, but if your string had been created from anything other than a string literal, it would _not_ be zero terminated. Even concatenating two string literals results in a string which isn't zero-terminated. So, implictly converting strinvg to const char* would just cause bugs (in fact, it _used_ to work, and it was fixed so that it doesn't precisely because it's behavior which just causes bugs). What you need to do is use std.string.toStringz. It converts a string to a zero-terminated string. It appends '\0' to the end of the string if it has to (which could result in the string having to be reallocated to make room for it), but if it can determine that it's unnecessary (which it can do at least some of the time with string literals), it'll just return the string's ptr property without doing any allocating. But since you _need_ that '\0', that's the best that you can do. Simply passing the string's ptr property to a C function would be wrong, since it's not zero-terminated. Your function call should look like unlink(toStringz(file)); Of course, you could just do std.file.remove(file), which ultimately does the same thing and does so on all platforms rather than just POSIX, but that's a separate issue from converting a string to a const char*. - Jonathan M Davis
Jun 29 2012
Your answers are remarkable elaborated. Thanks for your great effort, Jonathan!! ;-)
Jul 02 2012
On Saturday, 30 June 2012 at 00:27:46 UTC, Jonathan M Davis wrote:On Saturday, June 30, 2012 02:12:22 mta`chrono wrote:Thanks for the thorough explanation, but it begs the question why not make strings be array of chars that have \0 at the end of it? Since, lots of D programmers were/are probably C/C++ programmers, why should D be different here? Wouldn't it facilitate more C/C++ programmers to come to D? Just curious.does anyone know why string not implicit convertable to const(char*) ? ------- import core.sys.posix.unistd; void main() { // ok unlink("foo.txt"); // failed string file = "bar.txt"; unlink(file); } test.d(10): Error: function core.sys.posix.unistd.unlink (const(char*)) is not callable using argument types (string) test.d(10): Error: cannot implicitly convert expression (file) of type string to const(char*)Because it's _not_ const char*. It's an array. And passing a string directly to a C function (which is almost the only reason that you'd want a string to convert to a const char*) is generally _wrong_. Strings in D are _not_ zero- terminated. String _literals_ are (they have a '\0' one character passed their end), so as it just so happens, if string implicitly converted to const char*, your code would work, but if your string had been created from anything other than a string literal, it would _not_ be zero terminated. Even concatenating two string literals results in a string which isn't zero-terminated. So, implictly converting strinvg to const char* would just cause bugs (in fact, it _used_ to work, and it was fixed so that it doesn't precisely because it's behavior which just causes bugs). What you need to do is use std.string.toStringz. It converts a string to a zero-terminated string. It appends '\0' to the end of the string if it has to (which could result in the string having to be reallocated to make room for it), but if it can determine that it's unnecessary (which it can do at least some of the time with string literals), it'll just return the string's ptr property without doing any allocating. But since you _need_ that '\0', that's the best that you can do. Simply passing the string's ptr property to a C function would be wrong, since it's not zero-terminated. Your function call should look like unlink(toStringz(file)); Of course, you could just do std.file.remove(file), which ultimately does the same thing and does so on all platforms rather than just POSIX, but that's a separate issue from converting a string to a const char*. - Jonathan M Davis
Jul 05 2012
On 07/05/2012 09:32 PM, dcoder wrote:Thanks for the thorough explanation, but it begs the question why not make strings be array of chars that have \0 at the end of it?Because that is inefficient. It disables string slicing and is completely redundant. BTW: String literals are guaranteed to be zero-terminated.Since, lots of D programmers were/are probably C/C++ programmers, why should D be different here?Because it is a superior model.Wouldn't it facilitate more C/C++ programmers to come to D?Why would that matter?
Jul 05 2012
On Thursday, July 05, 2012 21:32:11 dcoder wrote:Thanks for the thorough explanation, but it begs the question why not make strings be array of chars that have \0 at the end of it? Since, lots of D programmers were/are probably C/C++ programmers, why should D be different here? Wouldn't it facilitate more C/C++ programmers to come to D? Just curious.Are you serious? I'm shocked to hear anyone suggest that. Zero-terminated strings are one of the largest mistakes in programming history. They're insanely inefficient. In fact, IIRC Walter Bright has stated that he thinks that having arrays without a length property was C's greatest mistake (and if they'd had that, they wouldn't have created zero-terminated strings). C++ tried to fix it with std::string, but C compatability bites you everywhere with that, so it only halfway works. C++ programmers in general would probably have thought that the designers of D were idiots if they had gone with zero- terminated strings. You don't do what another language did just to match. You do it because what they did works and you have no reason to change it. Zero-terminated strings were a horrible idea, and we're not about to copy it. - Jonathan M Davis
Jul 05 2012
Jonathan M Davis <jmdavisProg gmx.com> writes:On Thursday, July 05, 2012 21:32:11 dcoder wrote:To be fair, there are a _few_ areas in which zero-terminated strings may possibly outperform zero-terminated strings (appending data in the case where you know the memory block is large enough, for instance). But they're far and few between, and it would indeed be silly to switch to zero-terminated strings. -- The volume of a pizza of thickness a and radius z can be described by the following formula: pi zz aThanks for the thorough explanation, but it begs the question why not make strings be array of chars that have \0 at the end of it? Since, lots of D programmers were/are probably C/C++ programmers, why should D be different here? Wouldn't it facilitate more C/C++ programmers to come to D? Just curious.Are you serious? I'm shocked to hear anyone suggest that. Zero-terminated strings are one of the largest mistakes in programming history. They're insanely inefficient. In fact, IIRC Walter Bright has stated that he thinks that having arrays without a length property was C's greatest mistake (and if they'd had that, they wouldn't have created zero-terminated strings). C++ tried to fix it with std::string, but C compatability bites you everywhere with that, so it only halfway works. C++ programmers in general would probably have thought that the designers of D were idiots if they had gone with zero- terminated strings. You don't do what another language did just to match. You do it because what they did works and you have no reason to change it. Zero-terminated strings were a horrible idea, and we're not about to copy it.
Jul 05 2012
On 07/06/2012 02:57 AM, Wouter Verhelst wrote:Jonathan M Davis<jmdavisProg gmx.com> writes:It is impossible to know that the memory block is large enough unless the length of the string is known. But it isn't.On Thursday, July 05, 2012 21:32:11 dcoder wrote:To be fair, there are a _few_ areas in which zero-terminated strings may possibly outperform zero-terminated strings (appending data in the case where you know the memory block is large enough, for instance).Thanks for the thorough explanation, but it begs the question why not make strings be array of chars that have \0 at the end of it? Since, lots of D programmers were/are probably C/C++ programmers, why should D be different here? Wouldn't it facilitate more C/C++ programmers to come to D? Just curious.Are you serious? I'm shocked to hear anyone suggest that. Zero-terminated strings are one of the largest mistakes in programming history. They're insanely inefficient. In fact, IIRC Walter Bright has stated that he thinks that having arrays without a length property was C's greatest mistake (and if they'd had that, they wouldn't have created zero-terminated strings). C++ tried to fix it with std::string, but C compatability bites you everywhere with that, so it only halfway works. C++ programmers in general would probably have thought that the designers of D were idiots if they had gone with zero- terminated strings. You don't do what another language did just to match. You do it because what they did works and you have no reason to change it. Zero-terminated strings were a horrible idea, and we're not about to copy it.But they're far and few between, and it would indeed be silly to switch to zero-terminated strings.There is no string manipulation that is significantly faster with zero-terminated strings.
Jul 05 2012
Timon Gehr <timon.gehr gmx.ch> writes:On 07/06/2012 02:57 AM, Wouter Verhelst wrote:Sure it is, but not by looking at the string itself. Say you have a string that contains some data you need, and some other data you don't. I.e., you want to throw out parts of the string. You could allocate a memory block that's as large as the original string (so you're sure you've got enough space), and then start memcpy'ing stuff into the new memory block from the old string. This way you're sure you won't overrun your zero-terminated string, and you'll be a slight bit faster than you would be with a bounded string. I'll readily admit I haven't don't this all that often, though :-)To be fair, there are a _few_ areas in which zero-terminated strings may possibly outperform zero-terminated strings (appending data in the case where you know the memory block is large enough, for instance).It is impossible to know that the memory block is large enough unless the length of the string is known. But it isn't.Correct -- but only because you said "significantly". -- The volume of a pizza of thickness a and radius z can be described by the following formula: pi zz aBut they're far and few between, and it would indeed be silly to switch to zero-terminated strings.There is no string manipulation that is significantly faster with zero-terminated strings.
Jul 05 2012
On 07/06/2012 03:40 AM, Wouter Verhelst wrote:Timon Gehr<timon.gehr gmx.ch> writes:This incurs the cost of determining the original string's length, which is higher than computing the new string length for the data&length representation.On 07/06/2012 02:57 AM, Wouter Verhelst wrote:Sure it is, but not by looking at the string itself. Say you have a string that contains some data you need, and some other data you don't. I.e., you want to throw out parts of the string. You could allocate a memory block that's as large as the original string (so you're sure you've got enough space), and then start memcpy'ing stuff into the new memory block from the old string.To be fair, there are a _few_ areas in which zero-terminated strings may possibly outperform zero-terminated strings (appending data in the case where you know the memory block is large enough, for instance).It is impossible to know that the memory block is large enough unless the length of the string is known. But it isn't.This way you're sure you won't overrun your zero-terminated string, and you'll be a slight bit faster than you would be with a bounded string.Are you talking about differences of a few operations that are completely hidden on a modern out-of-order CPU? I don't think the zero-terminated string method will even perform less operations.I'll readily admit I haven't don't this all that often, though :-)I meant to say, 'measurably'.Correct -- but only because you said "significantly".But they're far and few between, and it would indeed be silly to switch to zero-terminated strings.There is no string manipulation that is significantly faster with zero-terminated strings.
Jul 05 2012
Timon Gehr <timon.gehr gmx.ch> writes:This incurs the cost of determining the original string's length,There are ways to know the original string's length without having to calculate it. E.g., if you do a read() with some length and you don't get an error message indicating that there aren't as many characters available, you can be pretty sure of the string's length. If the "original" string is a string you read in from a file, there'll be no need to count characters. Anyway, this is all besides the point. I think it's safe to say we agree that bounded strings and arrays are superior to zero-terminated strings (at least in all but a few corner cases). However, I also happen to think that saying "zero-terminated strings were a horrendous design decision" is a bit short-sighted. That's all. -- The volume of a pizza of thickness a and radius z can be described by the following formula: pi zz a
Jul 06 2012
On Friday, July 06, 2012 12:56:36 Wouter Verhelst wrote:However, I also happen to think that saying "zero-terminated strings were a horrendous design decision" is a bit short-sighted.Well, then we're going to have to agree to disagree on that one. While some design decisions may have made more sense at the time they were made or the ultimate pros and cons may not have been clear at the time, I think that zero- terminated strings are one of the design decisions which was truly short- sighted and an enormous mistake all around, and all C/C++ programmers have had to pay for it ever since. - Jonathan M Davis
Jul 06 2012
Well, then we're going to have to agree to disagree on that one. While some design decisions may have made more sense at the time they were made or the ultimate pros and cons may not have been clear at the time, I think that zero- terminated strings are one of the design decisions which was truly short- sighted and an enormous mistake all around, and all C/C++ programmers have had to pay for it ever since. - Jonathan M DavisI agree, despite the fact that it allows, in principle, creating strings as long as desired with constant cost (just one byte is sacrificed, instead of one, two, three, four etc. required to represent the length). Besides, using zero-terminated strings did not impose, in principle (forget about machine addressing issues) no upper bound on the length of a string. But, OTOH, it was also the only way to do it once the decision to not incorporate length in arrays (basically, under the assumption: an array is a pointer and nothing more) was made.
Jul 07 2012
On Thursday, July 05, 2012 18:57:05 Wouter Verhelst wrote:Jonathan M Davis <jmdavisProg gmx.com> writes:Actually, I'd expect a string that maintains its length to beat a zero- terminated string at that - especially because you'd have to already know the string's length to pull that off, which is O(n) for zero-terminated strings. The _only_ time that the zero-terminated string might outperform the one which maintained its length when you to append is if you already happen to know the length of the string being appended to and the string being appended (which you wouldn't normally with zero-terminated strings), because then the zero- terminated string would have one more byte to copy as part of its memcpy than the other string would, but the other string would have to adjust its length, making it cost _slightly_ more. But really, given the overal costs of zero- terminated length, it would be ridiculous to even count that extra bit of performance given the _huge_ performance losses everywhere else with them. The _only_ valid excuse that I'm aware of for picking such a horrid design is the fact that it costs extra memory to maintain the length of an array along with the array, and when C was created, they cared a _lot_ more about memory usage than we do today. So, regardless of what the pros or cons were in the short run, in the long run, their decision was a very poor one that pretty much no one has duplicated. I really see no reason to cut them any slack for such a horrible design decision. - Jonathan M DavisOn Thursday, July 05, 2012 21:32:11 dcoder wrote:To be fair, there are a _few_ areas in which zero-terminated strings may possibly outperform zero-terminated strings (appending data in the case where you know the memory block is large enough, for instance). But they're far and few between, and it would indeed be silly to switch to zero-terminated strings.Thanks for the thorough explanation, but it begs the question why not make strings be array of chars that have \0 at the end of it? Since, lots of D programmers were/are probably C/C++ programmers, why should D be different here? Wouldn't it facilitate more C/C++ programmers to come to D? Just curious.Are you serious? I'm shocked to hear anyone suggest that. Zero-terminated strings are one of the largest mistakes in programming history. They're insanely inefficient. In fact, IIRC Walter Bright has stated that he thinks that having arrays without a length property was C's greatest mistake (and if they'd had that, they wouldn't have created zero-terminated strings). C++ tried to fix it with std::string, but C compatability bites you everywhere with that, so it only halfway works. C++ programmers in general would probably have thought that the designers of D were idiots if they had gone with zero- terminated strings. You don't do what another language did just to match. You do it because what they did works and you have no reason to change it. Zero-terminated strings were a horrible idea, and we're not about to copy it.
Jul 05 2012
Jonathan M Davis <jmdavisProg gmx.com> writes:On Thursday, July 05, 2012 18:57:05 Wouter Verhelst wrote:That's what I meant, yes.To be fair, there are a _few_ areas in which zero-terminated strings may possibly outperform zero-terminated strings (appending data in the case where you know the memory block is large enough, for instance). But they're far and few between, and it would indeed be silly to switch to zero-terminated strings.Actually, I'd expect a string that maintains its length to beat a zero- terminated string at that - especially because you'd have to already know the string's length to pull that off, which is O(n) for zero-terminated strings. The _only_ time that the zero-terminated string might outperform the one which maintained its length when you to append is if you already happen to know the length of the string being appended to and the string being appended (which you wouldn't normally with zero-terminated strings), because then the zero- terminated string would have one more byte to copy as part of its memcpy than the other string would, but the other string would have to adjust its length, making it cost _slightly_ more.But really, given the overal costs of zero- terminated length, it would be ridiculous to even count that extra bit of performance given the _huge_ performance losses everywhere else with them.Absolutely.The _only_ valid excuse that I'm aware of for picking such a horrid design is the fact that it costs extra memory to maintain the length of an array along with the array, and when C was created, they cared a _lot_ more about memory usage than we do today. So, regardless of what the pros or cons were in the short run, in the long run, their decision was a very poor one that pretty much no one has duplicated.Well, really, strings in C are just a special case of arrays (as is true in D as well), and arrays in C are just a special case of pointers (which isn't true in D). That means the language is fairly compact, which also means the compiler has much lower resource requirements. I think that, much more than any requirements at runtime, has driven the choice for zero-terminated strings. Just for comparison, what happens to DMD's memory usage when you do extensive templating wouldn't have been possible back in 1969 ;-) -- The volume of a pizza of thickness a and radius z can be described by the following formula: pi zz a
Jul 05 2012
On Thursday, July 05, 2012 19:59:26 Wouter Verhelst wrote:Well, really, strings in C are just a special case of arrays (as is true in D as well), and arrays in C are just a special case of pointers (which isn't true in D). That means the language is fairly compact, which also means the compiler has much lower resource requirements. I think that, much more than any requirements at runtime, has driven the choice for zero-terminated strings. Just for comparison, what happens to DMD's memory usage when you do extensive templating wouldn't have been possible back in 1969 ;-)There are a number of things that we do now with programming languages that you couldn't do when C was created. Having arrays that know their length is not one of them. Other languages in that time frame did it. C made the horrendous mistake of not doing it. - Jonathan M Davis
Jul 05 2012
Thanks for the lengthy threaded explanations. I just use the language to write applications, I have no idea of the challenges that you must face to implement/design a language. Hence the stupid questions. :) Anyways, fascinating stuff.
Jul 06 2012