digitalmars.D.bugs - toUTFxx returns null references
- Derek Parnell (30/30) Feb 10 2005 I do not know if this is a bug or not.
- =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= (18/42) Feb 10 2005 Confusing, but I don't really think it's a bug...
- Derek (52/104) Feb 10 2005 If discovered this behaviour when I used an 'in' contract in a function ...
- =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= (39/51) Feb 10 2005 No,
- Derek (17/19) Feb 10 2005 I'm testing for this ...
- =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= (17/34) Feb 10 2005 There is nothing wrong with using an unassigned string,
- Derek (15/59) Feb 10 2005 Yes, I understand the technical aspect of this. However, I was attemptin...
- Regan Heath (19/20) Feb 10 2005 There is a difference, internally, but D treats them the same. Which is ...
- Derek Parnell (6/18) Feb 10 2005 Exactly! Well said.
- =?UTF-8?B?QW5kZXJzIEYgQmrDtnJrbHVuZA==?= (28/59) Feb 11 2005 More or less, yes. But that's more of an Implementation Quirkâ„¢.
- Regan Heath (33/88) Feb 13 2005 Which worries me because I believe there is a real need to tell them apa...
I do not know if this is a bug or not. The toUTF32(), toUTF16(), and toUTF8() routines return a null reference if the input parameter is an empty string. I would have thought that they should return an empty string instead. The only exception is when the parameter is the same type as the return value's type, in that case they return an empty string. Example code... <code> import std.utf; import std.stdio; void main() { char[] s = ""; dchar[] d; if (s is null) writefln("s is null"); else writefln("s length is %d", s.length); d = toUTF32(s); if (d is null) writefln("d is null"); else writefln("d length is %d", d.length); } </code> -- Derek Melbourne, Australia 10/02/2005 7:28:31 PM
Feb 10 2005
Derek Parnell wrote:I do not know if this is a bug or not.Confusing, but I don't really think it's a bug... (maybe the std routines need to be more similar to eachother, either all return null or all return "", but both types of return values are OK to use, below:)The toUTF32(), toUTF16(), and toUTF8() routines return a null reference if the input parameter is an empty string. I would have thought that they should return an empty string instead. The only exception is when the parameter is the same type as the return value's type, in that case they return an empty string.I believe that in D, the empty string is "equal" to null. http://www.digitalmars.com/d/cppstrings.html:In D, an empty string is just null: char[] str; if (!str) // string is emptyThat works the same with either null or "", and this too:import std.stdio; void main() { char[] s = ""; char[] d = null; writefln("s is %snull", s is null ? "" : "not "); writefln("s length is %d", s.length); writefln("d is %snull", d is null ? "" : "not "); writefln("d length is %d", d.length); }s is not null s length is 0 d is null d length is 0 Which means that whether it is "" or null, it'll compare and work the same to the rest of code ? Unless C is involved, since s.ptr will point to a '\0', but d.ptr points to null. But that will work itself out in the toStringz process... (since D strings have to be zero-terminate for C anyway) --anders
Feb 10 2005
On Thu, 10 Feb 2005 09:59:39 +0100, Anders F Björklund wrote:Derek Parnell wrote:If discovered this behaviour when I used an 'in' contract in a function ... bool foo(dchar[] X, dchar[] Y) in { assert( ! (X is null) ); assert( ! (Y is null) ); } body { . . . } So what you seem to be saying is that I shouldn't bother checking that a dynamic array reference is null or not. Instead I can just check the length. However, I was trying to trap the case in which the function was called with an uninitialized array. Calling it with a empty array is ok though. A fuller example in which it tripped me up ... <code> import std.utf; import std.stdio; bool foo(dchar[] X, dchar[] Y) in { assert( ! (X is null) ); assert( ! (Y is null) ); } body { return true; } bool foo(char[] X, char[] Y) { return foo( toUTF32(X), toUTF32(Y) ); } bool foo(wchar[] X, wchar[] Y) { return foo( toUTF32(X), toUTF32(Y) ); } unittest { dchar[] a; dchar[] b; a = ""; b = "123"; debug(1) writefln("UT1"); assert( foo(toUTF32(a), toUTF32(b) ) ); debug(1) writefln("UT2"); assert( foo(toUTF16(a), toUTF16(b) ) ); debug(1) writefln("UT3"); assert( foo(toUTF8(a), toUTF8(b) ) ); } </code> Compiled with dmd test -debug -unittest -- Derek Melbourne, AustraliaI do not know if this is a bug or not.Confusing, but I don't really think it's a bug... (maybe the std routines need to be more similar to eachother, either all return null or all return "", but both types of return values are OK to use, below:)The toUTF32(), toUTF16(), and toUTF8() routines return a null reference if the input parameter is an empty string. I would have thought that they should return an empty string instead. The only exception is when the parameter is the same type as the return value's type, in that case they return an empty string.I believe that in D, the empty string is "equal" to null. http://www.digitalmars.com/d/cppstrings.html:In D, an empty string is just null: char[] str; if (!str) // string is emptyThat works the same with either null or "", and this too:import std.stdio; void main() { char[] s = ""; char[] d = null; writefln("s is %snull", s is null ? "" : "not "); writefln("s length is %d", s.length); writefln("d is %snull", d is null ? "" : "not "); writefln("d length is %d", d.length); }s is not null s length is 0 d is null d length is 0 Which means that whether it is "" or null, it'll compare and work the same to the rest of code ? Unless C is involved, since s.ptr will point to a '\0', but d.ptr points to null. But that will work itself out in the toStringz process... (since D strings have to be zero-terminate for C anyway)
Feb 10 2005
Derek wrote:So what you seem to be saying is that I shouldn't bother checking that a dynamic array reference is null or not. Instead I can just check the length. However, I was trying to trap the case in which the function was called with an uninitialized array. Calling it with a empty array is ok though.No, I don't think you should bother to differ between null and .length == 0.bool foo(dchar[] X, dchar[] Y) in { assert( ! (X is null) ); assert( ! (Y is null) ); } body { return true; }The "recommended" way to write that is: assert(X); assert(Y); Since D doesn't have booleans, that is ? (and since the long form is an eye-sore) I'm not sure what you are trying to test, but: int main() { char[] nullstr = null; assert(nullstr == ""); assert("" == nullstr); return 0; } This test does not fail, and does not segfault... (like it would have done if nullstr was an Object:) int main() { Object nullobj = null; assert(nullobj == null); // <-- KABOOM assert(null == nullobj); // <-- KABOOM return 0; } This second program *must* be rewritten with "is". (since using '==' with class objects calls opEquals) Pointers are OK too: int main() { void* nullptr = null; assert(nullptr == null); assert(null == nullptr); return 0; } To be on the safe side, one can use "is" always... (i.e. with pointers/objects, but *not* with strings since that only compares the references, like in Java) --anders
Feb 10 2005
On Thu, 10 Feb 2005 14:10:47 +0100, Anders F Björklund wrote:I'm not sure what you are trying to test, but:I'm testing for this ... void main() { char[] nullstr; assert( ! (nullstr is null) ); } Namely, the attempted use of a string that has never had any assignment yet. But as toUTFxx() returns that something that looks like an unassigned string, I can't test for unassigned strings. I still think that the toUTFxx() functions should return an empty string if an empty string was passed to them. -- Derek Melbourne, Australia
Feb 10 2005
Derek wrote:There is nothing wrong with using an unassigned string, since all arrays (including char[]) default to length 0... You can pass "nullstr" to writefln and friends, just fine.I'm not sure what you are trying to test, but:I'm testing for this ... void main() { char[] nullstr; assert( ! (nullstr is null) ); } Namely, the attempted use of a string that has never had any assignment yet.But as toUTFxx() returns that something that looks like an unassigned string, I can't test for unassigned strings.If you really, really, want to test for "unassigned" strings - use .ptr: void main() { char[] s = ""; char[] d = null; assert(s.ptr != null); assert(d.ptr == null); } This is because the ptr of a string literal will point to a '\0' char.I still think that the toUTFxx() functions should return an empty string if an empty string was passed to them.There is *no* difference in D, between null and the empty string. They both have the length property set to 0, and they're equal. (not identical, though, so using "is" between them will fail) --anders
Feb 10 2005
On Thu, 10 Feb 2005 15:21:21 +0100, Anders F Björklund wrote:Derek wrote:Yes, I understand the technical aspect of this. However, I was attempting to help the coder trap mistakes; namely the use of unassigned strings. The assumption is that if a coder declares a string, and uses it before assigning anything to it, then it might mean that there is a logic error in the code. This is slightly different from the use of numbers, as most people expect that numbers are zero upon declaration. But still, its just a philosophy question really. Walter has decided for us that unassigned variables are an acceptable practice, where as pedantic people such as myself think that they might indicate errors in coding. I will, no doubt, have to adjust to the given situation as it ain't gonna change ;-) -- Derek Melbourne, AustraliaThere is nothing wrong with using an unassigned string, since all arrays (including char[]) default to length 0... You can pass "nullstr" to writefln and friends, just fine.I'm not sure what you are trying to test, but:I'm testing for this ... void main() { char[] nullstr; assert( ! (nullstr is null) ); } Namely, the attempted use of a string that has never had any assignment yet.But as toUTFxx() returns that something that looks like an unassigned string, I can't test for unassigned strings.If you really, really, want to test for "unassigned" strings - use .ptr: void main() { char[] s = ""; char[] d = null; assert(s.ptr != null); assert(d.ptr == null); } This is because the ptr of a string literal will point to a '\0' char.I still think that the toUTFxx() functions should return an empty string if an empty string was passed to them.There is *no* difference in D, between null and the empty string. They both have the length property set to 0, and they're equal. (not identical, though, so using "is" between them will fail)
Feb 10 2005
On Thu, 10 Feb 2005 15:21:21 +0100, Anders F Björklund <afb algonet.se> wrote:There is *no* difference in D, between null and the empty string.There is a difference, internally, but D treats them the same. Which is probably what you meant, but I'm just being thourough. :) A null string has ptr == null, an empty string has ptr == "". In some instances it is crucial to be able to tell these cases apart: 1- value does not exist (null) 2- value is blank (empty string) To check for case 1, we can go "if (s is null)" To check for case 2, we can go "if (s.length == 0)" eg. Simple example where it is important: User enters data into a text field (A) on a web page, leaves text field (B) blank, the code is saving the values of these two fields somewhere i.e. in a database containing 3 settings A, B and C. The presence of the emtpy field (B) on the page indicates any previous value for that setting should be overwritten with the empty value. The absense of the field (C) indicates that any previous value of the setting should not be overwritten but kept. Regan
Feb 10 2005
On Fri, 11 Feb 2005 10:05:06 +1300, Regan Heath wrote:On Thu, 10 Feb 2005 15:21:21 +0100, Anders F Björklund <afb algonet.se> wrote:Exactly! Well said. -- Derek Melbourne, Australia 11/02/2005 9:49:04 AMThere is *no* difference in D, between null and the empty string.There is a difference, internally, but D treats them the same. Which is probably what you meant, but I'm just being thourough. :) A null string has ptr == null, an empty string has ptr == "". In some instances it is crucial to be able to tell these cases apart: 1- value does not exist (null) 2- value is blank (empty string)
Feb 10 2005
Derek Parnell wrote:There is *no* difference in D, between null and the empty string.More or less, yes. But that's more of an Implementation Quirkâ„¢. The D specification explicitly says: http://www.digitalmars.com/d/arrays.htmlThere is a difference, internally, but D treats them the same. Which is probably what you meant, but I'm just being thourough. :)Array Initialization * Dynamic arrays are initialized to having 0 elements.http://www.digitalmars.com/d/cppstrings.htmlChecking For Empty Strings In D, an empty string is just null: char[] str; if (!str) // string is emptyBut in practice, they do differ - in the ptr to the '\0' (for C). (but both has a length property of 0, though, as mentioned earlier) And when you copy the char[], this ptr settings follows as well... This means that there is a way to trace if it has been set to "".But strings in D are not objects or pointers, they are arrays... And arrays are initialized to have the length zero, in the spec. Thus, that makes them similar to e.g. an integer that is initialized with a zero ? You will have to check if they are modified in some other way. Or just rely on the "string.ptr" value, since that will work as long as D supports calling C functions with string literals... But technically, there is no difference in D between "" and null. Which is probably why the standard library mixes them freely ? To recap: "" .length = 0 .ptr = &'\0' null .length = 0 .ptr = nullA null string has ptr == null, an empty string has ptr == "". In some instances it is crucial to be able to tell these cases apart: 1- value does not exist (null) 2- value is blank (empty string)Exactly! Well said.void main() { char[] emptystr = ""; char[] nullstr = null; assert(emptystr == nullstr); assert(!(emptystr is nullstr)); assert(emptystr.length == nullstr.length); assert(!(emptystr.ptr is nullstr.ptr)); }And the D standard library should probably be "fixed" to return null for null and "" for "" anyway, even if it not's in the spec ? Care to write a full unittest for it ? (at least for all of std.utf) --anders
Feb 11 2005
On Fri, 11 Feb 2005 17:54:45 +0100, Anders F Björklund <afb algonet.se> wrote:Derek Parnell wrote:Which worries me because I believe there is a real need to tell them apart. So, I ask that this behaviour be specified, or another method to achieve the same thing be specified.There is *no* difference in D, between null and the empty string.More or less, yes. But that's more of an Implementation Quirk™.There is a difference, internally, but D treats them the same. Which is probably what you meant, but I'm just being thourough. :)The D specification explicitly says: http://www.digitalmars.com/d/arrays.htmlSure, exactly what I said.Array Initialization * Dynamic arrays are initialized to having 0 elements.http://www.digitalmars.com/d/cppstrings.htmlChecking For Empty Strings In D, an empty string is just null: char[] str; if (!str) // string is emptyBut in practice, they do differ - in the ptr to the '\0' (for C). (but both has a length property of 0, though, as mentioned earlier)And when you copy the char[], this ptr settings follows as well... This means that there is a way to trace if it has been set to "".Yep, I want this behaviour to be specified. (or some other method to achieve what I want)And arrays appear to be value types containing a 'reference'. As in, arrays themselves cannot be null, but the reference in them can be.But strings in D are not objects or pointers, they are arrays...A null string has ptr == null, an empty string has ptr == "". In some instances it is crucial to be able to tell these cases apart: 1- value does not exist (null) 2- value is blank (empty string)Exactly! Well said.And arrays are initialized to have the length zero, in the spec. Thus, that makes them similar to e.g. an integer that is initialized with a zero ?I agree arrays are value types, as integers are. For a null string, the length is initialised to 0. For a "" string the length is initialised to the length of "", which happens to be 0. For a "abc" string the length is initialised to the length of "abc", which happens to be 3.You will have to check if they are modified in some other way. Or just rely on the "string.ptr" value, since that will work as long as D supports calling C functions with string literals...In C strings are pointers, and pointers can be null or point to a piece of memory which may contain a \0, so, in C there is a way to tell the 2 cases apart. In D arrays are value types containing a pointer/reference and a length. I firmly believe that loosing this ability for char[] would become a weakness in D, it would force me and others to resort to other methods to achieve it. I like the current behaviour, I just want to see it doesn't change.But technically, there is no difference in D between "" and null. Which is probably why the standard library mixes them freely ? To recap: "" .length = 0 .ptr = &'\0' null .length = 0 .ptr = nullYep, like I said.Definately. I've been saying null and "" can mean different things depending on the context, you seem to be agreeing, why are we arguing? :)void main() { char[] emptystr = ""; char[] nullstr = null; assert(emptystr == nullstr); assert(!(emptystr is nullstr)); assert(emptystr.length == nullstr.length); assert(!(emptystr.ptr is nullstr.ptr)); }And the D standard library should probably be "fixed" to return null for null and "" for "" anyway, even if it not's in the spec ?Care to write a full unittest for it ? (at least for all of std.utf)First we have to decide (on a per function basis) whether returning null or "" makes sense, or if in deed both make sense (for different reasons of course) i.e. null == failed, cannot convert, malfomed? "" == success, result really is "" Regan
Feb 13 2005