www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - toString issue

reply Johan Granberg <lijat.meREM OVEgmail.com> writes:
As a result of the discussion about char[] above I have been converting 
some of my code from dchar[] to char[], but that reminded me of an issue 
i have with the current state of phobos. in object their is the method 
toString that happened to have the same name as the COMMONLY used 
function std.string.toString this causes objects toString to shadow 
std.strings to string inside class methods. I know that FQN can be used 
as a workaround but it makes the code unnecessary hard to read and I 
think that name clashes such as this should be avoided in the standard 
library.

PROPOSAL. change all methods in object to have some prefix for to string 
I suggest opString as the op prefix is already in use.
Sep 29 2006
next sibling parent reply Vladimir Kulev <me lightoze.net> writes:
Johan Granberg wrote:
 PROPOSAL. change all methods in object to have some prefix for to string
 I suggest opString as the op prefix is already in use.
I agree, and the same about toHash. Naming consistency is the right thing.
Sep 30 2006
parent reply Hasan Aljudy <hasan.aljudy gmail.com> writes:
Vladimir Kulev wrote:
 Johan Granberg wrote:
 PROPOSAL. change all methods in object to have some prefix for to string
 I suggest opString as the op prefix is already in use.
I agree, and the same about toHash. Naming consistency is the right thing.
I totally disagree, what consistency are you talking about? toString and toHash are *not* operators, so prefixing them with op is misleading and inconsistent.
Sep 30 2006
next sibling parent Johan Granberg <lijat.meREM OVEgmail.com> writes:
Hasan Aljudy wrote:
 
 
 Vladimir Kulev wrote:
 Johan Granberg wrote:
 PROPOSAL. change all methods in object to have some prefix for to string
 I suggest opString as the op prefix is already in use.
I agree, and the same about toHash. Naming consistency is the right thing.
I totally disagree, what consistency are you talking about? toString and toHash are *not* operators, so prefixing them with op is misleading and inconsistent.
The prefix is not important the name collision issue is. The problem is that two commonly used identifiers collide and the use of an op prefix is one way to solve that (and would open up fore making them operator if desired at some later time)
Sep 30 2006
prev sibling parent Vladimir Kulev <me lightoze.net> writes:
Hasan Aljudy wrote:
 I totally disagree, what consistency are you talking about?
 toString and toHash are *not* operators, so prefixing them with op is
 misleading and inconsistent.
This methods are implied for all objects, so you can use them as well as other unary operators like ~, excepting there are no special symbols for them. Anyway, Object.toString and std.string.toString collision should be resolved, and renaming second one is also suitable for me.
Sep 30 2006
prev sibling parent reply Sean Kelly <sean f4.ca> writes:
Johan Granberg wrote:
 As a result of the discussion about char[] above I have been converting 
 some of my code from dchar[] to char[], but that reminded me of an issue 
 i have with the current state of phobos. in object their is the method 
 toString that happened to have the same name as the COMMONLY used 
 function std.string.toString this causes objects toString to shadow 
 std.strings to string inside class methods. I know that FQN can be used 
 as a workaround but it makes the code unnecessary hard to read and I 
 think that name clashes such as this should be avoided in the standard 
 library.
 
 PROPOSAL. change all methods in object to have some prefix for to string 
 I suggest opString as the op prefix is already in use.
How about toUtf8() for classes and structs :-) Sean
Sep 30 2006
next sibling parent reply Chris Nicholson-Sauls <ibisbasenji gmail.com> writes:
Sean Kelly wrote:
 Johan Granberg wrote:
 
 As a result of the discussion about char[] above I have been 
 converting some of my code from dchar[] to char[], but that reminded 
 me of an issue i have with the current state of phobos. in object 
 their is the method toString that happened to have the same name as 
 the COMMONLY used function std.string.toString this causes objects 
 toString to shadow std.strings to string inside class methods. I know 
 that FQN can be used as a workaround but it makes the code unnecessary 
 hard to read and I think that name clashes such as this should be 
 avoided in the standard library.

 PROPOSAL. change all methods in object to have some prefix for to 
 string I suggest opString as the op prefix is already in use.
How about toUtf8() for classes and structs :-) Sean
Gets my vote. Note that Mango classes typically already do this (with toString just calling toUtf8 in most cases), and provide toUtf16/toUtf32 counterparts. It is indeed effective. :) -- Chris Nicholson-Sauls
Sep 30 2006
parent Charlie <charlies nowhere.com> writes:
Gets my vote too , it's also more descriptive than 'toString' .

Chris Nicholson-Sauls wrote:
 Sean Kelly wrote:
 Johan Granberg wrote:

 As a result of the discussion about char[] above I have been 
 converting some of my code from dchar[] to char[], but that reminded 
 me of an issue i have with the current state of phobos. in object 
 their is the method toString that happened to have the same name as 
 the COMMONLY used function std.string.toString this causes objects 
 toString to shadow std.strings to string inside class methods. I know 
 that FQN can be used as a workaround but it makes the code 
 unnecessary hard to read and I think that name clashes such as this 
 should be avoided in the standard library.

 PROPOSAL. change all methods in object to have some prefix for to 
 string I suggest opString as the op prefix is already in use.
How about toUtf8() for classes and structs :-) Sean
Gets my vote. Note that Mango classes typically already do this (with toString just calling toUtf8 in most cases), and provide toUtf16/toUtf32 counterparts. It is indeed effective. :) -- Chris Nicholson-Sauls
Oct 01 2006
prev sibling parent reply Hasan Aljudy <hasan.aljudy gmail.com> writes:
Sean Kelly wrote:
 
 How about toUtf8() for classes and structs :-)
 
 
 Sean
I think there's a fundamental problem with the way D deals with strings. The spec claims that D natively supports strings through char[], at the same time, claims that D fully supports Unicode. The fundamental issue is that UTF-8 is one encoding for Unicode strings, but it's not always the best choice. Phobos mostly only deals with char[], and mixing code that uses wchar[] with code that uses char[] isn't very straight forward. Consider the simple case of reading a text file and detecting "words". To detect a word, you must first recognize letters, no .. not English letters; letters of any language, and for that purpose, we have isUniAlpha function. Now, If you encode the string as char[], then how are you gonna determine whether or not the next character is a Unicode alpha or not? The following definitely shouldn't work: //assuming text is char[] for( int i = 0; i < text.length; i++ ) { bool isLetter = isUniAlpha( text[i] ); .... } because isUniAlpha takes a dchar parameter, and of course, because a single char doesn't necessarily encode a Unicode character just by itself; if you're dealing with non-English text, then most likely a single char will only hold half the encoding for that letter. Surprisingly, the compiler allows this kind of code, but that's not the point. The point is, this code will never work, because char[] is not a very good way to hold a Unicode string. Of course there are ways around this, but they are still just "workarounds". Should you choose wchar[] (or dchar[]) to represent strings, you will get into all kinds of troubles dealing with phobos. The standard library always deals with strings using char[], this includes std.string and std.regexp, and even the Exception class. So, if you're using wchar[] to represent strings, and you want to throw an exception, you can't just say: because the compiler will complain (can't cast wchar[] to char[]), so you'll need toUtf8( myString ), and you're code can quickly become full of calls to toUtf* functions. Personally, I think D needs a proper String class built into the language and the standard library. or at least, casting between the different encodings should be seamless to the coder; just let the compiler call the appropriate toUtf* function and allow implicit casting.
Oct 01 2006
parent reply Derek Parnell <derek nomail.afraid.org> writes:
On Mon, 02 Oct 2006 00:52:44 -0600, Hasan Aljudy wrote:

 Sean Kelly wrote:
 
 How about toUtf8() for classes and structs :-)
 
 Sean
I think there's a fundamental problem with the way D deals with strings. The spec claims that D natively supports strings through char[], at the same time, claims that D fully supports Unicode. The fundamental issue is that UTF-8 is one encoding for Unicode strings, but it's not always the best choice. Phobos mostly only deals with char[], and mixing code that uses wchar[] with code that uses char[] isn't very straight forward. Consider the simple case of reading a text file and detecting "words". To detect a word, you must first recognize letters, no .. not English letters; letters of any language, and for that purpose, we have isUniAlpha function. Now, If you encode the string as char[], then how are you gonna determine whether or not the next character is a Unicode alpha or not? The following definitely shouldn't work: //assuming text is char[] for( int i = 0; i < text.length; i++ ) { bool isLetter = isUniAlpha( text[i] ); .... }
foreach(int i, dchar c; text) { bool isLetter = isUniAlpha( c ); ... } -- Derek (skype: derek.j.parnell) Melbourne, Australia "Down with mediocrity!" 2/10/2006 5:10:26 PM
Oct 02 2006
parent reply Hasan Aljudy <hasan.aljudy gmail.com> writes:
Derek Parnell wrote:
 On Mon, 02 Oct 2006 00:52:44 -0600, Hasan Aljudy wrote:
 
 Sean Kelly wrote:
 How about toUtf8() for classes and structs :-)

 Sean
I think there's a fundamental problem with the way D deals with strings. The spec claims that D natively supports strings through char[], at the same time, claims that D fully supports Unicode. The fundamental issue is that UTF-8 is one encoding for Unicode strings, but it's not always the best choice. Phobos mostly only deals with char[], and mixing code that uses wchar[] with code that uses char[] isn't very straight forward. Consider the simple case of reading a text file and detecting "words". To detect a word, you must first recognize letters, no .. not English letters; letters of any language, and for that purpose, we have isUniAlpha function. Now, If you encode the string as char[], then how are you gonna determine whether or not the next character is a Unicode alpha or not? The following definitely shouldn't work: //assuming text is char[] for( int i = 0; i < text.length; i++ ) { bool isLetter = isUniAlpha( text[i] ); .... }
foreach(int i, dchar c; text) { bool isLetter = isUniAlpha( c ); ... }
I know, but that's still a work-around. What if you need to iterate back and forth? You're gonna need to convert it to dchar[] (or wchar[]). However, that brings up a good point: Notice how foreach allows to iterate a string by Unicode characters (a.k.a code-points)? Shouldn't this kind of iteration be supported outside of foreach as well? Sure I know, you can write you're own String class and even an iterator, but that just proves that string support isn't really/fully built-in.
Oct 02 2006
parent reply Oskar Linde <oskar.lindeREM OVEgmail.com> writes:
Hasan Aljudy wrote:
 Derek Parnell wrote:
   foreach(int i, dchar c; text)
   {
        bool isLetter = isUniAlpha( c );
        ...
   }
I know, but that's still a work-around. What if you need to iterate back and forth? You're gonna need to convert it to dchar[] (or wchar[]). However, that brings up a good point: Notice how foreach allows to iterate a string by Unicode characters (a.k.a code-points)? Shouldn't this kind of iteration be supported outside of foreach as well?
see std.utf.decode and std.utf.stride. /Oskar
Oct 02 2006
parent Hasan Aljudy <hasan.aljudy gmail.com> writes:
Oskar Linde wrote:
 Hasan Aljudy wrote:
 Derek Parnell wrote:
   foreach(int i, dchar c; text)
   {
        bool isLetter = isUniAlpha( c );
        ...
   }
I know, but that's still a work-around. What if you need to iterate back and forth? You're gonna need to convert it to dchar[] (or wchar[]). However, that brings up a good point: Notice how foreach allows to iterate a string by Unicode characters (a.k.a code-points)? Shouldn't this kind of iteration be supported outside of foreach as well?
see std.utf.decode and std.utf.stride. /Oskar
I have .. and I know the functions are all there. but hey, the C standard library also has all sorts of string processing functions. I'm talking about the "built-in" string type, which doesn't really exist, even though the spec claims it does.
Oct 02 2006