digitalmars.D - A char is also not an int
- Arcane Jill (18/21) May 27 2004 While we're on the subject of disunifying one type from another, may I p...
- Matthew (7/9) May 27 2004 Can one implicitly convert char to int?
- Benji Smith (8/12) May 27 2004 I don't even like the notion of being able to explicitly cast from a
- Kevin Bealer (14/27) May 27 2004 I think the opposite is true; with Unicode, the semantics CAN be solid. ...
- Stewart Gordon (10/21) May 27 2004 Not even in cryptography and the like?
- Walter (30/50) May 27 2004 point out
- James McComb (13/18) May 27 2004 I agree with you about chars Walter, but this is because I think chars
- Roberto Mariottini (16/37) May 28 2004 That's strange, because this is one of the reasons the makes me *like* P...
- Phill (2/2) May 28 2004 Roberto:
- Roberto Mariottini (3/5) May 31 2004 "French", "Italian" ? ;-)
-
Matthew
(2/21)
Jun 04 2004
But yet we cannot overload on single-b... - Derek Parnell (19/46) May 27 2004 Maybe... Another way of looking at is that a character has (at least) tw...
While we're on the subject of disunifying one type from another, may I point out that a char is also not an int. Back in the old days of C, there was no 8-bit wide type other than char, so if you wanted an 8-bit wide numeric type, you used a char. Similarly, in Java, there is no UNSIGNED 16-bit wide type other than char, so if that's what you need, you use char. D has no such problems, so maybe it's about time to make the distinction clear. Logically, it makes no sense to try to do addition and subtraction with the at-sign or the square-right-bracket symbol. We all KNOW that the zero glyph is *NOT* the same thing as the number 48. This was true even back in the days of ASCII, but it's even more true in Unicode. A char in D stores, not a character, but a fragment of UTF-8, an encoding of Unicode character - and even a Unicode character is /itself/ an encoding. There is no longer a one-to-one correspondance between character and glyph. (There IS such a one-to-one correspondence in the old ASCII range of \u0020 to \u007E, of course, since Unicode is a superset of ASCII). Perhaps it's time to change this one too?int a = 'X'; // wrong char a = 'X'; // right int a = cast(int) 'X' // rightArcane Jill
May 27 2004
While we're on the subject of disunifying one type from another, may I pointoutthat a char is also not an int.Can one implicitly convert char to int? Man, that sucks! Pardon my indignance by crediting my claim never to have tried it because I have a long-standing aversion to such things from C/C++. If it's true it needs to be made untrue ASAP. (Was that strong enough? I hope so ...)
May 27 2004
On Thu, 27 May 2004 07:16:19 +0000 (UTC), Arcane Jill <Arcane_member pathlink.com> wrote:Perhaps it's time to change this one too?I don't even like the notion of being able to explicitly cast from a char to an int. Especially in the case of unicode characters, the semantics of a cast (even an explicit cast) are not very well defined. Getting the int value of a character should, in my opinion, be the provice of a static method from a specific string class. --Benjiint a = 'X'; // wrong char a = 'X'; // right int a = cast(int) 'X' // right
May 27 2004
In article <fd8cb0dfge0cm85o781a2rjpp9ait6fskq 4ax.com>, Benji Smith says...On Thu, 27 May 2004 07:16:19 +0000 (UTC), Arcane Jill <Arcane_member pathlink.com> wrote:I think the opposite is true; with Unicode, the semantics CAN be solid. In a normal C program, this is not the case. Consider: int chA = 'A'; int chZ = 'Z'; if ((chZ - chA) == 25) { // Is this true for EBCDIC? I dunno. } In C, the encoding is assumed to be the default system architecture encoding, which is not necessarily Unicode or ASCII. But, if the language DEFINES unicode as the operative representation, then the value 'A' should always be the same integer value. In any case, sometimes you need the integer value. KevinPerhaps it's time to change this one too?I don't even like the notion of being able to explicitly cast from a char to an int. Especially in the case of unicode characters, the semantics of a cast (even an explicit cast) are not very well defined. Getting the int value of a character should, in my opinion, be the provice of a static method from a specific string class. --Benjiint a = 'X'; // wrong char a = 'X'; // right int a = cast(int) 'X' // right
May 27 2004
Arcane Jill wrote: <snip>D has no such problems, so maybe it's about time to make the distinction clear. Logically, it makes no sense to try to do addition and subtraction with the at-sign or the square-right-bracket symbol.Not even in cryptography and the like?We all KNOW that the zero glyph is *NOT* the same thing as the number 48. This was true even back in the days of ASCII, but it's even more true in Unicode. A char in D stores, not a character, but a fragment of UTF-8, an encoding of Unicode character - and even a Unicode character is /itself/ an encoding. There is no longer a one-to-one correspondance between character and glyph.<snip> By 'character' do you mean 'character' or 'char value'? Stewart. -- My e-mail is valid but not my primary mailbox, aside from its being the unfortunate victim of intensive mail-bombing at the moment. Please keep replies on the 'group where everyone may benefit.
May 27 2004
"Arcane Jill" <Arcane_member pathlink.com> wrote in message news:c944k3$1o53$1 digitaldaemon.com...While we're on the subject of disunifying one type from another, may Ipoint outthat a char is also not an int. Back in the old days of C, there was no 8-bit wide type other than char,so ifyou wanted an 8-bit wide numeric type, you used a char. Similarly, in Java, there is no UNSIGNED 16-bit wide type other than char,so ifthat's what you need, you use char. D has no such problems, so maybe it's about time to make the distinctionclear.Logically, it makes no sense to try to do addition and subtraction withtheat-sign or the square-right-bracket symbol. We all KNOW that the zeroglyph is*NOT* the same thing as the number 48. This was true even back in the days of ASCII, but it's even more true in Unicode. A char in D stores, not a character, but a fragment of UTF-8, an encoding of Unicode character - and even a Unicode character is /itself/anencoding. There is no longer a one-to-one correspondance between characterandglyph. (There IS such a one-to-one correspondence in the old ASCII rangeof\u0020 to \u007E, of course, since Unicode is a superset of ASCII). Perhaps it's time to change this one too?I understand where you're coming from, and this is a compelling idea, but this idea has been tried out before in Pascal. And I can say from personal experience it is one reason I hate Pascal <g>. Chars do want to be integral data types, and requiring a cast for it leads to execrably ugly expressions filled with casts. In moving to C, one of the breaths of fresh air was to not need all those %^&*^^% casts any more. Let me enumerate a few ways that chars are used as integral types: 1) converting case 2) using char as index into a translation table 3) encoding/decoding UTF strings 4) encryption/decryption software 5) compression code 6) hashing 7) regex internal implementation 8) char value as input to a state machine like a lexer 9) encoding/decoding strings to/from integers in other words, routine system programming tasks. The improvement D has, however, is to have chars be a separate type from byte, which makes for better self-documenting code, and one can have different overloads for them.int a = 'X'; // wrong char a = 'X'; // right int a = cast(int) 'X' // right
May 27 2004
Walter wrote:I understand where you're coming from, and this is a compelling idea, but this idea has been tried out before in Pascal. And I can say from personal experience it is one reason I hate Pascal <g>. Chars do want to be integral data types, and requiring a cast for it leads to execrably ugly expressions filled with casts.I agree with you about chars Walter, but this is because I think chars are different from bools. The way I see it, bools can be either TRUE or FALSE, and these values not numeric. TRUE + 32 is not defined. (Of course, bools will be *implemented* as numeric values, but I'm talking about syntax.) But character standards, such as ASCII and Unicode, *define* characters as numeric quantities. ASCII *defines* A to be 65. So characters really are numeric. 'A' + 32 equals 'a'. This behaviour is well-defined. So I'd like to have a proper bool type, but I'd prefer D chars to remain as they are. James McComb
May 27 2004
In article <c95al5$19mr$1 digitaldaemon.com>, Walter says...I understand where you're coming from, and this is a compelling idea, but this idea has been tried out before in Pascal. And I can say from personal experience it is one reason I hate Pascal <g>.That's strange, because this is one of the reasons the makes me *like* Pascal :-)Chars do want to be integral data types, and requiring a cast for it leads to execrably ugly expressions filled with casts. In moving to C, one of the breaths of fresh air was to not need all those %^&*^^% casts any more.In my experience, only poor programming practice leads to manu int <-> char casts.Let me enumerate a few ways that chars are used as integral types: 1) converting caseThis is true only for English. Real natural languages are more complex than this, needing collating tables. I don't know about non-latin alphabets.2) using char as index into a translation tabletype a: array['a'..'z'] of 'A'..'Z'; b: array[char] of char;3) encoding/decoding UTF strings 4) encryption/decryption software 5) compression code 6) hashing 7) regex internal implementationThis is something you just won't do frequently, once they are in a library. Simply converting all input to integers and reconverting the final output to chars should work.8) char value as input to a state machine like a lexer 9) encoding/decoding strings to/from integersI don't see the point here.in other words, routine system programming tasks. The improvement D has, however, is to have chars be a separate type from byte, which makes for better self-documenting code, and one can have different overloads for them.This is better than nothing :-) Ciao
May 28 2004
Roberto: Can you explain what you mean by "Real natural languages"?
May 28 2004
In article <c99c0u$12gr$1 digitaldaemon.com>, Phill says...Roberto: Can you explain what you mean by "Real natural languages"?"French", "Italian" ? ;-) Ciao
May 31 2004
I understand where you're coming from, and this is a compelling idea, but this idea has been tried out before in Pascal. And I can say from personal experience it is one reason I hate Pascal <g>. Chars do want to be integral data types, and requiring a cast for it leads to execrably ugly expressions filled with casts. In moving to C, one of the breaths of fresh air was to not need all those %^&*^^% casts any more. Let me enumerate a few ways that chars are used as integral types: 1) converting case 2) using char as index into a translation table 3) encoding/decoding UTF strings 4) encryption/decryption software 5) compression code 6) hashing 7) regex internal implementation 8) char value as input to a state machine like a lexer 9) encoding/decoding strings to/from integers in other words, routine system programming tasks. The improvement D has, however, is to have chars be a separate type from byte, which makes for better self-documenting code, and one can have different overloads for them.<Horse state="dead" action="flog">But yet we cannot overload on single-bit integrals and boolean values!</Horse>
Jun 04 2004
On Thu, 27 May 2004 07:16:19 +0000 (UTC), Arcane Jill wrote:While we're on the subject of disunifying one type from another, may I point out that a char is also not an int. Back in the old days of C, there was no 8-bit wide type other than char, so if you wanted an 8-bit wide numeric type, you used a char. Similarly, in Java, there is no UNSIGNED 16-bit wide type other than char, so if that's what you need, you use char. D has no such problems, so maybe it's about time to make the distinction clear. Logically, it makes no sense to try to do addition and subtraction with the at-sign or the square-right-bracket symbol. We all KNOW that the zero glyph is *NOT* the same thing as the number 48. This was true even back in the days of ASCII, but it's even more true in Unicode. A char in D stores, not a character, but a fragment of UTF-8, an encoding of Unicode character - and even a Unicode character is /itself/ an encoding. There is no longer a one-to-one correspondance between character and glyph. (There IS such a one-to-one correspondence in the old ASCII range of \u0020 to \u007E, of course, since Unicode is a superset of ASCII). Perhaps it's time to change this one too?Maybe... Another way of looking at is that a character has (at least) two properties: a Glyph and an Identifer. Within an encoding set (eg. Unicode, ASCII, EBCDIC, ...), no two characters have the same identifier even though they may have the same glyph (eg. Space and Non-Breaking Space). One may then argue that an efficient datatype for the identier is an unsigned integer value. This makes it simple to be used as an index into a glyph table. In fact, an encoding set is like to have multiple glyph tables for various font representations, but that is another issue all together. So, an implicit cast from char to int would be just getting the character's identifier value, which is not such a bad thing. What is a bad thing is making assumptions about the relationships between character identifers. There is no necessary correlation between an character set's collation sequence and the characters' identifiers. I frequently work with encryption algorithms, and integer character identifiers are a *very* handy thing indeed. -- Derek 28/May/04 10:50:16 AMint a = 'X'; // wrong char a = 'X'; // right int a = cast(int) 'X' // right
May 27 2004