D - D is too english centric
- Martin M. Pedersen (20/20) May 27 2003 Hi,
- Walter (6/13) May 27 2003 No, only characters that fall into certain unicode ranges.
- Martin M. Pedersen (7/12) May 28 2003 I haven't found that, but I you are the export, so I believe you . It ma...
- Walter (6/13) May 28 2003 makes
- Burton Radons (20/41) May 28 2003 This could be more easily done by encoding into UTF-8 and assuming any
- Walter (15/38) May 28 2003 character
- Bill Cox (4/56) May 29 2003 I'll put in a vote for UTF-8 support. It seems to have the best chance
- Benji Smith (3/59) May 29 2003 I agree. Source should be UTF-8.
- Martin M. Pedersen (13/18) May 29 2003 Another way of resolving this would be to give the programmer control of...
- Ilya Minkov (32/49) May 28 2003 Hello, i believe there was a flamewar to this topic a few months ago, st...
- Martin M. Pedersen (39/45) May 28 2003 starting
- Bill Cox (6/8) May 28 2003 The latest version of Vim supports UTF-8. However, it requires a kernel...
- Bill Cox (3/17) May 28 2003 Err.... I read your post a little more carefully... I don't know of any...
- Georg Wrede (22/30) May 28 2003 Back in the bad old days, before MSDOS, we all used CP/M.
- Martin M. Pedersen (5/9) May 28 2003 So do we. Yet there are exceptions. If the customer pays us to develop a...
-
Walter
(3/5)
May 28 2003
Yup. Listen to the customers, not the marketing department
. - Mark Evans (4/4) May 28 2003 I agree that D is too English-centric (even ASCII-centric).
- Mark Evans (3/3) May 28 2003 Actually I still think that link compatibility with Digital Mars C++ wou...
- Mark T (3/7) May 29 2003 I don't think there is a full implementation of C99 yet. It was adopted ...
- Martin M. Pedersen (7/10) May 30 2003 late
Hi, I have noted that C99 allows *any* unicode character to be used in identifiers using \u. The D specification limits characters in identifiers to letters, digits, and '_', but does not even define what a letter is. The DMD implementation defines a letter to be ['A'..'Z', 'a'..'z']. I find this unfortunate, and in contrast to the one of the main goals of D: Link compability with C. It has previously been argued, that only english should be used for identifiers in order to support reuse better across language boundaries. But that argument isn't always valid. For example, half a decade ago, I was involved in building the IT-infrastructure for a nation-wide real estate network. One of the requirements was that *everything* was in dansh.. It involved lots of developers nation-wide, but noone outside Denmark. Of cause, identifiers couldn't be fully danish - and thereby introduced inconsistency in how things was names. But that was only a limitation of C back than, which might not be an issue a few years from now. If D has this limitation, it might be a valid reason to deselect D in favor of other languages. After all, english is only the native language of a miniority. Regards, Martin M. Pedersen
May 27 2003
"Martin M. Pedersen" <mmp www.moeller-pedersen.dk> wrote in message news:bb0sqs$1t1k$1 digitaldaemon.com...I have noted that C99 allows *any* unicode character to be used in identifiers using \u.No, only characters that fall into certain unicode ranges.The D specification limits characters in identifiers to letters, digits, and '_', but does not even define what a letter is.TheDMD implementation defines a letter to be ['A'..'Z', 'a'..'z']. I find this unfortunate, and in contrast to the one of the main goals ofD:Link compability with C.It's a good idea to change it to match C for the reasons you state.
May 27 2003
"Walter" <walter digitalmars.com> wrote in message news:bb1c8v$2e2l$1 digitaldaemon.com...I haven't found that, but I you are the export, so I believe you . It makes sense too.I have noted that C99 allows *any* unicode character to be used in identifiers using \u.No, only characters that fall into certain unicode ranges.I'm glad we are in line here :-) Regard, Martin M. PedersenLink compability with C.It's a good idea to change it to match C for the reasons you state.
May 28 2003
"Martin M. Pedersen" <mmp www.moeller-pedersen.dk> wrote in message news:bb2hou$oas$1 digitaldaemon.com..."Walter" <walter digitalmars.com> wrote in message news:bb1c8v$2e2l$1 digitaldaemon.com...makesI haven't found that, but I you are the export, so I believe you . ItI have noted that C99 allows *any* unicode character to be used in identifiers using \u.No, only characters that fall into certain unicode ranges.sense too."Each universal character name in an identifier shall designate a character whose encoding in ISO/IEC 10646 falls into one of the ranges specified in annex D." C99 6.4.2.1-3
May 28 2003
Walter wrote:"Martin M. Pedersen" <mmp www.moeller-pedersen.dk> wrote in message news:bb2hou$oas$1 digitaldaemon.com...This could be more easily done by encoding into UTF-8 and assuming any byte with the eighth bit set is an identifier. It allows weird obfuscations, yes, but why care about that? I won't write code that uses one of UNICODE's whitespace characters, and anyone whose code would be worth use by me would also not abuse it. At worst it'd be one of those features that kids get into abusing before they smarten up. C99's decision itself looks pretty bad. I'd use \u escapes for codes which I don't WANT rendered because either they have no rendering (whitespaces), because they would screw up rendering (controls), don't have a rendering in my code-writing font, or have special numeric significance. Whether this feature is implemented by any compilers and editors is certainly important to Martin's stated requirements. If his clients can't read the code he's written, he hasn't fulfilled his contract. Much more successful would be to use an encoding like UTF-8 or one of the BOM'd encodings D supports; all programs developed for Finns will surely render that. If it develops that C gets a link standard for UNICODE identifiers, then that can be emulated when mangling extern (C). There's no cause for following C99 exactly in the code itself."Walter" <walter digitalmars.com> wrote in message news:bb1c8v$2e2l$1 digitaldaemon.com...makesI haven't found that, but I you are the export, so I believe you . ItI have noted that C99 allows *any* unicode character to be used in identifiers using \u.No, only characters that fall into certain unicode ranges.sense too."Each universal character name in an identifier shall designate a character whose encoding in ISO/IEC 10646 falls into one of the ranges specified in annex D." C99 6.4.2.1-3
May 28 2003
"Burton Radons" <loth users.sourceforge.net> wrote in message news:bb3s9f$29qv$1 digitaldaemon.com...Walter wrote:character"Each universal character name in an identifier shall designate ainwhose encoding in ISO/IEC 10646 falls into one of the ranges specifiedThis is C's third attempt at internationalizing C source code. In 15 years I have yet to see any C source outside of a test suite that used trigraphs or digraphs. I'm skeptical the \u scheme will catch on, either. I think the best way is to simply declare that the source text is UTF-8, UTF-16, or UTF-32. D already recognizes and automatically handles all three. Then, it is simply a matter of deciding which unicode characters to allow as identifiers and whitespace. The advantage of that is you can edit the source in any text editor that supports unicode if you want to use more than ascii. There is no need for any special editors that recognize trigraphs, digraphs, or on-the-fly \u translation.annex D." C99 6.4.2.1-3This could be more easily done by encoding into UTF-8 and assuming any byte with the eighth bit set is an identifier. It allows weird obfuscations, yes, but why care about that? I won't write code that uses one of UNICODE's whitespace characters, and anyone whose code would be worth use by me would also not abuse it. At worst it'd be one of those features that kids get into abusing before they smarten up. C99's decision itself looks pretty bad. I'd use \u escapes for codes which I don't WANT rendered because either they have no rendering (whitespaces), because they would screw up rendering (controls), don't have a rendering in my code-writing font, or have special numeric significance. Whether this feature is implemented by any compilers and editors is certainly important to Martin's stated requirements. If his clients can't read the code he's written, he hasn't fulfilled his contract. Much more successful would be to use an encoding like UTF-8 or one of the BOM'd encodings D supports; all programs developed for Finns will surely render that. If it develops that C gets a link standard for UNICODE identifiers, then that can be emulated when mangling extern (C). There's no cause for following C99 exactly in the code itself.
May 28 2003
I'll put in a vote for UTF-8 support. It seems to have the best chance of getting support from Linux IDEs and debuggers. Bill Walter wrote:"Burton Radons" <loth users.sourceforge.net> wrote in message news:bb3s9f$29qv$1 digitaldaemon.com...Walter wrote:character"Each universal character name in an identifier shall designate ainwhose encoding in ISO/IEC 10646 falls into one of the ranges specifiedThis is C's third attempt at internationalizing C source code. In 15 years I have yet to see any C source outside of a test suite that used trigraphs or digraphs. I'm skeptical the \u scheme will catch on, either. I think the best way is to simply declare that the source text is UTF-8, UTF-16, or UTF-32. D already recognizes and automatically handles all three. Then, it is simply a matter of deciding which unicode characters to allow as identifiers and whitespace. The advantage of that is you can edit the source in any text editor that supports unicode if you want to use more than ascii. There is no need for any special editors that recognize trigraphs, digraphs, or on-the-fly \u translation.annex D." C99 6.4.2.1-3This could be more easily done by encoding into UTF-8 and assuming any byte with the eighth bit set is an identifier. It allows weird obfuscations, yes, but why care about that? I won't write code that uses one of UNICODE's whitespace characters, and anyone whose code would be worth use by me would also not abuse it. At worst it'd be one of those features that kids get into abusing before they smarten up. C99's decision itself looks pretty bad. I'd use \u escapes for codes which I don't WANT rendered because either they have no rendering (whitespaces), because they would screw up rendering (controls), don't have a rendering in my code-writing font, or have special numeric significance. Whether this feature is implemented by any compilers and editors is certainly important to Martin's stated requirements. If his clients can't read the code he's written, he hasn't fulfilled his contract. Much more successful would be to use an encoding like UTF-8 or one of the BOM'd encodings D supports; all programs developed for Finns will surely render that. If it develops that C gets a link standard for UNICODE identifiers, then that can be emulated when mangling extern (C). There's no cause for following C99 exactly in the code itself.
May 29 2003
I agree. Source should be UTF-8. --Benji In article <3ED5FFE7.3040100 viasic.com>, Bill Cox says...I'll put in a vote for UTF-8 support. It seems to have the best chance of getting support from Linux IDEs and debuggers. Bill Walter wrote:"Burton Radons" <loth users.sourceforge.net> wrote in message news:bb3s9f$29qv$1 digitaldaemon.com...Walter wrote:character"Each universal character name in an identifier shall designate ainwhose encoding in ISO/IEC 10646 falls into one of the ranges specifiedThis is C's third attempt at internationalizing C source code. In 15 years I have yet to see any C source outside of a test suite that used trigraphs or digraphs. I'm skeptical the \u scheme will catch on, either. I think the best way is to simply declare that the source text is UTF-8, UTF-16, or UTF-32. D already recognizes and automatically handles all three. Then, it is simply a matter of deciding which unicode characters to allow as identifiers and whitespace. The advantage of that is you can edit the source in any text editor that supports unicode if you want to use more than ascii. There is no need for any special editors that recognize trigraphs, digraphs, or on-the-fly \u translation.annex D." C99 6.4.2.1-3This could be more easily done by encoding into UTF-8 and assuming any byte with the eighth bit set is an identifier. It allows weird obfuscations, yes, but why care about that? I won't write code that uses one of UNICODE's whitespace characters, and anyone whose code would be worth use by me would also not abuse it. At worst it'd be one of those features that kids get into abusing before they smarten up. C99's decision itself looks pretty bad. I'd use \u escapes for codes which I don't WANT rendered because either they have no rendering (whitespaces), because they would screw up rendering (controls), don't have a rendering in my code-writing font, or have special numeric significance. Whether this feature is implemented by any compilers and editors is certainly important to Martin's stated requirements. If his clients can't read the code he's written, he hasn't fulfilled his contract. Much more successful would be to use an encoding like UTF-8 or one of the BOM'd encodings D supports; all programs developed for Finns will surely render that. If it develops that C gets a link standard for UNICODE identifiers, then that can be emulated when mangling extern (C). There's no cause for following C99 exactly in the code itself.
May 29 2003
"Walter" <walter digitalmars.com> wrote in message news:bb1c8v$2e2l$1 digitaldaemon.com...Another way of resolving this would be to give the programmer control of the external identifer. Something like this: extern (C) { extern("foo\u4444") void foo() { bar(); } extern("bar\u4444") void bar(); } That would also allow us to access mangled C++ identifiers, and identifiers containing '$'. It would not be easy, but that is not what I ask for. I only want it to be possible. Regards, Martin M. PedersenDMD implementation defines a letter to be ['A'..'Z', 'a'..'z']. I find this unfortunate, and in contrast to the one of the main goals ofD:Link compability with C.It's a good idea to change it to match C for the reasons you state.
May 29 2003
In article <bb0sqs$1t1k$1 digitaldaemon.com>, Martin M. Pedersen says...It has previously been argued, that only english should be used for identifiers in order to support reuse better across language boundaries. But that argument isn't always valid. For example, half a decade ago, I was involved in building the IT-infrastructure for a nation-wide real estate network. One of the requirements was that *everything* was in dansh.. It involved lots of developers nation-wide, but noone outside Denmark. Of cause, identifiers couldn't be fully danish - and thereby introduced inconsistency in how things was names. But that was only a limitation of C back than, which might not be an issue a few years from now. If D has this limitation, it might be a valid reason to deselect D in favor of other languages. After all, english is only the native language of a miniority.Hello, i believe there was a flamewar to this topic a few months ago, starting from an old 1st april joke article from Bjarne Stroustrup about adding unicode identifiers to C++. I believe that most people on this newsgroup are not native english speakers. And nontheless, the idea has found very little support, since: - for almost any language, a transliteration scheme exists which approximates the language in terms of latin alphabet; - keywords are english anyway, and in D there is no preprocessor to un-english them. :) Using any language other than english would yuild to inclonsistency anyway. - i know quite a number of languages, but i have tremendous problems switching between them. It may take minutes every time. And having seen a single english keyword, i start thinking in english and you can be sure of all my subsequent comments to be in english. Then, i also cant't read both code and comments simultaneously. So i have to translate the comments into english to get going. I even refuse to use any code with comments in my native language. I believe there are plenty of people experiencing the same problem. So, if you *really* want to mix your native language into a project, why don't you write a scanner, which would: - translate keywords from your language into D; - transliterate all other identifiers into latin letters. This would basically be an extended version of a lexer, and lexing D is really simple. Besides, there's a good readymade lexer to borrow. :)I have noted that C99 allows *any* unicode character to be used in identifiers using \u. The D specification limits characters in identifiers to letters, digits, and '_', but does not even define what a letter is. The DMD implementation defines a letter to be ['A'..'Z', 'a'..'z'].It is defined in the library. :>I find this unfortunate, and in contrast to the one of the main goals of D: Link compability with C.I have not seen a single piece of code using this silly feature. Is there any programmer's editor which has \u unicode support as of yet? And any IDE? I would also like to see how many compilers implement that - and in what manner. Even if some does, it would probably be incompatible with that of other compilers. So would you say, C violates the requierement of link compatibility with itself as well? :> -i.
May 28 2003
"Ilya Minkov" <Ilya_member pathlink.com> wrote in message news:bb2cup$in4$1 digitaldaemon.com...Hello, i believe there was a flamewar to this topic a few months ago,startingfrom an old 1st april joke article from Bjarne Stroustrup about addingunicodeidentifiers to C++.I don't want to get into a flamewar, and I don't want to argue against your preferences for using english. My point is simply that sometimes it is not a choice one can make. For example, if you are supplied with libraries using unicode identifiers, that you are required to use. If it is necessary to wrap such functions in other C code, D cannot be said to be link compatible with C (C99). Likewise, you might also be required to implement an interface using such identifiers.I have not seen a single piece of code using this silly feature.That is not really an argument. The feature exists, and will get support by compilers as times go by. Silly or not, compilers cannot be said to be C99 compliant if they do not support it. Any serious compiler vendor will go in that direction. And some will use this feature - there must have been a reason for its introduction.Is there any programmer's editor which has \u unicode support as of yet?And any IDE? They don't have to, as I read the document. They only need to support editing unicode. Translation phase 1 is: "Physical source file multibyte characters are mapped to the source character set (introducing new-line characters for end-of-line indicators) if necessary. Trigraph sequences are replaced by corresponding single-character internal representation." I believe this is also how DMD does things (except the trigraph stuff) - maps unicode chars \u-sequences, that is.I would also like to see how many compilers implement that - and in whatmanner. I don't know if the ABI is completely standardized, but the translation limits chapter gives me a clue how it is to be done: "31 significant initial characters in an external identifier (each universal character name specifying a character short identifier of 0000FFFF or less is considered 6 characters, each universal character name specifying a character short identifier of 00010000 or more is considered 10 characters, and each extended source character is considered the same number of characters as the corresponding universal character name, if any)" The numbers 6 and 10 indicates to me, that they will be encoded using "\uXXXX" and "\uXXXXXXXX" or something very similar. But that is only a guess. Regards, Martin M. Pedersen
May 28 2003
Hi, Ilya.I have not seen a single piece of code using this silly feature. Is there any programmer's editor which has \u unicode support as of yet? And any IDE?The latest version of Vim supports UTF-8. However, it requires a kernel patch that isn't in RedHat 7.3. It is suppose to be in 8.0 on. It also doesn't work in the last version of Cygwin I installed. Anyone know how UTF support is comming along in emacs? Bill
May 28 2003
Err.... I read your post a little more carefully... I don't know of any programming editors directly supporting the \u and \U features of C. Bill Cox wrote:Hi, Ilya.I have not seen a single piece of code using this silly feature. Is there any programmer's editor which has \u unicode support as of yet? And any IDE?The latest version of Vim supports UTF-8. However, it requires a kernel patch that isn't in RedHat 7.3. It is suppose to be in 8.0 on. It also doesn't work in the last version of Cygwin I installed. Anyone know how UTF support is comming along in emacs? Bill
May 28 2003
In article <bb0sqs$1t1k$1 digitaldaemon.com>, Martin M. Pedersen says...It has previously been argued, that only english should be used for identifiers in order to support reuse better across language boundaries. But that argument isn't always valid. For example, half a decade ago, I was involved in building the IT-infrastructure for a nation-wide real estate network. One of the requirements was that *everything* was in dansh.. It involved lots of developers nation-wide, but noone outside Denmark. Of cause, identifiers couldn't be fully danish - and thereby introduced inconsistency in how things was names.Back in the bad old days, before MSDOS, we all used CP/M. There was this Nationalist project in Finland, with the goal of translating all operating system commands to Finnish, or Finnish abbreviations. Ostensibly this would be easier on people. Turned out nobody wanted to use or learn the Finnish version. Their explanation: since these commands are "new words" to you anyway, the least of your troubles is the spelling. Compared with trying to grasp the meaning of these new concepts the spelling is a non-issue. And if you then have to use a non Finnish version, you're totally lost. Sure, D code written in Chinese would be more compact, maybe even more legible (in an absolute sense), with its one character variable names and method names. Maybe even parentheses and plus signs could be in Chinese equivalents. But I don't believe they'd want it. Most Finnish companies have a policy where all program code and comments have to be in English. Even in those companies where the programmers and staff speak hardly any English at all.
May 28 2003
"Georg Wrede" <Georg_member pathlink.com> wrote in messageMost Finnish companies have a policy where all program code and comments have to be in English. Even in those companies where the programmers and staff speak hardly any English at all.So do we. Yet there are exceptions. If the customer pays us to develop and deliver source code, it is his requirements that counts, not our policy. Regards, Martin M. Pedersen
May 28 2003
"Martin M. Pedersen" <mmp www.moeller-pedersen.dk> wrote in message news:bb2l7u$s2i$1 digitaldaemon.com...So do we. Yet there are exceptions. If the customer pays us to develop and deliver source code, it is his requirements that counts, not our policy.Yup. Listen to the customers, not the marketing department <g>.
May 28 2003
I agree that D is too English-centric (even ASCII-centric). Concern about C99 link compatibility leads me to reflect on C99's boolean type: http://www.uic.edu/classes/mcs/mcs494/f01/transparencies/sec8.4.pdf Mark
May 28 2003
Actually I still think that link compatibility with Digital Mars C++ would be a huge win for D. C++ also has a bool type. Mark
May 28 2003
I have noted that C99 allows *any* unicode character to be used in identifiers using \u. The D specification limits characters in identifiers to letters, digits, and '_', but does not even define what a letter is. The DMD implementation defines a letter to be ['A'..'Z', 'a'..'z'].I don't think there is a full implementation of C99 yet. It was adopted in late 1999. Maybe some of this stuff will disappear due to lack of use. Did ISO sack the trigraph crap from C89/C90?
May 29 2003
"Mark T" <Mark_member pathlink.com> wrote in message news:bb6710$1v5d$1 digitaldaemon.com...I don't think there is a full implementation of C99 yet. It was adopted inlate1999. Maybe some of this stuff will disappear due to lack of use. Did ISOsackthe trigraph crap from C89/C90?No, trigraphs are still there. Regards, Martin M. Pedersen
May 30 2003