D - UNICODE operators
- Mark Brudnak (37/37) Dec 02 2003 When reading the D spec I noticed that it supports UNICODE UTF-8, UTF-16...
- Georg Wrede (6/12) Dec 03 2003 I see it as a problem for code maintainers and debugging people.
- Walter (3/3) Dec 03 2003 These ideas have merit. Something useful ought to be done with unicode! ...
- Sean L. Palmer (9/12) Dec 03 2003 That really doesn't matter. That's what Character Map or BabelMap are f...
- Mark Brudnak (22/34) Dec 03 2003 for!
- Sean L. Palmer (8/45) Dec 03 2003 I want more operators. I am with you. I want to take advantage of unic...
- Walter (9/13) Dec 03 2003 of
- Sean L. Palmer (10/23) Dec 03 2003 of
- Mark J. Brudnak (33/120) Dec 03 2003 The UNICODE spec has a lot of mathematical symbols already defined (~100...
- Ilya Minkov (9/12) Dec 03 2003 You shall have a big, no, really HUGE parser handling these...
- Mark Brudnak (19/31) Dec 03 2003 No, the parser would have to detect three tokens <[, identifier, ]>. ...
- Hauke Duden (15/29) Dec 03 2003 My email client shows '?' for all your suggestions. I expect most
- Sean L. Palmer (11/40) Dec 03 2003 Win95 is dying, if not dead, for development purposes.
- Hauke Duden (6/7) Dec 03 2003 Win95 is close to dead: about 2% of our customers. But we still have 30%...
- Roald Ribe (9/15) Dec 03 2003 UNICODE support files for Win95 -> Me
- Hauke Duden (8/24) Dec 04 2003 The MSLU is just a layer above the normal ANSI API. It converts all
- Roald Ribe (11/25) Dec 04 2003 Yes, that is true. But it also means that if the user/admin has set
- Hauke Duden (8/25) Dec 04 2003 That was not the topic of this discussion. My point was that we
- Walter (8/10) Dec 19 2003 I agree. D should fully support developing unicode apps. I should point ...
- Elias Martenson (13/21) Dec 04 2003 Unix has pretty much settled on using UTF-8 for external representation
- Sean L. Palmer (6/27) Dec 04 2003 Right. And the OS should provide at least one font that has every singl...
- Elias Martenson (19/22) Dec 04 2003 Yes it certainly should. Now, my Linux installationlacks fonts for a lar...
- Sean L. Palmer (15/37) Dec 04 2003 That's fine with me, so long as they are not expressly prohibited, I can...
- J C Calvarese (6/21) Dec 04 2003 Actually, since DMD 0.74 non-ASCII characters (as long they are "unicode...
- Sean L. Palmer (27/63) Dec 05 2003 Yeah, just have to set this "free" browser to Encoding... Unicode UTF-8
- J C Calvarese (12/118) Dec 05 2003 OK, so I didn't send it right. (That's what a WASP like me gets for
- Elias Martenson (5/8) Dec 05 2003 Neat. Although your newsreader didn't include a proper encoding header.
- Mark J. Brudnak (11/34) Dec 05 2003 I think only "letter-like" unicode characters should be allowed in D
- Sean L. Palmer (5/11) Dec 05 2003 Agreed, though I would like to use symbols as operators.
- J C Calvarese (8/29) Dec 05 2003 My mail program garbled the UTF-8 file that I was trying to use as an
- Walter (4/6) Dec 19 2003 You're right, and that's the way it works now. I'm going by the C98
- Andy Friesen (5/17) Dec 03 2003 Bjarne suggested something similar to this for C++ once:
- Antti =?iso-8859-1?Q?Syk=E4ri?= (8/11) Dec 03 2003 This is also a problem that the language designer cannot fix by fixing
- Sean L. Palmer (4/15) Dec 03 2003 So someone can make a killing selling D Programmers' Keyboards!! ;)
- Elias Martenson (11/12) Dec 04 2003 Remember APL? Let's not go there again. :-)
When reading the D spec I noticed that it supports UNICODE UTF-8, UTF-16, UTF-32 source code formats. I propose that D extend its available set of operators (and maintain the current set) and draw from the unicode extensions for additional operators. For example: LOGICAL OPERATORS ================== ? (unicode 2264) may be used instead of <= ? (unicode 2265) may be used instead of >= ? (unicode 2260) may be used instead of != ? (unicode 225F) may be used instead of == ? (unicode 2227) may be used instead of && ? (unicode 2228) may be used instead of || INFIX OPERATORS (may only be overloaded) ================ ? (unicode 2218) may be introduced as the Schur product ? (unicode 22C5) may be introduced as the dot product × (unicode 00D7) may be introduced as the cross product ? (unicode 22C2) may be introduced as the union of two sets etc... UNARY OPERATORS (may only be overloaded) ============ ? (unicode 2218) may be introduced as the square root These were just chosen to provide some examples. There are a slew of symbols, most of which do not make sense in a programming environment. However some of these symbols may be useful to those who wish to over load them for a particular class they are developing. i.e. a = b × c ; is cleaner than a = cross(b, c) ; or worse yet a = b.cross(c) ; The largest difficulty with such a scheme is that our keyboards are not UNICODE friendly. This I see as a problem for the editor and operating system and not so much for the D language itself. Any way ... your thoughts?? Mark.
Dec 02 2003
In article <bqjndj$138p$1 digitaldaemon.com>, Mark Brudnak says...When reading the D spec I noticed that it supports UNICODE UTF-8, UTF-16, UTF-32 source code formats. I propose that D extend its available set of operators (and maintain the current set) and draw from the unicode..The largest difficulty with such a scheme is that our keyboards are not UNICODE friendly. This I see as a problem for the editor and operating system and not so much for the D language itself.I see it as a problem for code maintainers and debugging people. _They_ are not guaranteed to have the last and most international os version at hand, or if they do they still might no be able to see or even type such characters.
Dec 03 2003
These ideas have merit. Something useful ought to be done with unicode! The lack of a decent unicode keyboard is a problem, though, as it will be hard for anyone to type in the unicode operators.
Dec 03 2003
That really doesn't matter. That's what Character Map or BabelMap are for! Besides you'd likely be able to cut and paste them either from the header or the documentation. If someone makes some code that uses wierd unicode operators, you don't have to use it (or you can wrap it in ugly function call syntax). Sean "Walter" <walter digitalmars.com> wrote in message news:bql9s5$bkg$1 digitaldaemon.com...These ideas have merit. Something useful ought to be done with unicode!Thelack of a decent unicode keyboard is a problem, though, as it will be hard for anyone to type in the unicode operators.
Dec 03 2003
"Sean L. Palmer" <palmer.sean verizon.net> wrote in message news:bqlo7i$111q$1 digitaldaemon.com...That really doesn't matter. That's what Character Map or BabelMap arefor!Yes, this will work however it is not optimal. Too much keyboard - mouse switching is difficult. The thing that will make this work is the editor. I use VIM but am not familiar with its macro or shortcut features. EMACS must have similar features. In either of these editors (or others....LEDS....DIDE...) there will be a hassle/benefit tradeoff to the macro approach. The tipping point will be (I think) when the following happens: 1) The symbols are rendered in the editor (I can see the typeface, unlike my original post :^) ). 2) A symbol can be entered from a QWERTY keyboard using an escape/control key plus 3-5 other key strokes. this would be editor-specific. Mark.Besides you'd likely be able to cut and paste them either from the headerorthe documentation.Too much hassle.If someone makes some code that uses wierd unicode operators, you don'thaveto use it (or you can wrap it in ugly function call syntax).It makes sense to reserve all UNICODE "ARROWS" and "MATH OPERATORS" as symbols that cannot be used in identifiers. We should then choose a handful to serve as valid operators to start out with.Sean "Walter" <walter digitalmars.com> wrote in message news:bql9s5$bkg$1 digitaldaemon.com...hardThese ideas have merit. Something useful ought to be done with unicode!Thelack of a decent unicode keyboard is a problem, though, as it will befor anyone to type in the unicode operators.
Dec 03 2003
I want more operators. I am with you. I want to take advantage of unicode. I really see no reason why we should not be able to take any combination of characters that Unicode classifies as symbols, and make an operator out of it. The designers of D cannot possibly predict all the operators people are going to need or want. Sean "Mark Brudnak" <malibrud provide.net> wrote in message news:bqjndj$138p$1 digitaldaemon.com...When reading the D spec I noticed that it supports UNICODE UTF-8, UTF-16, UTF-32 source code formats. I propose that D extend its available set of operators (and maintain the current set) and draw from the unicode extensions for additional operators. For example: LOGICAL OPERATORS ================== ? (unicode 2264) may be used instead of <= ? (unicode 2265) may be used instead of >= ? (unicode 2260) may be used instead of != ? (unicode 225F) may be used instead of == ? (unicode 2227) may be used instead of && ? (unicode 2228) may be used instead of || INFIX OPERATORS (may only be overloaded) ================ ? (unicode 2218) may be introduced as the Schur product ? (unicode 22C5) may be introduced as the dot product × (unicode 00D7) may be introduced as the cross product ? (unicode 22C2) may be introduced as the union of two sets etc... UNARY OPERATORS (may only be overloaded) ============ ? (unicode 2218) may be introduced as the square root These were just chosen to provide some examples. There are a slew of symbols, most of which do not make sense in a programming environment. However some of these symbols may be useful to those who wish to over load them for a particular class they are developing. i.e. a = b × c ; is cleaner than a = cross(b, c) ; or worse yet a = b.cross(c) ; The largest difficulty with such a scheme is that our keyboards are not UNICODE friendly. This I see as a problem for the editor and operating system and not so much for the D language itself. Any way ... your thoughts?? Mark.
Dec 03 2003
"Sean L. Palmer" <palmer.sean verizon.net> wrote in message news:bqlasv$d7e$1 digitaldaemon.com...I really see no reason why we should not be able to take any combinationofcharacters that Unicode classifies as symbols, and make an operator out of it. The designers of D cannot possibly predict all the operators peoplearegoing to need or want.Some problems: 1) the precedence level of those operators. 2) what this implies is user-definable tokens, which is a big problem with a language that has as a design goal the ability to tokenize it without needing to do parse or semantic analysis.
Dec 03 2003
"Walter" <walter digitalmars.com> wrote in message news:bqlc2t$f0p$2 digitaldaemon.com..."Sean L. Palmer" <palmer.sean verizon.net> wrote in message news:bqlasv$d7e$1 digitaldaemon.com...ofI really see no reason why we should not be able to take any combinationofcharacters that Unicode classifies as symbols, and make an operator outYah that's a biggie. But I'd be ok with them defaulting to the lowest precedence and my being forced to use parenthesis.it. The designers of D cannot possibly predict all the operators peoplearegoing to need or want.Some problems: 1) the precedence level of those operators.2) what this implies is user-definable tokens, which is a big problem withalanguage that has as a design goal the ability to tokenize it without needing to do parse or semantic analysis.So require whitespace between operator tokens. It's easy to distinguish the boundary between brackets and symbols, or alphanumeric and symbols. Maybe limit the user-defined operators to no more than two symbols. Sean
Dec 03 2003
The UNICODE spec has a lot of mathematical symbols already defined (~100's). In my view combining ASCII symbols to form more operators is *not* the way to go. It would make the syntax even more difficult to parse and probably lead to abmbiguous syntax. A UNICODE character is one text symbol which can map to an operation (easy to parse). In "ASCII-land" the best approach to arbitrary operators is to define them with strings along with some yet-to-be-defined "bracket operator" to delimit them. For example, say I wanted to define some obtuse binary operator like the vector-exterior-product then my operator would be defined as a string like 'extprod' and some language-defined bracket, say <[ and ]>. To call this operator the code would look like this. myBivector = oneVector <[extprod]> anotherVector ; /* traditional infix notation w/ bulky operator */ The operator would be defined as: class ga { float [] vector ; int size ; ga = <[extprod]>( ga vectorB) { /* compute the exterior product of 'this' and vectorB */ } } It is bulky, but it would allow the definition of arbitrary operators in ASCII! As was said earlier, UNICODE is the way to go, it has a defined symbol for the exterior product :^). mark. "Sean L. Palmer" <palmer.sean verizon.net> wrote in message news:bqlasv$d7e$1 digitaldaemon.com...I want more operators. I am with you. I want to take advantage ofunicode.I really see no reason why we should not be able to take any combinationofcharacters that Unicode classifies as symbols, and make an operator out of it. The designers of D cannot possibly predict all the operators peoplearegoing to need or want. Sean "Mark Brudnak" <malibrud provide.net> wrote in message news:bqjndj$138p$1 digitaldaemon.com...UTF-16,When reading the D spec I noticed that it supports UNICODE UTF-8,loadUTF-32 source code formats. I propose that D extend its available set of operators (and maintain the current set) and draw from the unicode extensions for additional operators. For example: LOGICAL OPERATORS ================== ? (unicode 2264) may be used instead of <= ? (unicode 2265) may be used instead of >= ? (unicode 2260) may be used instead of != ? (unicode 225F) may be used instead of == ? (unicode 2227) may be used instead of && ? (unicode 2228) may be used instead of || INFIX OPERATORS (may only be overloaded) ================ ? (unicode 2218) may be introduced as the Schur product ? (unicode 22C5) may be introduced as the dot product × (unicode 00D7) may be introduced as the cross product ? (unicode 22C2) may be introduced as the union of two sets etc... UNARY OPERATORS (may only be overloaded) ============ ? (unicode 2218) may be introduced as the square root These were just chosen to provide some examples. There are a slew of symbols, most of which do not make sense in a programming environment. However some of these symbols may be useful to those who wish to overthem for a particular class they are developing. i.e. a = b × c ; is cleaner than a = cross(b, c) ; or worse yet a = b.cross(c) ; The largest difficulty with such a scheme is that our keyboards are not UNICODE friendly. This I see as a problem for the editor and operating system and not so much for the D language itself. Any way ... your thoughts?? Mark.
Dec 03 2003
Mark J. Brudnak wrote:It is bulky, but it would allow the definition of arbitrary operators in ASCII! As was said earlier, UNICODE is the way to go, it has a defined symbol for the exterior product :^).You shall have a big, no, really HUGE parser handling these... Because the parsing manner is not generic and you need to set operator precedence by constructing a big... mess! or have all these operators have the same precedence? or even make it an error to rely on precedence of these operators like lint does? Another idea: 'blabla' should be enough for the ascii infix notation. -eye
Dec 03 2003
"Ilya Minkov" <minkov cs.tum.edu> wrote in message news:bqljrj$q74$1 digitaldaemon.com...Mark J. Brudnak wrote:No, the parser would have to detect three tokens <[, identifier, ]>. It would have to take care of right/left matching. 'identifier' could be any valid D identifier like 'foo', 'bar'. For example: moo = foo <[ goo ]> zoo ; This statement parses to the following tokens ; moo = foo <[ goo ]> zoo ; The compiler then knows that this is equivalent to: moo = foo.goo(zoo) ;It is bulky, but it would allow the definition of arbitrary operators in ASCII! As was said earlier, UNICODE is the way to go, it has a defined symbol for the exterior product :^).You shall have a big, no, really HUGE parser handling these...Because the parsing manner is not generic and you need to set operator precedence by constructing a big... mess!or have all these operators have the same precedence?They would be higher than assignment. Otherwise be made explicit with parenthesies.or even make it an error to rely on precedence of these operators like lint does? Another idea: 'blabla' should be enough for the ascii infix notation. -eye
Dec 03 2003
Mark Brudnak wrote:When reading the D spec I noticed that it supports UNICODE UTF-8, UTF-16, UTF-32 source code formats. I propose that D extend its available set of operators (and maintain the current set) and draw from the unicode extensions for additional operators. For example: LOGICAL OPERATORS ================== ? (unicode 2264) may be used instead of <= ? (unicode 2265) may be used instead of >= ? (unicode 2260) may be used instead of != ? (unicode 225F) may be used instead of == ? (unicode 2227) may be used instead of && ? (unicode 2228) may be used instead of ||My email client shows '?' for all your suggestions. I expect most current code editors will do the same, since most programming languages use ASCII encoding for their source code. It would be quite some task to figure out what another programmer meant when he wrote: x = ((a ? b) ? c ? d ) ? e; Some operating systems (i.e. Win9x) don't even have support for printing unicode text on the screen, unless the used characters happen to also be available in the current code page. So it would be close to impossible to write a proper Unicode code editor on those OSs. And then, of course, there's the problem of entering such operators. My keyboard doesn't have any keys for (unicode 2264), (unicode 2265),... . It's a great idea, but currently I fear it is not practical. Hauke
Dec 03 2003
Win95 is dying, if not dead, for development purposes. You should look forward, it won't be long before all operating systems and all applications support unicode fully. Unless you think we're all gonna give up on this unicode nonsense in the near future, and go back to ascii. ;) It is a feature that doesn't have to be 100% implemented right away, and it is a feature that you are not forced to use. Sean "Hauke Duden" <H.NS.Duden gmx.net> wrote in message news:bqlept$j09$1 digitaldaemon.com...Mark Brudnak wrote:UTF-16,When reading the D spec I noticed that it supports UNICODE UTF-8,UTF-32 source code formats. I propose that D extend its available set of operators (and maintain the current set) and draw from the unicode extensions for additional operators. For example: LOGICAL OPERATORS ================== ? (unicode 2264) may be used instead of <= ? (unicode 2265) may be used instead of >= ? (unicode 2260) may be used instead of != ? (unicode 225F) may be used instead of == ? (unicode 2227) may be used instead of && ? (unicode 2228) may be used instead of ||My email client shows '?' for all your suggestions. I expect most current code editors will do the same, since most programming languages use ASCII encoding for their source code. It would be quite some task to figure out what another programmer meant when he wrote: x = ((a ? b) ? c ? d ) ? e; Some operating systems (i.e. Win9x) don't even have support for printing unicode text on the screen, unless the used characters happen to also be available in the current code page. So it would be close to impossible to write a proper Unicode code editor on those OSs. And then, of course, there's the problem of entering such operators. My keyboard doesn't have any keys for (unicode 2264), (unicode 2265),... . It's a great idea, but currently I fear it is not practical. Hauke
Dec 03 2003
Win95 is dying, if not dead, for development purposes.Win95 is close to dead: about 2% of our customers. But we still have 30% customers using Win98 or WinME. And I'm sure there are lots of Unix systems that would also have their problems with this - having been invented when ASCII ruled the world and Unicode didn't even exist. Hauke
Dec 03 2003
"Hauke Duden" <H.NS.Duden gmx.net> wrote in message news:bqlunf$1ag4$2 digitaldaemon.com...UNICODE support files for Win95 -> Me Microsoft Layer for Unicode on Windows 95/95/ME Systems (MSLU) version 1.0 (http://tinyurl.com/qynq) The question at hand is: is D going to be a language of the future, for all languages, all over the globe, or will it be a conservative backward looking effort? RoaldWin95 is dying, if not dead, for development purposes.Win95 is close to dead: about 2% of our customers. But we still have 30% customers using Win98 or WinME. And I'm sure there are lots of Unix systems that would also have their problems with this - having been invented when ASCII ruled the world and Unicode didn't even exist.
Dec 03 2003
Roald Ribe wrote:The MSLU is just a layer above the normal ANSI API. It converts all Unicode strings to ANSI before passing it to functions and converts the results back to Unicode afterwards. That means that Unicode characters that cannot be represented in the current (ANSI) code page will just be replaced with '?', or whatever the conversion routines use in such a case. HaukeWin95 is close to dead: about 2% of our customers. But we still have 30% customers using Win98 or WinME. And I'm sure there are lots of Unix systems that would also have their problems with this - having been invented when ASCII ruled the world and Unicode didn't even exist.UNICODE support files for Win95 -> Me Microsoft Layer for Unicode on Windows 95/95/ME Systems (MSLU) version 1.0 (http://tinyurl.com/qynq) The question at hand is: is D going to be a language of the future, for all languages, all over the globe, or will it be a conservative backward looking effort?
Dec 04 2003
Yes, that is true. But it also means that if the user/admin has set up the correct codepage/fonts for the language they work in, the application using the API will not need to know what codepage that is, it will just work with UNICODE. (openoffice.org uses this system on older Win9X platforms) It is a stop gap measure to allow modern programs run on older platforms, not the greatest invention since sliced bread ;-) It would allow a full UNICODE D app to run unmodified on any of those systems, get full use of UNICODE on newer systems, and still just use one API. RoaldUNICODE support files for Win95 -> Me Microsoft Layer for Unicode on Windows 95/95/ME Systems (MSLU) version 1.0 (http://tinyurl.com/qynq) The question at hand is: is D going to be a language of the future, for all languages, all over the globe, or will it be a conservative backward looking effort?The MSLU is just a layer above the normal ANSI API. It converts all Unicode strings to ANSI before passing it to functions and converts the results back to Unicode afterwards. That means that Unicode characters that cannot be represented in the current (ANSI) code page will just be replaced with '?', or whatever the conversion routines use in such a case.
Dec 04 2003
Roald Ribe wrote:That was not the topic of this discussion. My point was that we shouldn't use Unicode characters for something as essential to the language as operators, because then the code will only be readable if your editor/OS uses a code page that happens to contain these symbols. Creating Unicode applications in D is a completely different thing (and it was/is already discussed in a different thread). HaukeThat means that Unicode characters that cannot be represented in the current (ANSI) code page will just be replaced with '?', or whatever the conversion routines use in such a case.Yes, that is true. But it also means that if the user/admin has set up the correct codepage/fonts for the language they work in, the application using the API will not need to know what codepage that is, it will just work with UNICODE. (openoffice.org uses this system on older Win9X platforms) It is a stop gap measure to allow modern programs run on older platforms, not the greatest invention since sliced bread ;-) It would allow a full UNICODE D app to run unmodified on any of those systems, get full use of UNICODE on newer systems, and still just use one API.
Dec 04 2003
"Hauke Duden" <H.NS.Duden gmx.net> wrote in message news:bqnr2q$1240$1 digitaldaemon.com...Creating Unicode applications in D is a completely different thing (and it was/is already discussed in a different thread).I agree. D should fully support developing unicode apps. I should point out, though, that right now D supports unicode source text (UTF-8, UTF-16, and UTF-32), unicode characters in comments and strings, and unicode alpha characters in identifiers. I'm not sure, though, if the world is quite ready yet for unicode operators. We'll see.
Dec 19 2003
Den Thu, 04 Dec 2003 01:44:25 +0100 skrev Hauke Duden:Unix has pretty much settled on using UTF-8 for external representation and before long all text files in Unix will be UTF-8 instead of some local encoding. Here's a quote from the excellent UTF-8 for Unix FAQ (http://www.cl.cam.ac.uk/~mgk25/unicode.html): "With the UTF-8 encoding, Unicode can be used in a convenient and backwards compatible way in environments that, like Unix, were designed entirely around ASCII. UTF-8 is the way in which Unicode is used under Unix, Linux, and similar systems. It is now time to make sure that you are well familiar with it and that your software supports UTF-8 smoothly." Regards EliasWin95 is dying, if not dead, for development purposes.Win95 is close to dead: about 2% of our customers. But we still have 30% customers using Win98 or WinME. And I'm sure there are lots of Unix systems that would also have their problems with this - having been invented when ASCII ruled the world and Unicode didn't even exist.
Dec 04 2003
Right. And the OS should provide at least one font that has every single unicode character, for use as fallback for fonts that are missing such characters. Sean "Elias Martenson" <no spam.spam> wrote in message news:pan.2003.12.04.11.26.05.375275 spam.spam...Den Thu, 04 Dec 2003 01:44:25 +0100 skrev Hauke Duden:Unix has pretty much settled on using UTF-8 for external representation and before long all text files in Unix will be UTF-8 instead of some local encoding. Here's a quote from the excellent UTF-8 for Unix FAQ (http://www.cl.cam.ac.uk/~mgk25/unicode.html): "With the UTF-8 encoding, Unicode can be used in a convenient and backwards compatible way in environments that, like Unix, were designed entirely around ASCII. UTF-8 is the way in which Unicode is used under Unix, Linux, and similar systems. It is now time to make sure that you are well familiar with it and that your software supports UTF-8 smoothly." Regards EliasWin95 is dying, if not dead, for development purposes.Win95 is close to dead: about 2% of our customers. But we still have 30% customers using Win98 or WinME. And I'm sure there are lots of Unix systems that would also have their problems with this - having been invented when ASCII ruled the world and Unicode didn't even exist.
Dec 04 2003
Den Thu, 04 Dec 2003 10:56:46 -0800 skrev Sean L. Palmer:Right. And the OS should provide at least one font that has every single unicode character, for use as fallback for fonts that are missing such characters.Yes it certainly should. Now, my Linux installationlacks fonts for a large set of the unihan code points, but other than that I have most of them. In fact, I think that almost all existing installed operating systems today would be able to handle unicode operators. However, I think the problem with them is more related to the fact that you more than likely will need a special editor for the code (at least if you don't want to try to remember all the \u-codes for the operators). Unicode is very important, as I have pointed out several times in the other unicode thread, but it deals with strings in the language. Not the source code itself. Do I think the designers of Java made a mistake when support unicode in it's symbols? A few years ago I would have said yes. Now, I say that it really didn't matter. People don't use unicode symbols anyway. Therefore, I believe that this discussion is a non-issue. EVen if unicode operatos would be supported, I doubdt people would use them in the name of interoperability. Regards Elias
Dec 04 2003
That's fine with me, so long as they are not expressly prohibited, I can use them for my own personal projects. Support for them would then grow grassroots-style. I have text editors that support Unicode, and I don't mind cutting and pasting. Ease of entry is a minor issue to me. The problem is, if we can't define new operators in D, and it doesn't provide enough overloadable builtin operators, I'm stuck. I can do nothing but invest in a Unicode-aware preprocessor. I want the option of moving forward. What good is being able to compile D source encoded in UTF-8 if you aren't allowed to use any symbols that aren't in ASCII? (except embedded in string literals) Sean "Elias Martenson" <no spam.spam> wrote in message news:pan.2003.12.04.23.39.50.952964 spam.spam...Den Thu, 04 Dec 2003 10:56:46 -0800 skrev Sean L. Palmer:singleRight. And the OS should provide at least one font that has everyunicode character, for use as fallback for fonts that are missing such characters.Yes it certainly should. Now, my Linux installationlacks fonts for a large set of the unihan code points, but other than that I have most of them. In fact, I think that almost all existing installed operating systems today would be able to handle unicode operators. However, I think the problem with them is more related to the fact that you more than likely will need a special editor for the code (at least if you don't want to try to remember all the \u-codes for the operators). Unicode is very important, as I have pointed out several times in the other unicode thread, but it deals with strings in the language. Not the source code itself. Do I think the designers of Java made a mistake when support unicode in it's symbols? A few years ago I would have said yes. Now, I say that it really didn't matter. People don't use unicode symbols anyway. Therefore, I believe that this discussion is a non-issue. EVen if unicode operatos would be supported, I doubdt people would use them in the name of interoperability. Regards Elias
Dec 04 2003
Sean L. Palmer wrote:That's fine with me, so long as they are not expressly prohibited, I can use them for my own personal projects. Support for them would then grow grassroots-style. I have text editors that support Unicode, and I don't mind cutting and pasting. Ease of entry is a minor issue to me. The problem is, if we can't define new operators in D, and it doesn't provide enough overloadable builtin operators, I'm stuck. I can do nothing but invest in a Unicode-aware preprocessor. I want the option of moving forward. What good is being able to compile D source encoded in UTF-8 if you aren't allowed to use any symbols that aren't in ASCII? (except embedded in string literals)Actually, since DMD 0.74 non-ASCII characters (as long they are "unicode alpha") are allowed as identifier names. (See the attached example.) Also, comments can contain any non-ASCII character. I do think Unicode operators is an interesting idea. JustinSean
Dec 04 2003
Yeah, just have to set this "free" browser to Encoding... Unicode UTF-8 That's pretty cool. Pretty cool indeed. I bet you if I cut and paste some D program made by someone is a far-away land, into some web-based translator engine it would probably not do that bad of a job of translating the identifiers back into english again ;) Most likely, I'll rarely if ever see any source written in some other language, and if I did, I'd just consider it obfuscation. It's not a sin punishable by death. I think it's cool that finally people can more or less program in their own language, once they learn the english keywords. A preprocessor would allow even those to be replaced. In fact, whose idea was it to allow infix notation for regular identifiers? We could use a preprocessor to translate our D + Unicode Symbols into D that will actually compile. ;) Right now it would only work with prefix (lisp-like) notation, however. They have some really interesting brackets in Unicode, as well. Surely there's one just begging to be used for template syntax. Sean "J C Calvarese" <jcc7 cox.net> wrote in message news:bqpbqo$8no$1 digitaldaemon.com...Sean L. Palmer wrote:useThat's fine with me, so long as they are not expressly prohibited, I cannothingthem for my own personal projects. Support for them would then grow grassroots-style. I have text editors that support Unicode, and I don't mind cutting and pasting. Ease of entry is a minor issue to me. The problem is, if we can't define new operators in D, and it doesn't provide enough overloadable builtin operators, I'm stuck. I can doaren'tbut invest in a Unicode-aware preprocessor. I want the option of moving forward. What good is being able to compile D source encoded in UTF-8 if youstringallowed to use any symbols that aren't in ASCII? (except embedded in---------------------------------------------------------------------------- ----literals)Actually, since DMD 0.74 non-ASCII characters (as long they are "unicode alpha") are allowed as identifier names. (See the attached example.) Also, comments can contain any non-ASCII character. I do think Unicode operators is an interesting idea. JustinSeanconst char[] Sí = "yes"; const char[] Año = "year"; /+ These don't work (it might be because they are iconic symbols rather thanpart of any actual language)const char[] ???? = "box drawing"; const char[] ???? = "cards"; +/ int main() { int AñoNúmero = 2003; int Cyrillic???? = 1; int Hebrew?????; printf("%d", AñoNúmero); return 0; }
Dec 05 2003
Sean L. Palmer wrote:Yeah, just have to set this "free" browser to Encoding... Unicode UTF-8OK, so I didn't send it right. (That's what a WASP like me gets for belittling ASCII.) Unicode isn't very friendly to novices. I think putting it in a .zip will help out. Maybe it will work if I turn it into an .html file. I'm sure there's a setting in Thunderbird that will take care of this stuff automatically; I'm just not sure how much time I want to spent looking for it. (By the way, I used WinXP's notepad to create the original document because I was lazy and didn't want to hunt down another Unicode-capable editor.) JustinThat's pretty cool. Pretty cool indeed. I bet you if I cut and paste some D program made by someone is a far-away land, into some web-based translator engine it would probably not do that bad of a job of translating the identifiers back into english again ;) Most likely, I'll rarely if ever see any source written in some other language, and if I did, I'd just consider it obfuscation. It's not a sin punishable by death. I think it's cool that finally people can more or less program in their own language, once they learn the english keywords. A preprocessor would allow even those to be replaced. In fact, whose idea was it to allow infix notation for regular identifiers? We could use a preprocessor to translate our D + Unicode Symbols into D that will actually compile. ;) Right now it would only work with prefix (lisp-like) notation, however. They have some really interesting brackets in Unicode, as well. Surely there's one just begging to be used for template syntax. Sean "J C Calvarese" <jcc7 cox.net> wrote in message news:bqpbqo$8no$1 digitaldaemon.com...Sean L. Palmer wrote:useThat's fine with me, so long as they are not expressly prohibited, I cannothingthem for my own personal projects. Support for them would then grow grassroots-style. I have text editors that support Unicode, and I don't mind cutting and pasting. Ease of entry is a minor issue to me. The problem is, if we can't define new operators in D, and it doesn't provide enough overloadable builtin operators, I'm stuck. I can doaren'tbut invest in a Unicode-aware preprocessor. I want the option of moving forward. What good is being able to compile D source encoded in UTF-8 if youstringallowed to use any symbols that aren't in ASCII? (except embedded in---------------------------------------------------------------------------- ----literals)Actually, since DMD 0.74 non-ASCII characters (as long they are "unicode alpha") are allowed as identifier names. (See the attached example.) Also, comments can contain any non-ASCII character. I do think Unicode operators is an interesting idea. JustinSeanconst char[] Sí = "yes"; const char[] Año = "year"; /+ These don't work (it might be because they are iconic symbols rather thanpart of any actual language)const char[] ???? = "box drawing"; const char[] ???? = "cards"; +/ int main() { int AñoNúmero = 2003; int Cyrillic???? = 1; int Hebrew?????; printf("%d", AñoNúmero); return 0; }
Dec 05 2003
Den Fri, 05 Dec 2003 01:34:19 -0600 skrev J C Calvarese:Actually, since DMD 0.74 non-ASCII characters (as long they are "unicode alpha") are allowed as identifier names. (See the attached example.) Also, comments can contain any non-ASCII character.Neat. Although your newsreader didn't include a proper encoding header. Not your fault, but rather the broken software. :-) Regards Elias
Dec 05 2003
"J C Calvarese" <jcc7 cox.net> wrote <snip>Actually, since DMD 0.74 non-ASCII characters (as long they are "unicode alpha") are allowed as identifier names. (See the attached example.) Also, comments can contain any non-ASCII character.I think only "letter-like" unicode characters should be allowed in D identifiers. Having variables like int  = 42 ; float ±×§ =3.14159 ; will really confuse things. Punctuation, shapes, boxdrawing, dingbats, math symbols, should be prohibited from being used in identifiers.I do think Unicode operators is an interesting idea. Justin---------------------------------------------------------------------------- ----Sean const char[] Sà = "yes"; const char[] Año = "year"; /+ These don't work (it might be because they are iconic symbols rather thanpart of any actual language)const char[] â. â.¢â.¦â.¬ = "box drawing"; const char[] âT âT¥âT£âT¦ = "cards"; +/ int main() { int AñoNúmero = 2003; int CyrillicÒ-Ñ?Ò"Ò± = 1; int Hebrew××"×Yףק; printf("%d", AñoNúmero); return 0; }
Dec 05 2003
Agreed, though I would like to use symbols as operators. Sean "Mark J. Brudnak" <mjbrudna oakland.edu> wrote in message news:bqq0pe$183n$1 digitaldaemon.com...I think only "letter-like" unicode characters should be allowed in D identifiers. Having variables like int  = 42 ; float ±×§ =3.14159 ; will really confuse things. Punctuation, shapes, boxdrawing, dingbats,mathsymbols, should be prohibited from being used in identifiers.
Dec 05 2003
Mark J. Brudnak wrote:"J C Calvarese" <jcc7 cox.net> wrote <snip>My mail program garbled the UTF-8 file that I was trying to use as an example. D only allows unicode alphas (A - Z, alpha - omega, aleph - taw, accented letters, etc.) For example, the cards symbols (♠♥♣♦) and box elements (╠╢╦╬) can't be used as identifiers (I'm sure because I tried them and it wouldn't compile). JustinActually, since DMD 0.74 non-ASCII characters (as long they are "unicode alpha") are allowed as identifier names. (See the attached example.) Also, comments can contain any non-ASCII character.I think only "letter-like" unicode characters should be allowed in D identifiers. Having variables like int  = 42 ; float ±×§ =3.14159 ; will really confuse things. Punctuation, shapes, boxdrawing, dingbats, math symbols, should be prohibited from being used in identifiers.
Dec 05 2003
"Mark J. Brudnak" <mjbrudna oakland.edu> wrote in message news:bqq0pe$183n$1 digitaldaemon.com...I think only "letter-like" unicode characters should be allowed in D identifiers.You're right, and that's the way it works now. I'm going by the C98 "Appendix D" list of allowed alpha characters.
Dec 19 2003
Mark Brudnak wrote:When reading the D spec I noticed that it supports UNICODE UTF-8, UTF-16, UTF-32 source code formats. I propose that D extend its available set of operators (and maintain the current set) and draw from the unicode extensions for additional operators. For example: [...] The largest difficulty with such a scheme is that our keyboards are not UNICODE friendly. This I see as a problem for the editor and operating system and not so much for the D language itself. Any way ... your thoughts?? Mark.Bjarne suggested something similar to this for C++ once: http://www.research.att.com/~bs/whitespace98.pdf (yes, this is a joke) -- andy
Dec 03 2003
In article <bqjndj$138p$1 digitaldaemon.com>, Mark Brudnak wrote:The largest difficulty with such a scheme is that our keyboards are not UNICODE friendly. This I see as a problem for the editor and operating system and not so much for the D language itself.This is also a problem that the language designer cannot fix by fixing the language. It continues to be a problem as long as QWERTY is the only universally available keyboard or as long as some of the current major operating systems do not offer a universally available easy way to input those unicode characters that are commonly used in mathematics but rarely seen on the computer screen. -Antti
Dec 03 2003
So someone can make a killing selling D Programmers' Keyboards!! ;) Sean "Antti Sykäri" <jsykari gamma.hut.fi> wrote in message news:slrnbssvc9.i3r.jsykari pulu.hut.fi...In article <bqjndj$138p$1 digitaldaemon.com>, Mark Brudnak wrote:The largest difficulty with such a scheme is that our keyboards are not UNICODE friendly. This I see as a problem for the editor and operating system and not so much for the D language itself.This is also a problem that the language designer cannot fix by fixing the language. It continues to be a problem as long as QWERTY is the only universally available keyboard or as long as some of the current major operating systems do not offer a universally available easy way to input those unicode characters that are commonly used in mathematics but rarely seen on the computer screen. -Antti
Dec 03 2003
Den Wed, 03 Dec 2003 23:34:55 -0800 skrev Sean L. Palmer:So someone can make a killing selling D Programmers' Keyboards!! ;)Remember APL? Let's not go there again. :-) I agree that unicode operators could be useful in maths applications but other than that the advantages are pretty limited. Java has support for Unicode symbols, and that can be a mess unless you encode all non-ascii symbols using the \u notation, which makes the code pretty hard to read. I'm currently implementing a BASIC interpreter with full Unicode support. It's not a serious project though. :-) Regards Elias
Dec 04 2003