digitalmars.D - Unicode character module (unichar)
- Hauke Duden (7/7) Jun 04 2004 As promised in another thread, here's the unichar module that I've
- Walter (10/16) Jun 04 2004 Great! Some quick comments:
- Hauke Duden (10/22) Jun 04 2004 Hmmm. I don't know that much about the inner workings of foreach but
- Walter (9/16) Jun 04 2004 Yes.
- Hauke Duden (31/44) Jun 05 2004 Here's an example of what I mean:
- Walter (27/60) Jun 05 2004 tighter
- Hauke Duden (11/29) Jun 05 2004 My apologies. I now get the same error.
- Walter (23/36) Jun 05 2004 That's possible. I don't remember.
- hellcatv hotmail.com (17/53) Jun 05 2004 I think these features to enable catching errors at compile time are nec...
- Walter (7/23) Jun 05 2004 necessary.
- Hauke Duden (5/53) Jun 05 2004 Heh, the other reason being the implicit conversion between char and
-
Carlos Santander B.
(13/13)
Jun 06 2004
"Walter"
escribió en el mensaje - J C Calvarese (9/24) Jun 07 2004 Yes, in fact I think it'd be ideal to provide the 3 locations involved:
- Sean Kelly (9/18) Jun 05 2004 I prefer to think of modules as C++ namespaces. And in C++ I rarely
- Antti =?iso-8859-1?Q?Syk=E4ri?= (13/29) Jun 06 2004 In C++, that's a good habit.
- Hauke Duden (6/6) Jun 05 2004 I have updated the unichar module incorporating (most of) Walter's
As promised in another thread, here's the unichar module that I've written. It provides basic Unicode character property functions (like charIsDigit, charToLower, etc). It is documented in doxygen style and the compiled docs are included in the zip file. Let me know what you think! Hauke
Jun 04 2004
"Hauke Duden" <H.NS.Duden gmx.net> wrote in message news:c9qlqe$1me1$1 digitaldaemon.com...As promised in another thread, here's the unichar module that I've written. It provides basic Unicode character property functions (like charIsDigit, charToLower, etc). It is documented in doxygen style and the compiled docs are included in the zip file. Let me know what you think!Great! Some quick comments: o Can the enum be changed to enum CHARCATEGORY, and then replace CHARCATEGORY_LETTER, etc., to CHARCATEGORY.LETTER? o change inout in the foreach in charToTitle to nothing. o "Descimal" should be "Decimal" o no need for 'char' prefix on functions, the module name should suffice. The 2Mb ram at runtime is a little costly, so I think it should remain a separate package from std.ctype.
Jun 04 2004
Walter wrote:Yes. I guess that's just a C++-ism I got used to.Let me know what you think!Great! Some quick comments: o Can the enum be changed to enum CHARCATEGORY, and then replace CHARCATEGORY_LETTER, etc., to CHARCATEGORY.LETTER?o change inout in the foreach in charToTitle to nothing.Hmmm. I don't know that much about the inner workings of foreach but won't that create a copy of the referenced element?o "Descimal" should be "Decimal"Whoops ;).o no need for 'char' prefix on functions, the module name should suffice.As I said in another post, I'm reluctant to change this. Mostly because I want the functions to look different from the ctype ones but also because of D's overloading issue.The 2Mb ram at runtime is a little costly, so I think it should remain a separate package from std.ctype.I agree. Hauke
Jun 04 2004
"Hauke Duden" <H.NS.Duden gmx.net> wrote in message news:c9r0ri$26gi$1 digitaldaemon.com...Yes.o change inout in the foreach in charToTitle to nothing.Hmmm. I don't know that much about the inner workings of foreach but won't that create a copy of the referenced element?suffice.o no need for 'char' prefix on functions, the module name shouldAs I said in another post, I'm reluctant to change this. Mostly because I want the functions to look different from the ctype ones but also because of D's overloading issue.I just don't understand what the D overloading issue is. D has much tighter control over overloading than C++ has, overloads from one module aren't going to be mistaken for another one if both are imported. One reason for the package/module system in D is to pitch the C-ism of decorating names with a pseudo-package name into the ash heap of history <g>.
Jun 04 2004
Walter wrote: >>>o no need for 'char' prefix on functions, the module name shouldsuffice.Here's an example of what I mean: module unichar: bool isSeparator(dchar chr); module funkyMenu: bool isSeparator(MenuItem item); module myApp: import unichar; import funkyMenu; void foo(MenuItem item) { if(isSeparator(item)) .... } This will cause a compiler error because D stops looking for more overloads as soon as it finds unichar.isSeparator and never finds funkyMenu.isSeparator. And to make matters worse, the error message will not even tell you that there is some kind of conflict. No, the compiler will tell you that there is no isSeparator(MenuItem) even though there most certainly is. In C++ there'd be no such problem because the call is not actually ambiguous! It is perfectly clear that the MenuItem version is the one that should be called. That's what I mean and that's the reason why I don't want to define any global functions with names that may also occur in other contexts. Otherwise there may be weird effects for the library's user like working code failing to compile once another import is added - even though there is no ambiguity. HaukeAs I said in another post, I'm reluctant to change this. Mostly because I want the functions to look different from the ctype ones but also because of D's overloading issue.I just don't understand what the D overloading issue is. D has much tighter control over overloading than C++ has, overloads from one module aren't going to be mistaken for another one if both are imported. One reason for the package/module system in D is to pitch the C-ism of decorating names with a pseudo-package name into the ash heap of history <g>.
Jun 05 2004
"Hauke Duden" <H.NS.Duden gmx.net> wrote in message news:c9t8un$2jfc$1 digitaldaemon.com...tighterI just don't understand what the D overloading issue is. D has muchforcontrol over overloading than C++ has, overloads from one module aren't going to be mistaken for another one if both are imported. One reasonNo, that isn't what happens. What happens is that isSeparator appears in multiple modules, and the compiler doesn't know which one to use, so issues an error. You'll find the same error if you use a dchar argument for isSeparator. Next, overloading does NOT happen across modules. Overloading happens AFTER the symbol lookup. Only functions in the same scope are overloadable.the package/module system in D is to pitch the C-ism of decorating names with a pseudo-package name into the ash heap of history <g>.Here's an example of what I mean: module unichar: bool isSeparator(dchar chr); module funkyMenu: bool isSeparator(MenuItem item); module myApp: import unichar; import funkyMenu; void foo(MenuItem item) { if(isSeparator(item)) .... } This will cause a compiler error because D stops looking for more overloads as soon as it finds unichar.isSeparator and never finds funkyMenu.isSeparator.And to make matters worse, the error message will not even tell you that there is some kind of conflict. No, the compiler will tell you that there is no isSeparator(MenuItem) even though there most certainly is.The error message I get is: unichar.d(2): function isSeparator conflicts with funkyMenu.isSeparator at funkyMenu.d(3)In C++ there'd be no such problem because the call is not actually ambiguous! It is perfectly clear that the MenuItem version is the one that should be called. That's what I mean and that's the reason why I don't want to define any global functions with names that may also occur in other contexts. Otherwise there may be weird effects for the library's user like working code failing to compile once another import is added - even though there is no ambiguity.There is an ambiguity, and the compiler issues an error for it. The reason it behaves this way is to avoid the C++ global namespace pollution problem, where two completely unrelated functions in two unrelated source files happen to have the same name, and inadvertantly overload against each other causing some very strange errors. This doesn't happen in D, if you want two names in different modules to overload against each other, a specific action is required to make it happen (an alias declaration). It will NOT happen by default. You'll get the "conflicts" error above. Next, instead of the C++ 'fix' for this problem by adding a pseudo-package name to each global symbol, in D you can just use the module name for it, i.e.: unichar.isSeparator() funkyMenu.isSeparator() which is better than the C++ unichar_isSeparator(), isn't it?
Jun 05 2004
Walter wrote:Next, overloading does NOT happen across modules. Overloading happens AFTER the symbol lookup. Only functions in the same scope are overloadable.My apologies. I now get the same error. I distictly remember getting a much more misleading error when I experimented with overloads some time ago, though. Was there a related compiler error in earlier DMD versions?And to make matters worse, the error message will not even tell you that there is some kind of conflict. No, the compiler will tell you that there is no isSeparator(MenuItem) even though there most certainly is.The error message I get is: unichar.d(2): function isSeparator conflicts with funkyMenu.isSeparator at funkyMenu.d(3)There is an ambiguity, and the compiler issues an error for it. The reason it behaves this way is to avoid the C++ global namespace pollution problem, where two completely unrelated functions in two unrelated source files happen to have the same name, and inadvertantly overload against each other causing some very strange errors.What kind of strange errors are these? It seems to me that overloads with different argument types are unproblematic. You can sometimes have ambiguous calls, for example, if one function takes the base class type the other function's parameter but that'd simply cause a compiler error. Nothing that I would call "strange". Hauke
Jun 05 2004
"Hauke Duden" <H.NS.Duden gmx.net> wrote in message news:c9thhq$2urf$1 digitaldaemon.com...I distictly remember getting a much more misleading error when I experimented with overloads some time ago, though. Was there a related compiler error in earlier DMD versions?That's possible. I don't remember.reasonThere is an ambiguity, and the compiler issues an error for it. Theproblem,it behaves this way is to avoid the C++ global namespace pollutionotherwhere two completely unrelated functions in two unrelated source files happen to have the same name, and inadvertantly overload against eachSuppose, in file 'a.h', you have: void output(int); void output(long); which sends its argument to stdout. You download 'b.h' off the net, which has: void output(char); buried in it somewhere which writes its argument out to the serial port. Now, #include "a.h" output('c'); and all is fine. Now, #include "a.h" #include "b.h" output('c'); and your program breaks at runtime, possibly in invisible ways. In D, this would break in an obvious manner at compile time. Much more reliable.causing some very strange errors.What kind of strange errors are these? It seems to me that overloads with different argument types are unproblematic. You can sometimes have ambiguous calls, for example, if one function takes the base class type the other function's parameter but that'd simply cause a compiler error. Nothing that I would call "strange".
Jun 05 2004
I think these features to enable catching errors at compile time are necessary. however I was wondering if there are a few shortcuts. I'm running into two situations: first (and simplest) I have a file ftoa with a single class or function ... lets say char[] ftoa(real); in my other file I say import ftoa; alias ftoa.ftoa ftoa; nope! can't do it. Is there any way I can call it ftoa without having to name my file different than my function (or especially class as the case may be) secondly: it would be nice if I could alias everything in a module; alias ftoa.* *; or something :-) --Daniel In article <c9ti5d$2vk4$1 digitaldaemon.com>, Walter says..."Hauke Duden" <H.NS.Duden gmx.net> wrote in message news:c9thhq$2urf$1 digitaldaemon.com...I distictly remember getting a much more misleading error when I experimented with overloads some time ago, though. Was there a related compiler error in earlier DMD versions?That's possible. I don't remember.reasonThere is an ambiguity, and the compiler issues an error for it. Theproblem,it behaves this way is to avoid the C++ global namespace pollutionotherwhere two completely unrelated functions in two unrelated source files happen to have the same name, and inadvertantly overload against eachSuppose, in file 'a.h', you have: void output(int); void output(long); which sends its argument to stdout. You download 'b.h' off the net, which has: void output(char); buried in it somewhere which writes its argument out to the serial port. Now, #include "a.h" output('c'); and all is fine. Now, #include "a.h" #include "b.h" output('c'); and your program breaks at runtime, possibly in invisible ways. In D, this would break in an obvious manner at compile time. Much more reliable.causing some very strange errors.What kind of strange errors are these? It seems to me that overloads with different argument types are unproblematic. You can sometimes have ambiguous calls, for example, if one function takes the base class type the other function's parameter but that'd simply cause a compiler error. Nothing that I would call "strange".
Jun 05 2004
<hellcatv hotmail.com> wrote in message news:c9tjiv$3e$1 digitaldaemon.com...I think these features to enable catching errors at compile time arenecessary.however I was wondering if there are a few shortcuts. I'm running into two situations: first (and simplest) I have a file ftoa with a single class or function ... lets say char[] ftoa(real); in my other file I say import ftoa; alias ftoa.ftoa ftoa; nope! can't do it. Is there any way I can call it ftoa without having to name my file different than my function (or especially class as the case maybe) There's no way to distinguish the names if you don't name them something different.secondly: it would be nice if I could alias everything in a module; alias ftoa.* *; or something :-)Seems a little too easy <g>.
Jun 05 2004
Walter wrote:"Hauke Duden" <H.NS.Duden gmx.net> wrote in message news:c9thhq$2urf$1 digitaldaemon.com...Heh, the other reason being the implicit conversion between char and int, of course (hint,hint) ;). But I get your point. Thanks for the example. HaukeI distictly remember getting a much more misleading error when I experimented with overloads some time ago, though. Was there a related compiler error in earlier DMD versions?That's possible. I don't remember.reasonThere is an ambiguity, and the compiler issues an error for it. Theproblem,it behaves this way is to avoid the C++ global namespace pollutionotherwhere two completely unrelated functions in two unrelated source files happen to have the same name, and inadvertantly overload against eachSuppose, in file 'a.h', you have: void output(int); void output(long); which sends its argument to stdout. You download 'b.h' off the net, which has: void output(char); buried in it somewhere which writes its argument out to the serial port. Now, #include "a.h" output('c'); and all is fine. Now, #include "a.h" #include "b.h" output('c'); and your program breaks at runtime, possibly in invisible ways.causing some very strange errors.What kind of strange errors are these? It seems to me that overloads with different argument types are unproblematic. You can sometimes have ambiguous calls, for example, if one function takes the base class type the other function's parameter but that'd simply cause a compiler error. Nothing that I would call "strange".
Jun 05 2004
"Walter" <newshound digitalmars.com> escribió en el mensaje news:c9tf74$2rrl$1 digitaldaemon.com | The error message I get is: | unichar.d(2): function isSeparator conflicts with funkyMenu.isSeparator at | funkyMenu.d(3) | I believe there's a problem with this message. Suppose you're compiling a whole bunch of files and you get messages like this one (like what happened to me trying to compile mango beta 7), how could you possibly know where the conflict is? It should rather say something like "myApp.d(9): do you mean unichar.isSeparator or funkyMenu.isSeparator?". ----------------------- Carlos Santander Bernal
Jun 06 2004
Carlos Santander B. wrote:"Walter" <newshound digitalmars.com> escribió en el mensaje news:c9tf74$2rrl$1 digitaldaemon.com | The error message I get is: | unichar.d(2): function isSeparator conflicts with funkyMenu.isSeparator at | funkyMenu.d(3) | I believe there's a problem with this message. Suppose you're compiling a whole bunch of files and you get messages like this one (like what happened to me trying to compile mango beta 7), how could you possibly know where the conflict is? It should rather say something like "myApp.d(9): do you mean unichar.isSeparator or funkyMenu.isSeparator?".Yes, in fact I think it'd be ideal to provide the 3 locations involved: caller and 2 definitions... myApp.d(9): ambiguous "isSeparator" = unichar.d(946) or funkyMenu.d(86). This could be tremendously helpful! As libraries get more complicated, these issues get harder and harder for the code maintainer to track down.----------------------- Carlos Santander Bernal-- Justin (a/k/a jcc7) http://jcc_7.tripod.com/d/
Jun 07 2004
Hauke Duden wrote:In C++ there'd be no such problem because the call is not actually ambiguous! It is perfectly clear that the MenuItem version is the one that should be called. That's what I mean and that's the reason why I don't want to define any global functions with names that may also occur in other contexts. Otherwise there may be weird effects for the library's user like working code failing to compile once another import is added - even though there is no ambiguity.I prefer to think of modules as C++ namespaces. And in C++ I rarely import symbols with a "using" declaration, but rather fully qualify them: std::cout, etc. So why not the same thing here? unichar.toLower, etc. Or come up with a shorter module name if that one is too long. One thing I haven't tried... is it possible to import a package and still be required to provide module names when referring to symbols stored within each module in that package? That would be ideal. Sean
Jun 05 2004
In article <c9tghe$2tec$1 digitaldaemon.com>, Sean Kelly wrote:Hauke Duden wrote:In C++, that's a good habit. But in D you don't have to, because the potential conflicts between modules/namespaces will be detected automatically by the compiler. In short: If you want to be absolutely sure, C++ forces you to specify everything. In D you can use unqualified names as much as you like, and only when it is necessary to resolve the conflict you have to fully qualify them. Productive (because you're not likely to bump into a conflict too often) and safe (no surprises when you do). -Antti -- I will not be using Plan 9 in the creation of weapons of mass destruction to be used by nations other than the US.In C++ there'd be no such problem because the call is not actually ambiguous! It is perfectly clear that the MenuItem version is the one that should be called. That's what I mean and that's the reason why I don't want to define any global functions with names that may also occur in other contexts. Otherwise there may be weird effects for the library's user like working code failing to compile once another import is added - even though there is no ambiguity.I prefer to think of modules as C++ namespaces. And in C++ I rarely import symbols with a "using" declaration, but rather fully qualify them: std::cout, etc. So why not the same thing here? unichar.toLower, etc. Or come up with a shorter module name if that one is too long.
Jun 06 2004
I have updated the unichar module incorporating (most of) Walter's suggestions and also written a utype module as a drop-in replacement for ctype. Available here: http://www.hazardarea.com/unichar.zip Hauke
Jun 05 2004