D - char vs ascii
- Walter (4/4) Aug 14 2001 What do people think about using the keyword:
- Jan Knepper (3/7) Aug 14 2001 I guess ascii makes more sence than char and unicode makes more
- Erik Funkenbusch (26/33) Aug 14 2001 Just some suggestions that come to mind, in no particular order or
- Walter (32/56) Aug 15 2001 set.
- Ivan Frohne (9/9) Aug 15 2001 There's something clean and neat about calling things
- Walter (3/12) Aug 15 2001 I suspect that ascii and unicode are trademarked names!
-
Jan Knepper
(4/5)
Aug 15 2001
- Sheldon Simms (9/15) Aug 16 2001 Well it seems to be that you already have standard sizes integral
- Walter (2/15) Aug 17 2001 It seems useful to be able to overload char and byte separately.
- c. keith ray (25/35) Apr 29 2002 Perhaps some consideration of an existing long-lived internationalized c...
- weingart cs.ualberta.ca (Tobias Weingartner) (13/17) Aug 16 2001 Ascii makes little sense. In most cases where it is used (other than
- Jeff Frohwein (22/27) Aug 16 2001 I personally think C might have started a bad habit by using
- Charles Hixson (6/29) Aug 17 2001 That's a good idea. These could be the basic language defined types,
- Walter (4/5) Aug 17 2001 Oh, I am reading all of this stuff. It's a lot of fun, and people have g...
- Russell Bornschlegel (5/11) Aug 16 2001 My votes would be "char" and "unicode". Erik makes a good case
- Walter (3/14) Aug 17 2001 Oh, I hate the "_t" suffix too. I'd love to name it unicode, but since t...
- Walter (7/9) Aug 17 2001 I checked. Unicode is a registered trademark of Unicode, Inc. They
- Kent Sandvik (4/7) Aug 17 2001 XML uses UTF, so you could think about using 'utf' as one possible
- Russ Lewis (3/11) Aug 17 2001 Any clarification what UTF might mean? It's not necessarily obvious. N...
- Kent Sandvik (9/22) Aug 17 2001 Goodle is our friend. UTF or actually UTF-8 is one encoding scheme, stan...
- weingart cs.ualberta.ca (Tobias Weingartner) (11/17) Aug 20 2001 Please don't. I say, make form follow function. wchar is a throwback
- Walter (3/6) Aug 20 2001 I frequently want to overload characters differently than bytes, so usin...
- weingart cs.ualberta.ca (Tobias Weingartner) (12/21) Aug 22 2001 That's exactly what I'm saying. For charaters, use the character type.
What do people think about using the keyword: ascii or char? unicode or wchar? -Walter
Aug 14 2001
I guess ascii makes more sence than char and unicode makes more sence than wchar or wchar_t... Walter wrote:What do people think about using the keyword: ascii or char? unicode or wchar? -Walter
Aug 14 2001
Just some suggestions that come to mind, in no particular order or coherance: No, ascii makes little sense. ascii refers explicitly to one character set. There are many 8 bit character sets or locales or code pages or whatever you want to call them. Also, unicode can be 8 bit or 16 bit, and there is talk of a 32 bit as well in the future. I think any language that expects to stick around for any length of time needs to address the forward compatibility of new code sets. I'd much rather see a way to define your character type and use it throughout your program. Also remember that you might be creating an application that needs to display multiple character sets simultaneously (for instance, both English and Japanese). Now, while much of this will be OS specific, and doesn't belong in a language, you at least need some way to deal with such things cleanly in that language. char_t and wchar_t do not provide specific sizes, but can be implementation defined. I'd say define the types. char8 and char16, this allows char32 or char64 (or char12 for that matter, remember that some CPU's have non-standard word sizes). An alternative would be a syntax like char(8) or char(16), perhaps even a simple "char" and a modifier like "unicode(16) char" Finally, I might suggest doing away with char all together and making the entire language unicode. On platforms that don't support it, provide a seamless mapping mechanism to downconvert 16 bit chars to 8 bit. "Jan Knepper" <jan smartsoft.cc> wrote in message news:3B79CF33.94F71602 smartsoft.cc...I guess ascii makes more sence than char and unicode makes more sence than wchar or wchar_t... Walter wrote:What do people think about using the keyword: ascii or char? unicode or wchar?
Aug 14 2001
"Erik Funkenbusch" <erikf seahorsesoftware.com> wrote in message news:9lcsqr$2s9p$1 digitaldaemon.com...Just some suggestions that come to mind, in no particular order or coherance: No, ascii makes little sense. ascii refers explicitly to one characterset.There are many 8 bit character sets or locales or code pages or whateveryouwant to call them.Yes, I think it should just be called "char" and it will be an unsigned 8 bit type.Also, unicode can be 8 bit or 16 bit, and there is talk of a 32 bit aswellin the future. I think any language that expects to stick around for any length of time needs to address the forward compatibility of new codesets. 32 bit wchar_t's are a reality on linux now. I think it will work out best to just make a wchar type and it will map to whatever the wchar_t is for the local native C compiler.I'd much rather see a way to define your character type and use it throughout your program. Also remember that you might be creating an application that needs to display multiple character sets simultaneously (for instance, both English and Japanese).I've found I've wanted to support both ascii and unicode simultaneously in programs, hence I thought two different types was appropriate. I was constantly irritated by having to go through and either subtract or add L's in front of the strings. The macros to do it automatically are ugly. Hence, the idea that the string literals should be implicitly convertible to either char[] or wchar[]. Next, there is the D typedef facility, which actually does introduce a new, overloadable type. So, you could: typedef char mychar; or typedef wchar mychar; and through the magic of overloading <g> the rest of the code should not need changing.Now, while much of this will be OS specific, and doesn't belong in a language, you at least need some way to deal with such things cleanly in that language. char_t and wchar_t do not provide specific sizes, but canbeimplementation defined. I'd say define the types. char8 and char16, this allows char32 or char64 (or char12 for that matter, remember that some CPU's have non-standardwordsizes). An alternative would be a syntax like char(8) or char(16), perhaps even a simple "char" and a modifier like "unicode(16) char" Finally, I might suggest doing away with char all together and making the entire language unicode. On platforms that don't support it, provide a seamless mapping mechanism to downconvert 16 bit chars to 8 bit.Java went the way of chucking ascii entirely. While that makes sense for a web language, I think for systems languages ascii is going to be around for a long time, so might as well make it easy to deal with! Ascii is really never going to be anything but an 8 bit type - it is unicode with the varying size. Hence I think having a wchar type of a varying size is the way to go.
Aug 15 2001
There's something clean and neat about calling things what they are. Instead of larding up your code with typedef char ascii typedef wchar unicode why not just use 'ascii' and 'unicode' in the first place? Save the typedefs for typedef ascii ebcdic Now, about that cast notation .... --Ivan Frohne
Aug 15 2001
I suspect that ascii and unicode are trademarked names! "Ivan Frohne" <frohne gci.net> wrote in message news:9lf20l$11og$1 digitaldaemon.com...There's something clean and neat about calling things what they are. Instead of larding up your code with typedef char ascii typedef wchar unicode why not just use 'ascii' and 'unicode' in the first place? Save the typedefs for typedef ascii ebcdic Now, about that cast notation .... --Ivan Frohne
Aug 15 2001
<g> I had not thought of that one! Jan Walter wrote:I suspect that ascii and unicode are trademarked names!
Aug 15 2001
Im Artikel <9levtq$10ji$1 digitaldaemon.com> schrieb "Walter" <walter digitalmars.com>:I've found I've wanted to support both ascii and unicode simultaneously in programs, hence I thought two different types was appropriate. I was constantly irritated by having to go through and either subtract or add L's in front of the strings. The macros to do it automatically are ugly. Hence, the idea that the string literals should be implicitly convertible to either char[] or wchar[].Well it seems to be that you already have standard sizes integral types: byte, short, int, long. Why not make char be a 2 or 4-byte unicode char and use the syntax byte[] str = "My ASCII string"; for ascii? -- Sheldon Simms / sheldon semanticedge.com
Aug 16 2001
Sheldon Simms wrote in message <9lgvsh$2jb7$1 digitaldaemon.com>...Im Artikel <9levtq$10ji$1 digitaldaemon.com> schrieb "Walter" <walter digitalmars.com>:It seems useful to be able to overload char and byte separately.I've found I've wanted to support both ascii and unicode simultaneously in programs, hence I thought two different types was appropriate. I was constantly irritated by having to go through and either subtract or add L's in front of the strings. The macros to do it automatically are ugly. Hence, the idea that the string literals should be implicitly convertible to either char[] or wchar[].Well it seems to be that you already have standard sizes integral types: byte, short, int, long. Why not make char be a 2 or 4-byte unicode char and use the syntax byte[] str = "My ASCII string"; for ascii?
Aug 17 2001
In article <9lih2u$10ca$2 digitaldaemon.com>, Walter says...Sheldon Simms wrote in message <9lgvsh$2jb7$1 digitaldaemon.com>...Perhaps some consideration of an existing long-lived internationalized class library would be appropriate... [Cocoa] Representing strings as objects allows you to use strings wherever you use other objects. It also provides the benefits of encapsulation, so that string objects can use whatever encoding and storage [single-byte, multi-byte, or unicode] is needed for efficiency while simply appearing as arrays of characters. The class-cluster's two public classes, NSString and NSMutableString, declare the programmatic interface for noneditable and editable strings, respectively. Even though a string presents itself as an array of Unicode characters (Unicode is a registered trademark of Unicode, Inc.) its internal representation could be otherwise... A class cluster is one public class, whose visible 'constructors' (aka 'factory methods') instantiate appropriate hidden subclasses. So UnicodeString subclass, JapaneseShiftJISString subclass, ChineseBigFiveString subclass, and AsciiString subclass are hidden, but their parent classes visible. [I made up those names. Since they are hidden, it doesn't matter how many subclasses of NSString and NSMutableString there are - they all conform to the same public interface.] I believe the objective-C compiler translates "some string" into an NSString (I'm not sure if the compiler supports unicode string constants yet.) --- C. Keith Ray <http://homepage.mac.com/keithray/resume2.html> <http://homepage.mac.com/keithray/xpminifaq.html>Im Artikel <9levtq$10ji$1 digitaldaemon.com> schrieb "Walter" <walter digitalmars.com>:I've found I've wanted to support both ascii and unicode simultaneously in programs, hence I thought two different types was appropriate. I was constantly irritated by having to go through and either subtract or add L's in front of the strings. The macros to do it automatically are ugly. Hence, the idea that the string literals should be implicitly convertible to either char[] or wchar[].
Apr 29 2002
"c. keith ray" <c._member pathlink.com> wrote in message news:aak0n0$14gi$1 digitaldaemon.com... <http://developer.apple.com/techpubs/macosx/Cocoa/Reference/Foundation/ObjC_ It's a great idea, but it appears to be copyrighted by Apple.
Apr 29 2002
In article <aal1h7$23cc$1 digitaldaemon.com>, "Walter" <walter digitalmars.com> wrote:"c. keith ray" <c._member pathlink.com> wrote in message news:aak0n0$14gi$1 digitaldaemon.com... <http://developer.apple.com/techpubs/macosx/Cocoa/Reference/Foundation/ObjC_ It's a great idea, but it appears to be copyrighted by Apple.See also: <http://www.gnustep.org/> The objective-C version of Apple's Foundation library (which defines the String classes) is not open-source. The C version is open-source and has equivalent functionality. Apple's open-source license is at: <http://www.opensource.apple.com/apsl/> The c version of the Foundation library is at: <http://www.opensource.apple.com/projects/darwin/1.4/projects.html> look for "CoreFoundation 226-14.1 Core Foundation tool kit" -- C. Keith Ray <http://homepage.mac.com/keithray/xpminifaq.html>
Apr 30 2002
Ok, Apple's open source license looks like it can be used. Do you want to take the lead in converting it to D? "Keith Ray" <k1e2i3t4h5r6a7y 1m2a3c4.5c6o7m> wrote in message news:k1e2i3t4h5r6a7y-9BED6F.07530430042002 digitalmars.com...In article <aal1h7$23cc$1 digitaldaemon.com>, "Walter" <walter digitalmars.com> wrote:<http://developer.apple.com/techpubs/macosx/Cocoa/Reference/Foundation/ObjC_"c. keith ray" <c._member pathlink.com> wrote in message news:aak0n0$14gi$1 digitaldaemon.com...It's a great idea, but it appears to be copyrighted by Apple.See also: <http://www.gnustep.org/> The objective-C version of Apple's Foundation library (which defines the String classes) is not open-source. The C version is open-source and has equivalent functionality. Apple's open-source license is at: <http://www.opensource.apple.com/apsl/> The c version of the Foundation library is at: <http://www.opensource.apple.com/projects/darwin/1.4/projects.html> look for "CoreFoundation 226-14.1 Core Foundation tool kit" -- C. Keith Ray <http://homepage.mac.com/keithray/xpminifaq.html>
Apr 30 2002
In article <aamfkp$2p6o$1 digitaldaemon.com>, "Walter" <walter digitalmars.com> wrote:Ok, Apple's open source license looks like it can be used. Do you want to take the lead in converting it to D?... in my extensive free time? I wish I did have time for that... I have the desire to implement an OO language very similar to Smalltalk [objects all the way down] but with syntax more like JavaScript or Java without type declarations, using techniques from Threaded Interpreted Languages (kinds like Forth or Postscript). I do plan to look at D in more detail real soon now. PS: I'm a Macintosh user by choice (I spend most of day-job time programming on Windows), so I can't use your D compiler yet. -- C. Keith Ray <http://homepage.mac.com/keithray/xpminifaq.html>
Apr 30 2002
"Keith Ray" <k1e2i3t4h5r6a7y 1m2a3c4.5c6o7m> wrote in message news:k1e2i3t4h5r6a7y-35A6F7.20384830042002 digitalmars.com...PS: I'm a Macintosh user by choice (I spend most of day-job time programming on Windows), so I can't use your D compiler yet.If you want, you can also do a Mac port starting with the gnu compiler sources for the Mac.
May 03 2002
In article <9lchvd$2miu$1 digitaldaemon.com>, Walter wrote:What do people think about using the keyword: ascii or char? unicode or wchar?Ascii makes little sense. In most cases where it is used (other than for strings) is to get a "byte". Since you have a byte type, char is sort of redundant. IMHO it would be better to extend the string type (unicode, etc) to be able to specify a restricted subset. Unicode would be the superset (for strings, and the default if not contrained), and some other things (unicode.byte[10] string_of_10_byte_sized_positions) for restricting the type of "string" you have. -- Tobias Weingartner | Unix Guru, Admin, Systems-Dude Apt B 7707-110 St. | http://www.tepid.org/~weingart/ Edmonton, AB |------------------------------------------------- Canada, T6G 1G3 | %SYSTEM-F-ANARCHISM, The OS has been overthrown
Aug 16 2001
Walter wrote:What do people think about using the keyword: ascii or char? unicode or wchar?I personally think C might have started a bad habit by using types that were generally vague in nature. All I ask is that simplicity be given impartial consideration. Since we are all used to seeing types such as short, long, and int in code, perhaps it would be better for all of us to spend some time thinking about the following types rather than form an immediate opinion. I can easily identify with the fact that any unfamiliar looking types can look highly offensive to the newly or barely acquainted, as they did to me at one time: u8,s8,u16,s16,u32,s32,... Some will be adamantly opposed because they don't use these, or know anyone that does. SGI, for one, has used these types for Nintendo 64 development and now Nintendo is using them for GameBoy Advance development. There are probably others... As 128 bit and 256 bit systems are released, adding new types would be as easy as u128,s128,u256,s256... rather than have to consider something like "long long long long", or a new name in general. Those that want to use vague types can always typedef their own types. Thanks for listening, :) Jeff
Aug 16 2001
Jeff Frohwein wrote:Walter wrote:That's a good idea. These could be the basic language defined types, and then a "standard library" could include typedefs for the types that people are more familiar with. This would allow code to be written that could either easily adapt to changing word sizes. Be fixed for particular sizes, or both. And still have it be fairly portable.What do people think about using the keyword: ascii or char? unicode or wchar?... u8,s8,u16,s16,u32,s32,... ... As 128 bit and 256 bit systems are released, adding new types would be as easy as u128,s128,u256,s256... rather than have to consider something like "long long long long", or a new name in general. Those that want to use vague types can always typedef their own types. Thanks for listening, :) Jeff
Aug 17 2001
Jeff Frohwein <"jeff " SPAMLESSdevrs.com>Thanks for listening, :)Oh, I am reading all of this stuff. It's a lot of fun, and people have great ideas. I'm a little surprised at the sheer volume of replies and comments! -Walter
Aug 17 2001
Walter wrote:What do people think about using the keyword: ascii or char? unicode or wchar?My votes would be "char" and "unicode". Erik makes a good case against "ascii". "wchar" and "wchar_t" are ugly C-committeeisms, IMO. -Russell B
Aug 16 2001
Oh, I hate the "_t" suffix too. I'd love to name it unicode, but since there is a Unicode, Inc., I don't think I can. Russell Bornschlegel wrote in message <3B7C4455.ADFB4496 estarcion.com>...Walter wrote:What do people think about using the keyword: ascii or char? unicode or wchar?My votes would be "char" and "unicode". Erik makes a good case against "ascii". "wchar" and "wchar_t" are ugly C-committeeisms, IMO. -Russell B
Aug 17 2001
Walter wrote in message <9lk4ij$2d7a$2 digitaldaemon.com>...Oh, I hate the "_t" suffix too. I'd love to name it unicode, but sincethereis a Unicode, Inc., I don't think I can.I checked. Unicode is a registered trademark of Unicode, Inc. They specifically say that "unicode" can't be included in a product. Oh well. I guess that's why the ANSI committee picked "wchar_t". Looks like "wchar" is what D will use. -Walter
Aug 17 2001
"Walter" <walter digitalmars.com> wrote in message news:9lk4vh$2dj3$1 digitaldaemon.com...I checked. Unicode is a registered trademark of Unicode, Inc. They specifically say that "unicode" can't be included in a product. Oh well. I guess that's why the ANSI committee picked "wchar_t".XML uses UTF, so you could think about using 'utf' as one possible keyword. --Kent
Aug 17 2001
Kent Sandvik wrote:"Walter" <walter digitalmars.com> wrote in message news:9lk4vh$2dj3$1 digitaldaemon.com...Any clarification what UTF might mean? It's not necessarily obvious. Neither is wchar...but it's closer.I checked. Unicode is a registered trademark of Unicode, Inc. They specifically say that "unicode" can't be included in a product. Oh well. I guess that's why the ANSI committee picked "wchar_t".XML uses UTF, so you could think about using 'utf' as one possible keyword. --Kent
Aug 17 2001
Goodle is our friend. UTF or actually UTF-8 is one encoding scheme, stands for UCS Transformation Format, and actually USC is more in line with the Unicode definition, or Universal Character Set. Anyway, if those buzz words are too unknown, then wchar_t maybe is the way to go. --Kent "Russ Lewis" <russ deming-os.org> wrote in message news:3B7D9CA1.6A25385C deming-os.org...Kent Sandvik wrote:well."Walter" <walter digitalmars.com> wrote in message news:9lk4vh$2dj3$1 digitaldaemon.com...I checked. Unicode is a registered trademark of Unicode, Inc. They specifically say that "unicode" can't be included in a product. OhNeitherAny clarification what UTF might mean? It's not necessarily obvious.I guess that's why the ANSI committee picked "wchar_t".XML uses UTF, so you could think about using 'utf' as one possible keyword. --Kentis wchar...but it's closer.
Aug 17 2001
In article <9lk4vh$2dj3$1 digitaldaemon.com>, Walter wrote:I checked. Unicode is a registered trademark of Unicode, Inc. They specifically say that "unicode" can't be included in a product. Oh well. I guess that's why the ANSI committee picked "wchar_t". Looks like "wchar" is what D will use.Please don't. I say, make form follow function. wchar is a throwback to some weird ansi'ism, having "wide char's". That's stupid. If you want to have D handle strings natively, *and* you want it to be some sort of internationalized version of a string, make it be a string, or even a "char", or "character". Make it sufficiently different from C, such that people will know. For 1-byte things, use the type "byte". Say what you mean, mean what you say. wchar? If you use UTF, it could be vchar (variable length), etc... --Toby.
Aug 20 2001
Tobias Weingartner wrote in message ...For 1-byte things, use the type "byte". Say what you mean, mean what you say. wchar? If you use UTF, it could be vchar (variable length), etc...I frequently want to overload characters differently than bytes, so using "byte" for ascii doesn't work well for me.
Aug 20 2001
In article <9lsc7e$1di3$2 digitaldaemon.com>, Walter wrote:Tobias Weingartner wrote in message ...That's exactly what I'm saying. For charaters, use the character type. An array of these could be a string. Could be that the base library (or the language if necessary) could define a string class as well (index entries are of type character). What I'm saying is that wchar is a bad name. They are not "wide" chars, but what you really want is a "character". So name it as such. A char can be anything, even variable length (UTF-8 for example). If you need byte-sized quantities in your program, use "byte". If you need a character (possibly byte, word, qword, or variable length), use character. --Toby.For 1-byte things, use the type "byte". Say what you mean, mean what you say. wchar? If you use UTF, it could be vchar (variable length), etc...I frequently want to overload characters differently than bytes, so using "byte" for ascii doesn't work well for me.
Aug 22 2001