digitalmars.D.learn - Read a unicode character from the terminal
- Jacob Carlborg (6/6) Mar 31 2012 How would I read a unicode character from the terminal? I've tried using...
- =?UTF-8?B?QWxpIMOHZWhyZWxp?= (65/70) Mar 31 2012 I recommend using stdin. The destiny of std.cstream is uncertain and
- Jordi Sayol (4/4) Mar 31 2012 Many thanks to be so educational.
- Jordi Sayol (7/7) Mar 31 2012 BTW, for those who do not know, Ali =C3=87ehreli is writing a book to le...
- =?UTF-8?B?QWxpIMOHZWhyZWxp?= (8/12) Mar 31 2012 Thank you very much for the free plug! :)
- =?UTF-8?B?QWxpIMOHZWhyZWxp?= (67/70) Mar 31 2012 Here is a Unicode character range, which is unfortunately pretty
- Jacob Carlborg (11/18) Apr 01 2012 Ok, what's the differences compared to the example in your first post:
- =?UTF-8?B?QWxpIMOHZWhyZWxp?= (14/32) Apr 01 2012 No difference in that example because it consumes the entire input as
- Jacob Carlborg (4/17) Apr 01 2012 Ok, I see, thanks.
- Jacob Carlborg (9/73) Apr 01 2012 Yeah, exactly. When I think about it, I don't know why I thought "getc"
- Stewart Gordon (7/10) Mar 31 2012 What OS are you using?
- Jacob Carlborg (5/16) Apr 01 2012 I'll have a look, thanks.
- Stewart Gordon (8/11) Apr 04 2012 The D2 version is now up on the site.
- Jacob Carlborg (10/17) Apr 04 2012 Sure I can help you with testing. I have a lot on my own table so I
- Stewart Gordon (7/11) Apr 04 2012 Just to hold some miscellaneous utility classes/structs/functions.
- Jacob Carlborg (6/21) Apr 04 2012 Ok, I see. The functions that need a Posix implementation are mostly in
- Stewart Gordon (8/10) Apr 05 2012 Maybe it contains the code I need to finish datetime off. Though I can'...
- Jacob Carlborg (5/18) Apr 05 2012 http://dlang.org/phobos/std_getopt.html
- Stewart Gordon (5/7) Apr 07 2012 Where is the code in std.getopt that has any relevance whatsoever to
- Jacob Carlborg (5/13) Apr 07 2012 Both std.getopt and mjg.libs.util.commandline handle command line
- Stewart Gordon (10/12) Apr 07 2012 What's that to do with anything?
- Jacob Carlborg (4/16) Apr 07 2012 I don't know what your module is supposed to do.
- Stewart Gordon (8/9) Apr 07 2012 Then how about reading its documentation?
- Jacob Carlborg (14/18) Apr 04 2012 I solved it like this:
How would I read a unicode character from the terminal? I've tried using "std.cstream.din.getc" but it seems to only work for ascii characters. If I try to read and print something that isn't ascii, it just prints a question mark. -- /Jacob Carlborg
Mar 31 2012
On 03/31/2012 08:56 AM, Jacob Carlborg wrote:How would I read a unicode character from the terminal? I've tried using "std.cstream.din.getc"I recommend using stdin. The destiny of std.cstream is uncertain and stdin is sufficient. (I know that it lacks support for BOM but I don't need them.)but it seems to only work for ascii characters. If I try to read and print something that isn't ascii, it just prints a question mark.The word 'character' used to mean characters of the Latin-based alphabets but with Unicode support that's not the case anymore. In D, 'character' means UTF code unit, nothing else. Unfortunately, although 'Unidode character' is just the correct term to use, it conflicts with D's characters which are not Unicode characters. 'Unicode code point' is the non-conflicting term that matches what we mean with 'Unicode character.' Only dchar can hold code points. That's the part about characters. The other side is what is being fed into the program through its standard input. On my Linux consoles, the text comes as a stream of chars, i.e. a UTF-8 encoded text. You must ensure that your terminal is capable of supporting Unicode through its settings. On Windows terminals, one must enter 'chcp 65001' to set the terminal to UTF-8. Then, it is the program that must know what the data represents. If you are expecting a Unicode code point, then you may think that is should be as simple as reading into a dchar: import std.stdio; void main() { dchar letter; readf("%s", &letter); // <-- does not work! writeln(letter); } The output: $ ./deneme ç Ã <-- will be different on different consoles The problem is, char can implicitly be converted to dchar. Since the letter ç consists of two chars (two UTF-8 code units), dchar gets the first one converted as a dchar. To see this, read and write two chars in a loop without a newline in between: import std.stdio; void main() { foreach (i; 0 .. 2) { char code; readf("%s", &code); write(code); } writeln(); } This time two code units are read and then outputted to form a Unicode character on the console: $ ./deneme ç ç <-- result of two write(code) expressions The solution is to use ranges when pulling Unicode characters out of strings. std.stdin does not provide this yet, but it will eventually happen (so I've heard :)). For now, this is a way of getting Unicode characters from the input: import std.stdio; void main() { string line = readln(); foreach (dchar c; line) { writeln(c); } } Once you have the input as a string, std.utf.decode can also be used. Ali
Mar 31 2012
Many thanks to be so educational. Best regards, -- Jordi Sayol
Mar 31 2012
BTW, for those who do not know, Ali =C3=87ehreli is writing a book to lea= rn "D" from scratch. It's very educational. There are two formats: HTML (on-line) and PDF. http://ddili.org/ders/d.en/index.html Best regards, --=20 Jordi Sayol
Mar 31 2012
On 03/31/2012 02:31 PM, Jordi Sayol wrote:BTW, for those who do not know, Ali Çehreli is writing a book to learn "D" from scratch. It's very educational. There are two formats: HTML (on-line) and PDF. http://ddili.org/ders/d.en/index.html Best regards,Thank you very much for the free plug! :) I have translated eleven more chapters since the last announcement. I am on the assert chapter as we speak. It is taking longer than I had expected because I constantly make improvements to the original: corrections, consistency improvements, additions, adapting code samples to the current state of D, etc. Ali
Mar 31 2012
On 03/31/2012 11:53 AM, Ali Çehreli wrote:The solution is to use ranges when pulling Unicode characters out of strings. std.stdin does not provide this yet, but it will eventually happen (so I've heard :)).Here is a Unicode character range, which is unfortunately pretty inefficient because it relies on an exception that is thrown from isValidDchar! :p import std.stdio; import std.utf; import std.array; struct UnicodeRange { File file; char[4] codes; bool ready; this(File file) { this.file = file; this.ready = false; } bool empty() const property { return file.eof(); } dchar front() const property { if (!ready) { // Sorry, no 'mutable' in D! :p UnicodeRange * mutable_this = cast(UnicodeRange*)&this; mutable_this.readNext(); } return codes.front; } void popFront() { codes = codes.init; ready = false; } void readNext() { foreach (ref code; codes) { file.readf("%s", &code); if (file.eof()) { codes[] = '\0'; ready = false; break; } // Expensive way of determining "ready"! try { if (isValidDchar(codes.front)) { ready = true; break; } } catch (Exception) { // not ready } } } } UnicodeRange byUnicode(File file = stdin) { return UnicodeRange(file); } void main() { foreach(c; byUnicode()) { writeln(c); } } Ali
Mar 31 2012
On 2012-04-01 01:17, Ali Çehreli wrote:On 03/31/2012 11:53 AM, Ali Çehreli wrote: > The solution is to use ranges when pulling Unicode characters out of > strings. std.stdin does not provide this yet, but it will eventually > happen (so I've heard :)). Here is a Unicode character range, which is unfortunately pretty inefficient because it relies on an exception that is thrown from isValidDchar! :pOk, what's the differences compared to the example in your first post: void main() { string line = readln(); foreach (dchar c; line) { writeln(c); } } -- /Jacob Carlborg
Apr 01 2012
On 04/01/2012 05:00 AM, Jacob Carlborg wrote:On 2012-04-01 01:17, Ali Çehreli wrote:No difference in that example because it consumes the entire input as dchars. But in general, with that inefficient range, it is possible to pull just one dchar from the input and leave the rest of the stream untouched. For example, it would be possible to readf() an int right after that: auto u = byUnicode(); dchar d = u.front; // <-- reads just one dchar from the range int i; readf("%s", &i); // <-- continues with std.stdio functions writeln(i); With the getline() method, the int must be looked up in the line first, then from the input. AliOn 03/31/2012 11:53 AM, Ali Çehreli wrote:Ok, what's the differences compared to the example in your first post: void main() { string line = readln(); foreach (dchar c; line) { writeln(c); } }The solution is to use ranges when pulling Unicode characters out of strings. std.stdin does not provide this yet, but it will eventually happen (so I've heard :)).Here is a Unicode character range, which is unfortunately pretty inefficient because it relies on an exception that is thrown from isValidDchar! :p
Apr 01 2012
On 2012-04-01 16:02, Ali Çehreli wrote:No difference in that example because it consumes the entire input as dchars. But in general, with that inefficient range, it is possible to pull just one dchar from the input and leave the rest of the stream untouched. For example, it would be possible to readf() an int right after that: auto u = byUnicode(); dchar d = u.front; // <-- reads just one dchar from the range int i; readf("%s", &i); // <-- continues with std.stdio functions writeln(i); With the getline() method, the int must be looked up in the line first, then from the input. AliOk, I see, thanks. -- /Jacob Carlborg
Apr 01 2012
On 2012-03-31 20:53, Ali Çehreli wrote:I recommend using stdin. The destiny of std.cstream is uncertain and stdin is sufficient. (I know that it lacks support for BOM but I don't need them.)I thought std.cstream was a stream wrapper around stdin.The word 'character' used to mean characters of the Latin-based alphabets but with Unicode support that's not the case anymore. In D, 'character' means UTF code unit, nothing else. Unfortunately, although 'Unidode character' is just the correct term to use, it conflicts with D's characters which are not Unicode characters. 'Unicode code point' is the non-conflicting term that matches what we mean with 'Unicode character.' Only dchar can hold code points. That's the part about characters.Yeah, exactly. When I think about it, I don't know why I thought "getc" would work since it only returns a "char" and not a "dchar".The other side is what is being fed into the program through its standard input. On my Linux consoles, the text comes as a stream of chars, i.e. a UTF-8 encoded text. You must ensure that your terminal is capable of supporting Unicode through its settings. On Windows terminals, one must enter 'chcp 65001' to set the terminal to UTF-8.I'm on Mac OS X, the terminal is capable of handling Unicode.Then, it is the program that must know what the data represents. If you are expecting a Unicode code point, then you may think that is should be as simple as reading into a dchar: import std.stdio; void main() { dchar letter; readf("%s", &letter); // <-- does not work! writeln(letter); } The output: $ ./deneme ç Ã <-- will be different on different consolesI tried that as well.The problem is, char can implicitly be converted to dchar. Since the letter ç consists of two chars (two UTF-8 code units), dchar gets the first one converted as a dchar. To see this, read and write two chars in a loop without a newline in between: import std.stdio; void main() { foreach (i; 0 .. 2) { char code; readf("%s", &code); write(code); } writeln(); } This time two code units are read and then outputted to form a Unicode character on the console: $ ./deneme ç ç <-- result of two write(code) expressions The solution is to use ranges when pulling Unicode characters out of strings. std.stdin does not provide this yet, but it will eventually happen (so I've heard :)). For now, this is a way of getting Unicode characters from the input: import std.stdio; void main() { string line = readln(); foreach (dchar c; line) { writeln(c); } } Once you have the input as a string, std.utf.decode can also be used. AliI'll give that a try, thanks. -- /Jacob Carlborg
Apr 01 2012
On 31/03/2012 16:56, Jacob Carlborg wrote:How would I read a unicode character from the terminal? I've tried using "std.cstream.din.getc" but it seems to only work for ascii characters. If I try to read and print something that isn't ascii, it just prints a question mark.What OS are you using? And what codepage is the console set to? You might want to try the console module in my utility library: http://pr.stewartsplace.org.uk/d/sutil/ (For D1 at the moment, but a D2 version will be available any day now!) Stewart.
Mar 31 2012
On 2012-04-01 00:14, Stewart Gordon wrote:On 31/03/2012 16:56, Jacob Carlborg wrote:I'm using Mac OS X and the terminal is set to handle UTF-8.How would I read a unicode character from the terminal? I've tried using "std.cstream.din.getc" but it seems to only work for ascii characters. If I try to read and print something that isn't ascii, it just prints a question mark.What OS are you using? And what codepage is the console set to?You might want to try the console module in my utility library: http://pr.stewartsplace.org.uk/d/sutil/ (For D1 at the moment, but a D2 version will be available any day now!) Stewart.I'll have a look, thanks. -- /Jacob Carlborg
Apr 01 2012
On 31/03/2012 23:14, Stewart Gordon wrote: <snip>You might want to try the console module in my utility library: http://pr.stewartsplace.org.uk/d/sutil/ (For D1 at the moment, but a D2 version will be available any day now!)The D2 version is now up on the site. Jacob - would you be up for helping me with testing/implementation of my library on Mac OS? If you do a search for "todo" you'll see what needs to be done. Some of it will benefit Unix-type systems generally. If perchance you have a big-endian CPU, testing the bit arrays on it would also be of value. Stewart.
Apr 04 2012
On 2012-04-04 18:06, Stewart Gordon wrote:The D2 version is now up on the site. Jacob - would you be up for helping me with testing/implementation of my library on Mac OS? If you do a search for "todo" you'll see what needs to be done. Some of it will benefit Unix-type systems generally. If perchance you have a big-endian CPU, testing the bit arrays on it would also be of value. Stewart.Sure I can help you with testing. I have a lot on my own table so I don't have any time for implementing things (maybe some small things). If I may ask, what is the point of this library? Doesn't it duplicate functionally that's already available in Phobos and/or Tango? For Mac OS X, if you just follow the Posix standard you'll get very far. I have an x86 CPU, there were a couple of years ago since Apple last had a PPC based computer. -- /Jacob Carlborg
Apr 04 2012
On 04/04/2012 17:37, Jacob Carlborg wrote: <snip>Sure I can help you with testing. I have a lot on my own table so I don't have any time for implementing things (maybe some small things). If I may ask, what is the point of this library?Just to hold some miscellaneous utility classes/structs/functions.Doesn't it duplicate functionally that's already available in Phobos and/or Tango?<snip> It certainly does in places. But what matters is that it contains functionality that isn't present in Phobos (or wasn't present in Phobos at the time I wrote it). Stewart.
Apr 04 2012
On 2012-04-05 01:21, Stewart Gordon wrote:On 04/04/2012 17:37, Jacob Carlborg wrote: <snip>Ok, I see. The functions that need a Posix implementation are mostly in datetime and commandline, if I recall correctly. These are already present in Phobos? -- /Jacob CarlborgSure I can help you with testing. I have a lot on my own table so I don't have any time for implementing things (maybe some small things). If I may ask, what is the point of this library?Just to hold some miscellaneous utility classes/structs/functions.Doesn't it duplicate functionally that's already available in Phobos and/or Tango?<snip> It certainly does in places. But what matters is that it contains functionality that isn't present in Phobos (or wasn't present in Phobos at the time I wrote it). Stewart.
Apr 04 2012
On 05/04/2012 07:18, Jacob Carlborg wrote: <snip>Ok, I see. The functions that need a Posix implementation are mostly in datetime and commandline, if I recall correctly. These are already present in Phobos?Maybe it contains the code I need to finish datetime off. Though I can't really just copy someone else's code, I suppose I can at least see what functions it uses. I haven't noticed much along the lines of command line manipulation in Phobos - only the code (now in druntime) to populate the args argument to main (which under Posix it just uses argc/argv from the C main). Or is there something I haven't found? Stewart.
Apr 05 2012
On 2012-04-05 12:55, Stewart Gordon wrote:On 05/04/2012 07:18, Jacob Carlborg wrote: <snip>http://dlang.org/phobos/std_getopt.html But it might not do what you want. -- /Jacob CarlborgOk, I see. The functions that need a Posix implementation are mostly in datetime and commandline, if I recall correctly. These are already present in Phobos?Maybe it contains the code I need to finish datetime off. Though I can't really just copy someone else's code, I suppose I can at least see what functions it uses. I haven't noticed much along the lines of command line manipulation in Phobos - only the code (now in druntime) to populate the args argument to main (which under Posix it just uses argc/argv from the C main). Or is there something I haven't found? Stewart.
Apr 05 2012
On 05/04/2012 14:51, Jacob Carlborg wrote: <snip>http://dlang.org/phobos/std_getopt.html But it might not do what you want.Where is the code in std.getopt that has any relevance whatsoever to what smjg.libs.util.datetime or smjg.libs.util.commandline is for? Stewart.
Apr 07 2012
On 2012-04-07 14:36, Stewart Gordon wrote:On 05/04/2012 14:51, Jacob Carlborg wrote: <snip>Both std.getopt and mjg.libs.util.commandline handle command line arguments? -- /Jacob Carlborghttp://dlang.org/phobos/std_getopt.html But it might not do what you want.Where is the code in std.getopt that has any relevance whatsoever to what smjg.libs.util.datetime or smjg.libs.util.commandline is for? Stewart.
Apr 07 2012
On 07/04/2012 17:54, Jacob Carlborg wrote: <snip>Both std.getopt and mjg.libs.util.commandline handle command line arguments?What's that to do with anything? If the code I need to finish smjg.libs.util.commandline is somewhere in std.getopt, please tell me where exactly it is. If it isn't, then why did you refer me to it? That's like telling someone who's writing a bigint library and struggling to implement multiplication to just look in std.math. After all, they both handle numbers. Stewart.
Apr 07 2012
On 2012-04-07 19:57, Stewart Gordon wrote:On 07/04/2012 17:54, Jacob Carlborg wrote: <snip>I don't know what your module is supposed to do. -- /Jacob CarlborgBoth std.getopt and mjg.libs.util.commandline handle command line arguments?What's that to do with anything? If the code I need to finish smjg.libs.util.commandline is somewhere in std.getopt, please tell me where exactly it is. If it isn't, then why did you refer me to it? That's like telling someone who's writing a bigint library and struggling to implement multiplication to just look in std.math. After all, they both handle numbers. Stewart.
Apr 07 2012
On 07/04/2012 20:16, Jacob Carlborg wrote: <snip>I don't know what your module is supposed to do.Then how about reading its documentation? http://pr.stewartsplace.org.uk/d/sutil/doc/commandline.html If there's something you don't understand about it, this is the issue that needs to be addressed, rather than wildly guessing that some Phobos module provides the answer. Stewart.
Apr 07 2012
On 2012-03-31 17:56, Jacob Carlborg wrote:How would I read a unicode character from the terminal? I've tried using "std.cstream.din.getc" but it seems to only work for ascii characters. If I try to read and print something that isn't ascii, it just prints a question mark.I solved it like this: dchar readChar () { char[4] buffer; buffer[0] = din.getc(); auto len = codeLength!(char)(buffer[0]); foreach (i ; 1 .. len) buffer[i] = din.getc(); size_t i; return decode(buffer, i); } -- /Jacob Carlborg
Apr 04 2012