digitalmars.D.learn - Best way to count character spaces.
- Taylor Hillegeist (34/34) Jun 30 2015 So I am aware that Unicode is not simple... I have been working
- Rikki Cattermole (10/43) Jun 30 2015 Well I would personally use isWhite[0].
- H. S. Teoh via Digitalmars-d-learn (13/27) Jun 30 2015 [...]
- Steven Schveighoffer (5/29) Jul 01 2015 BTW, this exercise would make an EXCELLENT blog post highlighting both
So I am aware that Unicode is not simple... I have been working on a boxes like project http://boxes.thomasjensen.com/ it basically puts a pretty border around stdin characters. like so: ________________________ /\ \ \_|Different all twisty a| |of in maze are you, | |passages little. | | ___________________|_ \_/_____________________/ but I find that I need to know a bit more than the length of the string because of encoding differences I had a thought at one point to do this: MyString.splitlines.map!(a => a.toUTF32.length).reduce!max(); Should get me the longest line. but this has a problem too because control characters might not take up space (backspace?). https://en.wikipedia.org/wiki/Unicode_control_characters leaving an unwanted nasty space :( or take weird amount of space \t. And perhaps the first isn't really something to worry about. Or should i do something like: MyString.splitLines .map!(a => a .map!(a => a .isGraphical) .map!(a => cast(int) a?1:0) .array .reduce!((a,b) => a+b)) .reduce!max Mostly I am just curious of best practice in this situation. Both of the above fail with the input: "hello \n People \nP\u0008ofEARTH" on my command prompt at least.
Jun 30 2015
On 1/07/2015 6:33 a.m., Taylor Hillegeist wrote:So I am aware that Unicode is not simple... I have been working on a boxes like project http://boxes.thomasjensen.com/ it basically puts a pretty border around stdin characters. like so: ________________________ /\ \ \_|Different all twisty a| |of in maze are you, | |passages little. | | ___________________|_ \_/_____________________/ but I find that I need to know a bit more than the length of the string because of encoding differences I had a thought at one point to do this: MyString.splitlines.map!(a => a.toUTF32.length).reduce!max(); Should get me the longest line. but this has a problem too because control characters might not take up space (backspace?). https://en.wikipedia.org/wiki/Unicode_control_characters leaving an unwanted nasty space :( or take weird amount of space \t. And perhaps the first isn't really something to worry about. Or should i do something like: MyString.splitLines .map!(a => a .map!(a => a .isGraphical) .map!(a => cast(int) a?1:0) .array .reduce!((a,b) => a+b)) .reduce!max Mostly I am just curious of best practice in this situation. Both of the above fail with the input: "hello \n People \nP\u0008ofEARTH" on my command prompt at least.Well I would personally use isWhite[0]. I would also use filter and count along with it. So something like this: size_t[] lengths = MyString.splitLines .filter!isWhite .count .array; Untested of course, but may give you ideas :)
Jun 30 2015
On Tue, Jun 30, 2015 at 06:33:32PM +0000, Taylor Hillegeist via Digitalmars-d-learn wrote:So I am aware that Unicode is not simple... I have been working on a boxes like project http://boxes.thomasjensen.com/ it basically puts a pretty border around stdin characters. like so: ________________________ /\ \ \_|Different all twisty a| |of in maze are you, | |passages little. | | ___________________|_ \_/_____________________/ but I find that I need to know a bit more than the length of the string because of encoding differences[...] Use std.uni.byGrapheme. That's the only reliable way to count anything remotely resembling the display length of the string, which is not to be confused with the number of code points, which is also different from the length of the string in bytes or the number of code units. Note that even with byGrapheme, you may still need some post-processing, because certain terminals may output Asian block characters in double width, meaning that 1 grapheme takes up two columns on the screen. But byGrapheme should get you started on the right footing. T -- If the comments and the code disagree, it's likely that *both* are wrong. -- Christopher
Jun 30 2015
On 7/1/15 1:25 AM, H. S. Teoh via Digitalmars-d-learn wrote:On Tue, Jun 30, 2015 at 06:33:32PM +0000, Taylor Hillegeist via Digitalmars-d-learn wrote:BTW, this exercise would make an EXCELLENT blog post highlighting both the power of D's unicode support and the hairy issues of unicode. I like the ascii er... unicode art concept :) -SteveSo I am aware that Unicode is not simple... I have been working on a boxes like project http://boxes.thomasjensen.com/ it basically puts a pretty border around stdin characters. like so: ________________________ /\ \ \_|Different all twisty a| |of in maze are you, | |passages little. | | ___________________|_ \_/_____________________/ but I find that I need to know a bit more than the length of the string because of encoding differences[...] Use std.uni.byGrapheme. That's the only reliable way to count anything remotely resembling the display length of the string, which is not to be confused with the number of code points, which is also different from the length of the string in bytes or the number of code units. Note that even with byGrapheme, you may still need some post-processing, because certain terminals may output Asian block characters in double width, meaning that 1 grapheme takes up two columns on the screen. But byGrapheme should get you started on the right footing.
Jul 01 2015