digitalmars.D.learn - Is =?UTF-8?B?wro=?= an unicode alphabetic character?
- AsmMan (11/11) Sep 11 2014 what's an unicode alphabetic character? I misunderstood
- =?UTF-8?B?QWxpIMOHZWhyZWxp?= (11/22) Sep 11 2014 Alphabetic is defined as Lu + Ll + Lt + Lm + Lo + Nl + Other_Alphabetic,...
- AsmMan (3/30) Sep 11 2014 If I want ASCII and latin only alphabet which range should I use?
- =?UTF-8?B?QWxpIMOHZWhyZWxp?= (12/14) Sep 12 2014 This seems to be it:
what's an unicode alphabetic character? I misunderstood isAlpha(), I used to think it's to validate letters like a, b, è, é .. z etc but isAlpha('º') from std.uni module return true. How can I validate only the letters of an unicode alphabet in D or should I write one? I know I can do: bool is_id(dchar c) { return c >= 'a' && c <= 'z' || c >= 'A' && c <= 'z' || c >= 0xc0; } but I'm looking for a native, if any
Sep 11 2014
On 09/11/2014 08:04 PM, AsmMan wrote:what's an unicode alphabetic character?Alphabetic is defined as Lu + Ll + Lt + Lm + Lo + Nl + Other_Alphabetic, all of which are explained here: http://www.unicode.org/Public/5.1.0/ucd/UCD.html#General_Category_ValuesI misunderstood isAlpha(), I used to think it's to validate letters like a, b, è, é .. z etc but isAlpha('º') from std.uni module return true.º happens to be in the "Letter, Lowercase" category so yes, it is isAlpha().How can I validate only the letters of an unicode alphabet in D or should I write one?There are so many alphabets in the world. It is likely that a Unicode character will be a part of one.I know I can do: bool is_id(dchar c) { return c >= 'a' && c <= 'z' || c >= 'A' && c <= 'z' || c >= 0xc0; }There is a misunderstanding. There are so many Unicode characters that are >= 0xc0 but not a part of the Alphabetic category. For example: ← (U+2190 LEFTWARDS ARROW). Ali
Sep 11 2014
On Friday, 12 September 2014 at 04:04:22 UTC, Ali Çehreli wrote:On 09/11/2014 08:04 PM, AsmMan wrote:If I want ASCII and latin only alphabet which range should I use? ie, how should I rewrite is_id() function?what's an unicode alphabetic character?Alphabetic is defined as Lu + Ll + Lt + Lm + Lo + Nl + Other_Alphabetic, all of which are explained here: http://www.unicode.org/Public/5.1.0/ucd/UCD.html#General_Category_ValuesI misunderstood isAlpha(), I used to think it's to validate letters like a, b, è, é .. zetc butisAlpha('º') from std.uni module return true.º happens to be in the "Letter, Lowercase" category so yes, it is isAlpha().How can I validate only the letters of an unicode alphabet in D or should I write one?There are so many alphabets in the world. It is likely that a Unicode character will be a part of one.I know I can do: bool is_id(dchar c) { return c >= 'a' && c <= 'z' || c >= 'A' && c <= 'z' || c = 0xc0; }There is a misunderstanding. There are so many Unicode characters that are >= 0xc0 but not a part of the Alphabetic category. For example: ← (U+2190 LEFTWARDS ARROW). Ali
Sep 11 2014
On 09/11/2014 11:38 PM, AsmMan wrote:If I want ASCII and latin only alphabet which range should I use? ie, how should I rewrite is_id() function?This seems to be it: import std.stdio; import std.uni; void main() { alias latin = unicode.script.latin; assert('ç' in latin); assert('7' !in latin); writeln(latin); } Ali
Sep 12 2014
On Friday, 12 September 2014 at 07:57:43 UTC, Ali Çehreli wrote:On 09/11/2014 11:38 PM, AsmMan wrote:Sorry, I shouldn't asked for latin but an alphabet like French instead of: http://www.importanceoflanguages.com/Images/French/FrenchAlphabet.jpg (including the diacritics, of course) As you mentioned, º happend to be a letter so it still pass in: assert('º' in latin); so isn't different from isAlpha(). Is the UTF-8 table organized so that I can use a range (like we do for ASCII ch >= 'a' && ch <= 'z' || ch >= 'A' && ch <= 'Z') or should I put these alpha characters myself on table and then do look up?If I want ASCII and latin only alphabet which range should Iuse?ie, how should I rewrite is_id() function?This seems to be it: import std.stdio; import std.uni; void main() { alias latin = unicode.script.latin; assert('ç' in latin); assert('7' !in latin); writeln(latin); } Ali
Sep 12 2014
Thanks Ali, I think I get close: bool is_id(dchar c) { return c >= 'a' && c <= 'z' || c >= 'A' && c <= 'Z' || c >= 0xc0 && c <= 0x0d || c >= 0xd8 && c <= 0xf6 || c >= 0xf8 && c <= 0xff; } this doesn't include some math symbols. like c >= 0xc0 did.
Sep 12 2014