www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - unicode combinig mark/ std.uni question

reply ikod <geller.garry gmail.com> writes:
Hello,

I have to create very basic IDNA (Internationalized Domain Names 
in Applications) library. There are two parts in IDNA - user 
input checks and punycode encoding/decoding.

Punycode part already completed, and now I have to add some 
checks but I'm weak in unicode and cant find proper way to 
express these tests using std.uni.

Here are list of prohibited domain labels 
(https://tools.ietf.org/html/rfc5891):

    o  Labels whose first character is a combining mark (see The 
Unicode
       Standard, Section 2.11 [Unicode]).

    o  Labels containing prohibited code points, i.e., those that 
are
       assigned to the "DISALLOWED" category of the Tables document
       [RFC5892].

    o  Labels containing code points that are identified in the 
Tables
       document as "CONTEXTJ", i.e., requiring exceptional 
contextual
       rule processing on lookup, but that do not conform to those 
rules.
       Note that this implies that a rule must be defined, not 
null: a
       character that requires a contextual rule but for which the 
rule
       is null is treated in this step as having failed to conform 
to the
       rule.

    o  Labels containing code points that are identified in the 
Tables
       document as "CONTEXTO", but for which no such rule appears 
in the
       table of rules.  Applications resolving DNS names or 
carrying out
       equivalent operations are not required to test contextual 
rules
       for "CONTEXTO" characters, only to verify that a rule is 
defined
       (although they MAY make such tests to provide better 
protection or
       give better information to the user).

    o  Labels containing code points that are unassigned in the 
version
       of Unicode being used by the application, i.e., in the 
UNASSIGNED
       category of the Tables document.

Can anybody help with this task?

Thanks!
Dec 05 2017
parent Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On Tuesday, 5 December 2017 at 20:04:29 UTC, ikod wrote:
 Hello,

 I have to create very basic IDNA (Internationalized Domain 
 Names in Applications) library. There are two parts in IDNA - 
 user input checks and punycode encoding/decoding.

 Punycode part already completed, and now I have to add some 
 checks but I'm weak in unicode and cant find proper way to 
 express these tests using std.uni.

 Here are list of prohibited domain labels 
 (https://tools.ietf.org/html/rfc5891):
Well std.uni gives you ability to build fast lookup tables and a set of codepoints type, I don’t think we have any of the sets you listed prepared in std. Maybe combining marks are, check the docs.
Dec 08 2017