digitalmars.D - std.string.inPattern()
- Janice Caron (7/7) Oct 22 2007 I noticed in the docs that the pattern parameter to inPattern is
- Lutger (4/12) Oct 22 2007 Put it as the first or last character of the pattern argument and it'll
- Lutger (5/8) Oct 22 2007 By the way, this is an example of why including unittests as well as
- Alexander Panek (8/13) Oct 23 2007 char[] is not ASCII, but UTF-8, so any UTF-8 sequence (may it be
- davidl (6/17) Oct 23 2007 I think char[] is just an array of char. Just some stdlib APIs treat it ...
- Lutger (6/27) Oct 23 2007 It's more than that, you can do this with char[] and it works with
- Oskar Linde (9/29) Oct 23 2007 char[] is UTF8 by specification, see
I noticed in the docs that the pattern parameter to inPattern is specified as an array of chars, not an array of dchars. I realise that one can easily be converted to the other, but it leaves me wondering ... does inPattern() work with non-ASCII characters? Can I, for example, specify "\u0100-\u0200" as a range and expect it to work? Also, it's not clear how to match a minus sign.
Oct 22 2007
Janice Caron wrote:I noticed in the docs that the pattern parameter to inPattern is specified as an array of chars, not an array of dchars. I realise that one can easily be converted to the other, but it leaves me wondering ... does inPattern() work with non-ASCII characters? Can I, for example, specify "\u0100-\u0200" as a range and expect it to work?Should work, inPattern converts pattern's chars to dchars internally.Also, it's not clear how to match a minus sign.Put it as the first or last character of the pattern argument and it'll work.
Oct 22 2007
Lutger wrote:Janice Caron wrote:...Put it as the first or last character of the pattern argument and it'll work.By the way, this is an example of why including unittests as well as contracts in the ddoc system would be useful imo: this behavior of inPattern is 'documented' in it's unittests.
Oct 22 2007
On Mon, 22 Oct 2007 10:22:48 +0100 "Janice Caron" <caron800 googlemail.com> wrote:I noticed in the docs that the pattern parameter to inPattern is specified as an array of chars, not an array of dchars. I realise that one can easily be converted to the other, but it leaves me wondering ... does inPattern() work with non-ASCII characters?char[] is not ASCII, but UTF-8, so any UTF-8 sequence (may it be multibyte, or not) is valid, I suppose. (I don't know how inPattern works, though, so I can't answer the original question - just wanted to clarify this.) -- Alexander Panek <alexander.panek brainsware.org>
Oct 23 2007
在 Tue, 23 Oct 2007 20:07:53 +0800,Alexander Panek <alexander.panek brainsware.org> 写道:On Mon, 22 Oct 2007 10:22:48 +0100 "Janice Caron" <caron800 googlemail.com> wrote:I think char[] is just an array of char. Just some stdlib APIs treat it as it's UTF8 encoded. -- 使用 Opera 革命性的电子邮件客户程序: http://www.opera.com/mail/I noticed in the docs that the pattern parameter to inPattern is specified as an array of chars, not an array of dchars. I realise that one can easily be converted to the other, but it leaves me wondering ... does inPattern() work with non-ASCII characters?char[] is not ASCII, but UTF-8, so any UTF-8 sequence (may it be multibyte, or not) is valid, I suppose. (I don't know how inPattern works, though, so I can't answer the original question - just wanted to clarify this.)
Oct 23 2007
davidl wrote:在 Tue, 23 Oct 2007 20:07:53 +0800,Alexander Panek <alexander.panek brainsware.org> 写道:It's more than that, you can do this with char[] and it works with multibyte characters in UTF-8: foreach (dchar ch; pattern) /* stuff */ This is what inPattern does.On Mon, 22 Oct 2007 10:22:48 +0100 "Janice Caron" <caron800 googlemail.com> wrote:I think char[] is just an array of char. Just some stdlib APIs treat it as it's UTF8 encoded.I noticed in the docs that the pattern parameter to inPattern is specified as an array of chars, not an array of dchars. I realise that one can easily be converted to the other, but it leaves me wondering ... does inPattern() work with non-ASCII characters?char[] is not ASCII, but UTF-8, so any UTF-8 sequence (may it be multibyte, or not) is valid, I suppose. (I don't know how inPattern works, though, so I can't answer the original question - just wanted to clarify this.)
Oct 23 2007
davidl wrote:在 Tue, 23 Oct 2007 20:07:53 +0800,Alexander Panek <alexander.panek brainsware.org> 写道:char[] is UTF8 by specification, see http://www.digitalmars.com/d/type.html. String constants are also UTF8/16/32, and putting non-utf data into them will not compile. So it is a bit more than convention. To answer the original question: Yes, inPattern works with non-ASCII characters. -- OskarOn Mon, 22 Oct 2007 10:22:48 +0100 "Janice Caron" <caron800 googlemail.com> wrote:I think char[] is just an array of char. Just some stdlib APIs treat it as it's UTF8 encoded.I noticed in the docs that the pattern parameter to inPattern is specified as an array of chars, not an array of dchars. I realise that one can easily be converted to the other, but it leaves me wondering ... does inPattern() work with non-ASCII characters?char[] is not ASCII, but UTF-8, so any UTF-8 sequence (may it be multibyte, or not) is valid, I suppose. (I don't know how inPattern works, though, so I can't answer the original question - just wanted to clarify this.)
Oct 23 2007