digitalmars.D - arrays and strings

Berin Loritsch (34/34) Aug 31 2004 All this talk about unicode made it clear that using a straight array

Sebastian Beschke (9/14) Aug 31 2004 Not really. From what my limited Japanese abilities, this should

Sebastian Beschke (2/4) Aug 31 2004 Whoops, the link doesn't work. Nevermind.
Berin Loritsch (2/21) Aug 31 2004 Blasted electronic translators...

Ben Hinkle (11/45) Aug 31 2004 This is what dchar[] is for. With dchar[] array indexing === character

Ben Hinkle (8/63) Aug 31 2004 actually now that I think about it another way to slice from character a...

Ben Hinkle (26/92) Aug 31 2004 OK - enough replying to myself, I know, I know. Here's the code implemen...
Nick (4/10) Aug 31 2004 It's more flexible, but it is slightly slower. The two calls to characte...

Nick (4/6) Aug 31 2004 ^^^^^^
Ben Hinkle (29/40) Aug 31 2004 to

Berin Loritsch (18/47) Aug 31 2004 Considering the code is not as straight forward as I am used to,

Regan Heath (41/88) Aug 31 2004 Clever optimisation.

Regan Heath (5/93) Aug 31 2004 This sort of useful code should go into the standard library, the
Nick (9/25) Sep 01 2004 Nice. Except now you have to add a !(char[]) for every slice operation, ...

Regan Heath (20/48) Sep 01 2004 Does that work? (I haven't tried it, but I'd expect the second to

Sean Kelly (5/11) Sep 01 2004 Yes, it works because the prototypes are different. I used this trick a...
Nick (6/13) Sep 02 2004 Yep, it works. The second does not over-rule the first, it over-*loads* ...

Walter (1/1) Aug 31 2004 Nice work! Can I add it to std.string? Or should it go in std.utf?

Ben Hinkle (4/5) Sep 01 2004 cool, thanks. I think most people would look in std.string since the tar...

Arcane Jill (10/17) Sep 01 2004 ICU has the class UnicodeString to encapsulate strings, as well as the a...

Berin Loritsch <bloritsch d-haven.org> writes:

All this talk about unicode made it clear that using a straight array
may not be the right tool for string handling.  Sure the most common
operations can be done on an array (concatenation, sub-arrays, etc.).
However, if we are to assume any kind of encoding support other than
ASCII, it is simply not safe unless we are talking about "dchar" arrays.

For example, logically speaking I may want to get the second and third
characters of this string (UTF8): 彼は来る (only four characters).  It
is the Japanese text for "kyo kimasu" (he comes).  I'm into martial
arts, so I can't get away from the Japanese language (it is tied to
what I study)--even though I can't really speak a lick.

Now, tell me what I would get in a UTF8 environment:

char[] kyokimasu = "彼は来る";
char[] test = kyokimasu[1..3];

assert "は来" == test;

I guarantee you the assertion would fail.  Why?  because strict array
slicing does not take into account multibyte encoding.  Essentially I
will get part of the first character's encoding alone.

Any UTF aware system would either need to build this knowlege into the
language (bad idea IMO), or have a string to take care of that info
for you.  Things are a bit better with wchar[], (I'm not sure, but I
think the above will pass)--but there are still some cases of multibyte
encoding.

Not to mention the UTF8 string listed above would be more than 8 bytes
long (the wchar[] version).

The only way to make it work seamlessly is to have a string class that
would make the proper adjustments.  Of course this would also affect
the speed deamons here.

I think having something generally useful for internationalization is
very important, or we shoot ourselves in the foot (we want D to succeed,
as long as you speak English does not make sense).  General purpose i18n
and l10n is not easy to do by any stretch--but I think it is generally
agreed that it would have to be done in libraries.

I just don't think we can rely on D's native (up to now) way of dealing
with String manipulation.

Aug 31 2004

Sebastian Beschke <s.beschke gmx.de> writes:

Berin Loritsch wrote:
 For example, logically speaking I may want to get the second and third
 characters of this string (UTF8): 彼は来る (only four characters).  It
 is the Japanese text for "kyo kimasu" (he comes).  I'm into martial
 arts, so I can't get away from the Japanese language (it is tied to
 what I study)--even though I can't really speak a lick.

Not really. From what my limited Japanese abilities, this should 
actually be "kare wa kiru", which means the same thing (he comes). I 
don't think the kanji 彼 can be pronounced "kyo", if you look at this 
page: http://www.csse.monash.edu.au/cgi-bin/cgiwrap/jwb/wwwjdic?1D

Of course this is nitpicking (I'm sorry :D ) and doesn't make your point 
invalid. I agree that a string should be somewhat more "intelligent" 
than an array.

-Sebastian

Aug 31 2004

Sebastian Beschke <s.beschke gmx.de> writes:

Sebastian Beschke wrote:
 don't think the kanji 彼 can be pronounced "kyo", if you look at this 
 page: http://www.csse.monash.edu.au/cgi-bin/cgiwrap/jwb/wwwjdic?1D

Whoops, the link doesn't work. Nevermind.

Aug 31 2004

Berin Loritsch <bloritsch d-haven.org> writes:

Sebastian Beschke wrote:

 Berin Loritsch wrote:
 
 For example, logically speaking I may want to get the second and third
 characters of this string (UTF8): 彼は来る (only four characters).  It
 is the Japanese text for "kyo kimasu" (he comes).  I'm into martial
 arts, so I can't get away from the Japanese language (it is tied to
 what I study)--even though I can't really speak a lick.

 
 
 Not really. From what my limited Japanese abilities, this should 
 actually be "kare wa kiru", which means the same thing (he comes). I 
 don't think the kanji 彼 can be pronounced "kyo", if you look at this 
 page: http://www.csse.monash.edu.au/cgi-bin/cgiwrap/jwb/wwwjdic?1D
 
 Of course this is nitpicking (I'm sorry :D ) and doesn't make your point 
 invalid. I agree that a string should be somewhat more "intelligent" 
 than an array.
 
 -Sebastian

Blasted electronic translators...

Aug 31 2004

"Ben Hinkle" <bhinkle mathworks.com> writes:

This is what dchar[] is for. With dchar[] array indexing === character
indexing.
A couple of helper function in std.string
 char[] slice(char[] str, int a, int b); % slice characters a to b, not
index a to b
 wchar[] slice(wchar[] str, int a, int b);
would also be nice for those cases when one doesn't want to convert to
dchar[]. Maybe such functions area already in phobos somewhere? I haven't
looked too hard.

"Berin Loritsch" <bloritsch d-haven.org> wrote in message
news:ch24jt$rs0$1 digitaldaemon.com...
 All this talk about unicode made it clear that using a straight array
 may not be the right tool for string handling.  Sure the most common
 operations can be done on an array (concatenation, sub-arrays, etc.).
 However, if we are to assume any kind of encoding support other than
 ASCII, it is simply not safe unless we are talking about "dchar" arrays.

 For example, logically speaking I may want to get the second and third
 characters of this string (UTF8): ???? (only four characters).  It
 is the Japanese text for "kyo kimasu" (he comes).  I'm into martial
 arts, so I can't get away from the Japanese language (it is tied to
 what I study)--even though I can't really speak a lick.

 Now, tell me what I would get in a UTF8 environment:

 char[] kyokimasu = "????";
 char[] test = kyokimasu[1..3];

 assert "??" == test;

 I guarantee you the assertion would fail.  Why?  because strict array
 slicing does not take into account multibyte encoding.  Essentially I
 will get part of the first character's encoding alone.

 Any UTF aware system would either need to build this knowlege into the
 language (bad idea IMO), or have a string to take care of that info
 for you.  Things are a bit better with wchar[], (I'm not sure, but I
 think the above will pass)--but there are still some cases of multibyte
 encoding.

 Not to mention the UTF8 string listed above would be more than 8 bytes
 long (the wchar[] version).

 The only way to make it work seamlessly is to have a string class that
 would make the proper adjustments.  Of course this would also affect
 the speed deamons here.

 I think having something generally useful for internationalization is
 very important, or we shoot ourselves in the foot (we want D to succeed,
 as long as you speak English does not make sense).  General purpose i18n
 and l10n is not easy to do by any stretch--but I think it is generally
 agreed that it would have to be done in libraries.

 I just don't think we can rely on D's native (up to now) way of dealing
 with String manipulation.

Aug 31 2004

"Ben Hinkle" <bhinkle mathworks.com> writes:

actually now that I think about it another way to slice from character a to
b is to have a function that returns the index of the nth character:
 int character(char[] str, int n);
and then slicing is
 str[character(a) .. character(b)];
That is probably better than special slicing functions.

"Ben Hinkle" <bhinkle mathworks.com> wrote in message
news:ch26as$sl7$1 digitaldaemon.com...
 This is what dchar[] is for. With dchar[] array indexing === character
 indexing.
 A couple of helper function in std.string
  char[] slice(char[] str, int a, int b); % slice characters a to b, not
 index a to b
  wchar[] slice(wchar[] str, int a, int b);
 would also be nice for those cases when one doesn't want to convert to
 dchar[]. Maybe such functions area already in phobos somewhere? I haven't
 looked too hard.

 "Berin Loritsch" <bloritsch d-haven.org> wrote in message
 news:ch24jt$rs0$1 digitaldaemon.com...
 All this talk about unicode made it clear that using a straight array
 may not be the right tool for string handling.  Sure the most common
 operations can be done on an array (concatenation, sub-arrays, etc.).
 However, if we are to assume any kind of encoding support other than
 ASCII, it is simply not safe unless we are talking about "dchar" arrays.

 For example, logically speaking I may want to get the second and third
 characters of this string (UTF8): ???? (only four characters).  It
 is the Japanese text for "kyo kimasu" (he comes).  I'm into martial
 arts, so I can't get away from the Japanese language (it is tied to
 what I study)--even though I can't really speak a lick.

 Now, tell me what I would get in a UTF8 environment:

 char[] kyokimasu = "????";
 char[] test = kyokimasu[1..3];

 assert "??" == test;

 I guarantee you the assertion would fail.  Why?  because strict array
 slicing does not take into account multibyte encoding.  Essentially I
 will get part of the first character's encoding alone.

 Any UTF aware system would either need to build this knowlege into the
 language (bad idea IMO), or have a string to take care of that info
 for you.  Things are a bit better with wchar[], (I'm not sure, but I
 think the above will pass)--but there are still some cases of multibyte
 encoding.

 Not to mention the UTF8 string listed above would be more than 8 bytes
 long (the wchar[] version).

 The only way to make it work seamlessly is to have a string class that
 would make the proper adjustments.  Of course this would also affect
 the speed deamons here.

 I think having something generally useful for internationalization is
 very important, or we shoot ourselves in the foot (we want D to succeed,
 as long as you speak English does not make sense).  General purpose i18n
 and l10n is not easy to do by any stretch--but I think it is generally
 agreed that it would have to be done in libraries.

 I just don't think we can rely on D's native (up to now) way of dealing
 with String manipulation.

Aug 31 2004

"Ben Hinkle" <bhinkle mathworks.com> writes:

OK - enough replying to myself, I know, I know. Here's the code implementing
what I'm talking about:

import std.utf;

size_t character(char[] str, size_t n) {
  size_t i = 0;
  while (n--) {
    decode(str,i);
  }
  return i;
}

size_t character(wchar[] str, size_t n) {
  size_t i = 0;
  while (n--) {
    decode(str,i);
  }
  return i;
}


"Ben Hinkle" <bhinkle mathworks.com> wrote in message
news:ch26je$sq4$1 digitaldaemon.com...
 actually now that I think about it another way to slice from character a

to
 b is to have a function that returns the index of the nth character:
  int character(char[] str, int n);
 and then slicing is
  str[character(a) .. character(b)];
 That is probably better than special slicing functions.

 "Ben Hinkle" <bhinkle mathworks.com> wrote in message
 news:ch26as$sl7$1 digitaldaemon.com...
 This is what dchar[] is for. With dchar[] array indexing === character
 indexing.
 A couple of helper function in std.string
  char[] slice(char[] str, int a, int b); % slice characters a to b, not
 index a to b
  wchar[] slice(wchar[] str, int a, int b);
 would also be nice for those cases when one doesn't want to convert to
 dchar[]. Maybe such functions area already in phobos somewhere? I


haven't
 looked too hard.

 "Berin Loritsch" <bloritsch d-haven.org> wrote in message
 news:ch24jt$rs0$1 digitaldaemon.com...
 All this talk about unicode made it clear that using a straight array
 may not be the right tool for string handling.  Sure the most common
 operations can be done on an array (concatenation, sub-arrays, etc.).
 However, if we are to assume any kind of encoding support other than
 ASCII, it is simply not safe unless we are talking about "dchar"



arrays.
 For example, logically speaking I may want to get the second and third
 characters of this string (UTF8): ???? (only four characters).  It
 is the Japanese text for "kyo kimasu" (he comes).  I'm into martial
 arts, so I can't get away from the Japanese language (it is tied to
 what I study)--even though I can't really speak a lick.

 Now, tell me what I would get in a UTF8 environment:

 char[] kyokimasu = "????";
 char[] test = kyokimasu[1..3];

 assert "??" == test;

 I guarantee you the assertion would fail.  Why?  because strict array
 slicing does not take into account multibyte encoding.  Essentially I
 will get part of the first character's encoding alone.

 Any UTF aware system would either need to build this knowlege into the
 language (bad idea IMO), or have a string to take care of that info
 for you.  Things are a bit better with wchar[], (I'm not sure, but I
 think the above will pass)--but there are still some cases of



multibyte
 encoding.

 Not to mention the UTF8 string listed above would be more than 8 bytes
 long (the wchar[] version).

 The only way to make it work seamlessly is to have a string class that
 would make the proper adjustments.  Of course this would also affect
 the speed deamons here.

 I think having something generally useful for internationalization is
 very important, or we shoot ourselves in the foot (we want D to



succeed,
 as long as you speak English does not make sense).  General purpose



i18n
 and l10n is not easy to do by any stretch--but I think it is generally
 agreed that it would have to be done in libraries.

 I just don't think we can rely on D's native (up to now) way of



dealing
 with String manipulation.

Aug 31 2004

Nick <Nick_member pathlink.com> writes:

In article <ch26je$sq4$1 digitaldaemon.com>, Ben Hinkle says...
actually now that I think about it another way to slice from character a to
b is to have a function that returns the index of the nth character:
 int character(char[] str, int n);
and then slicing is
 str[character(a) .. character(b)];
That is probably better than special slicing functions.

It's more flexible, but it is slightly slower. The two calls to character() will
parse the string once each, while a splice() function could do it in one run.

Nick

Aug 31 2004

Nick <Nick_member pathlink.com> writes:

In article <ch2i6t$13ma$1 digitaldaemon.com>, Nick says...
It's more flexible, but it is slightly slower. The two calls to character() will
parse the string once each, while a splice() function could do it in one run.

^^^^^^
Err, that should be slice() :-)

Nick

Aug 31 2004

"Ben Hinkle" <bhinkle mathworks.com> writes:

"Nick" <Nick_member pathlink.com> wrote in message
news:ch2i6t$13ma$1 digitaldaemon.com...
 In article <ch26je$sq4$1 digitaldaemon.com>, Ben Hinkle says...
actually now that I think about it another way to slice from character a


to
b is to have a function that returns the index of the nth character:
 int character(char[] str, int n);
and then slicing is
 str[character(a) .. character(b)];
That is probably better than special slicing functions.

 It's more flexible, but it is slightly slower. The two calls to

character() will
 parse the string once each, while a splice() function could do it in one

run.
 Nick

good point. plus it is less typing. So here's version 2:

import std.utf;

size_t character(char[] str, size_t n, size_t i = 0) {
  while (n--) {
    decode(str,i);
  }
  return i;
}

size_t character(wchar[] str, size_t n, size_t i = 0) {
  while (n--) {
    decode(str,i);
  }
  return i;
}

char[] slice(char[] str, size_t a, size_t b) {
  size_t ai = character(str,a);
  size_t bi = character(str,b-a,ai);
  return str[ai .. bi];
}

wchar[] slice(wchar[] str, size_t a, size_t b) {
  size_t ai = character(str,a);
  size_t bi = character(str,b-a,ai);
  return str[ai .. bi];
}

Aug 31 2004

Berin Loritsch <bloritsch d-haven.org> writes:

Considering the code is not as straight forward as I am used to,
what the character() method is doing is decoding the string byte
by byte using the passed in index.  The index (i) is only used
to resume where you may have left off.  Ok.  So we have a little
optimization here so that we don't double-decode something...

It seemed a bit odd to me to do the b-a subtraction in the slice
method, but then I realized what you were doing (resuming from the
last point).

Of course this also assumes that someone didn't put in bad data
like:

slice(mystr, 5, 4);

Not to mention you could genericise the functions since they are
identical except for the element type of the array.

I suppose that is why C++ string object is templated (so you can
use wchar instead of char).

The decode method would actually be different though based on the
type.

Ben Hinkle wrote:

 import std.utf;
 
 size_t character(char[] str, size_t n, size_t i = 0) {
   while (n--) {
     decode(str,i);
   }
   return i;
 }
 
 size_t character(wchar[] str, size_t n, size_t i = 0) {
   while (n--) {
     decode(str,i);
   }
   return i;
 }
 
 char[] slice(char[] str, size_t a, size_t b) {
   size_t ai = character(str,a);
   size_t bi = character(str,b-a,ai);
   return str[ai .. bi];
 }
 
 wchar[] slice(wchar[] str, size_t a, size_t b) {
   size_t ai = character(str,a);
   size_t bi = character(str,b-a,ai);
   return str[ai .. bi];
 }

Aug 31 2004

Regan Heath <regan netwin.co.nz> writes:

On Tue, 31 Aug 2004 17:10:50 -0400, Berin Loritsch <bloritsch d-haven.org> 
wrote:
 Considering the code is not as straight forward as I am used to,
 what the character() method is doing is decoding the string byte
 by byte using the passed in index.  The index (i) is only used
 to resume where you may have left off.  Ok.  So we have a little
 optimization here so that we don't double-decode something...

Clever optimisation.

 It seemed a bit odd to me to do the b-a subtraction in the slice
 method, but then I realized what you were doing (resuming from the
 last point).

Yeah.. it took me a while too.

 Of course this also assumes that someone didn't put in bad data
 like:

 slice(mystr, 5, 4);

A perfect oppotunity for DbC eg.

char[] slice(char[] str, size_t a, size_t b)
in {
   assert(b > a); // b >= a?
}
body {
   size_t ai = character(str,a);
   size_t bi = character(str,b-a,ai);
   return str[ai .. bi];
}

 Not to mention you could genericise the functions since they are
 identical except for the element type of the array.

Yep.

template character(Type : Type[]) {
   size_t character(Type[] str, size_t n, size_t i = 0) {
     while (n--) {
       decode(str,i);
     }
     return i;
   }
}

template slice(Type : Type[])
{
   Type[] slice(Type[] str, size_t a, size_t b)
   in {
     assert(b > a); // b >= a?
   }
   body {
     size_t ai = character(str,a);
     size_t bi = character(str,b-a,ai);
     return str[ai .. bi];
   }
}

or something like that.

 I suppose that is why C++ string object is templated (so you can
 use wchar instead of char).

Probably.

 The decode method would actually be different though based on the
 type.

True.

Regan

 Ben Hinkle wrote:

 import std.utf;

 size_t character(char[] str, size_t n, size_t i = 0) {
   while (n--) {
     decode(str,i);
   }
   return i;
 }

 size_t character(wchar[] str, size_t n, size_t i = 0) {
   while (n--) {
     decode(str,i);
   }
   return i;
 }

 char[] slice(char[] str, size_t a, size_t b) {
   size_t ai = character(str,a);
   size_t bi = character(str,b-a,ai);
   return str[ai .. bi];
 }

 wchar[] slice(wchar[] str, size_t a, size_t b) {
   size_t ai = character(str,a);
   size_t bi = character(str,b-a,ai);
   return str[ai .. bi];
 }




-- 
Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/

Aug 31 2004

Regan Heath <regan netwin.co.nz> writes:

This sort of useful code should go into the standard library, the 
'phoenix' (or whatever we call it) library should include this..

On Wed, 01 Sep 2004 11:17:25 +1200, Regan Heath <regan netwin.co.nz> wrote:

 On Tue, 31 Aug 2004 17:10:50 -0400, Berin Loritsch 
 <bloritsch d-haven.org> wrote:
 Considering the code is not as straight forward as I am used to,
 what the character() method is doing is decoding the string byte
 by byte using the passed in index.  The index (i) is only used
 to resume where you may have left off.  Ok.  So we have a little
 optimization here so that we don't double-decode something...

 Clever optimisation.

 It seemed a bit odd to me to do the b-a subtraction in the slice
 method, but then I realized what you were doing (resuming from the
 last point).

 Yeah.. it took me a while too.

 Of course this also assumes that someone didn't put in bad data
 like:

 slice(mystr, 5, 4);

 A perfect oppotunity for DbC eg.

 char[] slice(char[] str, size_t a, size_t b)
 in {
    assert(b > a); // b >= a?
 }
 body {
    size_t ai = character(str,a);
    size_t bi = character(str,b-a,ai);
    return str[ai .. bi];
 }

 Not to mention you could genericise the functions since they are
 identical except for the element type of the array.

 Yep.

 template character(Type : Type[]) {
    size_t character(Type[] str, size_t n, size_t i = 0) {
      while (n--) {
        decode(str,i);
      }
      return i;
    }
 }

 template slice(Type : Type[])
 {
    Type[] slice(Type[] str, size_t a, size_t b)
    in {
      assert(b > a); // b >= a?
    }
    body {
      size_t ai = character(str,a);
      size_t bi = character(str,b-a,ai);
      return str[ai .. bi];
    }
 }

 or something like that.

 I suppose that is why C++ string object is templated (so you can
 use wchar instead of char).

 Probably.

 The decode method would actually be different though based on the
 type.

 True.

 Regan

 Ben Hinkle wrote:

 import std.utf;

 size_t character(char[] str, size_t n, size_t i = 0) {
   while (n--) {
     decode(str,i);
   }
   return i;
 }

 size_t character(wchar[] str, size_t n, size_t i = 0) {
   while (n--) {
     decode(str,i);
   }
   return i;
 }

 char[] slice(char[] str, size_t a, size_t b) {
   size_t ai = character(str,a);
   size_t bi = character(str,b-a,ai);
   return str[ai .. bi];
 }

 wchar[] slice(wchar[] str, size_t a, size_t b) {
   size_t ai = character(str,a);
   size_t bi = character(str,b-a,ai);
   return str[ai .. bi];
 }





-- 
Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/

Aug 31 2004

Nick <Nick_member pathlink.com> writes:

In article <opsdmdnblz5a2sq9 digitalmars.com>, Regan Heath says...
[...]

template slice(Type : Type[])
{
   Type[] slice(Type[] str, size_t a, size_t b)
   in {
     assert(b > a); // b >= a?
   }
   body {
     size_t ai = character(str,a);
     size_t bi = character(str,b-a,ai);
     return str[ai .. bi];
   }
}

or something like that.

 I suppose that is why C++ string object is templated (so you can
 use wchar instead of char).


Nice. Except now you have to add a !(char[]) for every slice operation, since D
doesn't auto detect types :-(

A workaround could be something like:

template slice_template(Type: Type[])
{...}

alias slice_template!(char[]) slice;
alias slice_template!(wchar[]) slice;

Nick

Sep 01 2004

Regan Heath <regan netwin.co.nz> writes:

On Wed, 1 Sep 2004 12:50:28 +0000 (UTC), Nick <Nick_member pathlink.com> 
wrote:
 In article <opsdmdnblz5a2sq9 digitalmars.com>, Regan Heath says...
 [...]

 template slice(Type : Type[])
 {
   Type[] slice(Type[] str, size_t a, size_t b)
   in {
     assert(b > a); // b >= a?
   }
   body {
     size_t ai = character(str,a);
     size_t bi = character(str,b-a,ai);
     return str[ai .. bi];
   }
 }

 or something like that.

 I suppose that is why C++ string object is templated (so you can
 use wchar instead of char).


 Nice. Except now you have to add a !(char[]) for every slice operation, 
 since D
 doesn't auto detect types :-(

 A workaround could be something like:

 template slice_template(Type: Type[])
 {...}

 alias slice_template!(char[]) slice;
 alias slice_template!(wchar[]) slice;

Does that work? (I haven't tried it, but I'd expect the second to 
over-rule the first?)

The other option is to then write wrapper functions eg.

char[] slice(char[] str, size_t a, size_t b)
{
   return slice!(char[])(str,a,b);
}

wchar[] slice(wchar[] str, size_t a, size_t b)
{
   return slice!(wchar[])(str,a,b);
}

dchar[] slice(dchar[] str, size_t a, size_t b)
{
   return slice!(dchar[])(str,a,b);
}

Regan

-- 
Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/

Sep 01 2004

Sean Kelly <sean f4.ca> writes:

In article <opsdn8ouhn5a2sq9 digitalmars.com>, Regan Heath says...
On Wed, 1 Sep 2004 12:50:28 +0000 (UTC), Nick <Nick_member pathlink.com> 
wrote:

 alias slice_template!(char[]) slice;
 alias slice_template!(wchar[]) slice;

Does that work? (I haven't tried it, but I'd expect the second to 
over-rule the first?)

Yes, it works because the prototypes are different.  I used this trick at some
point in my std.stream rewrite, though I think I tossed all the template code
before I posted the verison that's available now.


Sean

Sep 01 2004

Nick <Nick_member pathlink.com> writes:

In article <opsdn8ouhn5a2sq9 digitalmars.com>, Regan Heath says...
On Wed, 1 Sep 2004 12:50:28 +0000 (UTC), Nick <Nick_member pathlink.com> 
wrote:
 alias slice_template!(char[]) slice;
 alias slice_template!(wchar[]) slice;

Does that work? (I haven't tried it, but I'd expect the second to 
over-rule the first?)

Yep, it works. The second does not over-rule the first, it over-*loads* it,
meaning slice() is subject to normal function overloading rules. I use this on
almost all my templates, I find it makes the code less rough on the eyes and
means less typing as well.

Nick

Sep 02 2004

"Walter" <newshound digitalmars.com> writes:

Nice work! Can I add it to std.string? Or should it go in std.utf?

Aug 31 2004

Ben Hinkle <bhinkle4 juno.com> writes:

Walter wrote:

 Nice work! Can I add it to std.string? Or should it go in std.utf?

cool, thanks. I think most people would look in std.string since the target
of the operations are to index and slice strings - the encoding is somewhat
secondary.

Sep 01 2004

Arcane Jill <Arcane_member pathlink.com> writes:

In article <ch24jt$rs0$1 digitaldaemon.com>, Berin Loritsch says...

I think having something generally useful for internationalization is
very important, or we shoot ourselves in the foot (we want D to succeed,
as long as you speak English does not make sense).  General purpose i18n
and l10n is not easy to do by any stretch--but I think it is generally
agreed that it would have to be done in libraries.

ICU has the class UnicodeString to encapsulate strings, as well as the abstract
class CharacterIterator for iterating over characters, with concrete
implementations UCharCharacterIterator and StringCharacterIterator.

It also has a lot more besides. Check out the API guide at
http://oss.software.ibm.com/icu/apiref/classes.html.

All of this will be a part of D (yes, via a library) in the not-too-distant
future.


I just don't think we can rely on D's native (up to now) way of dealing
with String manipulation.

That's why I'm wrapping ICU as we speak.

Arcane Jill

Sep 01 2004

D Programming

C/C++ Programming

Other

digitalmars.D - arrays and strings