digitalmars.D - YASQ - Proper way to convert byte[] <--> string
- Steve Teale (7/7) Jul 11 2007 I have a byte[] A that contains an AJP13 packet, presumably including UT...
- Frits van Bommel (9/19) Jul 12 2007 That should work, and be optimal unless you can be sure the A array
- Steve Teale (2/24) Jul 12 2007 Can I use n+s.length? In my experimentation i noticed that a UTF8 strin...
- Frits van Bommel (18/25) Jul 12 2007 You noticed wrong...
- Steve Teale (2/31) Jul 12 2007 You are correct, I had misinterpreted my own test program.
- 0ffh (3/4) Jul 13 2007 Hah, Null-A! Reading A.E. van Vogt?
- Frits van Bommel (3/7) Jul 13 2007 No, never heard of him. I just picked \u0100 because it was a round
I have a byte[] A that contains an AJP13 packet, presumably including UTF8 strings. I need to extract such strings and to place strings in such a buffer. I'm using: string s = A[n .. m].dup; // n and m from prefixed string length/position return s; to get strings, and byte[] ba = cast(byte[]) s; A[n .. n+ba.length] = ba[0 .. $].dup; to put them. Are these a) sensible, b) optimal?
Jul 11 2007
Steve Teale wrote:I have a byte[] A that contains an AJP13 packet, presumably including UTF8 strings. I need to extract such strings and to place strings in such a buffer. I'm using: string s = A[n .. m].dup; // n and m from prefixed string length/position return s; to get strings, andThat should work, and be optimal unless you can be sure the A array doesn't change while you still need the string (in which case the .dup is unnecessary).byte[] ba = cast(byte[]) s; A[n .. n+ba.length] = ba[0 .. $].dup; to put them. Are these a) sensible, b) optimal?This one should work as well, but isn't optimal; the .dup is unnecessary. This should be equivalent but more efficient: --- A[n .. n+s.length] = cast(byte[]) s; ---
Jul 12 2007
Frits van Bommel Wrote:Steve Teale wrote:I have a byte[] A that contains an AJP13 packet, presumably including UTF8 strings. I need to extract such strings and to place strings in such a buffer. I'm using:Can I use n+s.length? In my experimentation i noticed that a UTF8 string containing a character using a two-byte representation definitely had an s.length of the number of characters, which was one less than the number of bytes.string s = A[n .. m].dup; // n and m from prefixed string length/position return s; to get strings, andThat should work, and be optimal unless you can be sure the A array doesn't change while you still need the string (in which case the .dup is unnecessary).byte[] ba = cast(byte[]) s; A[n .. n+ba.length] = ba[0 .. $].dup; to put them. Are these a) sensible, b) optimal?This one should work as well, but isn't optimal; the .dup is unnecessary. This should be equivalent but more efficient: --- A[n .. n+s.length] = cast(byte[]) s; ---
Jul 12 2007
Steve Teale wrote:Frits van Bommel Wrote:You noticed wrong... char[]s in D aren't very special, they're just specific array types that happen to be handled specially by some functions (such as writef*)[1]. The .length is the number of elements, and each element is a fixed size. A char is just a type representing a byte from UTF-8 text. --- import std.stdio; void main() { auto s = "\u0100"; writefln(s); writefln(s.length); writefln((cast(byte[])s).length); } --- Outputs a weird character (an A with a - on top) and two times the number 2. [1]: and by foreach statements as well; they can automagically extract char/wchar/dchar elements from char[]/wchar[]dchar[], in any combination.--- A[n .. n+s.length] = cast(byte[]) s; ---Can I use n+s.length? In my experimentation i noticed that a UTF8 string containing a character using a two-byte representation definitely had an s.length of the number of characters, which was one less than the number of bytes.
Jul 12 2007
Frits van Bommel Wrote:Steve Teale wrote:You are correct, I had misinterpreted my own test program.Frits van Bommel Wrote:You noticed wrong... char[]s in D aren't very special, they're just specific array types that happen to be handled specially by some functions (such as writef*)[1]. The .length is the number of elements, and each element is a fixed size. A char is just a type representing a byte from UTF-8 text. --- import std.stdio; void main() { auto s = "\u0100"; writefln(s); writefln(s.length); writefln((cast(byte[])s).length); } --- Outputs a weird character (an A with a - on top) and two times the number 2. [1]: and by foreach statements as well; they can automagically extract char/wchar/dchar elements from char[]/wchar[]dchar[], in any combination.--- A[n .. n+s.length] = cast(byte[]) s; ---Can I use n+s.length? In my experimentation i noticed that a UTF8 string containing a character using a two-byte representation definitely had an s.length of the number of characters, which was one less than the number of bytes.
Jul 12 2007
Frits van Bommel wrote:Outputs a weird character (an A with a - on top) [...]Hah, Null-A! Reading A.E. van Vogt? Regards, Frank
Jul 13 2007
0ffh wrote:Frits van Bommel wrote:No, never heard of him. I just picked \u0100 because it was a round character code and it happened to be that character...Outputs a weird character (an A with a - on top) [...]Hah, Null-A! Reading A.E. van Vogt?
Jul 13 2007