digitalmars.D.learn - how to write a string to a c pointer?

zhmt (35/35) Mar 04 2015 I am writing a asio binding. Objects need to be serialized into a

Kagamin (3/3) Mar 05 2015 string s;

zhmt (3/6) Mar 05 2015 Thank you very much. I should stop my developing , and read the
FG (66/69) Mar 05 2015 It's a bit more complicated than that if you include cutting string for ...

Kagamin (2/5) Mar 05 2015 Try with string s = "ąc\u0301ęłńóśźż";

FG (3/8) Mar 05 2015 Yeah, I see your point: ą, ąc (missing diacritic), ąć, ąćę, ...

ketmar (4/5) Mar 05 2015 or invisible ones. or RTL switch.

=?UTF-8?B?QWxpIMOHZWhyZWxp?= (13/15) Mar 05 2015 Thanks. I enjoyed the article and I agree with everything said in there.

Kagamin (3/9) Mar 05 2015 I'd say, company name should be processed verbatim, no need for

FG (3/5) Mar 06 2015 Great article. Thanks, Кетмар

"zhmt" <zhmtzhmt qq.com> writes:

I am writing a asio binding. Objects need to be serialized into a 
buffer (void *),

for example, write utf8 string into buffer,
write int into buffer,
write long into buffer,

Here is my class

class Buffer
{
	private void *ptr;
	private int size;
	private int _cap;

	public this(int cap)
	{
		ptr = malloc(cap);
		this._cap = cap;
	}

	public ~this()
	{
		free(ptr);
	}

	public ubyte[] asArray()
	{
		ubyte[] ret = (cast(ubyte*)ptr)[0..cap];
		return ret;
	}

	public void* getPtr()
	{
		return ptr;
	}

	public int cap()
	{
		return _cap;
	}
}

how can i write a utf8 string into the buffer?

Mar 04 2015

"Kagamin" <spam here.lot> writes:

string s;
char[] b = cast(char[])asArray();
b[0..s.length] = s[];

Mar 05 2015

"zhmt" <zhmtzhmt qq.com> writes:

On Thursday, 5 March 2015 at 09:42:53 UTC, Kagamin wrote:
 string s;
 char[] b = cast(char[])asArray();
 b[0..s.length] = s[];

Thank you very much. I should stop my developing , and read the 
dlang tutorial again.

Mar 05 2015

FG <home fgda.pl> writes:

On 2015-03-05 at 10:42, Kagamin wrote:
 string s;
 char[] b = cast(char[])asArray();
 b[0..s.length] = s[];

It's a bit more complicated than that if you include cutting string for buffers
with smaller capacity, doing so respecting UTF-8, and adding a '\0' sentinel,
since you may want to use the string in C (if I assume correctly). The
setString function does all that:



import std.stdio, std.range, std.c.stdlib;

class Buffer {
     private void *ptr;
     private int size;
     private int _cap;

     public this(int cap) { ptr = malloc(cap); this._cap = cap; }
     public ~this() { free(ptr); }
     public ubyte[] asArray() { ubyte[] ret = (cast(ubyte*)ptr)[0..cap]; return
ret; }
     public void* getPtr() { return ptr; }
     public int cap() { return _cap; }
}

int setString(Buffer buffer, string s)
{
     assert(buffer.cap > 0);
     char[] b = cast(char[])buffer.asArray();
     int len = min(s.length, buffer.cap - 1);
     int break_at;
     // The dchar is essential in walking over UTF-8 code points.
     // break_at will hold the last position at which the string can be cleanly
cut
     foreach (int i, dchar v; s) {
         if (i == len) { break_at = i; break; }
         if (i > len) break;
         break_at = i;
     }
     len = break_at;
     b[0..len] = s[0..len];

     // add a sentinel if you want to use the string in C
     b[len] = '\0';
     // you could at this point set buffer.size to len in order to use the
string in D
     return len;
}

void main()
{
     string s = "ąćęłńóśźż";
     foreach (i; 1..24) {
         Buffer buffer = new Buffer(i);
         int len = setString(buffer, s);
         printf("bufsize %2d -- strlen %2d -- %s --\n", i, len, buffer.getPtr);
     }
}



Output of the program:

bufsize  1 -- strlen  0 --  --
bufsize  2 -- strlen  0 --  --
bufsize  3 -- strlen  2 -- ą --
bufsize  4 -- strlen  2 -- ą --
bufsize  5 -- strlen  4 -- ąć --
bufsize  6 -- strlen  4 -- ąć --
bufsize  7 -- strlen  6 -- ąćę --
bufsize  8 -- strlen  6 -- ąćę --
bufsize  9 -- strlen  8 -- ąćęł --
bufsize 10 -- strlen  8 -- ąćęł --
bufsize 11 -- strlen 10 -- ąćęłń --
bufsize 12 -- strlen 10 -- ąćęłń --
bufsize 13 -- strlen 12 -- ąćęłńó --
bufsize 14 -- strlen 12 -- ąćęłńó --
bufsize 15 -- strlen 14 -- ąćęłńóś --
bufsize 16 -- strlen 14 -- ąćęłńóś --
bufsize 17 -- strlen 16 -- ąćęłńóśź --
bufsize 18 -- strlen 16 -- ąćęłńóśź --
bufsize 19 -- strlen 16 -- ąćęłńóśź --
bufsize 20 -- strlen 16 -- ąćęłńóśź --
bufsize 21 -- strlen 16 -- ąćęłńóśź --
bufsize 22 -- strlen 16 -- ąćęłńóśź --
bufsize 23 -- strlen 16 -- ąćęłńóśź --

Mar 05 2015

"Kagamin" <spam here.lot> writes:

On Thursday, 5 March 2015 at 13:57:45 UTC, FG wrote:
 void main()
 {
     string s = "ąćęłńóśźż";

Try with string s = "ąc\u0301ęłńóśźż";

Mar 05 2015

FG <home fgda.pl> writes:

On 2015-03-05 at 15:18, Kagamin wrote:
 On Thursday, 5 March 2015 at 13:57:45 UTC, FG wrote:
 void main()
 {
     string s = "ąćęłńóśźż";

 Try with string s = "ąc\u0301ęłńóśźż";

Yeah, I see your point: ą, ąc (missing diacritic), ąć, ąćę, ...
Damn those composite characters!

Mar 05 2015

ketmar <ketmar ketmar.no-ip.org> writes:

On Thu, 05 Mar 2015 16:36:35 +0100, FG wrote:

 Damn those composite characters!

or invisible ones. or RTL switch.

unicode sux[1].

[1] http://file.bestmx.net/ee/articles/uni_vs_code.pdf=

Mar 05 2015

=?UTF-8?B?QWxpIMOHZWhyZWxp?= <acehreli yahoo.com> writes:

On 03/05/2015 03:25 PM, ketmar wrote:

 unicode sux[1].

 [1] http://file.bestmx.net/ee/articles/uni_vs_code.pdf

Thanks. I enjoyed the article and I agree with everything said in there.

It made me happy that I was not the only person who has been ruminating 
over "alphabet" as the crucial piece in this whole Unicode story. I've 
been giving the example of if I have a company name as the string "ali & 
jim", the uppercase of it should be "ALİ & JIM" because the different 
letter 'i's belong to different alphabets. Anyway...

Here is how I attempted to define an alphabet with its implied collation 
orders. For example, for the Turkish alphabet:

   https://code.google.com/p/trileri/source/browse/trunk/tr/alfabe.d#796

Unfortunately, the code itself is in Turkish, has never been finished, 
bad and older D code, and is abandoned at this point. :-/

Ali

Mar 05 2015

"Kagamin" <spam here.lot> writes:

On Friday, 6 March 2015 at 00:53:49 UTC, Ali Çehreli wrote:
 It made me happy that I was not the only person who has been 
 ruminating over "alphabet" as the crucial piece in this whole 
 Unicode story. I've been giving the example of if I have a 
 company name as the string "ali & jim", the uppercase of it 
 should be "ALİ & JIM" because the different letter 'i's belong 
 to different alphabets.

I'd say, company name should be processed verbatim, no need for 
uppercase should arise.

Mar 05 2015

FG <home fgda.pl> writes:

On 2015-03-06 at 00:25, ketmar wrote:
 unicode sux[1].

 [1] http://file.bestmx.net/ee/articles/uni_vs_code.pdf


Great article. Thanks, Кетмар

   ⚠     ∑ ♫ ⚽ ☀ ☕ ☺  ≡  ♛

Mar 06 2015

D Programming

C/C++ Programming

Other

digitalmars.D.learn - how to write a string to a c pointer?