www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - how to write a string to a c pointer?

reply "zhmt" <zhmtzhmt qq.com> writes:
I am writing a asio binding. Objects need to be serialized into a 
buffer (void *),

for example, write utf8 string into buffer,
write int into buffer,
write long into buffer,

Here is my class

class Buffer
{
	private void *ptr;
	private int size;
	private int _cap;

	public this(int cap)
	{
		ptr = malloc(cap);
		this._cap = cap;
	}

	public ~this()
	{
		free(ptr);
	}

	public ubyte[] asArray()
	{
		ubyte[] ret = (cast(ubyte*)ptr)[0..cap];
		return ret;
	}

	public void* getPtr()
	{
		return ptr;
	}

	public int cap()
	{
		return _cap;
	}
}

how can i write a utf8 string into the buffer?
Mar 04 2015
parent reply "Kagamin" <spam here.lot> writes:
string s;
char[] b = cast(char[])asArray();
b[0..s.length] = s[];
Mar 05 2015
next sibling parent "zhmt" <zhmtzhmt qq.com> writes:
On Thursday, 5 March 2015 at 09:42:53 UTC, Kagamin wrote:
 string s;
 char[] b = cast(char[])asArray();
 b[0..s.length] = s[];
Thank you very much. I should stop my developing , and read the dlang tutorial again.
Mar 05 2015
prev sibling parent reply FG <home fgda.pl> writes:
On 2015-03-05 at 10:42, Kagamin wrote:
 string s;
 char[] b = cast(char[])asArray();
 b[0..s.length] = s[];
It's a bit more complicated than that if you include cutting string for buffers with smaller capacity, doing so respecting UTF-8, and adding a '\0' sentinel, since you may want to use the string in C (if I assume correctly). The setString function does all that: import std.stdio, std.range, std.c.stdlib; class Buffer { private void *ptr; private int size; private int _cap; public this(int cap) { ptr = malloc(cap); this._cap = cap; } public ~this() { free(ptr); } public ubyte[] asArray() { ubyte[] ret = (cast(ubyte*)ptr)[0..cap]; return ret; } public void* getPtr() { return ptr; } public int cap() { return _cap; } } int setString(Buffer buffer, string s) { assert(buffer.cap > 0); char[] b = cast(char[])buffer.asArray(); int len = min(s.length, buffer.cap - 1); int break_at; // The dchar is essential in walking over UTF-8 code points. // break_at will hold the last position at which the string can be cleanly cut foreach (int i, dchar v; s) { if (i == len) { break_at = i; break; } if (i > len) break; break_at = i; } len = break_at; b[0..len] = s[0..len]; // add a sentinel if you want to use the string in C b[len] = '\0'; // you could at this point set buffer.size to len in order to use the string in D return len; } void main() { string s = "ąćęłńóśźż"; foreach (i; 1..24) { Buffer buffer = new Buffer(i); int len = setString(buffer, s); printf("bufsize %2d -- strlen %2d -- %s --\n", i, len, buffer.getPtr); } } Output of the program: bufsize 1 -- strlen 0 -- -- bufsize 2 -- strlen 0 -- -- bufsize 3 -- strlen 2 -- ą -- bufsize 4 -- strlen 2 -- ą -- bufsize 5 -- strlen 4 -- ąć -- bufsize 6 -- strlen 4 -- ąć -- bufsize 7 -- strlen 6 -- ąćę -- bufsize 8 -- strlen 6 -- ąćę -- bufsize 9 -- strlen 8 -- ąćęł -- bufsize 10 -- strlen 8 -- ąćęł -- bufsize 11 -- strlen 10 -- ąćęłń -- bufsize 12 -- strlen 10 -- ąćęłń -- bufsize 13 -- strlen 12 -- ąćęłńó -- bufsize 14 -- strlen 12 -- ąćęłńó -- bufsize 15 -- strlen 14 -- ąćęłńóś -- bufsize 16 -- strlen 14 -- ąćęłńóś -- bufsize 17 -- strlen 16 -- ąćęłńóśź -- bufsize 18 -- strlen 16 -- ąćęłńóśź -- bufsize 19 -- strlen 16 -- ąćęłńóśź -- bufsize 20 -- strlen 16 -- ąćęłńóśź -- bufsize 21 -- strlen 16 -- ąćęłńóśź -- bufsize 22 -- strlen 16 -- ąćęłńóśź -- bufsize 23 -- strlen 16 -- ąćęłńóśź --
Mar 05 2015
parent reply "Kagamin" <spam here.lot> writes:
On Thursday, 5 March 2015 at 13:57:45 UTC, FG wrote:
 void main()
 {
     string s = "ąćęłńóśźż";
Try with string s = "ąc\u0301ęłńóśźż";
Mar 05 2015
parent reply FG <home fgda.pl> writes:
On 2015-03-05 at 15:18, Kagamin wrote:
 On Thursday, 5 March 2015 at 13:57:45 UTC, FG wrote:
 void main()
 {
     string s = "ąćęłńóśźż";
Try with string s = "ąc\u0301ęłńóśźż";
Yeah, I see your point: ą, ąc (missing diacritic), ąć, ąćę, ... Damn those composite characters!
Mar 05 2015
parent reply ketmar <ketmar ketmar.no-ip.org> writes:
On Thu, 05 Mar 2015 16:36:35 +0100, FG wrote:

 Damn those composite characters!
or invisible ones. or RTL switch. unicode sux[1]. [1] http://file.bestmx.net/ee/articles/uni_vs_code.pdf=
Mar 05 2015
next sibling parent reply =?UTF-8?B?QWxpIMOHZWhyZWxp?= <acehreli yahoo.com> writes:
On 03/05/2015 03:25 PM, ketmar wrote:

 unicode sux[1].

 [1] http://file.bestmx.net/ee/articles/uni_vs_code.pdf
Thanks. I enjoyed the article and I agree with everything said in there. It made me happy that I was not the only person who has been ruminating over "alphabet" as the crucial piece in this whole Unicode story. I've been giving the example of if I have a company name as the string "ali & jim", the uppercase of it should be "ALİ & JIM" because the different letter 'i's belong to different alphabets. Anyway... Here is how I attempted to define an alphabet with its implied collation orders. For example, for the Turkish alphabet: https://code.google.com/p/trileri/source/browse/trunk/tr/alfabe.d#796 Unfortunately, the code itself is in Turkish, has never been finished, bad and older D code, and is abandoned at this point. :-/ Ali
Mar 05 2015
parent "Kagamin" <spam here.lot> writes:
On Friday, 6 March 2015 at 00:53:49 UTC, Ali Çehreli wrote:
 It made me happy that I was not the only person who has been 
 ruminating over "alphabet" as the crucial piece in this whole 
 Unicode story. I've been giving the example of if I have a 
 company name as the string "ali & jim", the uppercase of it 
 should be "ALİ & JIM" because the different letter 'i's belong 
 to different alphabets.
I'd say, company name should be processed verbatim, no need for uppercase should arise.
Mar 05 2015
prev sibling parent FG <home fgda.pl> writes:
On 2015-03-06 at 00:25, ketmar wrote:
 unicode sux[1].

 [1] http://file.bestmx.net/ee/articles/uni_vs_code.pdf
Great article. Thanks, Кетмар ⚠ ∑ ♫ ⚽ ☀ ☕ ☺ ≡ ♛
Mar 06 2015