digitalmars.D.learn - Handling arbitrary char ranges
- Matt Kline (78/78) Apr 20 2016 I'm doing some work with a REST API, and I wrote a simple utility
- Alex Parrill (10/14) Apr 20 2016 IO functions usually work with octets, not characters, so an
- ag0aep6g (35/56) Apr 20 2016 Well, string is not a char range. If you want to accept string, you have...
- Matt Kline (8/11) Apr 20 2016 I don't have an option here, do I? I assume HTTP.onSend doesn't
- ag0aep6g (8/14) Apr 20 2016 Maybe I've missed it, but you didn't say where the HTTP type comes from,...
- Matt Kline (11/13) Apr 20 2016 std.net.curl: https://dlang.org/phobos/std_net_curl.html#.HTTP
- ag0aep6g (9/16) Apr 20 2016 I don't know if a PR would be worthwhile. What you say makes sense to
- Alex Parrill (36/37) Apr 20 2016 First, you can't assign anything to a void[], for the same reason
- ag0aep6g (18/40) Apr 20 2016 Not true. You can assign any dynamic array to a void[].
- Alex Parrill (10/59) Apr 20 2016 That's not assigning the elements of a void[]; it's just changing
- ag0aep6g (29/43) Apr 20 2016 True, but assigning elements is possible via slices as shown.
I'm doing some work with a REST API, and I wrote a simple utility function that sets an HTTP's onSend callback to send a string: property outgoingString(ref HTTP request, const(void)[] sendData) { import std.algorithm : min; request.contentLength = sendData.length; auto remainingData = sendData; request.onSend = delegate size_t(void[] buf) { size_t minLen = min(buf.length, remainingData.length); if (minLen == 0) return 0; buf[0..minLen] = remainingData[0..minLen]; remainingData = remainingData[minLen..$]; return minLen; }; } I then wrote a function that lazily strips some whitespace from strings I send. To accommodate this change, I need to modify the function above so it takes arbitrary ranges of char elements. I assumed this would be a modest task, but it's been an exercise in frustration. The closest I got was: property void outgoingString(T)(ref HTTP request, T sendData) { static if (isArray!T) { import std.algorithm : min; request.contentLength = sendData.length; request.onSend = delegate size_t(void[] buf) { size_t minLen = min(buf.length, sendData.length); if (minLen == 0) return 0; sendData = sendData[minLen..$]; return minLen; }; } else { __FUNCTION__ ~ " only takes char ranges, not " ~ typeof(bcu.front).stringof); // Length unknown; chunked transfer request.contentLength = ulong.max; request.onSend = delegate size_t(void[] buf) { for (size_t i = 0; i < buf.length; ++i) { if (bcu.empty) return i; bcu.popFront(); } return buf.length; }; } } To each of the commented lines above, 1. What is the idiomatic way to constrain the function to only take char ranges? One might naïvely add `is(ElementType!T : char)`, but that falls on its face due to strings "auto-decoding" their elements to dchar. (More on that later.) 2. The function fails to compile, issuing, "cannot implicitly convert expression (sendData[0..minLen]) of type string to void[]" on this line. I assume this has to do with the immutability of string elements. Specifying a non-const array of const elements is as simple as `const(void)[]`, but how does one do this here, with a template argument? 3. Is this needed, or is auto-decoding behavior specific to char arrays and not other char ranges? 4. Is this a sane approach to make sure I'm dealing with ranges of chars? Do I need to use `Unqual` to deal with const or immutable elements? 5. This fails, claiming the right hand side can't be converted to type void. Casting to ubyte doesn't help - so how *does* one write to an element of a void array? Am I making things harder than they have to be? Or is dealing with an arbitrary ranges of chars this complex? I've lost count of times templated code wouldn't compile because dchar was sneaking in somewhere... at least I'm in good company. (http://forum.dlang.org/post/m01r3d$1frl$1 digitalmars.com)
Apr 20 2016
On Wednesday, 20 April 2016 at 17:09:29 UTC, Matt Kline wrote:I'm doing some work with a REST API, and I wrote a simple utility function that sets an HTTP's onSend callback to send a string: [...]IO functions usually work with octets, not characters, so an extra encoding step is needed. For encoding character arrays to have something for arbitrary ranges. Avoid slicing ranges; not all ranges support it. If you absolutely need it (you don't here) then add hasSlicing to the constraint. isSomeChar can tell you if a type (like the ranges element type) is a character.
Apr 20 2016
On 20.04.2016 19:09, Matt Kline wrote:1. What is the idiomatic way to constrain the function to only take char ranges? One might naïvely add `is(ElementType!T : char)`, but that falls on its face due to strings "auto-decoding" their elements to dchar. (More on that later.)Well, string is not a char range. If you want to accept string, you have to special case it. Rejecting string is an option, though. The caller would then have to make a char range from the string. There's std.utf.byCodeUnit for that.2. The function fails to compile, issuing, "cannot implicitly convert expression (sendData[0..minLen]) of type string to void[]" on this line. I assume this has to do with the immutability of string elements. Specifying a non-const array of const elements is as simple as `const(void)[]`, but how does one do this here, with a template argument?Looks like a compiler bug to me. It works when you do it in two steps: ---- string sendData = "foo"; void[] buf = new void[3]; immutable(void)[] voidSendData = sendData; buf[] = voidSendData[]; ---- I've filed an issue: https://issues.dlang.org/show_bug.cgi?id=159423. Is this needed, or is auto-decoding behavior specific to char arrays and not other char ranges?Auto-decoding is specific to arrays.4. Is this a sane approach to make sure I'm dealing with ranges of chars? Do I need to use `Unqual` to deal with const or immutable elements?is(Foo : char) also accepts byte, ubyte, bool, and user-defined types with an alias this to a char. You don't need Unqual with `: char`. Since immutable(char) is a value type still, it implicitly converts to char. However, if you want to reject those other types, and only accept char and its qualified variants, then you need Unqual: ---- is(Unqual!(ElementType!bcu) == char) ----5. This fails, claiming the right hand side can't be converted to type void. Casting to ubyte doesn't help - so how *does* one write to an element of a void array?void[] is a bit of a special case. A single value of type void isn't really a thing. You can't write `void v = 1;`. Maybe use ubyte[] for the buffer type instead. To do it with void[], I guess you'd have to slice things: ---- char c = 'x'; void[] buf = new void[1]; buf[0 .. 1] = (&c)[0 .. 1]; ----Am I making things harder than they have to be? Or is dealing with an arbitrary ranges of chars this complex? I've lost count of times templated code wouldn't compile because dchar was sneaking in somewhere... at least I'm in good company. (http://forum.dlang.org/post/m01r3d$1frl$1 digitalmars.com)I think your problems come more from wanting to accept string, which simply isn't a char range, and from using void[] as the buffer type.
Apr 20 2016
On Wednesday, 20 April 2016 at 19:29:22 UTC, ag0aep6g wrote:Maybe use ubyte[] for the buffer type instead.I don't have an option here, do I? I assume HTTP.onSend doesn't take a `delegate size_t(ubyte[])` insetad of a `delegate size_t(void[])`, and that the former isn't implicitly convertible to the latter.I think your problems come more from wanting to accept string, which simply isn't a char rangeIs this due solely to the "auto-decode" behavior? Generally, (except apparently in this case) don't arrays of type T qualify as InputRanges of type T?
Apr 20 2016
On 20.04.2016 21:48, Matt Kline wrote:I don't have an option here, do I? I assume HTTP.onSend doesn't take a `delegate size_t(ubyte[])` insetad of a `delegate size_t(void[])`, and that the former isn't implicitly convertible to the latter.Maybe I've missed it, but you didn't say where the HTTP type comes from, did you? If it's not under your control, then yeah, I guess you have to deal with void[]. [...]Is this due solely to the "auto-decode" behavior? Generally, (except apparently in this case) don't arrays of type T qualify as InputRanges of type T?Yep. Generally, T[] is a range with element type T. char[], wchar[], and their qualified variants are the exception. And the reason is auto-decoding to dchar, yes.
Apr 20 2016
On Wednesday, 20 April 2016 at 20:00:58 UTC, ag0aep6g wrote:Maybe I've missed it, but you didn't say where the HTTP type comes from, did you?std.net.curl: https://dlang.org/phobos/std_net_curl.html#.HTTP (Sorry, I assumed that was a given since it's a standard library type. Poor assumption, perhaps.) I'd rather not write my own cURL wrapper. Do you think it would be worthwhile starting a PR for Phobos to get it changed to ubyte[]? A reading of https://dlang.org/spec/arrays.html indicates the main difference is that that GC crawls void[], but I would think that wouldn't matter for a short-lived buffer being shoveled into libcurl, which is, by nature, a copy of the same data somewhere else in your program...
Apr 20 2016
On 20.04.2016 22:09, Matt Kline wrote:I'd rather not write my own cURL wrapper. Do you think it would be worthwhile starting a PR for Phobos to get it changed to ubyte[]? A reading of https://dlang.org/spec/arrays.html indicates the main difference is that that GC crawls void[], but I would think that wouldn't matter for a short-lived buffer being shoveled into libcurl, which is, by nature, a copy of the same data somewhere else in your program...I don't know if a PR would be worthwhile. What you say makes sense to me, but I am by no means an expert here. As you say, void[] is the safer default with regards to the GC. It's also simpler to get a void[] from an arbitrary array, as any array implicitly converts to void[] (given compatible qualifiers). Getting a void[] from an arbitrary range isn't that simple, but getting a ubyte[] from an int[] requires some work, too. void[] is possibly be the better option all around.
Apr 20 2016
On Wednesday, 20 April 2016 at 17:09:29 UTC, Matt Kline wrote:[...]First, you can't assign anything to a void[], for the same reason you can't dereference a void*. This includes the slice assignment that you are trying to do in `buf[0..minLen] = remainingData[0..minLen];`. Cast the buffer to a `ubyte[]` buffer first, then you can assign bytes to it. auto bytebuf = cast(ubyte[]) buf; bytebuf[0] = 123; Second, don't use slicing on ranges (unless you need it). Not all ranges support it... auto buf = [1,2,3]; auto rng = filter!(x => x != 1)(buf); pragma(msg, hasSlicing!(typeof(rng))); // false ... and even ranges that support it don't support assigning to an array by slice: auto buf = new int[](3); buf[] = only(1,2,3)[]; // cannot implicitly convert expression (only(1, 2, 3).opSlice()) of type OnlyResult!(int, 3u) to int[] Instead, use a loop (or maybe `put`) to fill the array. Third, don't treat text as bytes; encode your characters. auto schema = EncodingScheme.create("utf-8"); auto range = chain("hello", " ", "world").map!(ch => cast(char) ch); auto buf = new ubyte[](100); auto currentPos = buf; while(!range.empty && schema.encodedLength(range.front) <= currentPos.length) { auto written = schema.encode(range.front, currentPos); currentPos = currentPos[written..$]; range.popFront(); } buf = buf[0..buf.length - currentPos.length]; (PS there ought to be a range in Phobos that encodes each character, something like map maybe)
Apr 20 2016
On 20.04.2016 23:59, Alex Parrill wrote:On Wednesday, 20 April 2016 at 17:09:29 UTC, Matt Kline wrote:Not true. You can assign any dynamic array to a void[]. Regarding vector notation, the spec doesn't seem to mention how it interacts with void[], but dmd accepts this no problem: ---- int[] i = [1, 2, 3]; auto v = new void[](3 * int.sizeof); v[] = i[]; ---- [...][...]First, you can't assign anything to a void[], for the same reason you can't dereference a void*. This includes the slice assignment that you are trying to do in `buf[0..minLen] = remainingData[0..minLen];`.Second, don't use slicing on ranges (unless you need it). Not all ranges support it...As far as I see, the slicing code is guarded by `static if (isArray!T)`. Arrays support slicing. [...]Instead, use a loop (or maybe `put`) to fill the array.That's what done in the `else` path, no?Third, don't treat text as bytes; encode your characters. auto schema = EncodingScheme.create("utf-8"); auto range = chain("hello", " ", "world").map!(ch => cast(char) ch); auto buf = new ubyte[](100); auto currentPos = buf; while(!range.empty && schema.encodedLength(range.front) <= currentPos.length) { auto written = schema.encode(range.front, currentPos); currentPos = currentPos[written..$]; range.popFront(); } buf = buf[0..buf.length - currentPos.length];You're "converting" chars to UTF-8 here, right? That's a nop. char is a UTF-8 code unit already.(PS there ought to be a range in Phobos that encodes each character, something like map maybe)std.utf.byChar and friends: https://dlang.org/phobos/std_utf.html#.byChar
Apr 20 2016
On Wednesday, 20 April 2016 at 22:44:37 UTC, ag0aep6g wrote:On 20.04.2016 23:59, Alex Parrill wrote:That's not assigning the elements of a void[]; it's just changing what the slice points to and adjusting the length, like doing `void* ptr = someOtherPtr;`On Wednesday, 20 April 2016 at 17:09:29 UTC, Matt Kline wrote:Not true. You can assign any dynamic array to a void[].[...]First, you can't assign anything to a void[], for the same reason you can't dereference a void*. This includes the slice assignment that you are trying to do in `buf[0..minLen] = remainingData[0..minLen];`.Regarding vector notation, the spec doesn't seem to mention how it interacts with void[], but dmd accepts this no problem: ---- int[] i = [1, 2, 3]; auto v = new void[](3 * int.sizeof); v[] = i[]; ----It only seems to work on arrays, not arbitrary ranges, sliceable or not. Though see below.[...]Yes, I did not see the static if condition, my bad.Second, don't use slicing on ranges (unless you need it). Not all ranges support it...As far as I see, the slicing code is guarded by `static if (isArray!T)`. Arrays support slicing. [...]Instead, use a loop (or maybe `put`) to fill the array.That's what done in the `else` path, no?It can be either chars, wchars, or dchars.Third, don't treat text as bytes; encode your characters. auto schema = EncodingScheme.create("utf-8"); auto range = chain("hello", " ", "world").map!(ch => cast(char) ch); auto buf = new ubyte[](100); auto currentPos = buf; while(!range.empty && schema.encodedLength(range.front) <= currentPos.length) { auto written = schema.encode(range.front, currentPos); currentPos = currentPos[written..$]; range.popFront(); } buf = buf[0..buf.length - currentPos.length];You're "converting" chars to UTF-8 here, right? That's a nop. char is a UTF-8 code unit already.byChar would work. byWChar and byDChar might cause endian-ness issues.(PS there ought to be a range in Phobos that encodes each character, something like map maybe)std.utf.byChar and friends: https://dlang.org/phobos/std_utf.html#.byChar
Apr 20 2016
On 21.04.2016 04:35, Alex Parrill wrote:On Wednesday, 20 April 2016 at 22:44:37 UTC, ag0aep6g wrote:[...]On 20.04.2016 23:59, Alex Parrill wrote:That's not assigning the elements of a void[]; it's just changing what the slice points to and adjusting the length, like doing `void* ptr = someOtherPtr;`True, but assigning elements is possible via slices as shown. [...]It only seems to work on arrays, not arbitrary ranges, sliceable or not. Though see below.Yes, assigning slices and more complex vector operations only works with dynamic arrays. [...][...]auto range = chain("hello", " ", "world").map!(ch => cast(char) ch);[...]auto written = schema.encode(range.front, currentPos);Your range specifically has element type char, though. Not wchar or dchar. And Matt Kline wants to work on char ranges (and maybe string), not on arbitrary ranges of char/wchar/dchar. [...]You're "converting" chars to UTF-8 here, right? That's a nop. char is a UTF-8 code unit already.It can be either chars, wchars, or dchars.byChar would work. byWChar and byDChar might cause endian-ness issues.Easily combined with the endianess functions from std.bitmanip: ---- void main() { import std.algorithm: equal; import std.bitmanip: nativeToBigEndian, nativeToLittleEndian; import std.utf: byWchar; string utf8 = "foobär"; auto utf16le = utf8.byWchar.map!nativeToLittleEndian; auto utf16be = utf8.byWchar.map!nativeToBigEndian; assert(equal(utf16le, [['f', 0], ['o', 0], ['o', 0], ['b', 0], [0xE4, 0], ['r', 0]])); assert(equal(utf16be, [[0, 'f'], [0, 'o'], [0, 'o'], [0, 'b'], [0, 0xE4], [0, 'r']])); } ----
Apr 20 2016