digitalmars.D.learn - evenChunks on a string - hasLength constraint fails?
- amarillion (29/29) Mar 14 2023 Hey
- Paul Backus (34/38) Mar 14 2023 By default, D's standard library treats a `string` as a range of
- amarillion (8/20) Mar 16 2023 Thanks for the clear explanation! I was already aware that you
Hey I'm trying to split a string down the middle. I thought the function std.range.evenChunks would be perfect for this: ``` import std.range; void main() { string line = "abcdef"; auto parts = evenChunks(line, 2); assert(parts == ["abc", "def"]); } ``` But I'm getting a compiler error: ``` /usr/include/dmd/phobos/std/range/package.d(8569): Candidate is: `evenChunks(Source)(Source source, size_t chunkCount)` with `Source = string` whose parameters have the following constraints: `~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~` ` isForwardRange!Source > hasLength!Source ` `~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~` ./test.d(7): All possible candidates are marked as `deprecated` or ` disable` ``` I'm trying to understand why this doesn't work. I don't really understand the error. If I interpret this correctly, it's missing a length attribute on a string, but shouldn't length be there?
Mar 14 2023
On Tuesday, 14 March 2023 at 08:21:00 UTC, amarillion wrote:I'm trying to understand why this doesn't work. I don't really understand the error. If I interpret this correctly, it's missing a length attribute on a string, but shouldn't length be there?By default, D's standard library treats a `string` as a range of Unicode code points (i.e., a range of `dchar`s), encoded in UTF-8. Because UTF-8 is a variable-length encoding, it's impossible to know how many code points there are in a `string` without iterating it--which means that, as far as the standard library is concerned, `string` does not have a valid `.length` property. This behavior is known as "auto decoding", and is described in more detail in this article by Jack Stouffer: https://jackstouffer.com/blog/d_auto_decoding_and_you.html If you do not want the standard library to treat your `string` as an array of code points, you must use a wrapper like [`std.utf.byCodeUnit`][1] (to get a range of `char`s) or [`std.string.representation`][2] (to get a range of `ubyte`s). For example: ```d auto parts = evenChunks(line.byCodeUnit, 2); ``` Of course, if you do this, there is a risk that you will split a code point in half and end up with invalid Unicode. If your program needs to handle Unicode input, you would be better off finding a different solution—for example, you could use [`std.range.primitives.walkLength`][3] to compute the midpoint of the range by hand, and split it using [`std.range.chunks`][4]: ```d size_t length = line.walkLength; auto parts = chunks(line, length / 2); ``` [1]: https://phobos.dpldocs.info/std.utf.byCodeUnit.html [2]: https://phobos.dpldocs.info/std.string.representation.html [3]: https://phobos.dpldocs.info/std.range.primitives.walkLength.1.html [4]: https://phobos.dpldocs.info/std.range.chunks.html
Mar 14 2023
On Tuesday, 14 March 2023 at 18:41:50 UTC, Paul Backus wrote:On Tuesday, 14 March 2023 at 08:21:00 UTC, amarillion wrote:Thanks for the clear explanation! I was already aware that you could iterate by codepoint with foreach(dchar c; s), but it just didn't cross my mind that the same concept was playing a role here. I guess it's just one of those things that you just have to know. regards, AmarillionI'm trying to understand why this doesn't work. I don't really understand the error. If I interpret this correctly, it's missing a length attribute on a string, but shouldn't length be there?By default, D's standard library treats a `string` as a range of Unicode code points (i.e., a range of `dchar`s), encoded in UTF-8. Because UTF-8 is a variable-length encoding, it's impossible to know how many code points there are in a `string` without iterating it--which means that, as far as the standard library is concerned, `string` does not have a valid `.length` property.
Mar 16 2023