digitalmars.D - Bug in countUntil?
- monarch_dodra (18/18) Oct 12 2012 I was looking in countUntil to fix another issue, and I think the
- Jonathan M Davis (13/37) Oct 12 2012 Many algorithms special case narrow strings for efficiency. However, in=
- monarch_dodra (4/42) Oct 12 2012 yeah, that's what I thought, but wanted it double checked. I'll
- =?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= (7/10) Oct 13 2012 Just wanted to mention that this kind of subtle change in behavior can
I was looking in countUntil to fix another issue, and I think the string support is broken This program: //---- import std.algorithm; import std.stdio; void main() { "日本語".countUntil('本').writeln(); } //---- Will produce "3". ... I'd have straight up said it was a bug, but the implementation goes out of its way to special case narrow strings, when the default implementation would have produced the right result anyway. So I was thinking it is somehow by design...? Am I missing something, or is it just implementation sillyness?
Oct 12 2012
On Friday, October 12, 2012 21:02:47 monarch_dodra wrote:I was looking in countUntil to fix another issue, and I think the string support is broken =20 This program: //---- import std.algorithm; import std.stdio; =20 void main() { "=E6=97=A5=E6=9C=AC=E8=AA=9E".countUntil('=E6=9C=AC').writeln();=} //---- =20 Will produce "3". =20 ... =20 I'd have straight up said it was a bug, but the implementation goes out of its way to special case narrow strings, when the default implementation would have produced the right result anyway. So I was thinking it is somehow by design...? =20 Am I missing something, or is it just implementation sillyness?Many algorithms special case narrow strings for efficiency. However, in= this=20 case, it looks just plain wrong. countUntil is supposed to return the n= umber=20 of elements (i.e. code points in this case), but it looks like it's ret= urning=20 the number of code units. So, I'd say that it's definitely wrong. If yo= u want=20 code units, then use std.string.indexOf. countUntil is supposed to retu= rn the=20 number of code points. - Jonathan M Davis
Oct 12 2012
On Friday, 12 October 2012 at 19:17:13 UTC, Jonathan M Davis wrote:On Friday, October 12, 2012 21:02:47 monarch_dodra wrote:yeah, that's what I thought, but wanted it double checked. I'll take care of it then.I was looking in countUntil to fix another issue, and I think the string support is broken This program: //---- import std.algorithm; import std.stdio; void main() { "日本語".countUntil('本').writeln(); } //---- Will produce "3". ... I'd have straight up said it was a bug, but the implementation goes out of its way to special case narrow strings, when the default implementation would have produced the right result anyway. So I was thinking it is somehow by design...? Am I missing something, or is it just implementation sillyness?Many algorithms special case narrow strings for efficiency. However, in this case, it looks just plain wrong. countUntil is supposed to return the number of elements (i.e. code points in this case), but it looks like it's returning the number of code units. So, I'd say that it's definitely wrong. If you want code units, then use std.string.indexOf. countUntil is supposed to return the number of code points. - Jonathan M Davis
Oct 12 2012
Am 10/12/2012 9:27 PM, schrieb monarch_dodra:yeah, that's what I thought, but wanted it double checked. I'll take care of it then.Just wanted to mention that this kind of subtle change in behavior can break a lot of code in non-obvious ways. In any case, the documentation for countUntil, but more importantly for (last)IndexOf, needs to state clearly what it does for narrow strings (the countUntil docs at least imply this by using the term "elements", but an explicit statement can do no harm).
Oct 13 2012