digitalmars.D.learn - std.string.assumeUTF() silently casting mutable to immutable?
- Forest (22/26) Feb 12 2024 I may have found a bug in assumeUTF(), but being new to D, I'm
- Jonathan M Davis (24/50) Feb 13 2024 It's not a bug in assumeUTF. if you changed your code to
I may have found a bug in assumeUTF(), but being new to D, I'm not sure. The description:Assume the given array of integers arr is a well-formed UTF string and return it typed as a UTF string. ubyte becomes char, ushort becomes wchar and uint becomes dchar. Type qualifiers are preserved.The declaration: ```d auto assumeUTF(T)(T[] arr) if (staticIndexOf!(immutable T, immutable ubyte, immutable ushort, immutable uint) != -1) ``` Shouldn't that precondition's `immutable T` be simply `T`? As it stands, I can do this with no complaints from the compiler... ```d string test(ubyte[] arr) { import std.string; return arr.assumeUTF; } ``` ...and accidentally end up with a "string" pointing at mutable data. Am I missing something?
Feb 12 2024
On Tuesday, February 13, 2024 12:40:57 AM MST Forest via Digitalmars-d-learn wrote:I may have found a bug in assumeUTF(), but being new to D, I'm not sure. The description:It's not a bug in assumeUTF. if you changed your code to string test(ubyte[] arr) { import std.string; pragma(msg, typeof(arr.assumeUTF)); return arr.assumeUTF; } then the compiler will output char[] because assumeUTF retains the type qualifier of the original type (as its documentation explains). Rather, it looks like the problem here is that dmd will implictly change the constness of a return value when it thinks that it can do so to make the code work. Essentially, that means that the function has to be pure and that the return value can't have come from any of the function's arguments. And at a glance, that would be true here, because no char[] was passed into assumeUTF. However, casting from ubyte[] to char[] is safe, so dmd should be taking that possibility into account, and it's apparently not. So, there's definitely a bug here, but it's a dmd bug. Its checks for whether it can safely change the constness of the return type apparently aren't sophisticated enough to catch this case. - Jonathan M DavisAssume the given array of integers arr is a well-formed UTF string and return it typed as a UTF string. ubyte becomes char, ushort becomes wchar and uint becomes dchar. Type qualifiers are preserved.The declaration: ```d auto assumeUTF(T)(T[] arr) if (staticIndexOf!(immutable T, immutable ubyte, immutable ushort, immutable uint) != -1) ``` Shouldn't that precondition's `immutable T` be simply `T`? As it stands, I can do this with no complaints from the compiler... ```d string test(ubyte[] arr) { import std.string; return arr.assumeUTF; } ``` ...and accidentally end up with a "string" pointing at mutable data. Am I missing something?
Feb 13 2024
On Tuesday, 13 February 2024 at 08:10:20 UTC, Jonathan M Davis wrote:So, there's definitely a bug here, but it's a dmd bug. Its checks for whether it can safely change the constness of the return type apparently aren't sophisticated enough to catch this case.This is a pretty severe bug. Some test cases: https://d.godbolt.org/z/K1fjdj76M ```d ubyte[] pure_ubyte(ubyte[] arr) pure safe; ubyte[] pure_void(void[] arr) pure safe; ubyte[] pure_int(int[] arr) pure safe; int[] pure_ubyte_to_int(ubyte[] arr) pure safe; // All cases below should not compile, yet some do. immutable(ubyte)[] test(ubyte[] arr) safe { // return with_ubyte(arr); // ERROR: OK return pure_void(arr); // No error: NOK! } immutable(ubyte)[] test(int[] arr) safe { return pure_int(arr); // No error: NOK! } immutable(int)[] test2(ubyte[] arr) safe { return pure_ubyte_to_int(arr); // No error: NOK! } ``` -Johan
Feb 13 2024
On Tuesday, 13 February 2024 at 14:05:03 UTC, Johan wrote:On Tuesday, 13 February 2024 at 08:10:20 UTC, Jonathan M Davis wrote:Thanks, gents. Reported on the tracker: https://issues.dlang.org/show_bug.cgi?id=24394So, there's definitely a bug here, but it's a dmd bug. Its checks for whether it can safely change the constness of the return type apparently aren't sophisticated enough to catch this case.This is a pretty severe bug.
Feb 13 2024
On Wednesday, 14 February 2024 at 02:13:08 UTC, Forest wrote:On Tuesday, 13 February 2024 at 14:05:03 UTC, Johan wrote:This has already been fixed, you just need to use -preview=fixImmutableConv. This was put behind a preview flag as it introduces a breaking change.On Tuesday, 13 February 2024 at 08:10:20 UTC, Jonathan M Davis wrote:Thanks, gents. Reported on the tracker: https://issues.dlang.org/show_bug.cgi?id=24394So, there's definitely a bug here, but it's a dmd bug. Its checks for whether it can safely change the constness of the return type apparently aren't sophisticated enough to catch this case.This is a pretty severe bug.
Feb 14 2024
On Wednesday, 14 February 2024 at 10:57:42 UTC, RazvanN wrote:This has already been fixed, you just need to use -preview=fixImmutableConv. This was put behind a preview flag as it introduces a breaking change.I just tried that flag on run.dlang.org, and although it fixes the case I posted earlier, it doesn't fix this one: ```d string test(const(ubyte)[] arr) { import std.string; return arr.assumeUTF; } ``` Shouldn't this be rejected as well?
Feb 14 2024
On Wednesday, 14 February 2024 at 11:56:29 UTC, Forest wrote:On Wednesday, 14 February 2024 at 10:57:42 UTC, RazvanN wrote:Indeed, that should be rejected as well, otherwise you can modify immutable table. This code currently happily compiles: ```d string test(const(ubyte)[] arr) { import std.string; return arr.assumeUTF; } void main() { import std.stdio; ubyte[] arr = ['a', 'b', 'c']; auto t = test(arr); writeln(t); arr[0] = 'x'; writeln(t); } ``` And prints: ``` abc xbc ``` However, this seems to be a different issue.This has already been fixed, you just need to use -preview=fixImmutableConv. This was put behind a preview flag as it introduces a breaking change.I just tried that flag on run.dlang.org, and although it fixes the case I posted earlier, it doesn't fix this one: ```d string test(const(ubyte)[] arr) { import std.string; return arr.assumeUTF; } ``` Shouldn't this be rejected as well?
Feb 14 2024