digitalmars.D - Interpolated strings and SQL
- Walter Bright (86/86) Jan 08 2024 Here's how SQL support is done for DIP1036:
- Nickolay Bukreyev (110/110) Jan 08 2024 Hello. It is fascinating to see string interpolation in D. Let me
- Nickolay Bukreyev (8/10) Jan 09 2024 Shame on me. `segregatedInterpolations(Args...)` should end with
- Walter Bright (25/50) Jan 09 2024 Thank you for your thoughts!
- Alexandru Ermicioi (11/15) Jan 09 2024 If that's the case, then 1036 wins imho, by simple thing of not
- Walter Bright (50/52) Jan 09 2024 Consider the overhead 1036 has by comparing it with plain writeln or wri...
- Alexandru Ermicioi (11/16) Jan 09 2024 How is this related to original argument of not requiring any
- Timon Gehr (16/33) Jan 09 2024 I think Alexandru and Nickolay already discharged the concerns about
- Walter Bright (39/42) Jan 10 2024 Yes, I used writeln instead of writefln. The similarity between the two ...
- Nickolay Bukreyev (39/46) Jan 10 2024 It looks very similar to what I presented in my later posts
- Walter Bright (8/11) Jan 10 2024 Thank you for the explanation. It was entirely missing from the spec, an...
- Nickolay Bukreyev (16/23) Jan 10 2024 Importance of the ability to do processing at compile time was
- Timon Gehr (3/6) Jan 11 2024 As far as I am concerned it is a must-have. For example, this is what
- Walter Bright (3/9) Jan 11 2024 Why does compile time make it a guarantee and runtime not?
- Richard (Rikki) Andrew Cattermole (10/21) Jan 11 2024 Where possible we absolutely should not be.
- Walter Bright (13/24) Jan 11 2024 I agree that compile time checking is preferable. But there is a cost in...
- Richard (Rikki) Andrew Cattermole (13/43) Jan 11 2024 So I guess the question is, do you want to hear from a company that they...
- Richard (Rikki) Andrew Cattermole (15/15) Jan 11 2024 Let's try something different.
- zjh (8/16) Jan 11 2024 I think D language can create an `attribute dictionary` for any
- Timon Gehr (10/21) Jan 12 2024 Because a SQL injection attack by definition is when a third party can
- Richard (Rikki) Andrew Cattermole (14/17) Jan 10 2024 Another potential solution would be to allow passing metadata on the
- Nickolay Bukreyev (16/22) Jan 10 2024 Sorry, I don’t understand how this can possibly work. After
- Richard (Rikki) Andrew Cattermole (6/33) Jan 10 2024 This has side effects. It affects ``ref`` and ``out``. It also affects
- Nickolay Bukreyev (6/11) Jan 10 2024 Thank you for the clarification. I see a downside that pretty
- Steven Schveighoffer (88/100) Jan 10 2024 Yes, DIP1036e has a lot of extra templates generated, and the
- Timon Gehr (19/83) Jan 11 2024 My point was with DIP1036e it either works or does not compile, not that...
- Walter Bright (4/6) Jan 11 2024 What's missing is why is a runtime check not good enough? The D compiler...
- Timon Gehr (3/10) Jan 12 2024 Sure.
- Steven Schveighoffer (15/26) Jan 09 2024 Yeah, and writeln could avoid those if it's that important. A
- Walter Bright (8/18) Jan 10 2024 I've been aware for a long time that writeln and writefln are very ineff...
- Hipreme (38/60) Jan 10 2024 Are you sure you really want to keep optimizing debug logging
- Walter Bright (4/18) Jan 10 2024 I regularly work on many of those problems. For example, without looking...
- Timon Gehr (3/23) Jan 11 2024 Thanks a lot for the incredible amount of work you have invested into D
- Walter Bright (2/4) Jan 11 2024 It is indeed my pleasure, especially the privilege of working with you g...
- Nickolay Bukreyev (20/33) Jan 09 2024 No. This line is inside `enum string query = () { ... }();`. So
- Paolo Invernizzi (60/93) Jan 09 2024 Compile time string creation when dealing with SQL give you the
- Timon Gehr (5/7) Jan 09 2024 Yes. Besides the usability benefits you allude to, it is simply a
- Nickolay Bukreyev (7/8) Jan 09 2024 Oh, I realized you might be reading this without a fancy Markdown
- Timon Gehr (2/6) Jan 09 2024 And I stand by that.
- Walter Bright (2/9) Jan 09 2024 But I showed that DIP1027 could do the SQL example.
- Timon Gehr (2/12) Jan 09 2024 You actually did not.
- Walter Bright (2/3) Jan 09 2024 See my other reply to you in this thread.
- Nickolay Bukreyev (5/7) Jan 09 2024 Also, when I said, _like in Swift_, in no event was I meaning,
- Nickolay Bukreyev (14/14) Jan 09 2024 I’ve just realized DIP1036 has an excellent feature that is not
- Walter Bright (15/33) Jan 09 2024 The compiler will indeed reject it (The error message would be a bit baf...
- Nickolay Bukreyev (41/61) Jan 10 2024 Yes! It would be brilliant if `alias` could refer to any
- Nickolay Bukreyev (18/32) Jan 10 2024 Well, `InterpolatedLiteral` and `InterpolatedExpression` don’t
- Walter Bright (22/24) Jan 10 2024 Structs with no fields have a size of 1 byte for D and C++ structs, and ...
- Timon Gehr (5/11) Jan 11 2024 I am not a big fan of this option. If we are going to allow passing
- Timon Gehr (18/59) Jan 11 2024 What we want that DIP1036e mostly provides is:
- Timon Gehr (6/10) Jan 11 2024 if (condition);
- Timon Gehr (23/127) Jan 09 2024 This is not ironic at all. The point is it _can_ do that, while DIP1027
- Walter Bright (18/19) Jan 09 2024 How so? Consider this:
- Timon Gehr (13/39) Jan 09 2024 It does not compile. The arg->args fix I'll grant you as it is a typo
- Walter Bright (21/35) Jan 09 2024 It was just a proof of concept piece of code. execi could check for form...
- Nickolay Bukreyev (59/71) Jan 09 2024 A valid point, thanks. Could you test if that fixes the issue?
- Walter Bright (18/49) Jan 09 2024 Yes, that works.
- Paolo Invernizzi (10/24) Jan 09 2024 No.
- Paolo Invernizzi (8/18) Jan 09 2024 You are underestimating what can be gained as value in catching
- Walter Bright (22/26) Jan 11 2024 Please expand on that. This is a very important topic. I want to know al...
- Paolo Invernizzi (71/78) Jan 12 2024 As a preamble, we are _currently_ doing all the SQL validations
- Timon Gehr (18/21) Jan 12 2024 This is not true, DIP1027 also suffers from other drawbacks. For example...
- Timon Gehr (4/6) Jan 12 2024 - In any case, DIP1027 cannot support nested expression sequences
- Steven Schveighoffer (44/65) Jan 12 2024 The point is to pass the things that the compiler knows to the
- Nickolay Bukreyev (110/135) Jan 09 2024 To sum up, it works with nested istrings poorly; it may even be
- Timon Gehr (4/11) Jan 11 2024 That is not true in the least. It validates conclusively that no SQL
- Timon Gehr (58/119) Jan 09 2024 Adam's `execi` partially runs at compile time and partially of course it...
- Walter Bright (1/1) Jan 11 2024 I'd like to see an example of how DIP1027 does not prevent an injection ...
- Timon Gehr (92/94) Jan 12 2024 ```d
- Timon Gehr (6/10) Jan 09 2024 One more point here is that `text` will of course only work with
- Steven Schveighoffer (76/138) Jan 09 2024 At the end of the day, DIP1027 is an improvement of `writef`, and
- Timon Gehr (10/20) Jan 09 2024 I had typed up a similar point in my post, but then thought that most
- Walter Bright (1/1) Jan 11 2024 Please post an example of a problem it cannot detect.
- Timon Gehr (9/10) Jan 12 2024 For example:
Here's how SQL support is done for DIP1036: https://github.com/adamdruppe/interpolation-examples/blob/master/lib/sql.d ``` auto execi(Args...)(Sqlite db, InterpolationHeader header, Args args, InterpolationFooter footer) { import arsd.sqlite; // sqlite lets you do ?1, ?2, etc enum string query = () { string sql; int number; import std.conv; foreach(idx, arg; Args) static if(is(arg == InterpolatedLiteral!str, string str)) sql ~= str; else static if(is(arg == InterpolationHeader) || is(arg == InterpolationFooter)) throw new Exception("Nested interpolation not supported"); else static if(is(arg == InterpolatedExpression!code, string code)) { } // just skip it else sql ~= "?" ~ to!string(++number); return sql; }(); auto statement = Statement(db, query); int number; foreach(arg; args) { static if(!isInterpolatedMetadata!(typeof(arg))) statement.bind(++number, arg); } return statement.execute(); } ``` This: 1. The istring, after converted to a tuple of arguments, is passed to the `execi` template. 2. It loops over the arguments, essentially turing it (ironically!) back into a format string. The formats, instead of %s, are ?1, ?2, ?3, etc. 3. It skips all the Interpolation arguments inserted by DIP1036. 4. The remaining argument are each bound to the indices 1, 2, 3, ... 5. Then it executes the sql statement. Note that nested istrings are not supported. Let's see how this can work with DIP1027: ``` auto execi(Args...)(Sqlite db, Args args) { import arsd.sqlite; // sqlite lets you do ?1, ?2, etc enum string query = () { string sql; int number; import std.conv; auto fmt = arg[0]; for (size_t i = 0; i < fmt.length, ++i) { char c = fmt[i]; if (c == '%' && i + 1 < fmt.length && fmt[i + 1] == 's') { sql ~= "?" ~ to!string(++number); ++i; } else if (c == '%' && i + 1 < fmt.length && fmt[i + 1] == '%') ++i; // skip escaped % else sql ~= c; } return sql; }(); auto statement = Statement(db, query); int number; foreach(arg; args[1 .. args.length]) { statement.bind(++number, arg); } return statement.execute(); } ``` This: 1. The istring, after converted to a tuple of arguments, is passed to the `execi` template. 2. The first tuple element is the format string. 3. A replacement format string is created by replacing all instances of "%s" with "?n", where `n` is the index of the corresponding arg. 4. The replacement format string is bound to `statement`, and the arguments are bound to their indices. 5. Then it executes the sql statement. It is equivalent.
Jan 08 2024
Hello. It is fascinating to see string interpolation in D. Let me try to spread some light on it; I hope my thoughts will be useful. 1. First of all, I’d like to notice that in the DIP1027 variant of the code we see: > `auto fmt = arg[0];` (`arg` is undeclared identifier here; I presume `args` was meant.) There is a problem: this line is executed at CTFE, but it cannot access `args`, which is a runtime parameter of `execi`. For this to work, the format string should go to a template parameter, and interpolated expressions should go to runtime parameters. How can DIP1027 accomplish this? 2. > Note that nested istrings are not supported. To clarify: “not supported” means one cannot write ``` db.execi(i"SELECT field FROM items WHERE server = $(i"europe$(number)")"); ``` Instead, you have to be more explicit about what you want the inner string to become. This is legal: ``` db.execi(i"SELECT field FROM items WHERE server = $(i"europe$(number)".text)"); ``` However, it is not hard to adjust `execi` so that it fully supports nested istrings: ```d struct Span { size_t i, j; bool topLevel; } enum segregatedInterpolations(Args...) = { Span[ ] result; size_t processedTill; size_t depth; static foreach (i, T; Args) static if (is(T == InterpolationHeader)) { if (!depth++) { result ~= Span(processedTill, i, true); processedTill = i; } } else static if (is(T == InterpolationFooter)) if (!--depth) { result ~= Span(processedTill, i + 1); processedTill = i + 1; } return result; }(); auto execi(Args...)(Sqlite db, InterpolationHeader header, Args args, InterpolationFooter footer) { import std.conv: text, to; import arsd.sqlite; // sqlite lets you do ?1, ?2, etc enum string query = () { string sql; int number; static foreach (span; segregatedInterpolations!Args) static if (span.topLevel) { static foreach (T; Args[span.i .. span.j]) static if (is(T == InterpolatedLiteral!str, string str)) sql ~= str; else static if (is(T == InterpolatedExpression!code, string code)) sql ~= "?" ~ to!string(++number); } return sql; }(); auto statement = Statement(db, query); int number; static foreach (span; segregatedInterpolations!Args) static if (span.topLevel) { static foreach (arg; args[span.i .. span.j]) static if (!isInterpolatedMetadata!(typeof(arg))) statement.bind(++number, arg); } else // Convert a nested interpolation to string with `.text`. statement.bind(++number, args[span.i .. span.j].text); return statement.execute(); } ``` Here, we just invoke `.text` on nested istrings. A more advanced implementation would allocate a buffer and reuse it. It could even be ` nogc` if it wanted. 3. DIP1036 appeals more to me because it passes rich, high-level information about parts of the string. With DIP1027, on the other hand, we have to extract that information ourselves by parsing the string character by character. But the compiler already tokenized the string; why do we have to do it again? (And no, lower level doesn’t imply broader possibilities here.) It may have another implication: looping over characters might put current CTFE engine in trouble if strings are large. Much more iterations need to be executed, and more memory is consumed in the process. We certainly need numbers here, but I thought it was important to at least bring attention to this point. 4. What I don’t like in both DIPs is a rather arbitrary selection of meta characters: `$`, `$$` and `%s`. In regular strings, all of them are just normal characters; in istrings, they gain special meaning. I suppose a cleaner way would be to use `\(...)` syntax (like in Swift). So `i"a \(x) b"` interpolates `x` while `"a \(x) b"` is an immediate syntax error. First, it helps to catch bugs caused by missing `i`. Second, the question, how do we escape `$`, gets the most straightforward answer: we don’t. A downside is that parentheses will always be required with this syntax. But the community preferred them anyway even with `$`.
Jan 08 2024
On Tuesday, 9 January 2024 at 07:30:57 UTC, Nickolay Bukreyev wrote:However, it is not hard to adjust `execi` so that it fully supports nested istrings:Shame on me. `segregatedInterpolations(Args...)` should end with this: ```d result ~= Span(processedTill, Args.length, true); return result; ```
Jan 09 2024
Thank you for your thoughts! On 1/8/2024 11:30 PM, Nickolay Bukreyev wrote:> 1. First of all, I’d like to notice that in the DIP1027 variant of the code wesee: > `auto fmt = arg[0];` (`arg` is undeclared identifier here; I presume `args` was meant.)Yes. I don't have sql on my system, so didn't try to compile it. I always make typos. Oof.There is a problem: this line is executed at CTFE,It's executed at runtime. The code is not optimized for speed, I just wanted to show the concept. The speed doesn't particularly matter, because after all this is a call to a database which is going to be slow. Anyhow, DIP1036 also uses unoptimized code here.3. DIP1036 appeals more to me because it passes rich, high-level information about parts of the string. With DIP1027, on the other hand, we have to extract that information ourselves by parsing the string character by character. But the compiler already tokenized the string; why do we have to do it again? (And no, lower level doesn’t imply broader possibilities here.)DIP1036 also builds a new format string.It may have another implication: looping over characters might put current CTFE engine in trouble if strings are large. Much more iterations need to be executed, and more memory is consumed in the process. We certainly need numbers here, but I thought it was important to at least bring attention to this point.It happens at runtime.4. What I don’t like in both DIPs is a rather arbitrary selection of meta characters: `$`, `$$` and `%s`. In regular strings, all of them are just normal characters; in istrings, they gain special meaning.I looked at several schemes, and picked `$` because it looked the nicest.I suppose a cleaner way would be to use `\(...)` syntax (like in Swift). So `i"a \(x) b"` interpolates `x` while `"a \(x) b"` is an immediate syntax error. First, it helps to catch bugs caused by missing `i`.I'm sorry to say, that looks like tty noise. Aesthetic appeal is very important design consideration for D.Second, the question, how do we escape `$`, gets the most straightforward answer: we don’t.It will rarely need to be escaped, but when one does need it, one needs it!A downside is that parentheses will always be required with this syntax. But the community preferred them anyway even with `$`.DIP1027 does not require ( ) if it's just an identifier. That makes for the shortest, simplest istring syntax. The ( ) usage will be relatively rare. The idea is the most common cases should require the least syntactical noise. Also, the reason I picked the SQL example is because that is the one most cited as being needed and in showing the power of DIP1036 and because I was told that DIP1027 couldn't do it :-) The intent of DIP1027 is not to provide the most powerful, richest mechanism. It's meant to be the simplest I could think of, with the most attractive appearance, minimal runtime overhead, while handling the meat and potatoes use cases.
Jan 09 2024
On Tuesday, 9 January 2024 at 08:29:08 UTC, Walter Bright wrote:The intent of DIP1027 is not to provide the most powerful, richest mechanism. It's meant to be the simplest I could think of, with the most attractive appearance, minimal runtime overhead, while handling the meat and potatoes use cases.If that's the case, then 1036 wins imho, by simple thing of not doing any parsing of format string. Note, that other use cases might not require building of a format string. What about logging functionality? In case of 1036, a log function could just dump all text into sink directly, for 1027 it would still need to parse format string to find where to inject arguments. This use case makes 1036 more favourable than 1027, by your own criterias for a good mechanism.
Jan 09 2024
On 1/9/2024 12:45 AM, Alexandru Ermicioi wrote:If that's the case, then 1036 wins imho, by simple thing of not doing any parsing of format string.Consider the overhead 1036 has by comparing it with plain writeln or writefln: ``` void test(int baz) { writeln(i"$(baz + 4)"); writeln(baz + 5); writefln("%d", baz + 6); } ``` Generated code: 0000: 55 push RBP 0001: 48 8B EC mov RBP,RSP 0004: 48 83 EC 20 sub RSP,020h 0008: 48 89 5D E8 mov -018h[RBP],RBX 000c: 89 7D F8 mov -8[RBP],EDI // baz 000f: 48 83 EC 08 sub RSP,8 0013: 31 C0 xor EAX,EAX 0015: 88 45 F0 mov -010h[RBP],AL 0018: 48 8D 75 F0 lea RSI,-010h[RBP] 001c: FF 36 push dword ptr [RSI] // header 001e: 88 45 F1 mov -0Fh[RBP],AL 0021: 48 8D 5D F1 lea RBX,-0Fh[RBP] 0025: FF 33 push dword ptr [RBX] // expression!"baz + 4" 0027: 8D 7F 04 lea EDI,4[RDI] // baz + 4 002a: 88 45 F2 mov -0Eh[RBP],AL 002d: 48 8D 75 F2 lea RSI,-0Eh[RBP] 0031: FF 36 push dword ptr [RSI] // footer 0033: E8 00 00 00 00 call writeln 0038: 48 83 C4 20 add RSP,020h 003c: 8B 45 F8 mov EAX,-8[RBP] 003f: 8D 78 05 lea EDI,5[RAX] // baz + 5 0042: E8 00 00 00 00 call writeln 0047: BA 00 00 00 00 mov EDX,0 // "%d".ptr 004c: BE 02 00 00 00 mov ESI,2 // "%d".length 0051: 8B 4D F8 mov ECX,-8[RBP] 0054: 8D 79 06 lea EDI,6[RCX] // baz + 6 0057: E8 00 00 00 00 call writefln 005c: 48 8B 5D E8 mov RBX,-018h[RBP] 0060: C9 leave 0061: C3 ret With the istring, there are 4 calls to struct member functions that just return null. This can't be good for performance or program size. We can compute the number of arguments passed to the function: istring: 1 + 3 * <number of arguments> + 1 + 1 (*) writeln: <number of arguments> writefln: 1 + <number of arguments> (*) includes string literals before, between, and after arguments
Jan 09 2024
On Tuesday, 9 January 2024 at 19:05:40 UTC, Walter Bright wrote:On 1/9/2024 12:45 AM, Alexandru Ermicioi wrote:How is this related to original argument of not requiring any parsing to be done by user inside function that accepts istring, that you replied to? I personally would be ok with any overhead 1036 adds as long as I don't need to do any extra work such as parsing. Please take into consideration also code inside function that does accept interpolated string. I'm pretty sure that parsing of format string inside dip1027 function would result in bigger and more complex generated code, than overhead you've mentioned for 1036 version, for use cases similar to logging I've mentioned.If that's the case, then 1036 wins imho, by simple thing of not doing any parsing of format string.Consider the overhead 1036 has by comparing it with plain writeln or writefln:
Jan 09 2024
On 1/9/24 20:05, Walter Bright wrote:On 1/9/2024 12:45 AM, Alexandru Ermicioi wrote:I think Alexandru and Nickolay already discharged the concerns about overhead pretty well, but just note that with DIP1027, `test(3)` prints: %s7 8 9 There is fundamentally no way to make this work correctly, due to how DIP1027 throws away the information about the format string. With DIP1036e, `test(3)` prints: 7 8 9 And you can get rid of the runtime overhead by adding a `pragma(inline, true)` `writeln` overload. (I guess with DMD that will still bloat the executable, but I think other compiler backends and linkers can be made elide such symbols completely.)If that's the case, then 1036 wins imho, by simple thing of not doing any parsing of format string.Consider the overhead 1036 has by comparing it with plain writeln or writefln: ``` void test(int baz) { writeln(i"$(baz + 4)"); writeln(baz + 5); writefln("%d", baz + 6); } ``` ...
Jan 09 2024
On 1/9/2024 2:38 PM, Timon Gehr wrote:%s7 8 9Yes, I used writeln instead of writefln. The similarity between the two names is a source of error, but if that was a festering problem we'd have seen a lot of complaints about it by now.And you can get rid of the runtime overhead by adding a `pragma(inline, true)` `writeln` overload. (I guess with DMD that will still bloat the executable,Try it and see. I didn't mention the other kind of bloat - the rather massive number and size of template names being generated that go into the object file, as well as all the uncalled functions generated only to be removed by the linker. As far as I can tell, the only advantage of DIP1036 is the use of inserted templates to "key" the tuples to specific functions. Isn't that what the type system is supposed to do? Maybe the real issue is that a format string should be a different type than a conventional string. For example: ```d extern (C) pragma(printf) int printf(const(char*), ...); enum Format : string; void foo(Format f) { printf("Format %s\n", f.ptr); } void foo(string s) { printf("string %s\n", s.ptr); } void main() { Format f = cast(Format)"f"; foo(f); string s = "s"; foo(s); } ``` which prints: Format f string s If we comment out `foo(string s)`: test2.d(14): Error: function `test2.foo(Format f)` is not callable using argument types `(string)` test2.d(14): cannot pass argument `s` of type `string` to parameter `Format f` If we comment out `foo(Format s)`: string f string s This means that if execi()'s first parameter is of type `Format`, and the istring generates the format string with type `Format`, this key will fit the lock. A string generated by other means, such as `.text`, will not fit that lock.
Jan 10 2024
On Wednesday, 10 January 2024 at 19:53:48 UTC, Walter Bright wrote:I may have found a solution. I'm interested in your thoughts on it.It looks very similar to what I presented in my later posts ([this](https://forum.dlang.org/post/qiyrmzwnoguzxxllgzcz forum.dlang.org) and one following). It’s inspiring: we are probably getting closer to common understanding of things.As far as I can tell, the only advantage of DIP1036 is the use of inserted templates to "key" the tuples to specific functions. Isn't that what the type system is supposed to do? Maybe the real issue is that a format string should be a different type than a conventional string.Exactly. Let me try to explain why DIP1036 is doing what it is doing. For illustrative purposes, I’ll be drastically simplifying code; please excuse me for that. Let there be `foo`, a function that would like to receive an istring. Inside it, we would like to transform its argument list at compile time into a new argument list. So what we essentially want is to pass an istring to a template parameter so that it is available to `foo` at compile time: ```d int x; foo!(cast(Format)"prefix ", 2 * x); // foo!(alias Format, alias int)() ``` Unfortunately, this does not work because `2 * x` cannot be passed to an `alias` parameter. _This is the root of the problem._ The only way to do that is to pass them to runtime parameters: ```d int x; foo(cast(Format)"prefix ", 2 * x); // foo!(Format, int)(Format, int) ``` However, now `foo` cannot access the format string at compile time—its type is simply `Format`, and its value becomes known only at runtime. So we encode the value into the type: ```d int x; foo(Format!"prefix "(), 2 * x); // foo!(Format!"prefix ", int)(Format!"prefix ", int) ``` This is more or less what DIP1036 is doing at the moment. Hope it became clear now. I’d say DIP1036, as we see it now, relies on a clever workaround of a limitation imposed by the language. If that limitation is gone, the DIP will become simpler.
Jan 10 2024
On 1/10/2024 5:53 PM, Nickolay Bukreyev wrote:Exactly. Let me try to explain why DIP1036 is doing what it is doing. For illustrative purposes, I’ll be drastically simplifying code; please excuse me for that.Thank you for the explanation. It was entirely missing from the spec, and I overlooked it in the code. (This is why reverse engineering a spec from code is not so easy.) It is indeed clever. As for it being a required feature of string interpolation to do this processing at compile time, that's a nice feature, not a must have. The enum proposal is to obviate the requirement for a header and footer template, which is a big improvement.
Jan 10 2024
On Thursday, 11 January 2024 at 02:21:17 UTC, Walter Bright wrote:As for it being a required feature of string interpolation to do this processing at compile time, that's a nice feature, not a must have.Importance of the ability to do processing at compile time was stated by: * Alexandru ([here](https://forum.dlang.org/post/yxrqncmaiyfmhxnvzgil forum.dlang.org) and [here](https://forum.dlang.org/post/yqwxvjnvqaahhshrfohy forum.dlang.org)), * Timon ([here](https://forum.dlang.org/post/unjfb9$1ku5$1 digitalmars.com)), * Paolo ([here](https://forum.dlang.org/post/rhpblxrebibhpnfxfihv forum.dlang.org) and [here](https://forum.dlang.org/post/ajeqtckcwawuvtusbvxb forum.dlang.org)), * Steven ([here](https://forum.dlang.org/post/ilituyhcqipsqktqmfor forum.dlang.org)).The enum proposal is to obviate the requirement for a header and footer template, which is a big improvement.Header and footer are not templates; `InterpolatedLiteral` and `InterpolatedExpression` are. Yes, the latter two can be replaced by enums iff it becomes possible to pass arbitrary expressions to alias parameters. And I agree it would be a big improvement.Structs with no fields have a size of 1 byte for D and C++ structs, and 0 or 4 for C structs (depending on the target).Yes, I mistakenly wrote, _zero-sized_, when I meant, _empty_.
Jan 10 2024
On 1/11/24 03:21, Walter Bright wrote:As for it being a required feature of string interpolation to do this processing at compile time, that's a nice feature, not a must have.As far as I am concerned it is a must-have. For example, this is what prevents the SQL injection attack, it's a safety guarantee.
Jan 11 2024
On 1/11/2024 11:50 AM, Timon Gehr wrote:On 1/11/24 03:21, Walter Bright wrote:Why does compile time make it a guarantee and runtime not? We do array bounds checking at runtime.As for it being a required feature of string interpolation to do this processing at compile time, that's a nice feature, not a must have.As far as I am concerned it is a must-have. For example, this is what prevents the SQL injection attack, it's a safety guarantee.
Jan 11 2024
On 12/01/2024 6:28 PM, Walter Bright wrote:On 1/11/2024 11:50 AM, Timon Gehr wrote:Where possible we absolutely should not be. Making things crash at runtime, because the compiler did not apply the knowledge it has is just ridiculous. Imagine going to ``http://google.com/itsacrash`` and crashing Google. Or pressing a button too fast on an airplane and suddenly the fuel pumps turn off and then refuse to turn back on. Instead of the compiler catching clearly bad logic that it has a full understanding of, you're disrupting service and making people lose money. This is not a good thing.On 1/11/24 03:21, Walter Bright wrote:Why does compile time make it a guarantee and runtime not? We do array bounds checking at runtime.As for it being a required feature of string interpolation to do this processing at compile time, that's a nice feature, not a must have.As far as I am concerned it is a must-have. For example, this is what prevents the SQL injection attack, it's a safety guarantee.
Jan 11 2024
On 1/11/2024 9:36 PM, Richard (Rikki) Andrew Cattermole wrote:Making things crash at runtime, because the compiler did not apply the knowledge it has is just ridiculous. Imagine going to ``http://google.com/itsacrash`` and crashing Google. Or pressing a button too fast on an airplane and suddenly the fuel pumps turn off and then refuse to turn back on. Instead of the compiler catching clearly bad logic that it has a full understanding of, you're disrupting service and making people lose money. This is not a good thing.I agree that compile time checking is preferable. But there is a cost involved, as I explained more fully in another post. It isn't free. Since the format string is a compile time creature, not a user input feature, if the fault only happened when the code is deployed, it means the code was *never* executed before it was shipped. This is an inexcusable failure for any avionics system, or any critical system, since we have simple tools that check coverage. BTW, professional code is full of assert()s. Asserts check for faults in the code logic that are not the result of user input, but are the result of programming errors. We leave them as asserts because nobody knows how to get compilers to detect them, or is too costly to detect them. In other words, this is not an absolute thing. It's a weighing of cost and benefit.
Jan 11 2024
On 12/01/2024 8:00 PM, Walter Bright wrote:On 1/11/2024 9:36 PM, Richard (Rikki) Andrew Cattermole wrote:So I guess the question is, do you want to hear from a company that they lost X amount of business because they used a language feature that could have caught errors at compile time, but instead continually crashed in a live environment? I do not. That would be a total embarrassment. I have an identical problem currently with `` mustuse``. It errors out at runtime if you do not check to see if it has an error, if you try to get access to the value. It is hell. I could never recommend such an error prone design. I am only putting up with it until the language is capable of something better. https://issues.dlang.org/show_bug.cgi?id=23998Making things crash at runtime, because the compiler did not apply the knowledge it has is just ridiculous. Imagine going to ``http://google.com/itsacrash`` and crashing Google. Or pressing a button too fast on an airplane and suddenly the fuel pumps turn off and then refuse to turn back on. Instead of the compiler catching clearly bad logic that it has a full understanding of, you're disrupting service and making people lose money. This is not a good thing.I agree that compile time checking is preferable. But there is a cost involved, as I explained more fully in another post. It isn't free. Since the format string is a compile time creature, not a user input feature, if the fault only happened when the code is deployed, it means the code was *never* executed before it was shipped. This is an inexcusable failure for any avionics system, or any critical system, since we have simple tools that check coverage. BTW, professional code is full of assert()s. Asserts check for faults in the code logic that are not the result of user input, but are the result of programming errors. We leave them as asserts because nobody knows how to get compilers to detect them, or is too costly to detect them. In other words, this is not an absolute thing. It's a weighing of cost and benefit.
Jan 11 2024
Let's try something different. Would you like me to write a small specification for an alternative method for passing metadata from the call site into the body that would allow a string interpolation feature to not use extra templates while still being compile time based? I described this to Adam Wilson yesterday: ```d func( metadata("hi!") 2); void func(T)(T arg) { enum MetaData = __traits(getAttributes, arg); pragma(msg, MetaData); } ``` This is essentially what 1036e is attempting to do, but it does it with extra templates.
Jan 11 2024
On Friday, 12 January 2024 at 07:31:49 UTC, Richard (Rikki) Andrew Cattermole wrote:Let's try something different.```d func( metadata("hi!") 2); void func(T)(T arg) { enum MetaData = __traits(getAttributes, arg); pragma(msg, MetaData); } ```I think D language can create an `attribute dictionary` for any building block In this way, the `attribute soup` can be simplified. It would be even better to simplify the method of `getting and setting` attributes. It can be used to facilitate the extraction of `metadata`
Jan 11 2024
On 1/12/24 06:28, Walter Bright wrote:On 1/11/2024 11:50 AM, Timon Gehr wrote:Because a SQL injection attack by definition is when a third party can control safety-critical parts of your SQL query at runtime. The very fact that the whole prepared SQL query is known at compile-time, with runtime data only entering through the placeholders, conclusively rules this out. If the SQL query is constructed at runtime based on runtime data, `execi` is unable to check whether an SQL injection vulnerability is present.On 1/11/24 03:21, Walter Bright wrote:Why does compile time make it a guarantee and runtime not? ...As for it being a required feature of string interpolation to do this processing at compile time, that's a nice feature, not a must have.As far as I am concerned it is a must-have. For example, this is what prevents the SQL injection attack, it's a safety guarantee.We do array bounds checking at runtime.You can check array bounds at runtime. You cannot check where a runtime-known string came from at runtime. It's simply not possible.
Jan 12 2024
On 11/01/2024 2:53 PM, Nickolay Bukreyev wrote:I’d say DIP1036, as we see it now, relies on a clever workaround of a limitation imposed by the language. If that limitation is gone, the DIP will become simpler.Another potential solution would be to allow passing metadata on the function call side, to the function. Consider: ``i"prefix${expr:format}suffix"`` Could be: ```d func("prefix", format("format") expr, "suffix"); void func(T...)(T args) { pragma(msg, __traits(getAttributes, args[1])); // format("format") } ``` This is so much simpler than what 1036e is. But it does require another language feature.
Jan 10 2024
On Thursday, 11 January 2024 at 02:35:00 UTC, Richard (Rikki) Andrew Cattermole wrote:```d void func(T...)(T args) { pragma(msg, __traits(getAttributes, args[1])); // format("format") } ```Sorry, I don’t understand how this can possibly work. After `func` template is instantiated, its `T` is bound to, e.g., `AliasSeq!(string, int, string)`. `args` is just a local variable of type `AliasSeq!(string, int, string)`. How can `__traits` know what attributes were attached at call site? If, on the other hand, attributes do affect the type, then IMHO ```d func("prefix", format("format") expr, "suffix"); ``` is not much different than ```d func("prefix", format!"format"(expr), "suffix"); ``` I.e., we can do it already.
Jan 10 2024
On 11/01/2024 5:31 PM, Nickolay Bukreyev wrote:On Thursday, 11 January 2024 at 02:35:00 UTC, Richard (Rikki) Andrew Cattermole wrote:This has side effects. It affects ``ref`` and ``out``. It also affects lifetime analysis. So we can't do it currently. But yes, it affects the type, without being in the type system explicitly as it is meta data.```d void func(T...)(T args) { pragma(msg, __traits(getAttributes, args[1])); // format("format") } ```Sorry, I don’t understand how this can possibly work. After `func` template is instantiated, its `T` is bound to, e.g., `AliasSeq!(string, int, string)`. `args` is just a local variable of type `AliasSeq!(string, int, string)`. How can `__traits` know what attributes were attached at call site? If, on the other hand, attributes do affect the type, then IMHO ```d func("prefix", format("format") expr, "suffix"); ``` is not much different than ```d func("prefix", format!"format"(expr), "suffix"); ``` I.e., we can do it already.
Jan 10 2024
On Thursday, 11 January 2024 at 04:34:33 UTC, Richard (Rikki) Andrew Cattermole wrote:This has side effects. It affects ``ref`` and ``out``. It also affects lifetime analysis. So we can't do it currently. But yes, it affects the type, without being in the type system explicitly as it is meta data.Thank you for the clarification. I see a downside that pretty much any generic code should strip the annotations off its arguments after it inspected them, to reduce template bloating. However, we are probably going off-topic.
Jan 10 2024
On Wednesday, 10 January 2024 at 19:53:48 UTC, Walter Bright wrote:Yes, DIP1036e has a lot of extra templates generated, and the mangled name is going to be large. Let's skip for a moment the template that writeln will generate (which I agree isn't ideal, but also is somewhat par for the course). This shouldn't be a huge problem for the interpolation *types* because the type doesn't get included in the binary. It is a big problem for the `toString` function, because that *is* included. However, we can mitigate the ones that return `null`: ```d string __interpNull() => null; struct InterpolatedExpression(string expr) { alias toString = __interpNull; } ... // and so on ``` I tested this and it does work. So this reduces all the `toString` member functions from `InterpolatedExpression` (and `InterpolationPrologue` and `InterpolationEpilog`, but those are not templated structs anyway) to one function in the binary. But we can't do this for `InterpolatedLiteral` (which by the way is improperly described in Atila's DIP, the associated `toString` member function should return the literal). We can do possibly a couple things here to mitigate: 1. We can modify how `std.format` works so it will accept the following as a `toString` hook: ```d struct S { enum toString = "I am an S"; } ``` This means, no function calls, no extra long symobls in the binary (since it's an enum, it should not go in), and I think even the compilation will be faster. 2. We modify it to be aware of `InterpolationLiteral` types, and avoid depending on the `toString` API. After all, we own both Phobos and druntime, we can coordinate the release. And as a further suggestion, though this is kind of off-topic, we may look into ways to have templates that *don't* make it into the binary explicitly. Basically, they are marked as shims or forwarders by the library author, and just serve as a way to write nicer syntax. This could help in more than just the interpolation DIP.And you can get rid of the runtime overhead by adding a `pragma(inline, true)` `writeln` overload. (I guess with DMD that will still bloat the executable,I didn't mention the other kind of bloat - the rather massive number and size of template names being generated that go into the object file, as well as all the uncalled functions generated only to be removed by the linker.As far as I can tell, the only advantage of DIP1036 is the use of inserted templates to "key" the tuples to specific functions. Isn't that what the type system is supposed to do? Maybe the real issue is that a format string should be a different type than a conventional string.No. While I agree that having a different *type* makes it more useful and easier to hook, there is a fundamental problem being solved with the compile-time literals being passed to the function. Namely, tremendous power is available to validate, parse, prepare, etc. string data at compile time, for use during runtime. This simply *is not possible* with 1027. The runtime benefits are huge: * No need to allocate anything (` nogc`, `-betterC`, etc. all available) * You get compiler errors instead of runtime errors (if you put in the work) * It's possible generate "perfect forwarding" to another function that does use another form. For example, `printf`. * If you inline the call, it can be as if you called the forwarded function directly with the exactly correct parameters. And I want to continue to point out, that a constructed "format string" mechanism just is inferior, regardless if it is another type, as long as you don't need formatting specifiers (and arguably, it's just a difference in taste otherwise). The compiler parsed it out, it knows the separate pieces. Giving those pieces directly to the library is both the most efficient way, and also the most obvious way. The "format string" mechanism, while making sense for writef, *must* add an element of complexity to the receiving function, since it now has to know what "language" the translated string is. e.g. with DIP1027, one must know that `%s` is special and what it represents, and the user must know to escape `%s` to avoid miscommunication. With 1036e, there is no format string, so there is no complication there, or confusion. The value being passed is right where you would expect it, and you don't have to parse a separate thing to know. Note in YAIDIP, this was done partly through an interpolation header, which had all the compile-time information, and then strings and interpolated data were interspersed. I find this also a workable solution, and could even do without the strings being passed interspersed (as I said, we have control over `writeln` and `text`), but I think the ordering of the tuple to match what the actual string literal looks like is so intuitive, and we would be losing that if we did some kind of "format header" mechanism. -Steve
Jan 10 2024
On 1/10/24 20:53, Walter Bright wrote:On 1/9/2024 2:38 PM, Timon Gehr wrote: > %s7 8 9 Yes, I used writeln instead of writefln. The similarity between the two names is a source of error, but if that was a festering problem we'd have seen a lot of complaints about it by now. ...My point was with DIP1036e it either works or does not compile, not that you called the wrong function.I understand the drawbacks of DIP1036e which it shares with most non-trivial metaprogramming. D underdelivers in this department at the moment, but this still remains one of the key selling points of D. The issue is that DIP1027 is worse than DIP1036e. DIP1027 is also worse than nothing. It has been rejected for good reason. For some reason you however keep insisting it is essentially as useful as DIP1036e. That's just not the case. I think a much better answer to DIP1036e than a DIP1027 revival would have been to add a -preview=experimental-DIP1036e flag and do a call to action to resolve language issues and limitations that force DIP1036e to generate bloat. Maybe there would have been an even better way to handle this.And you can get rid of the runtime overhead by adding a `pragma(inline, true)` `writeln` overload. (I guess with DMD that will still bloat the executable,Try it and see. I didn't mention the other kind of bloat - the rather massive number and size of template names being generated that go into the object file, as well as all the uncalled functions generated only to be removed by the linker. ...As far as I can tell, the only advantage of DIP1036 is the use of inserted templates to "key" the tuples to specific functions.Well, this is not the case, that is not the only advantage.Isn't that what the type system is supposed to do? Maybe the real issue is that a format string should be a different type than a conventional string. For example: ```d extern (C) pragma(printf) int printf(const(char*), ...); enum Format : string; void foo(Format f) { printf("Format %s\n", f.ptr); } void foo(string s) { printf("string %s\n", s.ptr); } void main() { Format f = cast(Format)"f"; foo(f); string s = "s"; foo(s); } ``` which prints: Format f string s If we comment out `foo(string s)`: test2.d(14): Error: function `test2.foo(Format f)` is not callable using argument types `(string)` test2.d(14): cannot pass argument `s` of type `string` to parameter `Format f` If we comment out `foo(Format s)`: string f string s This means that if execi()'s first parameter is of type `Format`, and the istring generates the format string with type `Format`, this key will fit the lock. A string generated by other means, such as `.text`, will not fit that lock.Well, this is a step in the right direction, but rest assured if this was the only advantage of DIP1036e, then Adam would have gone with this suggestion. I am almost sure this is one of the ideas he discarded.
Jan 11 2024
On 1/11/2024 11:45 AM, Timon Gehr wrote:My point was with DIP1036e it either works or does not compile, not that you called the wrong function.What's missing is why is a runtime check not good enough? The D compiler emits more than one safety check at runtime. For example, array bounds checking, and switch statement default checks.
Jan 11 2024
On 1/12/24 06:33, Walter Bright wrote:On 1/11/2024 11:45 AM, Timon Gehr wrote:There is no runtime check, it just does the wrong thing.My point was with DIP1036e it either works or does not compile, not that you called the wrong function.What's missing is why is a runtime check not good enough?The D compiler emits more than one safety check at runtime. For example, array bounds checking, and switch statement default checks.Sure.
Jan 12 2024
On Tuesday, 9 January 2024 at 19:05:40 UTC, Walter Bright wrote:With the istring, there are 4 calls to struct member functions that just return null.Yeah, and writeln could avoid those if it's that important. A good optimizer will remove that call.This can't be good for performance or program size.Then use writeln the way you want? I don't see it as significant at all.We can compute the number of arguments passed to the function: ``` istring: 1 + 3 * <number of arguments> + 1 + 1 (*) writeln: <number of arguments> writefln: 1 + <number of arguments> ``` (*) includes string literals before, between, and after argumentsI find it bizarre to be concerned about the call performance of zero-sized structs and empty strings to writeln or writef, like the function is some shining example of performance or efficient argument passing. If you do not have inlining or optimizations enabled, do you think the call tree of writefln is going to be compact? Not to mention it eventually just calls into C opaquely. Note that you can write a simple wrapper that can be inlined, which will mitigate all of this via compile-time transformations. If you like, I can write it up and you can try it out! -Steve
Jan 09 2024
On 1/9/2024 3:33 PM, Steven Schveighoffer wrote:I find it bizarre to be concerned about the call performance of zero-sized structs and empty strings to writeln or writef, like the function is some shining example of performance or efficient argument passing. If you do not have inlining or optimizations enabled, do you think the call tree of writefln is going to be compact? Not to mention it eventually just calls into C opaquely. Note that you can write a simple wrapper that can be inlined, which will mitigate all of this via compile-time transformations. If you like, I can write it up and you can try it out!I've been aware for a long time that writeln and writefln are very inefficient, and could use a re-engineering. A big part of the problem is the blizzard of templates resulting from using them. This issue doubles the number of templates. Even if they are optimized away, they sit in the object file. Anyhow, see my other reply to Timon. I may have found a solution. I'm interested in your thoughts on it.
Jan 10 2024
On Wednesday, 10 January 2024 at 20:19:46 UTC, Walter Bright wrote:On 1/9/2024 3:33 PM, Steven Schveighoffer wrote:Are you sure you really want to keep optimizing debug logging functionality? Come on. The only reason to keep using `printf` and `writeln` is for debug logging. If you're going to show your log function to a user, it is going to be completely different. They are super easy to disable by simply creating a wrapper. If you want to know what increases the compilation time on them, is `std.conv.to!float`. I have said this many times on forums already. I don't know about people's hobby, but caring about performance on logging is simply too much. Do me a favor: Press F12 to open your browser's console, then write at it: `for(let i = 0; i < 10000; i ++) console.log(i);` You'll notice how slot it is. And this is not JS problem. Logging is always slow, no matter how much you optimize. I personally find this a great loss of time that could be directed into a lot more useful tasks, such as: - Improving debugging symbols in DMD and for macOS - Improving importC until it actually works - Listen to rikki's complaint about how slow it is to import UTF Tables - Improving support for shared libraries on DMD (like not making it collect an interfaced object) - Solve the problem with `init` property of structs containing memory reference which can be easily be corrupted - Fix the problem when an abstract class implements an interface - Make a D compiler daemon - Help in the project of DMD as a library focused on helping WebFreak in code-d and serve-d - Implement DMD support for Apple Silicon - Revive newCTFE engine - Implement ctfe caching Those are the only thing I can take of my mind right now. Anyway, I'm not here to demand anything at all. I'm only giving examples on what could be done in fields I have no experience in how to make it better, but I know people out there can do it. But for me, it is just a pity to see such genius wasting time on improving a rather antiquated debug functionalityI find it bizarre to be concerned about the call performance of zero-sized structs and empty strings to writeln or writef, like the function is some shining example of performance or efficient argument passing. If you do not have inlining or optimizations enabled, do you think the call tree of writefln is going to be compact? Not to mention it eventually just calls into C opaquely. Note that you can write a simple wrapper that can be inlined, which will mitigate all of this via compile-time transformations. If you like, I can write it up and you can try it out!I've been aware for a long time that writeln and writefln are very inefficient, and could use a re-engineering. A big part of the problem is the blizzard of templates resulting from using them. This issue doubles the number of templates. Even if they are optimized away, they sit in the object file. Anyhow, see my other reply to Timon. I may have found a solution. I'm interested in your thoughts on it.
Jan 10 2024
On 1/10/2024 12:56 PM, Hipreme wrote:- Improving debugging symbols in DMD and for macOS - Improving importC until it actually works - Listen to rikki's complaint about how slow it is to import UTF Tables - Improving support for shared libraries on DMD (like not making it collect an interfaced object) - Solve the problem with `init` property of structs containing memory reference which can be easily be corrupted - Fix the problem when an abstract class implements an interface - Make a D compiler daemon - Help in the project of DMD as a library focused on helping WebFreak in code-d and serve-d - Implement DMD support for Apple Silicon - Revive newCTFE engine - Implement ctfe cachingI regularly work on many of those problems. For example, without looking it up, I think I've fixed maybe 20 ImportC issues in the last month. I've also done a number of recent PRs aimed at making D more tractable as a library. So has Razvan.
Jan 10 2024
On 1/10/24 22:21, Walter Bright wrote:On 1/10/2024 12:56 PM, Hipreme wrote:Thanks a lot for the incredible amount of work you have invested into D over the years!- Improving debugging symbols in DMD and for macOS - Improving importC until it actually works - Listen to rikki's complaint about how slow it is to import UTF Tables - Improving support for shared libraries on DMD (like not making it collect an interfaced object) - Solve the problem with `init` property of structs containing memory reference which can be easily be corrupted - Fix the problem when an abstract class implements an interface - Make a D compiler daemon - Help in the project of DMD as a library focused on helping WebFreak in code-d and serve-d - Implement DMD support for Apple Silicon - Revive newCTFE engine - Implement ctfe cachingI regularly work on many of those problems. For example, without looking it up, I think I've fixed maybe 20 ImportC issues in the last month. I've also done a number of recent PRs aimed at making D more tractable as a library. So has Razvan.
Jan 11 2024
On 1/11/2024 12:20 PM, Timon Gehr wrote:Thanks a lot for the incredible amount of work you have invested into D over the years!It is indeed my pleasure, especially the privilege of working with you guys!
Jan 11 2024
On Tuesday, 9 January 2024 at 08:29:08 UTC, Walter Bright wrote:It happens at runtime.No. This line is inside `enum string query = () { ... }();`. So CTFE-performance considerations do apply.I'm sorry to say, that looks like tty noise.That’s sad. In my opinion, it is at least as readable, plus I see a few objective advantages in it. We don’t have to agree on this though.It will rarely need to be escaped, but when one does need it, one needs it!Yes, but I see a benefit in reducing the number of characters that _have_ to be escaped in the first place. While `$` rarely appeared in examples we’ve been thinking of so far, if someone faces a need to create a string full of dollars, escaping them all will uglify the string.DIP1027 does not require ( ) if it's just an identifier. That makes for the shortest, simplest istring syntax. The ( ) usage will be relatively rare. The idea is the most common cases should require the least syntactical noise.Totally agree. Personally, I prefer omitting parentheses in interpolations when a language supports such syntax, but it’s a matter of taste.Also, the reason I picked the SQL example is because that is the one most cited as being needed and in showing the power of DIP1036 and because I was told that DIP1027 couldn't do it :-)DIP1027 is unable to do it _at compile time_. I cannot argue that compile-time string creation doesn’t give us much if we call an SQL engine afterwards. So we need another example where CTFE-ability is desired. Alexandru Ermicioi asked about logging; I agree it is nice to rule out format-string parsing from every `log` call.
Jan 09 2024
On Tuesday, 9 January 2024 at 09:25:28 UTC, Nickolay Bukreyev wrote:On Tuesday, 9 January 2024 at 08:29:08 UTC, Walter Bright wrote:Compile time string creation when dealing with SQL give you the ability to validate the string for correctness at compile time. Here an example of what we are doing internally: ``` pinver utumno fieldmanager % bin/yab build ldc_lab_mac_i64_dg 2024-01-09T10:48:07.889 [info] melkor.d:235:executeReadyLabel executing ldc_lab_mac_i64_dg: /Users/pinver/dlang/ldc-1.36.0/bin/ldc2 -preview=dip1000 -i -Isrc -mtriple=x86_64-apple-darwin --vcolumns -J/Users/pinver/Lembas --d-version=env_dev_ --d-version=listen_for_nx_ --d-version=disable_ssl --d-version=disable_fixations --d-version=disable_metrics --d-version=disable_aggregator --d-debug -g -of/Users/pinver/Projects/DeepGlance/fieldmanager/bin/lab_mac_i64_dg /Users/pinver/Projects/DeepGlance/fieldmanager/src/application.d src/sbx/raygui/c_raygui.c 2024-01-09T10:48:13.423 [error] melkor.d:247:executeReadyLabel build failed: src/ops/sql/semantics.d(489,31): Error: uncaught CTFE exception `object.Exception("42P01: relation \"snapshotsssss\" does not exist. SQL: select size_mm, size_px from snapshotsssss where snapshot_id = $1")` src/api3.d(41,9): thrown from here src/api3.d(51,43): called from here: `checkSql(Schema("public", ["aggregators":Table("aggregators", ["aggregated_till":Column("aggregated_till", Type.timestamp, true, false), "touchpoint_id":Column("touchpoint_id", Type.smallint, true, false)], [], [], ["pinver", "ipsos_analysis_operator", "i /Users/pinver/Projects/DeepGlance/fieldmanager/src/application.d(644,45): Error: template instance `api3.forgeSqlCheckerForSchema!(Schema("public", ["aggregators":Table("aggregators", ["aggregated_till":Column("aggregated_till", Type.timestamp, true, false), "touchpoint_id":Column("touchpoint_id", T ``` or ``` pinver utumno fieldmanager % bin/yab build ldc_lab_mac_i64_dg 2024-01-09T10:52:36.220 [info] melkor.d:235:executeReadyLabel executing ldc_lab_mac_i64_dg: /Users/pinver/dlang/ldc-1.36.0/bin/ldc2 -preview=dip1000 -i -Isrc -mtriple=x86_64-apple-darwin --vcolumns -J/Users/pinver/Lembas --d-version=env_dev_ --d-version=listen_for_nx_ --d-version=disable_ssl --d-version=disable_fixations --d-version=disable_metrics --d-version=disable_aggregator --d-debug -g -of/Users/pinver/Projects/DeepGlance/fieldmanager/bin/lab_mac_i64_dg /Users/pinver/Projects/DeepGlance/fieldmanager/src/application.d src/sbx/raygui/c_raygui.c 2024-01-09T10:52:37.254 [error] melkor.d:247:executeReadyLabel build failed: src/ops/sql/semantics.d(504,19): Error: uncaught CTFE exception `object.Exception("XXXX! role \"dummyuser\" can't select on table \"snapshots\". SQL: select size_mm, size_px from snapshots where snapshot_id = $1")` src/api3.d(41,9): thrown from here src/api3.d(51,43): called from here: `checkSql(Schema("public", ["aggregators":Table("aggregators", ["aggregated_till":Column("aggregated_till", Type.timestamp, true, false), "touchpoint_id":Column("touchpoint_id", Type.smallint, true, false)], [], [], ["pinver", "ipsos_analysis_operator", "i /Users/pinver/Projects/DeepGlance/fieldmanager/src/application.d(644,45): Error: template instance `api3.forgeSqlCheckerForSchema!(Schema("public", ["aggregators":Table("aggregators", ["aggregated_till":Column("aggregated_till", Type.timestamp, true, false), "touchpoint_id":Column("touchpoint_id", T ``` CTFE support is a must IMHO /PIt happens at runtime.No. This line is inside `enum string query = () { ... }();`. So CTFE-performance considerations do apply.I'm sorry to say, that looks like tty noise.That’s sad. In my opinion, it is at least as readable, plus I see a few objective advantages in it. We don’t have to agree on this though.It will rarely need to be escaped, but when one does need it, one needs it!Yes, but I see a benefit in reducing the number of characters that _have_ to be escaped in the first place. While `$` rarely appeared in examples we’ve been thinking of so far, if someone faces a need to create a string full of dollars, escaping them all will uglify the string.DIP1027 does not require ( ) if it's just an identifier. That makes for the shortest, simplest istring syntax. The ( ) usage will be relatively rare. The idea is the most common cases should require the least syntactical noise.Totally agree. Personally, I prefer omitting parentheses in interpolations when a language supports such syntax, but it’s a matter of taste.Also, the reason I picked the SQL example is because that is the one most cited as being needed and in showing the power of DIP1036 and because I was told that DIP1027 couldn't do it :-)DIP1027 is unable to do it _at compile time_. I cannot argue that compile-time string creation doesn’t give us much if we call an SQL engine afterwards. So we need another example where CTFE-ability is desired. Alexandru Ermicioi asked about logging; I agree it is nice to rule out format-string parsing from every `log` call.
Jan 09 2024
On 1/9/24 10:59, Paolo Invernizzi wrote:CTFE support is a must IMHOYes. Besides the usability benefits you allude to, it is simply a security feature. We absolutely do not want the constructed string to depend on dynamically entered runtime data. Constructing it at compile time ensures that this is the case.
Jan 09 2024
On Tuesday, 9 January 2024 at 08:29:08 UTC, Walter Bright wrote:that looks like tty noise.Oh, I realized you might be reading this without a fancy Markdown renderer. Backticks are part of Markdown syntax, not D. I only suggested using i"a \(x) b" rather than i"a $(x) b"
Jan 09 2024
On 1/9/24 09:29, Walter Bright wrote:Also, the reason I picked the SQL example is because that is the one most cited as being needed and in showing the power of DIP1036 and because I was told that DIP1027 couldn't do it :-)And I stand by that.
Jan 09 2024
On 1/9/2024 4:40 AM, Timon Gehr wrote:On 1/9/24 09:29, Walter Bright wrote:But I showed that DIP1027 could do the SQL example.Also, the reason I picked the SQL example is because that is the one most cited as being needed and in showing the power of DIP1036 and because I was told that DIP1027 couldn't do it :-)And I stand by that.
Jan 09 2024
On 1/9/24 20:06, Walter Bright wrote:On 1/9/2024 4:40 AM, Timon Gehr wrote:You actually did not.On 1/9/24 09:29, Walter Bright wrote:But I showed that DIP1027 could do the SQL example.Also, the reason I picked the SQL example is because that is the one most cited as being needed and in showing the power of DIP1036 and because I was told that DIP1027 couldn't do it :-)And I stand by that.
Jan 09 2024
On 1/9/2024 1:24 PM, Timon Gehr wrote:You actually did not.See my other reply to you in this thread.
Jan 09 2024
On Tuesday, 9 January 2024 at 07:30:57 UTC, Nickolay Bukreyev wrote:I suppose a cleaner way would be to use `\(...)` syntax (like in Swift).Also, when I said, _like in Swift_, in no event was I meaning, _Swift has it, therefore, D should do the same_. I meant, _there is at least one other language that does it this way_.
Jan 09 2024
I’ve just realized DIP1036 has an excellent feature that is not evident right away. Look at the signature of `execi`: ```d auto execi(Args...)(Sqlite db, InterpolationHeader header, Args args, InterpolationFooter footer) { ... } ``` `InterpolationHeader`/`InterpolationFooter` _require_ you to pass an istring. Consider this example: ```d db.execi(i"INSERT INTO items VALUES ($(x))".text); ``` Here, we accidentally added `.text`. It would be an SQL injection… but the compiler rejects it! `typeof(i"...".text)` is `string`, and `execi` cannot be called with `(Sqlite, string)`.
Jan 09 2024
On 1/9/2024 12:04 AM, Nickolay Bukreyev wrote:I’ve just realized DIP1036 has an excellent feature that is not evident right away. Look at the signature of `execi`: ```d auto execi(Args...)(Sqlite db, InterpolationHeader header, Args args, InterpolationFooter footer) { ... } ``` `InterpolationHeader`/`InterpolationFooter` _require_ you to pass an istring. Consider this example: ```d db.execi(i"INSERT INTO items VALUES ($(x))".text); ``` Here, we accidentally added `.text`. It would be an SQL injection… but the compiler rejects it! `typeof(i"...".text)` is `string`, and `execi` cannot be called with `(Sqlite, string)`.The compiler will indeed reject it (The error message would be a bit baffling to those who don't know what Interpolation types are), along with any attempt to call execi() with a pre-constructed string. The end result is that to do manipulation with istring tuples, the programmer is alternately faced with adding Interpolation elements or filtering them out. Is that really what we want? Will that impede the use of tuples generally, or just impede the use of istrings? --- P.S. most keyboarding bugs result from neglecting to add needed syntax, not typing extra stuff. This is why: int* p; is initialized to zero, while: int* p = void; is left uninitialized. The user is unlikely to accidentally type "= void".
Jan 09 2024
On Monday, 8 January 2024 at 03:05:17 UTC, Walter Bright wrote:On 1/7/2024 6:30 PM, Walter Bright wrote:Yes! It would be brilliant if `alias` could refer to any Expression, not just symbols. If that was the case, we could just pass InterpolationHeader/Footer/etc. to template parameters (as opposed to runtime parameters, where they go now). ```d // Desired syntax: db.execi!i"INSERT INTO sample VALUES ($(id), $(2*x))"; // Desugars to: db.execi!( InterpolationHeader(), InterpolatedLiteral!"INSERT INTO sample VALUES ("(), InterpolatedExpression!"id"(), id, InterpolatedLiteral!", "(), InterpolatedExpression!"2*x"(), 2*x, // Currently illegal (`2*x` is not aliasable). InterpolatedLiteral!")"(), InterpolationFooter(), ); // `execi!(...)` would expand to: db.execImpl("INSERT INTO sample VALUES (?1, ?2)", id, 2*x); ``` With this approach, they are processed entirely via compile-time sequence manipulations. Zero-sized structs are never passed as arguments. Inlining is not necessary to get rid of them. An example with `writeln` (or just about any function alike): ```d writeln(interpolate!i"prefix $(baz + 4) suffix"); // Desugars to: writeln(interpolate!( InterpolationHeader(), InterpolatedLiteral!"prefix "(), InterpolatedExpression!"baz + 4"(), baz + 4, InterpolatedLiteral!" suffix"(), InterpolationFooter(), )); // `interpolate!(...)` would expand to: writeln("prefix ", baz + 4, " suffix"); ```On 1/7/2024 3:50 PM, Timon Gehr wrote:I wonder if what we're missing are functions that operate on tuples and return tuples. We almost have them in the form of: ``` template tuple(A ...) { alias tuple = A; } ``` but the compiler wants A to only consist of symbols, types and expressions that can be computed at compile time. This is so the name mangling will work. But what if we don't bother doing name mangling for this kind of template?This cannot work: ``` int x=readln.strip.split.to!int; db.execi(xxx!i"INSERT INTO sample VALUES ($(id), $(2*x))"); ```True, you got me there. It's the 2\*x that is not turnable into an alias. I'm going to think about this a bit.
Jan 10 2024
On Wednesday, 10 January 2024 at 15:07:42 UTC, Nickolay Bukreyev wrote:```d writeln(interpolate!i"prefix $(baz + 4) suffix"); // Desugars to: writeln(interpolate!( InterpolationHeader(), InterpolatedLiteral!"prefix "(), InterpolatedExpression!"baz + 4"(), baz + 4, InterpolatedLiteral!" suffix"(), InterpolationFooter(), )); // `interpolate!(...)` would expand to: writeln("prefix ", baz + 4, " suffix"); ```Well, `InterpolatedLiteral` and `InterpolatedExpression` don’t have to be templates anymore: ```d writeln(interpolate!i"prefix $(baz + 4) suffix"); // Desugars to: writeln(interpolate!( InterpolationHeader(), InterpolatedLiteral("prefix "), InterpolatedExpression("baz + 4"), baz + 4, InterpolatedLiteral(" suffix"), InterpolationFooter(), )); // `interpolate!(...)` would expand to: writeln("prefix ", baz + 4, " suffix"); ```
Jan 10 2024
On 1/10/2024 7:07 AM, Nickolay Bukreyev wrote:Zero-sized structs are never passed as arguments. Inlining is not necessary to get rid of them.Structs with no fields have a size of 1 byte for D and C++ structs, and 0 or 4 for C structs (depending on the target). The rationale for a non-zero size is so that different structs instances will be at different addresses. ```d struct S { } void foo(S s); void test(S s) { foo(s); } ``` ``` push RBP mov RBP,RSP sub RSP,8 push dword ptr 010h[RBP] call _D5test43fooFSQm1SZv PC32 add RSP,010h pop RBP ret ```
Jan 10 2024
On 1/10/24 16:07, Nickolay Bukreyev wrote:I am not a big fan of this option. If we are going to allow passing runtime arguments as template parameters, we might as well just allow passing template parameters as runtime arguments instead. It's much more clear how to make that work.Yes! It would be brilliant if `alias` could refer to any Expression, not just symbols. If that was the case, we could just pass InterpolationHeader/Footer/etc. to template parameters (as opposed to runtime parameters, where they go now).
Jan 11 2024
On 1/10/24 01:03, Walter Bright wrote:On 1/9/2024 12:04 AM, Nickolay Bukreyev wrote:What we want that DIP1036e mostly provides is: 0. The library can detect whether it is being passed an istring. 1. The library that accepts the istring decides how to process it. 2. The string parts of the istring are known to the library at compile time. 3. The expression parts of the istring can be evaluated only at runtime. 4. The expression parts of the istring can be passed arbitrarily, by ref, lazy, alias, ... (this part in fact works better with DIP1027). 5. The library can access the original expression, e.g. in string form. 6. A templated function that is called with an istring can do all of the above.I’ve just realized DIP1036 has an excellent feature that is not evident right away. Look at the signature of `execi`: ```d auto execi(Args...)(Sqlite db, InterpolationHeader header, Args args, InterpolationFooter footer) { ... } ``` `InterpolationHeader`/`InterpolationFooter` _require_ you to pass an istring. Consider this example: ```d db.execi(i"INSERT INTO items VALUES ($(x))".text); ``` Here, we accidentally added `.text`. It would be an SQL injection… but the compiler rejects it! `typeof(i"...".text)` is `string`, and `execi` cannot be called with `(Sqlite, string)`.The compiler will indeed reject it (The error message would be a bit baffling to those who don't know what Interpolation types are), along with any attempt to call execi() with a pre-constructed string. The end result is that to do manipulation with istring tuples, the programmer is alternately faced with adding Interpolation elements or filtering them out. Is that really what we want?Will that impede the use of tuples generally, or just impede the use of istrings? ...It's just a way to achieve 0.-6. above relatively well with a simple patch to the lexer. I am not sure why it would impede anything except compile time and binary size.--- P.S. most keyboarding bugs result from neglecting to add needed syntax, not typing extra stuff. This is why: int* p; is initialized to zero, while: int* p = void; is left uninitialized. The user is unlikely to accidentally type "= void".The user (especially the kind of user that may be prone to accidentally introduce an SQL injection attack) is more likely to accidentally type `.format` or `.text` because that may be a relatively common way to use an istring in their code base.
Jan 11 2024
On 1/11/24 21:13, Timon Gehr wrote:P.S. most keyboarding bugs result from neglecting to add needed syntax, not typing extra stuff.if (condition); { ... } I think it's due to muscle memory and it does happen quite a bit.
Jan 11 2024
On 1/9/24 00:06, Walter Bright wrote:Here's how SQL support is done for DIP1036: https://github.com/adamdruppe/interpolation-examples/blob/master/lib/sql.d ``` auto execi(Args...)(Sqlite db, InterpolationHeader header, Args args, InterpolationFooter footer) { import arsd.sqlite; // sqlite lets you do ?1, ?2, etc enum string query = () { string sql; int number; import std.conv; foreach(idx, arg; Args) static if(is(arg == InterpolatedLiteral!str, string str)) sql ~= str; else static if(is(arg == InterpolationHeader) || is(arg == InterpolationFooter)) throw new Exception("Nested interpolation not supported"); else static if(is(arg == InterpolatedExpression!code, string code)) { } // just skip it else sql ~= "?" ~ to!string(++number); return sql; }(); auto statement = Statement(db, query); int number; foreach(arg; args) { static if(!isInterpolatedMetadata!(typeof(arg))) statement.bind(++number, arg); } return statement.execute(); } ``` This: 1. The istring, after converted to a tuple of arguments, is passed to the `execi` template. 2. It loops over the arguments, essentially turing it (ironically!) back into a formatThis is not ironic at all. The point is it _can_ do that, while DIP1027 _cannot_ do _either this or the opposite direction_. It is yourself who called the istring the building block instead of the end product, but now you are indeed failing to turn the sausage back into the cow.string. The formats, instead of %s, are ?1, ?2, ?3, etc. 3. It skips all the Interpolation arguments inserted by DIP1036. 4. The remaining argument are each bound to the indices 1, 2, 3, ... 5. Then it executes the sql statement. Note that nested istrings are not supported. ...But you get a useful error message that exactly pinpoints what the problem is. Also, they could be supported, which is the point.Let's see how this can work with DIP1027: ``` auto execi(Args...)(Sqlite db, Args args) { import arsd.sqlite; // sqlite lets you do ?1, ?2, etc enum string query = () { string sql; int number; import std.conv; auto fmt = arg[0]; for (size_t i = 0; i < fmt.length, ++i) { char c = fmt[i]; if (c == '%' && i + 1 < fmt.length && fmt[i + 1] == 's') { sql ~= "?" ~ to!string(++number); ++i; } else if (c == '%' && i + 1 < fmt.length && fmt[i + 1] == '%') ++i; // skip escaped % else sql ~= c; } return sql; }(); auto statement = Statement(db, query); int number; foreach(arg; args[1 .. args.length]) { statement.bind(++number, arg); } return statement.execute(); } ``` This: ...This does not work.1. The istring, after converted to a tuple of arguments, is passed to the `execi` template. 2. The first tuple element is the format string. 3. A replacement format string is created by replacing all instances of "%s" with "?n", where `n` is the index of the corresponding arg. 4. The replacement format string is bound to `statement`, and the arguments are bound to their indices. 5. Then it executes the sql statement. It is equivalent.No. As Nickolay already explained, it is not equivalent. - It does not even compile, even if we fix the typo arg -> args. That is enough to dismiss DIP1027 for this example. However, let's for the sake of argument assume that, miraculously, `execi` can read the format string at compile time, then: - With this signature, if you pass a manually-constructed string to it, it would just accept the SQL injection. - It does not give a proper error message for nested istrings. - It has to manually parse the format string. It iterates over each character of the original format string. - It (ironically!) constructs a new format string, the original one was useless. - If you pass a bad format string to it (for example, by specifying a manual format), it will just do nonsense, while DIP1036e avoids bad format strings by construction.
Jan 09 2024
On 1/9/2024 4:35 AM, Timon Gehr wrote:This does not work.How so? Consider this: ``` import std.stdio; auto execi(Args...)(Args args) { auto fmt = args[0].dup; fmt[0] = 'k'; writefln(fmt, args[1 .. args.length]); } void main() { string b = "betty"; execi(i"hello $b"); } ``` which compiles and runs, printing: kello betty
Jan 09 2024
On 1/9/24 20:16, Walter Bright wrote:On 1/9/2024 4:35 AM, Timon Gehr wrote:It does not compile. The arg->args fix I'll grant you as it is a typo whose only significance is to make it even more clear that you never tried to run any version of the code, but then you still get another compile error. I suggest you mock out the SQL library, you don't actually need to install it to try your code. If we remove the `enum` then your code still does not work correctly, for example because it does not prevent an SQL injection attack if the user constructs the SQL string manually by accidentally using `format`. I and other people already pointed out this flaw and other flaws in other posts.This does not work.How so?Consider this: ``` import std.stdio; auto execi(Args...)(Args args) { auto fmt = args[0].dup; fmt[0] = 'k'; writefln(fmt, args[1 .. args.length]); } void main() { string b = "betty"; execi(i"hello $b"); } ``` which compiles and runs, printing: kello bettyI considered it and it did not have an impact on the way I view the DIP1027 `execi` implementation you have given.
Jan 09 2024
On 1/9/2024 4:35 AM, Timon Gehr wrote:However, let's for the sake of argument assume that, miraculously, `execi` can read the format string at compile time, then:Adam's implementation of execi() also runs at run time, not compile time.- With this signature, if you pass a manually-constructed string to it, it would just accept the SQL injection.It was just a proof of concept piece of code. execi could check for format strings that contain ?n sequences. It could also check the number of %s formats against the number of arguments.But you get a useful error message that exactly pinpoints what the problem is. Also, they could be supported, which is the point. - It does not give a proper error message for nested istrings.execi could be extended to reject arguments that contain %s sequences. Or, if there was an embedded istring, the number of %s formats can be checked against the number of arguments. An embedded istring would show a mismatch. I expect that use of nested istrings would be exceedingly rare. If they are used, wrapping them in text() will make work. Besides, would a nested istring in an sql call be intended as part of the sql format, or would a text string be the intended result?- It has to manually parse the format string. It iterates over each character of the original format string.Correct. And it does not need to iterate over and remove all the Interpolation arguments. Nor does it need the extra two arguments, which aren't free of cost.- It (ironically!) constructs a new format string, the original one was useless.Yes, it converts the format specifiers to the sql ones. Why is this a problem?- If you pass a bad format string to it (for example, by specifying a manual format), it will just do nonsense, while DIP1036e avoids bad format strings by construction.What happens when ?3 is included in a DIP1036 istring? `i"string ?3 ($betty)" ? I didn't see any check for that. Of course, one could add such a check to the 1036 execi. printf format strings are checked by the compiler, and writef format strings are checked by writef. execi is also capable of being extended to check the format string to ensure the format matches the args.
Jan 09 2024
On Tuesday, 9 January 2024 at 20:01:34 UTC, Walter Bright wrote:With the istring, there are 4 calls to struct member functions that just return null. This can't be good for performance or program size.A valid point, thanks. Could you test if that fixes the issue? ```d import core.interpolation; import std.meta: AliasSeq, staticMap; import std.stdio; template filterOutEmpty(alias arg) { alias T = typeof(arg); static if (is(T == InterpolatedLiteral!s, string s)) static if (s.length) alias filterOutEmpty = s; else alias filterOutEmpty = AliasSeq!(); else static if ( is(T == InterpolationHeader) || is(T == InterpolatedExpression!code, string code) || is(T == InterpolationFooter) ) alias filterOutEmpty = AliasSeq!(); else alias filterOutEmpty = arg; } pragma(inline, true) // This pragma is necessary unless you compile with `-inline`. void log(Args...)(InterpolationHeader, Args args, InterpolationFooter) { writeln(staticMap!(filterOutEmpty, args)); } void main() { int baz = 3; log(i"$(baz + 4)"); writeln(baz + 5); } ```Adam's implementation of execi() also runs at run time, not compile time.We are probably talking about different things. Adam’s implementation constructs a format string at compile time thanks to `enum` storage class [in line 36](https://github.com/adamdruppe/interpolation-examples/blob/a8a5d4d4ee37ee9ae3942c4f4e8489011c3c 673/lib/sql.d#L36). Constructing it at compile time is essential so that we can validate the generated SQL and abort compilation, as Paolo [demonstrated](https://forum.dlang.org/post/qbtbyxcglwijjbeygtvi forum.dlang.org).execi could be extended to reject arguments that contain %s sequences.I disagree. Storing a string that contains `%s` in a database should be allowed (storing any string should obviously be allowed, regardless of its contents). But `execi` is unable to differentiate between a string that happens to contain `%s` and a nested format string: ``` // DIP1027 example(i"prefix1 $(i"prefix2 $(x) suffix2") suffix1"); // Gets rewritten as: example("prefix1 %s suffix1", "prefix2 %s suffix2", x); ``` I might be wrong, but it appears to me that DIP1027 is not able to deal with nested format strings, in a general case. DIP1036 has no such limitation (demonstrated in point 2 [here](https://forum.dlang.org/post/lizjwxdgsnmgykaoczyf forum.dlang.org)).Nor does it need the extra two arguments, which aren't free of cost.I explained [here](https://forum.dlang.org/post/qkvxnbqjefnvjyytfana forum.dlang.org) why these two arguments are valuable. Aren’t free of cost—correct unless you enable inlining. `execi` may require some changes (like `filterOutEmpty` I showed above) to make them free of cost, but it is doable.What happens when ?3 is included in a DIP1036 istring? `i"string ?3 ($betty)"` ? I didn't see any check for that. Of course, one could add such a check to the 1036 execi.You are right, it doesn’t. Timon’s point (expressed as “This does not work”) is that DIP1036 is able to do validation at compile time while DIP1027 is only able to do it at runtime, when this function actually gets invoked.
Jan 09 2024
P.S. Thank you for your well constructed arguments. On 1/9/2024 1:35 PM, Nickolay Bukreyev wrote:A valid point, thanks. Could you test if that fixes the issue?Yes, that works.We are probably talking about different things. Adam’s implementation constructs a format string at compile time thanks to `enum` storage class [in line 36](https://github.com/adamdruppe/interpolation-examples/blob/a8a5d4d4ee37ee9ae3942c4f4e8489011c3c4673/lib/sql.d#L36).Yes, you're right.Constructing it at compile time is essential so that we can validate the generated SQL and abort compilation, as Paolo [demonstrated](https://forum.dlang.org/post/qbtbyxcglwijjbeygtvi forum.dlang.org).That only checks one aspect of correctness - nested string interpolations.True, which is why a % that is not intended as a format specifier is entered as %%.execi could be extended to reject arguments that contain %s sequences.I disagree. Storing a string that contains `%s` in a database should be allowed (storing any string should obviously be allowed, regardless of its contents).But `execi` is unable to differentiate between a string that happens to contain `%s` and a nested format string: ``` // DIP1027 example(i"prefix1 $(i"prefix2 $(x) suffix2") suffix1"); // Gets rewritten as: example("prefix1 %s suffix1", "prefix2 %s suffix2", x); ``` I might be wrong, but it appears to me that DIP1027 is not able to deal with nested format strings, in a general case.The expansion for `example` has a mismatch in the number of formats (1) and number of arguments (2). This can be detected at runtime by `example`, as I've explained. A compile time way is DIP1027 can be modified to reject any arguments that consist of tuples with other than one element. This would eliminate nested istring tuples at compile time.DIP1036 has no such limitation (demonstrated in point 2 [here](https://forum.dlang.org/post/lizjwxdgsnmgykaoczyf forum.dlang.org)).DIP1036 cannot detect other problems with the string literals. It seems like a lot of complexity to deal with only one issue with malformed strings at compile time rather than runtime.I explained [here](https://forum.dlang.org/post/qkvxnbqjefnvjyytfana forum.dlang.org) why these two arguments are valuable. Aren’t free of cost—correct unless you enable inlining. `execi` may require some changes (like `filterOutEmpty` I showed above) to make them free of cost, but it is doable.You'd have to also make every formatted writer a template, and add the filter to them.You are right, it doesn’t. Timon’s point (expressed as “This does not work”) is that DIP1036 is able to do validation at compile time while DIP1027 is only able to do it at runtime, when this function actually gets invoked.The only validation it does is check for nested string interpolations.
Jan 09 2024
On Tuesday, 9 January 2024 at 23:21:34 UTC, Walter Bright wrote:P.S. Thank you for your well constructed arguments. On 1/9/2024 1:35 PM, Nickolay Bukreyev wrote:No. If you look at the errors raised during the compilation of our codebase, we are checking FAR MORE, for example the second error is related to wrong missing grant condition on a select. I've included it as an example, just not syntax, table names, semantic or so, but also permissions, at compile time. And that's a concrete codebase, used in production, not speculations. /PA valid point, thanks. Could you test if that fixes the issue?Yes, that works.We are probably talking about different things. Adam’s implementation constructs a format string at compile time thanks to `enum` storage class [in line 36](https://github.com/adamdruppe/interpolation-examples/blob/a8a5d4d4ee37ee9ae3942c4f4e8489011c3c4673/lib/sql.d#L36).Yes, you're right.Constructing it at compile time is essential so that we can validate the generated SQL and abort compilation, as Paolo [demonstrated](https://forum.dlang.org/post/qbtbyxcglwijjbeygtvi forum.dlang.org).That only checks one aspect of correctness - nested string interpolations.
Jan 09 2024
On Tuesday, 9 January 2024 at 23:21:34 UTC, Walter Bright wrote:<snip>Constructing it at compile time is essential so that we can validate the generated SQL and abort compilation, as Paolo [demonstrated](https://forum.dlang.org/post/qbtbyxcglwijjbeygtvi forum.dlang.org).That only checks one aspect of correctness - nested string interpolations.You are underestimating what can be gained as value in catching SQL problems at compile time instead of runtime. And, believe me, it's not a matter of mocking the DB and relying on unittest and coverage. CTFE capability is needed. /PDIP1036 has no such limitation (demonstrated in point 2 [here](https://forum.dlang.org/post/lizjwxdgsnmgykaoczyf forum.dlang.org)).DIP1036 cannot detect other problems with the string literals. It seems like a lot of complexity to deal with only one issue with malformed strings at compile time rather than runtime.
Jan 09 2024
On 1/9/2024 3:49 PM, Paolo Invernizzi wrote:You are underestimating what can be gained as value in catching SQL problems at compile time instead of runtime. And, believe me, it's not a matter of mocking the DB and relying on unittest and coverage.Please expand on that. This is a very important topic. I want to know all the relevant facts.CTFE capability is needed.I concur that compile time errors are better than runtime errors. But in this case, there's a continuing cost to have them, cost to other far more common use cases for istrings. The cost is in terms of complexity, about needing to filter out all the extra marker templates, about reducing its utility as a tuple generator with the unexpected extra elements, larger object files, much longer mangled names, and so on. Want to know the source of my unease about it? Simple things should be simple. This isn't. The extra complexity is always there, even for the simple cases, and the simple cases are far and away the most common use cases. Frankly, it reminds me of C++ template expressions, which caught the C++ world by storm for about 2 years, before it faded away into oblivion and nobody talks about them anymore. Fortunately for C++, template expressions could be ignored, as they were not a core language feature. But DIP1036 is a core language feature, a feature we would be stuck with forever. And I'll be the one who gets the heat for it. The compile-time vs runtime issue is the only thing left standing where the advantage goes to DIP1036. So it needs a very compelling case. P.S. You can do template expressions in D, too!
Jan 11 2024
On Friday, 12 January 2024 at 06:06:52 UTC, Walter Bright wrote:On 1/9/2024 3:49 PM, Paolo Invernizzi wrote:As a preamble, we are _currently_ doing all the SQL validations against schemas at compile time: semantic of the query, correctness of the relations involved, types matching with D (and Elm types), permission granted to roles that are performing the query. That's not a problem at all, it's just something like: sql!`select foo from bar where baz > 1` [1] In the same way we check also this: sql!`update foo set bag = ${d_variable_bag}` But to attach sanitise functionalities in what is inside `d_variable_bag`, checking its type, and actually bind the content for the sql protocol is done by mixins, after the sql!string instantiation. As you can guess, that is the most common usage, by far, the business logic is FULL of stuff like that. The security aspect is related to the fact that you _always_ need to sanitise the data content of the d variable, the mixin takes care of that part, and you can't skip it. Said that, unittesting at runtime can be done against a real db, or mocking it. A real db is onerous, sometime you need additional licenses, resource management, and it's time consuming. Just imagine writing D code, but having back errors not during compilations but only when the "autotester" CI task completed! Keep in mind that using a real db is a very common, for one simple reason: mocking a db to be point of being useful for unit testing is a PITA. The common approach is simply skipping that, and mock the _results_ of the data retrieved by the query, to unittest the business logic. The queries are not checked until they run agains the dev db. The compile time solutions instead, give you immediately feedback on wrong query, wrong type bindings, and that's invaluable especially regarding a fundamental things: refactory of code, or schema changes. If the DB schema is changed, the application simply does not compile anymore, until you align it again against the changed schema. And the compiler gently points you to the pieces of code you need to adjust, and the same if you change a D type that somewhere will be bond to a sql parameters. So you can refactor without fears, and if the application compiles, you are assured to have everything aligned. It's like extending the correctness of type system down to the db type system, and it's priceless. So, long story short: we will be forced to use mixin if we can't rely on CT interpolation, but having it will simplify the codebase. [1] well, query sometimes can be things like that: with dsx as (select face_id, bounding_box_px, gaze_yaw_deg, gaze_pitch_deg from dev_eyes where eye = ${sx}), ddx as (select face_id, bounding_box_px, gaze_yaw_deg, gaze_pitch_deg from dev_eyes where eye = ${dx}) select dfc.bounding_box_px as face, dfc.expression, dby.center_z_mm, dsx.bounding_box_px as eye_sx, dsx.gaze_pitch_deg, dsx.gaze_yaw_deg, ddx.bounding_box_px as eye_dx, ddx.gaze_pitch_deg, ddx.gaze_yaw_deg from dev_samples left join dev_bodies as dby using(sample_id) left join dev_faces as dfc using(body_id) left join dsx using(face_id) left join ddx using(face_id) where dev_samples.device_id = ${deviceId} and system_timestamp_ms = (select max(system_timestamp_ms) from dev_samples where dev_samples.device_id=${deviceId}) and dfc.bounding_box_px is not null` order by dby.center_z_mmYou are underestimating what can be gained as value in catching SQL problems at compile time instead of runtime. And, believe me, it's not a matter of mocking the DB and relying on unittest and coverage.Please expand on that. This is a very important topic. I want to know all the relevant facts.
Jan 12 2024
On 1/12/24 07:06, Walter Bright wrote:The compile-time vs runtime issue is the only thing left standing where the advantage goes to DIP1036.This is not true, DIP1027 also suffers from other drawbacks. For example: - DIP1027 has already been rejected. - Format string has to be passed as a runtime argument. - Format string has to be parsed. (Whether at runtime or compile time.) - Format string is not transparent to the library user, they have to manually escape '%'. - No simple way to detect the end of the part of the argument list that is part of the istring. - Cannot support nested istrings. (I guess the `enum Format: string;` would mitigate this to some extent.) DIP1027 has the following advantages: - No interspersed runtime arguments not carrying any runtime data, this is a bit easier to consume. - Fewer template instantiations. In any case, I think the compile-time vs runtime issue is the most significant. I do not want a solution that does not integrate well with metaprogramming, it's just not worth it.
Jan 12 2024
On 1/12/24 16:27, Timon Gehr wrote:- Cannot support nested istrings. (I guess the `enum Format: string;` would mitigate this to some extent.)- In any case, DIP1027 cannot support nested expression sequences without the user passing a manual marker. DIP1036e can support them quite naturally.
Jan 12 2024
On Friday, 12 January 2024 at 06:06:52 UTC, Walter Bright wrote:On 1/9/2024 3:49 PM, Paolo Invernizzi wrote:The point is to pass the things that the compiler knows to the library, namely the string literal parts. Within the current domain of the D language, the best way to do this is to use string template parameters. Necessarily, this is going to incur template symbol name explosion. I would love to solve this problem, especially in the cases where compile-time usage isn't needed. Having the compile-time expressions is essential when you need it, but is pretty ugly when you don't. Again, we can have wrapper templates that do this for you. The problem (as always) is that these wrapper templates are still in there, still taking up space. Is there any room for a solution here? I'm talking about the compiler being clued in that these functions shouldn't exist in the binary. Then the compiler can take a lot of shortcuts (like hashing the type data instead of making a demangleable symbol). But Timon is also right that the "format string" version is actually adding to the grief for library writers and users. There's no reason I can think of to add additional parsing requirements for the library. I'd prefer Jonathan Marler's solution of just interspersing strings and values if I had to pick between that and DIP1027. But that still leaves so much on the table of what *could be great*. I also think it's fine to tell users 'Hey, you want formatted output? it's writef("format", args)'. My target was not and never will be, `writef`.CTFE capability is needed.I concur that compile time errors are better than runtime errors. But in this case, there's a continuing cost to have them, cost to other far more common use cases for istrings. The cost is in terms of complexity, about needing to filter out all the extra marker templates, about reducing its utility as a tuple generator with the unexpected extra elements, larger object files, much longer mangled names, and so on.Want to know the source of my unease about it? Simple things should be simple. This isn't. The extra complexity is always there, even for the simple cases, and the simple cases are far and away the most common use cases.It actually is simple. It's a simple transformation from a parsed expression to the subexpressions contained within (sprinkling in types to make it easy to know what is what). What you *do* with the transformation might not be simple, but that's not necessary to use the feature.Frankly, it reminds me of C++ template expressions, which caught the C++ world by storm for about 2 years, before it faded away into oblivion and nobody talks about them anymore. Fortunately for C++, template expressions could be ignored, as they were not a core language feature. But DIP1036 is a core language feature, a feature we would be stuck with forever. And I'll be the one who gets the heat for it.I just looked it up and... no. It's not even close. There is no *requirement* to make this complicated. The transformation is simple and straightforward. It's easy to understand if you take 5 minutes to read the docs. If you want to build some insanely complex thing out of this, it's possible. But there is no requirement to use it that way. To reiterate, the *feature* is simple, what you can do with the feature is unbounded. This is like saying templates are too complicated because of what you *can do* with templates.P.S. You can do template expressions in D, too!I rest my case ;) -Steve
Jan 12 2024
On Tuesday, 9 January 2024 at 23:21:34 UTC, Walter Bright wrote:A compile time way is DIP1027 can be modified to reject any arguments that consist of tuples with other than one element. This would eliminate nested istring tuples at compile time.To sum up, it works with nested istrings poorly; it may even be sensible to forbid them entirely for DIP1027. Glad we’ve reached a consensus on this point. This case doesn’t seem crucial at the moment though; now we can focus on more relevant questions.DIP1036 cannot detect other problems with the string literals. It seems like a lot of complexity to deal with only one issue with malformed strings at compile time rather than runtime.DIP1036 provides full CTFE capabilities at your disposal. You can validate _anything_ about a format string; any compile-time-executable hypothetical `validateSql(query)` will fit. I guess none of the examples presented so far featured such validation because it usually tends to be long and not illustrative. However, another Adam’s example [does perform](https://github.com/adamdruppe/interpolation-examples/blob/a8a5d4d4ee37ee9ae3942c4f4e8489011c3 4673/07-html.d#L13) non-trivial compile-time validation. Here is how it is [implemented](https://github.com/adamdruppe/interpolation-examples/blob/a8a5d4d4ee37ee9ae3942c4f4e8489011c3c4673/lib/html.d#L97).They check a lot more. I agree it is hard to spot the error messages in the linked post so I’ll copy them here: relation "snapshotsssss" does not exist. SQL: select size_mm, size_px from snapshotsssss where snapshot_id = $1 role "dummyuser" can't select on table "snapshots". SQL: select size_mm, size_px from snapshots where snapshot_id = $1 As you can see, they check sophisticated business logic expressed in terms of relational databases. And all of that happens at compile time. Isn’t that a miracle?Constructing it at compile time is essential so that we can validate the generated SQL and abort compilation, as Paolo [demonstrated](https://forum.dlang.org/post/qbtbyxcglwijjbeygtvi forum.dlang.org).That only checks one aspect of correctness - nested string interpolations.Err… every formatted writer has to be a template anyway, doesn’t it? It needs to accept argument lists that may contain values of arbitrary types.I explained [here](https://forum.dlang.org/post/qkvxnbqjefnvjyytfana forum.dlang.org) why these two arguments are valuable. Aren’t free of cost—correct unless you enable inlining. `execi` may require some changes (like `filterOutEmpty` I showed above) to make them free of cost, but it is doable.You'd have to also make every formatted writer a template,…and add the filter to them.Yeah. I admit this is a problem. As a rule of thumb, the most obvious code should yield the best results. With DIP1036, this is not the case at the moment: when you pass an interpolation sequence to a function not specifically designed for it, it wastes more stack space than necessary and passes useless junk in registers. Others have mentioned that DIP1027 performs much worse in terms of speed (due to runtime parsing). While that is undoubtable, I think DIP1036 should be tweaked to behave as good as possible. There was an idea in this thread to improve the ABI so that it ignores empty structs, but I’m rather sceptical about it. Instead, let us note there are basically two patterns of usage for istrings: 1. Passing to a function that processes an istring and does something non-trivial. `execi` is a good example. 2. Passing to a function that simply stringifies every fragment, one after another. `writeln` is a good example. Something counterintuitive, case 1 is easier to address: the function already traverses the received sequence and transforms it. So it is only necessary to write it in such way that it is inline-friendly. By the way, what functions do we have in Phobos that fall into the case-2 category? `write`/`writeln`, `std.conv.text`, `std.logger.core.log`, and… is that all? Must be something else!.. Turns out there are only a handful of relevant functions in the entire stdlib. It shouldn’t be hard to put a filter in each of them. It also hints they are probably not that common in the wild. However, when one encounters a third-party `write`-like function that is unaware of `InterpolationHeader`/etc., they should have a means to fix it from outside, i.e., without touching its source and ideally without writing a wrapper by hand. Unfortunately, I could not come up with a satisfactory solution for this. Will keep thinking. Perhaps someone else manages to find it faster. --- An idea in a different direction. Currently, `InterpolationHeader`/etc. structs interoperate with `write`-like functions seamlessly (at the expense of passing zero-sized arguments) due to the fact they all have an appropriate `toString` method. If we remove those methods (and do nothing else), then `write(i"a$(x)b")` would produce something like: InterpolationHeader()InterpolatedLiteral!"a"()InterpolatedExpression!"x"()42InterpolatedLiteral!"b"()InterpolationFooter() The program, rather than introducing a silent inefficiency, immediately tells the user they need to account for these types. --- And one more idea. Current implementation of DIP1036 can emit empty chunks—i.e., `InterpolatedLiteral!""`—see for example `i"$(x)"`. If I was making a guess why it does so, I would say it strives to produce consistent, regular sequences. On the one hand, it might ease the job of interpolation-sequence handlers: they can count on the fact that expressions and literals always alternate inside a sequence. On the other, they have to check if a literal is empty and drop it if it is so it actually makes their job harder. I do not know whether not producing empty literals in the first place would be a positive or negative change. But it is something worth to consider. --- Slightly off-topic: when I was thinking about this, I was astonished by the fact istrings can work with `readf`/`formattedRead`/`scanf`. Just wanted to share this observation. ```d readf(i" $(&x) $(&y)"); ```The compiler will indeed reject it (The error message would be a bit baffling to those who don't know what Interpolation types are)This is true. I suppose the docs should mention `InterpolationHeader` and friends when talking about istrings, explain what an istring is lowered to, and show examples. Then a programmer who’ve read the docs will have a mental association between “istring” and “InterpolationHeader/Footer/etc.” Those who don’t read the docs—well, they won’t have. Only googling will save them. To be honest, I’m not concerned about this point too much.along with any attempt to call execi() with a pre-constructed string. The end result is that to do manipulation with istring tuples, the programmer is alternately faced with adding Interpolation elements or filtering them out. Is that really what we want?I’d argue it is wonderful that `execi` cannot be called with a pre-constructed string. The API should provide another function instead—say, `execDynamicStatement(Sqlite, string, Args...)`. `execi` should be used for statically known SQL with interpolated arguments, and `execDynamicStatement`—for arbitrary SQL constructed at runtime. A verbose name is intentional to discourage its usage in favour of `execi`.P.S. most keyboarding bugs result from neglecting to add needed syntax, not typing extra stuff.That makes sense. Though you’ll never guess what beast can be spawned by uncareful refactoring. Extra protection won’t harm, especially if it’s zero-cost. P.S. Zero-initialization of variables is one of D’s cool features, indeed.
Jan 09 2024
On 1/10/24 00:21, Walter Bright wrote:...The other points I think have been adequately addressed already.That is not true in the least. It validates conclusively that no SQL injection attack is going on. This is the main feature of the example!You are right, it doesn’t. Timon’s point (expressed as “This does not work”) is that DIP1036 is able to do validation at compile time while DIP1027 is only able to do it at runtime, when this function actually gets invoked.The only validation it does is check for nested string interpolations.
Jan 11 2024
On 1/9/24 21:01, Walter Bright wrote:On 1/9/2024 4:35 AM, Timon Gehr wrote:Adam's `execi` partially runs at compile time and partially of course it will ultimately run at run time (like code generated by a metaprogram tends to do). The SQL statement is prepared at compile time. Therefore, by construction, it cannot depend on any runtime parameters, preventing an SQL injection. (And it can be checked at compile time, like people are already doing with less convenient syntax).However, let's for the sake of argument assume that, miraculously, `execi` can read the format string at compile time, then:Adam's implementation of execi() also runs at run time, not compile time. ...So is Adam's example code. In any case: I am talking about the function _signature_. Whatever crazy advanced thing you do in the implementation, the signature that DIP1027 expects `execi` to have is fundamentally significantly less safe.- With this signature, if you pass a manually-constructed string to it, it would just accept the SQL injection.It was just a proof of concept piece of code.execi could check for format strings that contain ?n sequences. It could also check the number of %s formats against the number of arguments. ...That does not fix the security issue.> But you get a useful error message that exactly pinpoints what the problem is. > Also, they could be supported, which is the point.And now suddenly you can no longer store anything that looks like a format string in your data base.- It does not give a proper error message for nested istrings.execi could be extended to reject arguments that contain %s sequences.Or, if there was an embedded istring, the number of %s formats can be checked against the number of arguments.Maybe at runtime. But why introduce this failure mode in the first place?An embedded istring would show a mismatch. ...The error message would be phrased in overly general terms and hence be confusing.I expect that use of nested istrings would be exceedingly rare. If they are used, wrapping them in text() will make work.Depends on how exactly they are used. For the SQL case, not allowing them is a decent option.Besides, would a nested istring in an sql call be intended as part of the sql format, or would a text string be the intended result? ...Whatever it is, with DIP1036e and compile-time SQL construction, user data does not make it into the SQL expression sent to the database.Adam's implementation does the filtering at compile time. The function body will be something like: auto statement = Statement(db, "...?1...?2...?3..."); // replace ... by query int number = 0; statement.bind(++number, firstArg); statement.bind(++number, secondArg); statement.bind(++number, thirdArg); But yes, DIP1036e does make some concessions and it will indeed pass empty struct arguments in case the function is not inlined (could use pragma(inline, true) to avoid it.)- It has to manually parse the format string. It iterates over each character of the original format string.Correct. And it does not need to iterate over and remove all the Interpolation arguments.Nor does it need the extra two arguments, which aren't free of cost. ...Are you really going to argue that some extra empty struct arguments are in some way more expensive than runtime query construction including format string parsing and query construction using GC strings? But anyway, if you think interpolation is not worth runtime overhead that would perhaps need to be mitigated using additional features or an improved calling convention, that's up to you, but then DIP1027 loses too.You argued earlier like it is in some way an ironic benefit of DIP1027 that the DB interface requires something that is similar to a format string under the hood. Well, it does not require the kind of format string that DMD is generating.- It (ironically!) constructs a new format string, the original one was useless.Yes, it converts the format specifiers to the sql ones. Why is this a problem? ...That's a fair point in general, but I was specifically talking about the format string that you pass into the function that accepts the istring, not similar kinds of strings that may or may not be generated in the implementation. In any case, DIP1027 istrings can also create a format string with `?3`, and there no way to check within `execi` if that `?3` came from malicious data that was read as input to the program or was put there by an incompetent programmer.- If you pass a bad format string to it (for example, by specifying a manual format), it will just do nonsense, while DIP1036e avoids bad format strings by construction.What happens when ?3 is included in a DIP1036 istring? `i"string ?3 ($betty)" ? I didn't see any check for that.Of course, one could add such a check to the 1036 execi. ...With DIP1036e the check could be done at compile time.printf format strings are checked by the compiler,As a one-off special case that only supports a specific kind of format string.and writef format strings are checked by writef.`writef` allows the format string to be passed as a template parameter if compile-time parsing and checking is requested. DIP1027 does not naturally support this.execi is also capable of being extended to check the format string to ensure the format matches the args.With DIP1027, you'd have to do it at runtime.
Jan 09 2024
I'd like to see an example of how DIP1027 does not prevent an injection attack.
Jan 11 2024
On 1/12/24 07:13, Walter Bright wrote:I'd like to see an example of how DIP1027 does not prevent an injection attack.```d // mock SQL import std.format, std.variant; class Sqlite{ this(string){} Sqlite query(string command,scope Variant[int] args=null){ writeln("EXECUTING"); writeln(command); if(args.length){ writeln("ARGS:"); foreach(k,v;args){ if(v!=Variant.init) writefln(i"?$k = ($v)"); } } writeln("DONE"); return this; } struct Row{ int opIndex(int i){ return 0; } } int opApply(scope int delegate(Row) dg){ writeln("ITERATING OVER ROWS"); return 0; } } struct Statement{ Sqlite db; string query; Variant[int] args; void bind(T)(int i,T arg){ args[i]=Variant(arg); } void execute(){ db.query(query,args); } } auto execi(Args...)(Sqlite db, Args args) { // sqlite lets you do ?1, ?2, etc string query = () { // note: parsing done at runtime string sql; int number; import std.conv; auto fmt = args[0]; for (size_t i = 0; i < fmt.length; ++i) { char c = fmt[i]; if (c == '%' && i + 1 < fmt.length && fmt[i + 1] == 's') { sql ~= "?" ~ to!string(++number); ++i; } else if (c == '%' && i + 1 < fmt.length && fmt[i + 1] == '%') ++i; // skip escaped % else sql ~= c; } return sql; }(); auto statement = Statement(db, query); int number; foreach(arg; args[1 .. args.length]) { statement.bind(++number, arg); } return statement.execute(); } import std.stdio; void main() { auto db = new Sqlite(":memory:"); db.query("CREATE TABLE Students (id INTEGER, name TEXT)"); // you might think this is sql injection... and you'd be right! the lib // cannot use rich metadata because it is not provided by the istring // therefore, it cannot verify that the user didn't construct the // query themselves in an unsafe way int id = 1; string name = "Robert'); DROP TABLE Students;--"; db.execi(i"INSERT INTO sample VALUES ($(id), '$(name)')".format); foreach(row; db.query("SELECT * from sample")) writeln(row[0], ": ", row[1]); } ``` Prints: EXECUTING CREATE TABLE Students (id INTEGER, name TEXT) DONE EXECUTING INSERT INTO sample VALUES (1, 'Robert'); DROP TABLE Students;--') DONE EXECUTING SELECT * from sample DONE ITERATING OVER ROWS https://xkcd.com/327/
Jan 12 2024
On 1/9/24 21:01, Walter Bright wrote:I expect that use of nested istrings would be exceedingly rare. If they are used, wrapping them in text() will make work.One more point here is that `text` will of course only work with DIP1038e, with DIP1027 you need `format`. In any case, unfortunately I have to bow out of this discussion now as it is consuming too much of my time right in front of a deadline. I can get back to this in a couple of days.
Jan 09 2024
At the end of the day, DIP1027 is an improvement of `writef`, and `writef` only (not even `printf` works correctly). The interpolation DIP Atila is writing (I'll call it IDIP) supports all manner of interpolated transformations, efficiently and effectively, with proper compiler checks. Let's go through the points made... On Monday, 8 January 2024 at 23:06:40 UTC, Walter Bright wrote:Here's how SQL support is done for DIP1036: https://github.com/adamdruppe/interpolation-examples/blob/master/lib/sql.d...This: 1. The istring, after converted to a tuple of arguments, is passed to the `execi` template.Yes, and with an explicit type to be matched against, enabling overloading. Note that `execi` could be called the same thing as the normal execution function, and then users could use whatever form they prefer -- sql string + args or istring. It's a seamless experience. Compare to DIP1027 where you can accidentally use the wrong form with string args.2. It loops over the arguments, essentially turing it (ironically!) back into a format string. The formats, instead of %s, are ?1, ?2, ?3, etc.There is no formatting, sqlite does not have any kind of format specifiers. No, it is not "turned back" into a format string, because there was no format string to begin with. The sql is *constructed* using the given information from the compiler clearly identifying which portions are sql and which portions are parameters. And the SQL query is built at compile time, not runtime (as DIP1027 *must do*). This incurs no memory allocations at runtime.3. It skips all the Interpolation arguments inserted by DIP1036.Sure, those are not necessary here. Should be a no-op, as no data is actually passed.4. The remaining argument are each bound to the indices 1, 2, 3, ...Yes.5. Then it executes the sql statement.Yes.Note that nested istrings are not supported.Note that nested istrings can be *detected*. And they are not supported *as explicitly specified*! This is not a defect or limitation but a choice of the particular example library. Noting this "limitation" is like noting the limitation that `void foo(int)` can't be called with a `string` argument.Let's see how this can work with DIP1027: ```d auto execi(Args...)(Sqlite db, Args args) { import arsd.sqlite; // sqlite lets you do ?1, ?2, etc enum string query = () { string sql; int number; import std.conv; auto fmt = args[0]; for (size_t i = 0; i < fmt.length, ++i) { char c = fmt[i]; if (c == '%' && i + 1 < fmt.length && fmt[i + 1] == 's') { sql ~= "?" ~ to!string(++number); ++i; } else if (c == '%' && i + 1 < fmt.length && fmt[i + 1] == '%') ++i; // skip escaped % else sql ~= c; } return sql; }(); ```As mentioned several times, this fails to compile -- an enum cannot be built from the runtime variable `args`. Now, you can just do this *without* an enum, and yes, it will compile, build a string at runtime, and you are now at the mercy of the user to not have put in specialized placeholder (poorly named as a "format specifier" in DIP1027 because it is solely focused on writef). No compiler help for you! To put it another way, you have given up complete control of the API of your library to the compiler and the user. Instead of understanding what the user has said, you have to guess. And BTW, this is valid SQL: ```sql SELECT * FROM someTable WHERE fieldN LIKE '%something%' ``` Which means, the poor user needs to escape `%` in a way completely unrelated to the sql language *or* the istring specification, something that IDIP doesn't require. This is a further burden on the user that is wholly unnecessary, just because DIP1027 decided to use `%s` as "the definitive ~~placeholder~~ format specifier".```d auto statement = Statement(db, query); int number; foreach(arg; args[1 .. args.length]) { statement.bind(++number, arg); } return statement.execute(); } ``` This: 1. The istring, after converted to a tuple of arguments, is passed to the `execi` template.A tuple with an incorrect parameter that needs runtime transformation and allocations.2. The first tuple element is the format string. 3. A replacement format string is created by replacing all instances of "%s" with "?n", where `n` is the index of the corresponding arg.SQL doesn't use format strings, so the parameter must be transformed at runtime using memory allocations. And it does this without knowing whether the "%s" came from the "format string" or from a parameter. Not to mention the user can pass in other "format specifiers" at will.4. The replacement format string is bound to `statement`, and the arguments are bound to their indices.Maybe. sqlite frowns upon mismatching arguments because the library decided your search string was actually a placeholder in some unrelated domain specific language (the language of `writef`).5. Then it executes the sql statement.Maybe.It is equivalent.It is most certainly not. The two are only slightly comparable. IDIP is a mechanism for an SQL library author (and many other domains, see Adam's repository) to effectively and gracefully consume succinct and intuitive instructions from a user to avoid SQL injections, and use the compiler to weed out problematic calls. Whereas DIP1027 is a loaded footgun which is built for `writef` that can be shoehorned into an SQL lib, which necessitates allocations and all checks are done at runtime. -Steve
Jan 09 2024
On 1/9/24 23:30, Steven Schveighoffer wrote:And BTW, this is valid SQL: ```sql SELECT * FROM someTable WHERE fieldN LIKE '%something%' ``` Which means, the poor user needs to escape `%` in a way completely unrelated to the sql language *or* the istring specification, something that IDIP doesn't require.I had typed up a similar point in my post, but then thought that most likely DIP1027 does the escaping automatically and dropped the line of inquiry. But actually checking it now, it indeed does not seem to do anything to prevent such hijacking. https://github.com/dlang/DIPs/blob/master/DIPs/rejected/DIP1027.md https://github.com/dlang/dmd/compare/master...WalterBright:dmd:dip1027#diff-a556a8e6917dd4042f541bdb19673f96940149ec3d416b0156af4d0e4cc5e4bdR16347-R16452 Having the SQL library arbitrarily interpret a substring `%s` in your SQL query as a placeholder seems like unnecessary pain, and it also renders moot the idea that DIP1027 code is able to detect mismatches.
Jan 09 2024
Please post an example of a problem it cannot detect.
Jan 11 2024
On 1/12/24 07:17, Walter Bright wrote:Please post an example of a problem it cannot detect.For example: ```d import std.stdio; void main(){ int x=2,y=3; writefln(i"%success: $y",x); } ```
Jan 12 2024