www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Interpolated strings and SQL

reply Walter Bright <newshound2 digitalmars.com> writes:
Here's how SQL support is done for DIP1036:

https://github.com/adamdruppe/interpolation-examples/blob/master/lib/sql.d

```
auto execi(Args...)(Sqlite db, InterpolationHeader header, Args args, 
InterpolationFooter footer) {
     import arsd.sqlite;

     // sqlite lets you do ?1, ?2, etc

     enum string query = () {
         string sql;
         int number;
         import std.conv;
         foreach(idx, arg; Args)
             static if(is(arg == InterpolatedLiteral!str, string str))
                 sql ~= str;
             else static if(is(arg == InterpolationHeader) || is(arg == 
InterpolationFooter))
                 throw new Exception("Nested interpolation not supported");
             else static if(is(arg == InterpolatedExpression!code, string code))
                 {   } // just skip it
             else
                 sql ~= "?" ~ to!string(++number);
         return sql;
     }();

     auto statement = Statement(db, query);
     int number;
     foreach(arg; args) {
         static if(!isInterpolatedMetadata!(typeof(arg)))
             statement.bind(++number, arg);
     }

     return statement.execute();
}
```
This:

1. The istring, after converted to a tuple of arguments, is passed to the 
`execi` template.
2. It loops over the arguments, essentially turing it (ironically!) back into a 
format
string. The formats, instead of %s, are ?1, ?2, ?3, etc.
3. It skips all the Interpolation arguments inserted by DIP1036.
4. The remaining argument are each bound to the indices 1, 2, 3, ...
5. Then it executes the sql statement.

Note that nested istrings are not supported.

Let's see how this can work with DIP1027:

```
auto execi(Args...)(Sqlite db, Args args) {
     import arsd.sqlite;

     // sqlite lets you do ?1, ?2, etc

     enum string query = () {
         string sql;
         int number;
         import std.conv;
         auto fmt = arg[0];
         for (size_t i = 0; i < fmt.length, ++i)
         {
             char c = fmt[i];
             if (c == '%' && i + 1 < fmt.length && fmt[i + 1] == 's')
             {
                 sql ~= "?" ~ to!string(++number);
                 ++i;
             }
             else if (c == '%' && i + 1 < fmt.length && fmt[i + 1] == '%')
                 ++i;  // skip escaped %
             else
                 sql ~= c;
         }
         return sql;
     }();

     auto statement = Statement(db, query);
     int number;
     foreach(arg; args[1 .. args.length]) {
         statement.bind(++number, arg);
     }

     return statement.execute();
}
```
This:

1. The istring, after converted to a tuple of arguments, is passed to the 
`execi` template.
2. The first tuple element is the format string.
3. A replacement format string is created by replacing all instances of "%s"
with
"?n", where `n` is the index of the corresponding arg.
4. The replacement format string is bound to `statement`, and the arguments are 
bound
to their indices.
5. Then it executes the sql statement.

It is equivalent.
Jan 08 2024
next sibling parent reply Nickolay Bukreyev <buknik95 ya.ru> writes:
Hello. It is fascinating to see string interpolation in D. Let me 
try to spread some light on it; I hope my thoughts will be useful.

1.  First of all, I’d like to notice that in the DIP1027 variant 
of the code we see:

     > `auto fmt = arg[0];`

     (`arg` is undeclared identifier here; I presume `args` was 
meant.) There is a problem: this line is executed at CTFE, but it 
cannot access `args`, which is a runtime parameter of `execi`. 
For this to work, the format string should go to a template 
parameter, and interpolated expressions should go to runtime 
parameters. How can DIP1027 accomplish this?

2.
     > Note that nested istrings are not supported.

     To clarify: “not supported” means one cannot write

     ```
     db.execi(i"SELECT field FROM items WHERE server = 
$(i"europe$(number)")");
     ```

     Instead, you have to be more explicit about what you want the 
inner string to become. This is legal:

     ```
     db.execi(i"SELECT field FROM items WHERE server = 
$(i"europe$(number)".text)");
     ```

     However, it is not hard to adjust `execi` so that it fully 
supports nested istrings:

     ```d
     struct Span {
         size_t i, j;
         bool topLevel;
     }

     enum segregatedInterpolations(Args...) = {
         Span[ ] result;
         size_t processedTill;
         size_t depth;
         static foreach (i, T; Args)
             static if (is(T == InterpolationHeader)) {
                 if (!depth++) {
                     result ~= Span(processedTill, i, true);
                     processedTill = i;
                 }
             } else static if (is(T == InterpolationFooter))
                 if (!--depth) {
                     result ~= Span(processedTill, i + 1);
                     processedTill = i + 1;
                 }
         return result;
     }();

     auto execi(Args...)(Sqlite db, InterpolationHeader header, 
Args args, InterpolationFooter footer) {
         import std.conv: text, to;
         import arsd.sqlite;

         // sqlite lets you do ?1, ?2, etc

         enum string query = () {
             string sql;
             int number;
             static foreach (span; segregatedInterpolations!Args)
                 static if (span.topLevel) {
                     static foreach (T; Args[span.i .. span.j])
                         static if (is(T == 
InterpolatedLiteral!str, string str))
                             sql ~= str;
                         else static if (is(T == 
InterpolatedExpression!code, string code))
                             sql ~= "?" ~ to!string(++number);
                 }
             return sql;
         }();

         auto statement = Statement(db, query);
         int number;
         static foreach (span; segregatedInterpolations!Args)
             static if (span.topLevel) {
                 static foreach (arg; args[span.i .. span.j])
                     static if 
(!isInterpolatedMetadata!(typeof(arg)))
                         statement.bind(++number, arg);
             } else // Convert a nested interpolation to string 
with `.text`.
                 statement.bind(++number, args[span.i .. 
span.j].text);

         return statement.execute();
     }
     ```

     Here, we just invoke `.text` on nested istrings. A more 
advanced implementation would allocate a buffer and reuse it. It 
could even be ` nogc` if it wanted.

3.  DIP1036 appeals more to me because it passes rich, high-level 
information about parts of the string. With DIP1027, on the other 
hand, we have to extract that information ourselves by parsing 
the string character by character. But the compiler already 
tokenized the string; why do we have to do it again? (And no, 
lower level doesn’t imply broader possibilities here.)

     It may have another implication: looping over characters 
might put current CTFE engine in trouble if strings are large. 
Much more iterations need to be executed, and more memory is 
consumed in the process. We certainly need numbers here, but I 
thought it was important to at least bring attention to this 
point.

4.  What I don’t like in both DIPs is a rather arbitrary 
selection of meta characters: `$`, `$$` and `%s`. In regular 
strings, all of them are just normal characters; in istrings, 
they gain special meaning.

     I suppose a cleaner way would be to use `\(...)` syntax (like 
in Swift). So `i"a \(x) b"` interpolates `x` while `"a \(x) b"` 
is an immediate syntax error. First, it helps to catch bugs 
caused by missing `i`. Second, the question, how do we escape 
`$`, gets the most straightforward answer: we don’t.

     A downside is that parentheses will always be required with 
this syntax. But the community preferred them anyway even with 
`$`.
Jan 08 2024
next sibling parent Nickolay Bukreyev <buknik95 ya.ru> writes:
On Tuesday, 9 January 2024 at 07:30:57 UTC, Nickolay Bukreyev 
wrote:
 However, it is not hard to adjust `execi` so that it fully 
 supports nested istrings:
Shame on me. `segregatedInterpolations(Args...)` should end with this: ```d result ~= Span(processedTill, Args.length, true); return result; ```
Jan 09 2024
prev sibling next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
Thank you for your thoughts!

On 1/8/2024 11:30 PM, Nickolay Bukreyev wrote:> 1.  First of all, I’d like
to 
notice that in the DIP1027 variant of the code we
 see:
 
      > `auto fmt = arg[0];`
 
      (`arg` is undeclared identifier here; I presume `args` was meant.)
Yes. I don't have sql on my system, so didn't try to compile it. I always make typos. Oof.
 There is a problem: this line is executed at CTFE,
It's executed at runtime. The code is not optimized for speed, I just wanted to show the concept. The speed doesn't particularly matter, because after all this is a call to a database which is going to be slow. Anyhow, DIP1036 also uses unoptimized code here.
 3.  DIP1036 appeals more to me because it passes rich, high-level information 
 about parts of the string. With DIP1027, on the other hand, we have to extract 
 that information ourselves by parsing the string character by character. But
the 
 compiler already tokenized the string; why do we have to do it again? (And no, 
 lower level doesn’t imply broader possibilities here.)
DIP1036 also builds a new format string.
      It may have another implication: looping over characters might put
current 
 CTFE engine in trouble if strings are large. Much more iterations need to be 
 executed, and more memory is consumed in the process. We certainly need
numbers 
 here, but I thought it was important to at least bring attention to this point.
It happens at runtime.
 4.  What I don’t like in both DIPs is a rather arbitrary selection of meta 
 characters: `$`, `$$` and `%s`. In regular strings, all of them are just
normal 
 characters; in istrings, they gain special meaning.
I looked at several schemes, and picked `$` because it looked the nicest.
      I suppose a cleaner way would be to use `\(...)` syntax (like in
Swift). So 
 `i"a \(x) b"` interpolates `x` while `"a \(x) b"` is an immediate syntax
error. 
 First, it helps to catch bugs caused by missing `i`.
I'm sorry to say, that looks like tty noise. Aesthetic appeal is very important design consideration for D.
 Second, the question, how 
 do we escape `$`, gets the most straightforward answer: we don’t.
It will rarely need to be escaped, but when one does need it, one needs it!
      A downside is that parentheses will always be required with this
syntax. 
 But the community preferred them anyway even with `$`.
DIP1027 does not require ( ) if it's just an identifier. That makes for the shortest, simplest istring syntax. The ( ) usage will be relatively rare. The idea is the most common cases should require the least syntactical noise. Also, the reason I picked the SQL example is because that is the one most cited as being needed and in showing the power of DIP1036 and because I was told that DIP1027 couldn't do it :-) The intent of DIP1027 is not to provide the most powerful, richest mechanism. It's meant to be the simplest I could think of, with the most attractive appearance, minimal runtime overhead, while handling the meat and potatoes use cases.
Jan 09 2024
next sibling parent reply Alexandru Ermicioi <alexandru.ermicioi gmail.com> writes:
On Tuesday, 9 January 2024 at 08:29:08 UTC, Walter Bright wrote:
 The intent of DIP1027 is not to provide the most powerful, 
 richest mechanism. It's meant to be the simplest I could think 
 of, with the most attractive appearance, minimal runtime 
 overhead, while handling the meat and potatoes use cases.
If that's the case, then 1036 wins imho, by simple thing of not doing any parsing of format string. Note, that other use cases might not require building of a format string. What about logging functionality? In case of 1036, a log function could just dump all text into sink directly, for 1027 it would still need to parse format string to find where to inject arguments. This use case makes 1036 more favourable than 1027, by your own criterias for a good mechanism.
Jan 09 2024
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 1/9/2024 12:45 AM, Alexandru Ermicioi wrote:
 If that's the case, then 1036 wins imho, by simple thing of not doing any 
 parsing of format string.
Consider the overhead 1036 has by comparing it with plain writeln or writefln: ``` void test(int baz) { writeln(i"$(baz + 4)"); writeln(baz + 5); writefln("%d", baz + 6); } ``` Generated code: 0000: 55 push RBP 0001: 48 8B EC mov RBP,RSP 0004: 48 83 EC 20 sub RSP,020h 0008: 48 89 5D E8 mov -018h[RBP],RBX 000c: 89 7D F8 mov -8[RBP],EDI // baz 000f: 48 83 EC 08 sub RSP,8 0013: 31 C0 xor EAX,EAX 0015: 88 45 F0 mov -010h[RBP],AL 0018: 48 8D 75 F0 lea RSI,-010h[RBP] 001c: FF 36 push dword ptr [RSI] // header 001e: 88 45 F1 mov -0Fh[RBP],AL 0021: 48 8D 5D F1 lea RBX,-0Fh[RBP] 0025: FF 33 push dword ptr [RBX] // expression!"baz + 4" 0027: 8D 7F 04 lea EDI,4[RDI] // baz + 4 002a: 88 45 F2 mov -0Eh[RBP],AL 002d: 48 8D 75 F2 lea RSI,-0Eh[RBP] 0031: FF 36 push dword ptr [RSI] // footer 0033: E8 00 00 00 00 call writeln 0038: 48 83 C4 20 add RSP,020h 003c: 8B 45 F8 mov EAX,-8[RBP] 003f: 8D 78 05 lea EDI,5[RAX] // baz + 5 0042: E8 00 00 00 00 call writeln 0047: BA 00 00 00 00 mov EDX,0 // "%d".ptr 004c: BE 02 00 00 00 mov ESI,2 // "%d".length 0051: 8B 4D F8 mov ECX,-8[RBP] 0054: 8D 79 06 lea EDI,6[RCX] // baz + 6 0057: E8 00 00 00 00 call writefln 005c: 48 8B 5D E8 mov RBX,-018h[RBP] 0060: C9 leave 0061: C3 ret With the istring, there are 4 calls to struct member functions that just return null. This can't be good for performance or program size. We can compute the number of arguments passed to the function: istring: 1 + 3 * <number of arguments> + 1 + 1 (*) writeln: <number of arguments> writefln: 1 + <number of arguments> (*) includes string literals before, between, and after arguments
Jan 09 2024
next sibling parent Alexandru Ermicioi <alexandru.ermicioi gmail.com> writes:
On Tuesday, 9 January 2024 at 19:05:40 UTC, Walter Bright wrote:
 On 1/9/2024 12:45 AM, Alexandru Ermicioi wrote:
 If that's the case, then 1036 wins imho, by simple thing of 
 not doing any parsing of format string.
Consider the overhead 1036 has by comparing it with plain writeln or writefln:
How is this related to original argument of not requiring any parsing to be done by user inside function that accepts istring, that you replied to? I personally would be ok with any overhead 1036 adds as long as I don't need to do any extra work such as parsing. Please take into consideration also code inside function that does accept interpolated string. I'm pretty sure that parsing of format string inside dip1027 function would result in bigger and more complex generated code, than overhead you've mentioned for 1036 version, for use cases similar to logging I've mentioned.
Jan 09 2024
prev sibling next sibling parent reply Timon Gehr <timon.gehr gmx.ch> writes:
On 1/9/24 20:05, Walter Bright wrote:
 On 1/9/2024 12:45 AM, Alexandru Ermicioi wrote:
 If that's the case, then 1036 wins imho, by simple thing of not doing 
 any parsing of format string.
Consider the overhead 1036 has by comparing it with plain writeln or writefln: ``` void test(int baz) {     writeln(i"$(baz + 4)");     writeln(baz + 5);     writefln("%d", baz + 6); } ``` ...
I think Alexandru and Nickolay already discharged the concerns about overhead pretty well, but just note that with DIP1027, `test(3)` prints: %s7 8 9 There is fundamentally no way to make this work correctly, due to how DIP1027 throws away the information about the format string. With DIP1036e, `test(3)` prints: 7 8 9 And you can get rid of the runtime overhead by adding a `pragma(inline, true)` `writeln` overload. (I guess with DMD that will still bloat the executable, but I think other compiler backends and linkers can be made elide such symbols completely.)
Jan 09 2024
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 1/9/2024 2:38 PM, Timon Gehr wrote:
 %s7 8 9
Yes, I used writeln instead of writefln. The similarity between the two names is a source of error, but if that was a festering problem we'd have seen a lot of complaints about it by now.
 And you can get rid of the runtime overhead by adding a `pragma(inline, true)` 
 `writeln` overload. (I guess with DMD that will still bloat the executable,
Try it and see. I didn't mention the other kind of bloat - the rather massive number and size of template names being generated that go into the object file, as well as all the uncalled functions generated only to be removed by the linker. As far as I can tell, the only advantage of DIP1036 is the use of inserted templates to "key" the tuples to specific functions. Isn't that what the type system is supposed to do? Maybe the real issue is that a format string should be a different type than a conventional string. For example: ```d extern (C) pragma(printf) int printf(const(char*), ...); enum Format : string; void foo(Format f) { printf("Format %s\n", f.ptr); } void foo(string s) { printf("string %s\n", s.ptr); } void main() { Format f = cast(Format)"f"; foo(f); string s = "s"; foo(s); } ``` which prints: Format f string s If we comment out `foo(string s)`: test2.d(14): Error: function `test2.foo(Format f)` is not callable using argument types `(string)` test2.d(14): cannot pass argument `s` of type `string` to parameter `Format f` If we comment out `foo(Format s)`: string f string s This means that if execi()'s first parameter is of type `Format`, and the istring generates the format string with type `Format`, this key will fit the lock. A string generated by other means, such as `.text`, will not fit that lock.
Jan 10 2024
next sibling parent reply Nickolay Bukreyev <buknik95 ya.ru> writes:
On Wednesday, 10 January 2024 at 19:53:48 UTC, Walter Bright 
wrote:
 I may have found a solution. I'm interested in your thoughts on 
 it.
It looks very similar to what I presented in my later posts ([this](https://forum.dlang.org/post/qiyrmzwnoguzxxllgzcz forum.dlang.org) and one following). It’s inspiring: we are probably getting closer to common understanding of things.
 As far as I can tell, the only advantage of DIP1036 is the use 
 of inserted templates to "key" the tuples to specific 
 functions. Isn't that what the type system is supposed to do? 
 Maybe the real issue is that a format string should be a 
 different type than a conventional string.
Exactly. Let me try to explain why DIP1036 is doing what it is doing. For illustrative purposes, I’ll be drastically simplifying code; please excuse me for that. Let there be `foo`, a function that would like to receive an istring. Inside it, we would like to transform its argument list at compile time into a new argument list. So what we essentially want is to pass an istring to a template parameter so that it is available to `foo` at compile time: ```d int x; foo!(cast(Format)"prefix ", 2 * x); // foo!(alias Format, alias int)() ``` Unfortunately, this does not work because `2 * x` cannot be passed to an `alias` parameter. _This is the root of the problem._ The only way to do that is to pass them to runtime parameters: ```d int x; foo(cast(Format)"prefix ", 2 * x); // foo!(Format, int)(Format, int) ``` However, now `foo` cannot access the format string at compile time—its type is simply `Format`, and its value becomes known only at runtime. So we encode the value into the type: ```d int x; foo(Format!"prefix "(), 2 * x); // foo!(Format!"prefix ", int)(Format!"prefix ", int) ``` This is more or less what DIP1036 is doing at the moment. Hope it became clear now. I’d say DIP1036, as we see it now, relies on a clever workaround of a limitation imposed by the language. If that limitation is gone, the DIP will become simpler.
Jan 10 2024
next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 1/10/2024 5:53 PM, Nickolay Bukreyev wrote:
 Exactly. Let me try to explain why DIP1036 is doing what it is doing. For 
 illustrative purposes, I’ll be drastically simplifying code; please excuse
me 
 for that.
Thank you for the explanation. It was entirely missing from the spec, and I overlooked it in the code. (This is why reverse engineering a spec from code is not so easy.) It is indeed clever. As for it being a required feature of string interpolation to do this processing at compile time, that's a nice feature, not a must have. The enum proposal is to obviate the requirement for a header and footer template, which is a big improvement.
Jan 10 2024
next sibling parent Nickolay Bukreyev <buknik95 ya.ru> writes:
On Thursday, 11 January 2024 at 02:21:17 UTC, Walter Bright wrote:
 As for it being a required feature of string interpolation to 
 do this processing at compile time, that's a nice feature, not 
 a must have.
Importance of the ability to do processing at compile time was stated by: * Alexandru ([here](https://forum.dlang.org/post/yxrqncmaiyfmhxnvzgil forum.dlang.org) and [here](https://forum.dlang.org/post/yqwxvjnvqaahhshrfohy forum.dlang.org)), * Timon ([here](https://forum.dlang.org/post/unjfb9$1ku5$1 digitalmars.com)), * Paolo ([here](https://forum.dlang.org/post/rhpblxrebibhpnfxfihv forum.dlang.org) and [here](https://forum.dlang.org/post/ajeqtckcwawuvtusbvxb forum.dlang.org)), * Steven ([here](https://forum.dlang.org/post/ilituyhcqipsqktqmfor forum.dlang.org)).
 The enum proposal is to obviate the requirement for a header 
 and footer template, which is a big improvement.
Header and footer are not templates; `InterpolatedLiteral` and `InterpolatedExpression` are. Yes, the latter two can be replaced by enums iff it becomes possible to pass arbitrary expressions to alias parameters. And I agree it would be a big improvement.
 Structs with no fields have a size of 1 byte for D and C++ 
 structs, and 0 or 4 for C structs (depending on the target).
Yes, I mistakenly wrote, _zero-sized_, when I meant, _empty_.
Jan 10 2024
prev sibling parent reply Timon Gehr <timon.gehr gmx.ch> writes:
On 1/11/24 03:21, Walter Bright wrote:
 
 As for it being a required feature of string interpolation to do this 
 processing at compile time, that's a nice feature, not a must have.
As far as I am concerned it is a must-have. For example, this is what prevents the SQL injection attack, it's a safety guarantee.
Jan 11 2024
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 1/11/2024 11:50 AM, Timon Gehr wrote:
 On 1/11/24 03:21, Walter Bright wrote:
 As for it being a required feature of string interpolation to do this 
 processing at compile time, that's a nice feature, not a must have.
As far as I am concerned it is a must-have. For example, this is what prevents the SQL injection attack, it's a safety guarantee.
Why does compile time make it a guarantee and runtime not? We do array bounds checking at runtime.
Jan 11 2024
next sibling parent reply "Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:
On 12/01/2024 6:28 PM, Walter Bright wrote:
 On 1/11/2024 11:50 AM, Timon Gehr wrote:
 On 1/11/24 03:21, Walter Bright wrote:
 As for it being a required feature of string interpolation to do this 
 processing at compile time, that's a nice feature, not a must have.
As far as I am concerned it is a must-have. For example, this is what prevents the SQL injection attack, it's a safety guarantee.
Why does compile time make it a guarantee and runtime not? We do array bounds checking at runtime.
Where possible we absolutely should not be. Making things crash at runtime, because the compiler did not apply the knowledge it has is just ridiculous. Imagine going to ``http://google.com/itsacrash`` and crashing Google. Or pressing a button too fast on an airplane and suddenly the fuel pumps turn off and then refuse to turn back on. Instead of the compiler catching clearly bad logic that it has a full understanding of, you're disrupting service and making people lose money. This is not a good thing.
Jan 11 2024
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 1/11/2024 9:36 PM, Richard (Rikki) Andrew Cattermole wrote:
 Making things crash at runtime, because the compiler did not apply the
knowledge 
 it has is just ridiculous.
 
 Imagine going to ``http://google.com/itsacrash`` and crashing Google.
 
 Or pressing a button too fast on an airplane and suddenly the fuel pumps turn 
 off and then refuse to turn back on.
 
 Instead of the compiler catching clearly bad logic that it has a full 
 understanding of, you're disrupting service and making people lose money. This 
 is not a good thing.
I agree that compile time checking is preferable. But there is a cost involved, as I explained more fully in another post. It isn't free. Since the format string is a compile time creature, not a user input feature, if the fault only happened when the code is deployed, it means the code was *never* executed before it was shipped. This is an inexcusable failure for any avionics system, or any critical system, since we have simple tools that check coverage. BTW, professional code is full of assert()s. Asserts check for faults in the code logic that are not the result of user input, but are the result of programming errors. We leave them as asserts because nobody knows how to get compilers to detect them, or is too costly to detect them. In other words, this is not an absolute thing. It's a weighing of cost and benefit.
Jan 11 2024
parent reply "Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:
On 12/01/2024 8:00 PM, Walter Bright wrote:
 On 1/11/2024 9:36 PM, Richard (Rikki) Andrew Cattermole wrote:
 Making things crash at runtime, because the compiler did not apply the 
 knowledge it has is just ridiculous.

 Imagine going to ``http://google.com/itsacrash`` and crashing Google.

 Or pressing a button too fast on an airplane and suddenly the fuel 
 pumps turn off and then refuse to turn back on.

 Instead of the compiler catching clearly bad logic that it has a full 
 understanding of, you're disrupting service and making people lose 
 money. This is not a good thing.
I agree that compile time checking is preferable. But there is a cost involved, as I explained more fully in another post. It isn't free. Since the format string is a compile time creature, not a user input feature, if the fault only happened when the code is deployed, it means the code was *never* executed before it was shipped. This is an inexcusable failure for any avionics system, or any critical system, since we have simple tools that check coverage. BTW, professional code is full of assert()s. Asserts check for faults in the code logic that are not the result of user input, but are the result of programming errors. We leave them as asserts because nobody knows how to get compilers to detect them, or is too costly to detect them. In other words, this is not an absolute thing. It's a weighing of cost and benefit.
So I guess the question is, do you want to hear from a company that they lost X amount of business because they used a language feature that could have caught errors at compile time, but instead continually crashed in a live environment? I do not. That would be a total embarrassment. I have an identical problem currently with `` mustuse``. It errors out at runtime if you do not check to see if it has an error, if you try to get access to the value. It is hell. I could never recommend such an error prone design. I am only putting up with it until the language is capable of something better. https://issues.dlang.org/show_bug.cgi?id=23998
Jan 11 2024
parent reply "Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:
Let's try something different.

Would you like me to write a small specification for an alternative 
method for passing metadata from the call site into the body that would 
allow a string interpolation feature to not use extra templates while 
still being compile time based?

I described this to Adam Wilson yesterday:

```d
func( metadata("hi!") 2);

void func(T)(T arg) {
	enum MetaData = __traits(getAttributes, arg);
	pragma(msg, MetaData);
}
```

This is essentially what 1036e is attempting to do, but it does it with 
extra templates.
Jan 11 2024
parent zjh <fqbqrr 163.com> writes:
On Friday, 12 January 2024 at 07:31:49 UTC, Richard (Rikki) 
Andrew Cattermole wrote:
 Let's try something different.
 ```d
 func( metadata("hi!") 2);

 void func(T)(T arg) {
 	enum MetaData = __traits(getAttributes, arg);
 	pragma(msg, MetaData);
 }
 ```
I think D language can create an `attribute dictionary` for any building block In this way, the `attribute soup` can be simplified. It would be even better to simplify the method of `getting and setting` attributes. It can be used to facilitate the extraction of `metadata`
Jan 11 2024
prev sibling parent Timon Gehr <timon.gehr gmx.ch> writes:
On 1/12/24 06:28, Walter Bright wrote:
 On 1/11/2024 11:50 AM, Timon Gehr wrote:
 On 1/11/24 03:21, Walter Bright wrote:
 As for it being a required feature of string interpolation to do this 
 processing at compile time, that's a nice feature, not a must have.
As far as I am concerned it is a must-have. For example, this is what prevents the SQL injection attack, it's a safety guarantee.
Why does compile time make it a guarantee and runtime not? ...
Because a SQL injection attack by definition is when a third party can control safety-critical parts of your SQL query at runtime. The very fact that the whole prepared SQL query is known at compile-time, with runtime data only entering through the placeholders, conclusively rules this out. If the SQL query is constructed at runtime based on runtime data, `execi` is unable to check whether an SQL injection vulnerability is present.
 We do array bounds checking at runtime.
You can check array bounds at runtime. You cannot check where a runtime-known string came from at runtime. It's simply not possible.
Jan 12 2024
prev sibling parent reply "Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:
On 11/01/2024 2:53 PM, Nickolay Bukreyev wrote:
 I’d say DIP1036, as we see it now, relies on a clever workaround of a 
 limitation imposed by the language. If that limitation is gone, the DIP 
 will become simpler.
Another potential solution would be to allow passing metadata on the function call side, to the function. Consider: ``i"prefix${expr:format}suffix"`` Could be: ```d func("prefix", format("format") expr, "suffix"); void func(T...)(T args) { pragma(msg, __traits(getAttributes, args[1])); // format("format") } ``` This is so much simpler than what 1036e is. But it does require another language feature.
Jan 10 2024
parent reply Nickolay Bukreyev <buknik95 ya.ru> writes:
On Thursday, 11 January 2024 at 02:35:00 UTC, Richard (Rikki) 
Andrew Cattermole wrote:
 ```d
 void func(T...)(T args) {
     pragma(msg, __traits(getAttributes, args[1])); // 
 format("format")
 }
 ```
Sorry, I don’t understand how this can possibly work. After `func` template is instantiated, its `T` is bound to, e.g., `AliasSeq!(string, int, string)`. `args` is just a local variable of type `AliasSeq!(string, int, string)`. How can `__traits` know what attributes were attached at call site? If, on the other hand, attributes do affect the type, then IMHO ```d func("prefix", format("format") expr, "suffix"); ``` is not much different than ```d func("prefix", format!"format"(expr), "suffix"); ``` I.e., we can do it already.
Jan 10 2024
parent reply "Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:
On 11/01/2024 5:31 PM, Nickolay Bukreyev wrote:
 On Thursday, 11 January 2024 at 02:35:00 UTC, Richard (Rikki) Andrew 
 Cattermole wrote:
 ```d
 void func(T...)(T args) {
     pragma(msg, __traits(getAttributes, args[1])); // format("format")
 }
 ```
Sorry, I don’t understand how this can possibly work. After `func` template is instantiated, its `T` is bound to, e.g., `AliasSeq!(string, int, string)`. `args` is just a local variable of type `AliasSeq!(string, int, string)`. How can `__traits` know what attributes were attached at call site? If, on the other hand, attributes do affect the type, then IMHO ```d func("prefix", format("format") expr, "suffix"); ``` is not much different than ```d func("prefix", format!"format"(expr), "suffix"); ``` I.e., we can do it already.
This has side effects. It affects ``ref`` and ``out``. It also affects lifetime analysis. So we can't do it currently. But yes, it affects the type, without being in the type system explicitly as it is meta data.
Jan 10 2024
parent Nickolay Bukreyev <buknik95 ya.ru> writes:
On Thursday, 11 January 2024 at 04:34:33 UTC, Richard (Rikki) 
Andrew Cattermole wrote:
 This has side effects. It affects ``ref`` and ``out``. It also 
 affects lifetime analysis.

 So we can't do it currently.

 But yes, it affects the type, without being in the type system 
 explicitly as it is meta data.
Thank you for the clarification. I see a downside that pretty much any generic code should strip the annotations off its arguments after it inspected them, to reduce template bloating. However, we are probably going off-topic.
Jan 10 2024
prev sibling next sibling parent Steven Schveighoffer <schveiguy gmail.com> writes:
On Wednesday, 10 January 2024 at 19:53:48 UTC, Walter Bright 
wrote:
 And you can get rid of the runtime overhead by adding a 
 `pragma(inline, true)` `writeln` overload. (I guess with DMD 
 that will still bloat the executable,
I didn't mention the other kind of bloat - the rather massive number and size of template names being generated that go into the object file, as well as all the uncalled functions generated only to be removed by the linker.
Yes, DIP1036e has a lot of extra templates generated, and the mangled name is going to be large. Let's skip for a moment the template that writeln will generate (which I agree isn't ideal, but also is somewhat par for the course). This shouldn't be a huge problem for the interpolation *types* because the type doesn't get included in the binary. It is a big problem for the `toString` function, because that *is* included. However, we can mitigate the ones that return `null`: ```d string __interpNull() => null; struct InterpolatedExpression(string expr) { alias toString = __interpNull; } ... // and so on ``` I tested this and it does work. So this reduces all the `toString` member functions from `InterpolatedExpression` (and `InterpolationPrologue` and `InterpolationEpilog`, but those are not templated structs anyway) to one function in the binary. But we can't do this for `InterpolatedLiteral` (which by the way is improperly described in Atila's DIP, the associated `toString` member function should return the literal). We can do possibly a couple things here to mitigate: 1. We can modify how `std.format` works so it will accept the following as a `toString` hook: ```d struct S { enum toString = "I am an S"; } ``` This means, no function calls, no extra long symobls in the binary (since it's an enum, it should not go in), and I think even the compilation will be faster. 2. We modify it to be aware of `InterpolationLiteral` types, and avoid depending on the `toString` API. After all, we own both Phobos and druntime, we can coordinate the release. And as a further suggestion, though this is kind of off-topic, we may look into ways to have templates that *don't* make it into the binary explicitly. Basically, they are marked as shims or forwarders by the library author, and just serve as a way to write nicer syntax. This could help in more than just the interpolation DIP.
 As far as I can tell, the only advantage of DIP1036 is the use 
 of inserted templates to "key" the tuples to specific 
 functions. Isn't that what the type system is supposed to do? 
 Maybe the real issue is that a format string should be a 
 different type than a conventional string.
No. While I agree that having a different *type* makes it more useful and easier to hook, there is a fundamental problem being solved with the compile-time literals being passed to the function. Namely, tremendous power is available to validate, parse, prepare, etc. string data at compile time, for use during runtime. This simply *is not possible* with 1027. The runtime benefits are huge: * No need to allocate anything (` nogc`, `-betterC`, etc. all available) * You get compiler errors instead of runtime errors (if you put in the work) * It's possible generate "perfect forwarding" to another function that does use another form. For example, `printf`. * If you inline the call, it can be as if you called the forwarded function directly with the exactly correct parameters. And I want to continue to point out, that a constructed "format string" mechanism just is inferior, regardless if it is another type, as long as you don't need formatting specifiers (and arguably, it's just a difference in taste otherwise). The compiler parsed it out, it knows the separate pieces. Giving those pieces directly to the library is both the most efficient way, and also the most obvious way. The "format string" mechanism, while making sense for writef, *must* add an element of complexity to the receiving function, since it now has to know what "language" the translated string is. e.g. with DIP1027, one must know that `%s` is special and what it represents, and the user must know to escape `%s` to avoid miscommunication. With 1036e, there is no format string, so there is no complication there, or confusion. The value being passed is right where you would expect it, and you don't have to parse a separate thing to know. Note in YAIDIP, this was done partly through an interpolation header, which had all the compile-time information, and then strings and interpolated data were interspersed. I find this also a workable solution, and could even do without the strings being passed interspersed (as I said, we have control over `writeln` and `text`), but I think the ordering of the tuple to match what the actual string literal looks like is so intuitive, and we would be losing that if we did some kind of "format header" mechanism. -Steve
Jan 10 2024
prev sibling parent reply Timon Gehr <timon.gehr gmx.ch> writes:
On 1/10/24 20:53, Walter Bright wrote:
 On 1/9/2024 2:38 PM, Timon Gehr wrote:
  > %s7 8 9
 
 Yes, I used writeln instead of writefln. The similarity between the two 
 names is a source of error, but if that was a festering problem we'd 
 have seen a lot of complaints about it by now.
 ...
My point was with DIP1036e it either works or does not compile, not that you called the wrong function.
 
 And you can get rid of the runtime overhead by adding a 
 `pragma(inline, true)` `writeln` overload. (I guess with DMD that will 
 still bloat the executable,
Try it and see. I didn't mention the other kind of bloat - the rather massive number and size of template names being generated that go into the object file, as well as all the uncalled functions generated only to be removed by the linker. ...
I understand the drawbacks of DIP1036e which it shares with most non-trivial metaprogramming. D underdelivers in this department at the moment, but this still remains one of the key selling points of D. The issue is that DIP1027 is worse than DIP1036e. DIP1027 is also worse than nothing. It has been rejected for good reason. For some reason you however keep insisting it is essentially as useful as DIP1036e. That's just not the case. I think a much better answer to DIP1036e than a DIP1027 revival would have been to add a -preview=experimental-DIP1036e flag and do a call to action to resolve language issues and limitations that force DIP1036e to generate bloat. Maybe there would have been an even better way to handle this.
 As far as I can tell, the only advantage of DIP1036 is the use of 
 inserted templates to "key" the tuples to specific functions.
Well, this is not the case, that is not the only advantage.
 Isn't that 
 what the type system is supposed to do? Maybe the real issue is that a 
 format string should be a different type than a conventional string. For 
 example:
 
 ```d
 extern (C) pragma(printf) int printf(const(char*), ...);
 
 enum Format : string;
 
 void foo(Format f) { printf("Format %s\n", f.ptr); }
 void foo(string s) { printf("string %s\n", s.ptr); }
 
 void main()
 {
      Format f = cast(Format)"f";
      foo(f);
      string s = "s";
      foo(s);
 }
 ```
 which prints:
 
 Format f
 string s
 
 If we comment out `foo(string s)`:
 
 test2.d(14): Error: function `test2.foo(Format f)` is not callable using 
 argument types `(string)`
 test2.d(14):        cannot pass argument `s` of type `string` to 
 parameter `Format f`
 
 If we comment out `foo(Format s)`:
 
 string f
 string s
 
 This means that if execi()'s first parameter is of type `Format`, and 
 the istring generates the format string with type `Format`, this key 
 will fit the lock. A string generated by other means, such as `.text`, 
 will not fit that lock.
 
Well, this is a step in the right direction, but rest assured if this was the only advantage of DIP1036e, then Adam would have gone with this suggestion. I am almost sure this is one of the ideas he discarded.
Jan 11 2024
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 1/11/2024 11:45 AM, Timon Gehr wrote:
 My point was with DIP1036e it either works or does not compile, not that you 
 called the wrong function.
What's missing is why is a runtime check not good enough? The D compiler emits more than one safety check at runtime. For example, array bounds checking, and switch statement default checks.
Jan 11 2024
parent Timon Gehr <timon.gehr gmx.ch> writes:
On 1/12/24 06:33, Walter Bright wrote:
 On 1/11/2024 11:45 AM, Timon Gehr wrote:
 My point was with DIP1036e it either works or does not compile, not 
 that you called the wrong function.
What's missing is why is a runtime check not good enough?
There is no runtime check, it just does the wrong thing.
 The D compiler emits more than one safety check at runtime. For example, array
bounds 
 checking, and switch statement default checks.
Sure.
Jan 12 2024
prev sibling parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On Tuesday, 9 January 2024 at 19:05:40 UTC, Walter Bright wrote:

 With the istring, there are 4 calls to struct member functions 
 that just return null.
Yeah, and writeln could avoid those if it's that important. A good optimizer will remove that call.
 This can't be good for performance or program size.
Then use writeln the way you want? I don't see it as significant at all.
 We can compute the number of arguments passed to the function:

 ```
 istring: 1 + 3 * <number of arguments> + 1 + 1  (*)
 writeln: <number of arguments>
 writefln: 1 + <number of arguments>
 ```

 (*) includes string literals before, between, and after 
 arguments
I find it bizarre to be concerned about the call performance of zero-sized structs and empty strings to writeln or writef, like the function is some shining example of performance or efficient argument passing. If you do not have inlining or optimizations enabled, do you think the call tree of writefln is going to be compact? Not to mention it eventually just calls into C opaquely. Note that you can write a simple wrapper that can be inlined, which will mitigate all of this via compile-time transformations. If you like, I can write it up and you can try it out! -Steve
Jan 09 2024
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 1/9/2024 3:33 PM, Steven Schveighoffer wrote:
 I find it bizarre to be concerned about the call performance of zero-sized 
 structs and empty strings to writeln or writef, like the function is some 
 shining example of performance or efficient argument passing. If you do not
have 
 inlining or optimizations enabled, do you think the call tree of writefln is 
 going to be compact? Not to mention it eventually just calls into C opaquely.
 
 Note that you can write a simple wrapper that can be inlined, which will 
 mitigate all of this via compile-time transformations.
 
 If you like, I can write it up and you can try it out!
I've been aware for a long time that writeln and writefln are very inefficient, and could use a re-engineering. A big part of the problem is the blizzard of templates resulting from using them. This issue doubles the number of templates. Even if they are optimized away, they sit in the object file. Anyhow, see my other reply to Timon. I may have found a solution. I'm interested in your thoughts on it.
Jan 10 2024
parent reply Hipreme <msnmancini hotmail.com> writes:
On Wednesday, 10 January 2024 at 20:19:46 UTC, Walter Bright 
wrote:
 On 1/9/2024 3:33 PM, Steven Schveighoffer wrote:
 I find it bizarre to be concerned about the call performance 
 of zero-sized structs and empty strings to writeln or writef, 
 like the function is some shining example of performance or 
 efficient argument passing. If you do not have inlining or 
 optimizations enabled, do you think the call tree of writefln 
 is going to be compact? Not to mention it eventually just 
 calls into C opaquely.
 
 Note that you can write a simple wrapper that can be inlined, 
 which will mitigate all of this via compile-time 
 transformations.
 
 If you like, I can write it up and you can try it out!
I've been aware for a long time that writeln and writefln are very inefficient, and could use a re-engineering. A big part of the problem is the blizzard of templates resulting from using them. This issue doubles the number of templates. Even if they are optimized away, they sit in the object file. Anyhow, see my other reply to Timon. I may have found a solution. I'm interested in your thoughts on it.
Are you sure you really want to keep optimizing debug logging functionality? Come on. The only reason to keep using `printf` and `writeln` is for debug logging. If you're going to show your log function to a user, it is going to be completely different. They are super easy to disable by simply creating a wrapper. If you want to know what increases the compilation time on them, is `std.conv.to!float`. I have said this many times on forums already. I don't know about people's hobby, but caring about performance on logging is simply too much. Do me a favor: Press F12 to open your browser's console, then write at it: `for(let i = 0; i < 10000; i ++) console.log(i);` You'll notice how slot it is. And this is not JS problem. Logging is always slow, no matter how much you optimize. I personally find this a great loss of time that could be directed into a lot more useful tasks, such as: - Improving debugging symbols in DMD and for macOS - Improving importC until it actually works - Listen to rikki's complaint about how slow it is to import UTF Tables - Improving support for shared libraries on DMD (like not making it collect an interfaced object) - Solve the problem with `init` property of structs containing memory reference which can be easily be corrupted - Fix the problem when an abstract class implements an interface - Make a D compiler daemon - Help in the project of DMD as a library focused on helping WebFreak in code-d and serve-d - Implement DMD support for Apple Silicon - Revive newCTFE engine - Implement ctfe caching Those are the only thing I can take of my mind right now. Anyway, I'm not here to demand anything at all. I'm only giving examples on what could be done in fields I have no experience in how to make it better, but I know people out there can do it. But for me, it is just a pity to see such genius wasting time on improving a rather antiquated debug functionality
Jan 10 2024
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 1/10/2024 12:56 PM, Hipreme wrote:
 - Improving debugging symbols in DMD and for macOS
 - Improving importC until it actually works
 - Listen to rikki's complaint about how slow it is to import UTF Tables
 - Improving support for shared libraries on DMD (like not making it collect an 
 interfaced object)
 - Solve the problem with `init` property of structs containing memory
reference 
 which can be easily be corrupted
 - Fix the problem when an abstract class implements an interface
 - Make a D compiler daemon
 - Help in the project of DMD as a library focused on helping WebFreak in
code-d 
 and serve-d
 - Implement DMD support for Apple Silicon
 - Revive newCTFE engine
 - Implement ctfe caching
I regularly work on many of those problems. For example, without looking it up, I think I've fixed maybe 20 ImportC issues in the last month. I've also done a number of recent PRs aimed at making D more tractable as a library. So has Razvan.
Jan 10 2024
parent reply Timon Gehr <timon.gehr gmx.ch> writes:
On 1/10/24 22:21, Walter Bright wrote:
 On 1/10/2024 12:56 PM, Hipreme wrote:
 - Improving debugging symbols in DMD and for macOS
 - Improving importC until it actually works
 - Listen to rikki's complaint about how slow it is to import UTF Tables
 - Improving support for shared libraries on DMD (like not making it 
 collect an interfaced object)
 - Solve the problem with `init` property of structs containing memory 
 reference which can be easily be corrupted
 - Fix the problem when an abstract class implements an interface
 - Make a D compiler daemon
 - Help in the project of DMD as a library focused on helping WebFreak 
 in code-d and serve-d
 - Implement DMD support for Apple Silicon
 - Revive newCTFE engine
 - Implement ctfe caching
I regularly work on many of those problems. For example, without looking it up, I think I've fixed maybe 20 ImportC issues in the last month. I've also done a number of recent PRs aimed at making D more tractable as a library. So has Razvan.
Thanks a lot for the incredible amount of work you have invested into D over the years!
Jan 11 2024
parent Walter Bright <newshound2 digitalmars.com> writes:
On 1/11/2024 12:20 PM, Timon Gehr wrote:
 Thanks a lot for the incredible amount of work you have invested into D over
the 
 years!
It is indeed my pleasure, especially the privilege of working with you guys!
Jan 11 2024
prev sibling next sibling parent reply Nickolay Bukreyev <buknik95 ya.ru> writes:
On Tuesday, 9 January 2024 at 08:29:08 UTC, Walter Bright wrote:
 It happens at runtime.
No. This line is inside `enum string query = () { ... }();`. So CTFE-performance considerations do apply.
 I'm sorry to say, that looks like tty noise.
That’s sad. In my opinion, it is at least as readable, plus I see a few objective advantages in it. We don’t have to agree on this though.
 It will rarely need to be escaped, but when one does need it, 
 one needs it!
Yes, but I see a benefit in reducing the number of characters that _have_ to be escaped in the first place. While `$` rarely appeared in examples we’ve been thinking of so far, if someone faces a need to create a string full of dollars, escaping them all will uglify the string.
 DIP1027 does not require ( ) if it's just an identifier. That 
 makes for the shortest, simplest
 istring syntax. The ( ) usage will be relatively rare. The idea 
 is the most common cases should
 require the least syntactical noise.
Totally agree. Personally, I prefer omitting parentheses in interpolations when a language supports such syntax, but it’s a matter of taste.
 Also, the reason I picked the SQL example is because that is 
 the one most cited as being needed
 and in showing the power of DIP1036 and because I was told that 
 DIP1027 couldn't do it :-)
DIP1027 is unable to do it _at compile time_. I cannot argue that compile-time string creation doesn’t give us much if we call an SQL engine afterwards. So we need another example where CTFE-ability is desired. Alexandru Ermicioi asked about logging; I agree it is nice to rule out format-string parsing from every `log` call.
Jan 09 2024
parent reply Paolo Invernizzi <paolo.invernizzi gmail.com> writes:
On Tuesday, 9 January 2024 at 09:25:28 UTC, Nickolay Bukreyev 
wrote:
 On Tuesday, 9 January 2024 at 08:29:08 UTC, Walter Bright wrote:
 It happens at runtime.
No. This line is inside `enum string query = () { ... }();`. So CTFE-performance considerations do apply.
 I'm sorry to say, that looks like tty noise.
That’s sad. In my opinion, it is at least as readable, plus I see a few objective advantages in it. We don’t have to agree on this though.
 It will rarely need to be escaped, but when one does need it, 
 one needs it!
Yes, but I see a benefit in reducing the number of characters that _have_ to be escaped in the first place. While `$` rarely appeared in examples we’ve been thinking of so far, if someone faces a need to create a string full of dollars, escaping them all will uglify the string.
 DIP1027 does not require ( ) if it's just an identifier. That 
 makes for the shortest, simplest
 istring syntax. The ( ) usage will be relatively rare. The 
 idea is the most common cases should
 require the least syntactical noise.
Totally agree. Personally, I prefer omitting parentheses in interpolations when a language supports such syntax, but it’s a matter of taste.
 Also, the reason I picked the SQL example is because that is 
 the one most cited as being needed
 and in showing the power of DIP1036 and because I was told 
 that DIP1027 couldn't do it :-)
DIP1027 is unable to do it _at compile time_. I cannot argue that compile-time string creation doesn’t give us much if we call an SQL engine afterwards. So we need another example where CTFE-ability is desired. Alexandru Ermicioi asked about logging; I agree it is nice to rule out format-string parsing from every `log` call.
Compile time string creation when dealing with SQL give you the ability to validate the string for correctness at compile time. Here an example of what we are doing internally: ``` pinver utumno fieldmanager % bin/yab build ldc_lab_mac_i64_dg 2024-01-09T10:48:07.889 [info] melkor.d:235:executeReadyLabel executing ldc_lab_mac_i64_dg: /Users/pinver/dlang/ldc-1.36.0/bin/ldc2 -preview=dip1000 -i -Isrc -mtriple=x86_64-apple-darwin --vcolumns -J/Users/pinver/Lembas --d-version=env_dev_ --d-version=listen_for_nx_ --d-version=disable_ssl --d-version=disable_fixations --d-version=disable_metrics --d-version=disable_aggregator --d-debug -g -of/Users/pinver/Projects/DeepGlance/fieldmanager/bin/lab_mac_i64_dg /Users/pinver/Projects/DeepGlance/fieldmanager/src/application.d src/sbx/raygui/c_raygui.c 2024-01-09T10:48:13.423 [error] melkor.d:247:executeReadyLabel build failed: src/ops/sql/semantics.d(489,31): Error: uncaught CTFE exception `object.Exception("42P01: relation \"snapshotsssss\" does not exist. SQL: select size_mm, size_px from snapshotsssss where snapshot_id = $1")` src/api3.d(41,9): thrown from here src/api3.d(51,43): called from here: `checkSql(Schema("public", ["aggregators":Table("aggregators", ["aggregated_till":Column("aggregated_till", Type.timestamp, true, false), "touchpoint_id":Column("touchpoint_id", Type.smallint, true, false)], [], [], ["pinver", "ipsos_analysis_operator", "i /Users/pinver/Projects/DeepGlance/fieldmanager/src/application.d(644,45): Error: template instance `api3.forgeSqlCheckerForSchema!(Schema("public", ["aggregators":Table("aggregators", ["aggregated_till":Column("aggregated_till", Type.timestamp, true, false), "touchpoint_id":Column("touchpoint_id", T ``` or ``` pinver utumno fieldmanager % bin/yab build ldc_lab_mac_i64_dg 2024-01-09T10:52:36.220 [info] melkor.d:235:executeReadyLabel executing ldc_lab_mac_i64_dg: /Users/pinver/dlang/ldc-1.36.0/bin/ldc2 -preview=dip1000 -i -Isrc -mtriple=x86_64-apple-darwin --vcolumns -J/Users/pinver/Lembas --d-version=env_dev_ --d-version=listen_for_nx_ --d-version=disable_ssl --d-version=disable_fixations --d-version=disable_metrics --d-version=disable_aggregator --d-debug -g -of/Users/pinver/Projects/DeepGlance/fieldmanager/bin/lab_mac_i64_dg /Users/pinver/Projects/DeepGlance/fieldmanager/src/application.d src/sbx/raygui/c_raygui.c 2024-01-09T10:52:37.254 [error] melkor.d:247:executeReadyLabel build failed: src/ops/sql/semantics.d(504,19): Error: uncaught CTFE exception `object.Exception("XXXX! role \"dummyuser\" can't select on table \"snapshots\". SQL: select size_mm, size_px from snapshots where snapshot_id = $1")` src/api3.d(41,9): thrown from here src/api3.d(51,43): called from here: `checkSql(Schema("public", ["aggregators":Table("aggregators", ["aggregated_till":Column("aggregated_till", Type.timestamp, true, false), "touchpoint_id":Column("touchpoint_id", Type.smallint, true, false)], [], [], ["pinver", "ipsos_analysis_operator", "i /Users/pinver/Projects/DeepGlance/fieldmanager/src/application.d(644,45): Error: template instance `api3.forgeSqlCheckerForSchema!(Schema("public", ["aggregators":Table("aggregators", ["aggregated_till":Column("aggregated_till", Type.timestamp, true, false), "touchpoint_id":Column("touchpoint_id", T ``` CTFE support is a must IMHO /P
Jan 09 2024
parent Timon Gehr <timon.gehr gmx.ch> writes:
On 1/9/24 10:59, Paolo Invernizzi wrote:
 
 CTFE support is a must IMHO
Yes. Besides the usability benefits you allude to, it is simply a security feature. We absolutely do not want the constructed string to depend on dynamically entered runtime data. Constructing it at compile time ensures that this is the case.
Jan 09 2024
prev sibling next sibling parent Nickolay Bukreyev <buknik95 ya.ru> writes:
On Tuesday, 9 January 2024 at 08:29:08 UTC, Walter Bright wrote:
 that looks like tty noise.
Oh, I realized you might be reading this without a fancy Markdown renderer. Backticks are part of Markdown syntax, not D. I only suggested using i"a \(x) b" rather than i"a $(x) b"
Jan 09 2024
prev sibling parent reply Timon Gehr <timon.gehr gmx.ch> writes:
On 1/9/24 09:29, Walter Bright wrote:
 
 Also, the reason I picked the SQL example is because that is the one 
 most cited as being needed and in showing the power of DIP1036 and 
 because I was told that DIP1027 couldn't do it :-)
And I stand by that.
Jan 09 2024
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 1/9/2024 4:40 AM, Timon Gehr wrote:
 On 1/9/24 09:29, Walter Bright wrote:
 Also, the reason I picked the SQL example is because that is the one most 
 cited as being needed and in showing the power of DIP1036 and because I was 
 told that DIP1027 couldn't do it :-)
And I stand by that.
But I showed that DIP1027 could do the SQL example.
Jan 09 2024
parent reply Timon Gehr <timon.gehr gmx.ch> writes:
On 1/9/24 20:06, Walter Bright wrote:
 On 1/9/2024 4:40 AM, Timon Gehr wrote:
 On 1/9/24 09:29, Walter Bright wrote:
 Also, the reason I picked the SQL example is because that is the one 
 most cited as being needed and in showing the power of DIP1036 and 
 because I was told that DIP1027 couldn't do it :-)
And I stand by that.
But I showed that DIP1027 could do the SQL example.
You actually did not.
Jan 09 2024
parent Walter Bright <newshound2 digitalmars.com> writes:
On 1/9/2024 1:24 PM, Timon Gehr wrote:
 You actually did not.
See my other reply to you in this thread.
Jan 09 2024
prev sibling parent Nickolay Bukreyev <buknik95 ya.ru> writes:
On Tuesday, 9 January 2024 at 07:30:57 UTC, Nickolay Bukreyev 
wrote:
 I suppose a cleaner way would be to use `\(...)` syntax (like 
 in Swift).
Also, when I said, _like in Swift_, in no event was I meaning, _Swift has it, therefore, D should do the same_. I meant, _there is at least one other language that does it this way_.
Jan 09 2024
prev sibling next sibling parent reply Nickolay Bukreyev <buknik95 ya.ru> writes:
I’ve just realized DIP1036 has an excellent feature that is not 
evident right away. Look at the signature of `execi`:

```d
auto execi(Args...)(Sqlite db, InterpolationHeader header, Args 
args, InterpolationFooter footer) { ... }
```

`InterpolationHeader`/`InterpolationFooter` _require_ you to pass 
an istring. Consider this example:

```d
db.execi(i"INSERT INTO items VALUES ($(x))".text);
```

Here, we accidentally added `.text`. It would be an SQL 
injection… but the compiler rejects it! `typeof(i"...".text)` is 
`string`, and `execi` cannot be called with `(Sqlite, string)`.
Jan 09 2024
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 1/9/2024 12:04 AM, Nickolay Bukreyev wrote:
 I’ve just realized DIP1036 has an excellent feature that is not evident
right 
 away. Look at the signature of `execi`:
 
 ```d
 auto execi(Args...)(Sqlite db, InterpolationHeader header, Args args, 
 InterpolationFooter footer) { ... }
 ```
 
 `InterpolationHeader`/`InterpolationFooter` _require_ you to pass an istring. 
 Consider this example:
 
 ```d
 db.execi(i"INSERT INTO items VALUES ($(x))".text);
 ```
 
 Here, we accidentally added `.text`. It would be an SQL injection… but the 
 compiler rejects it! `typeof(i"...".text)` is `string`, and `execi` cannot be 
 called with `(Sqlite, string)`.
The compiler will indeed reject it (The error message would be a bit baffling to those who don't know what Interpolation types are), along with any attempt to call execi() with a pre-constructed string. The end result is that to do manipulation with istring tuples, the programmer is alternately faced with adding Interpolation elements or filtering them out. Is that really what we want? Will that impede the use of tuples generally, or just impede the use of istrings? --- P.S. most keyboarding bugs result from neglecting to add needed syntax, not typing extra stuff. This is why: int* p; is initialized to zero, while: int* p = void; is left uninitialized. The user is unlikely to accidentally type "= void".
Jan 09 2024
next sibling parent reply Nickolay Bukreyev <buknik95 ya.ru> writes:
On Monday, 8 January 2024 at 03:05:17 UTC, Walter Bright wrote:
 On 1/7/2024 6:30 PM, Walter Bright wrote:
 On 1/7/2024 3:50 PM, Timon Gehr wrote:
 This cannot work:

 ```
 int x=readln.strip.split.to!int;
 db.execi(xxx!i"INSERT INTO sample VALUES ($(id), $(2*x))");
 ```
True, you got me there. It's the 2\*x that is not turnable into an alias. I'm going to think about this a bit.
I wonder if what we're missing are functions that operate on tuples and return tuples. We almost have them in the form of: ``` template tuple(A ...) { alias tuple = A; } ``` but the compiler wants A to only consist of symbols, types and expressions that can be computed at compile time. This is so the name mangling will work. But what if we don't bother doing name mangling for this kind of template?
Yes! It would be brilliant if `alias` could refer to any Expression, not just symbols. If that was the case, we could just pass InterpolationHeader/Footer/etc. to template parameters (as opposed to runtime parameters, where they go now). ```d // Desired syntax: db.execi!i"INSERT INTO sample VALUES ($(id), $(2*x))"; // Desugars to: db.execi!( InterpolationHeader(), InterpolatedLiteral!"INSERT INTO sample VALUES ("(), InterpolatedExpression!"id"(), id, InterpolatedLiteral!", "(), InterpolatedExpression!"2*x"(), 2*x, // Currently illegal (`2*x` is not aliasable). InterpolatedLiteral!")"(), InterpolationFooter(), ); // `execi!(...)` would expand to: db.execImpl("INSERT INTO sample VALUES (?1, ?2)", id, 2*x); ``` With this approach, they are processed entirely via compile-time sequence manipulations. Zero-sized structs are never passed as arguments. Inlining is not necessary to get rid of them. An example with `writeln` (or just about any function alike): ```d writeln(interpolate!i"prefix $(baz + 4) suffix"); // Desugars to: writeln(interpolate!( InterpolationHeader(), InterpolatedLiteral!"prefix "(), InterpolatedExpression!"baz + 4"(), baz + 4, InterpolatedLiteral!" suffix"(), InterpolationFooter(), )); // `interpolate!(...)` would expand to: writeln("prefix ", baz + 4, " suffix"); ```
Jan 10 2024
next sibling parent Nickolay Bukreyev <buknik95 ya.ru> writes:
On Wednesday, 10 January 2024 at 15:07:42 UTC, Nickolay Bukreyev 
wrote:
 ```d
 writeln(interpolate!i"prefix $(baz + 4) suffix");
 // Desugars to:
 writeln(interpolate!(
     InterpolationHeader(),
     InterpolatedLiteral!"prefix "(),
     InterpolatedExpression!"baz + 4"(),
     baz + 4,
     InterpolatedLiteral!" suffix"(),
     InterpolationFooter(),
 ));
 // `interpolate!(...)` would expand to:
 writeln("prefix ", baz + 4, " suffix");
 ```
Well, `InterpolatedLiteral` and `InterpolatedExpression` don’t have to be templates anymore: ```d writeln(interpolate!i"prefix $(baz + 4) suffix"); // Desugars to: writeln(interpolate!( InterpolationHeader(), InterpolatedLiteral("prefix "), InterpolatedExpression("baz + 4"), baz + 4, InterpolatedLiteral(" suffix"), InterpolationFooter(), )); // `interpolate!(...)` would expand to: writeln("prefix ", baz + 4, " suffix"); ```
Jan 10 2024
prev sibling next sibling parent Walter Bright <newshound2 digitalmars.com> writes:
On 1/10/2024 7:07 AM, Nickolay Bukreyev wrote:
 Zero-sized structs are never passed as arguments. Inlining is not 
 necessary to get rid of them.
Structs with no fields have a size of 1 byte for D and C++ structs, and 0 or 4 for C structs (depending on the target). The rationale for a non-zero size is so that different structs instances will be at different addresses. ```d struct S { } void foo(S s); void test(S s) { foo(s); } ``` ``` push RBP mov RBP,RSP sub RSP,8 push dword ptr 010h[RBP] call _D5test43fooFSQm1SZv PC32 add RSP,010h pop RBP ret ```
Jan 10 2024
prev sibling parent Timon Gehr <timon.gehr gmx.ch> writes:
On 1/10/24 16:07, Nickolay Bukreyev wrote:

 
 Yes! It would be brilliant if `alias` could refer to any Expression, not 
 just symbols. If that was the case, we could just pass 
 InterpolationHeader/Footer/etc. to template parameters (as opposed to 
 runtime parameters, where they go now).
I am not a big fan of this option. If we are going to allow passing runtime arguments as template parameters, we might as well just allow passing template parameters as runtime arguments instead. It's much more clear how to make that work.
Jan 11 2024
prev sibling parent reply Timon Gehr <timon.gehr gmx.ch> writes:
On 1/10/24 01:03, Walter Bright wrote:
 On 1/9/2024 12:04 AM, Nickolay Bukreyev wrote:
 I’ve just realized DIP1036 has an excellent feature that is not 
 evident right away. Look at the signature of `execi`:

 ```d
 auto execi(Args...)(Sqlite db, InterpolationHeader header, Args args, 
 InterpolationFooter footer) { ... }
 ```

 `InterpolationHeader`/`InterpolationFooter` _require_ you to pass an 
 istring. Consider this example:

 ```d
 db.execi(i"INSERT INTO items VALUES ($(x))".text);
 ```

 Here, we accidentally added `.text`. It would be an SQL injection… but 
 the compiler rejects it! `typeof(i"...".text)` is `string`, and 
 `execi` cannot be called with `(Sqlite, string)`.
The compiler will indeed reject it (The error message would be a bit baffling to those who don't know what Interpolation types are), along with any attempt to call execi() with a pre-constructed string. The end result is that to do manipulation with istring tuples, the programmer is alternately faced with adding Interpolation elements or filtering them out. Is that really what we want?
What we want that DIP1036e mostly provides is: 0. The library can detect whether it is being passed an istring. 1. The library that accepts the istring decides how to process it. 2. The string parts of the istring are known to the library at compile time. 3. The expression parts of the istring can be evaluated only at runtime. 4. The expression parts of the istring can be passed arbitrarily, by ref, lazy, alias, ... (this part in fact works better with DIP1027). 5. The library can access the original expression, e.g. in string form. 6. A templated function that is called with an istring can do all of the above.
 Will that impede the use of tuples generally, or just impede the use of
istrings?
 ...
It's just a way to achieve 0.-6. above relatively well with a simple patch to the lexer. I am not sure why it would impede anything except compile time and binary size.
 ---
 
 P.S. most keyboarding bugs result from neglecting to add needed syntax, 
 not typing extra stuff. This is why:
 
      int* p;
 
 is initialized to zero, while:
 
      int* p = void;
 
 is left uninitialized. The user is unlikely to accidentally type "= void".
The user (especially the kind of user that may be prone to accidentally introduce an SQL injection attack) is more likely to accidentally type `.format` or `.text` because that may be a relatively common way to use an istring in their code base.
Jan 11 2024
parent Timon Gehr <timon.gehr gmx.ch> writes:
On 1/11/24 21:13, Timon Gehr wrote:
 
 
 P.S. most keyboarding bugs result from neglecting to add needed syntax, 
 not typing extra stuff.
if (condition); { ... } I think it's due to muscle memory and it does happen quite a bit.
Jan 11 2024
prev sibling next sibling parent reply Timon Gehr <timon.gehr gmx.ch> writes:
On 1/9/24 00:06, Walter Bright wrote:
 Here's how SQL support is done for DIP1036:
 
 https://github.com/adamdruppe/interpolation-examples/blob/master/lib/sql.d
 
 ```
 auto execi(Args...)(Sqlite db, InterpolationHeader header, Args args, 
 InterpolationFooter footer) {
      import arsd.sqlite;
 
      // sqlite lets you do ?1, ?2, etc
 
      enum string query = () {
          string sql;
          int number;
          import std.conv;
          foreach(idx, arg; Args)
              static if(is(arg == InterpolatedLiteral!str, string
str))
                  sql ~= str;
              else static if(is(arg == InterpolationHeader) ||
is(arg == 
 InterpolationFooter))
                  throw new Exception("Nested interpolation not
supported");
              else static if(is(arg == InterpolatedExpression!code, 
 string code))
                  {   } // just skip it
              else
                  sql ~= "?" ~ to!string(++number);
          return sql;
      }();
 
      auto statement = Statement(db, query);
      int number;
      foreach(arg; args) {
          static if(!isInterpolatedMetadata!(typeof(arg)))
              statement.bind(++number, arg);
      }
 
      return statement.execute();
 }
 ```
 This:
 
 1. The istring, after converted to a tuple of arguments, is passed to 
 the `execi` template.
 2. It loops over the arguments, essentially turing it (ironically!) back 
 into a format
This is not ironic at all. The point is it _can_ do that, while DIP1027 _cannot_ do _either this or the opposite direction_. It is yourself who called the istring the building block instead of the end product, but now you are indeed failing to turn the sausage back into the cow.
 string. The formats, instead of %s, are ?1, ?2, ?3, etc.
 3. It skips all the Interpolation arguments inserted by DIP1036.
 4. The remaining argument are each bound to the indices 1, 2, 3, ...
 5. Then it executes the sql statement.
 
 Note that nested istrings are not supported.
 ...
But you get a useful error message that exactly pinpoints what the problem is. Also, they could be supported, which is the point.
 Let's see how this can work with DIP1027:
 
 ```
 auto execi(Args...)(Sqlite db, Args args) {
      import arsd.sqlite;
 
      // sqlite lets you do ?1, ?2, etc
 
      enum string query = () {
          string sql;
          int number;
          import std.conv;
          auto fmt = arg[0];
          for (size_t i = 0; i < fmt.length, ++i)
          {
              char c = fmt[i];
              if (c == '%' && i + 1 < fmt.length && fmt[i + 1] ==
's')
              {
                  sql ~= "?" ~ to!string(++number);
                  ++i;
              }
              else if (c == '%' && i + 1 < fmt.length && fmt[i + 1]
== '%')
                  ++i;  // skip escaped %
              else
                  sql ~= c;
          }
          return sql;
      }();
 
      auto statement = Statement(db, query);
      int number;
      foreach(arg; args[1 .. args.length]) {
          statement.bind(++number, arg);
      }
 
      return statement.execute();
 }
 ```
 This:
 ...
This does not work.
 1. The istring, after converted to a tuple of arguments, is passed to 
 the `execi` template.
 2. The first tuple element is the format string.
 3. A replacement format string is created by replacing all instances of 
 "%s" with
 "?n", where `n` is the index of the corresponding arg.
 4. The replacement format string is bound to `statement`, and the 
 arguments are bound
 to their indices.
 5. Then it executes the sql statement.
 
 It is equivalent.
No. As Nickolay already explained, it is not equivalent. - It does not even compile, even if we fix the typo arg -> args. That is enough to dismiss DIP1027 for this example. However, let's for the sake of argument assume that, miraculously, `execi` can read the format string at compile time, then: - With this signature, if you pass a manually-constructed string to it, it would just accept the SQL injection. - It does not give a proper error message for nested istrings. - It has to manually parse the format string. It iterates over each character of the original format string. - It (ironically!) constructs a new format string, the original one was useless. - If you pass a bad format string to it (for example, by specifying a manual format), it will just do nonsense, while DIP1036e avoids bad format strings by construction.
Jan 09 2024
next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 1/9/2024 4:35 AM, Timon Gehr wrote:
 This does not work.
How so? Consider this: ``` import std.stdio; auto execi(Args...)(Args args) { auto fmt = args[0].dup; fmt[0] = 'k'; writefln(fmt, args[1 .. args.length]); } void main() { string b = "betty"; execi(i"hello $b"); } ``` which compiles and runs, printing: kello betty
Jan 09 2024
parent Timon Gehr <timon.gehr gmx.ch> writes:
On 1/9/24 20:16, Walter Bright wrote:
 On 1/9/2024 4:35 AM, Timon Gehr wrote:
 This does not work.
How so?
It does not compile. The arg->args fix I'll grant you as it is a typo whose only significance is to make it even more clear that you never tried to run any version of the code, but then you still get another compile error. I suggest you mock out the SQL library, you don't actually need to install it to try your code. If we remove the `enum` then your code still does not work correctly, for example because it does not prevent an SQL injection attack if the user constructs the SQL string manually by accidentally using `format`. I and other people already pointed out this flaw and other flaws in other posts.
 Consider this:
 
 ```
 import std.stdio;
 
 auto execi(Args...)(Args args)
 {
      auto fmt = args[0].dup;
      fmt[0] = 'k';
      writefln(fmt, args[1 .. args.length]);
 }
 
 void main()
 {
      string b = "betty";
      execi(i"hello $b");
 }
 ```
 
 which compiles and runs, printing:
 
 kello betty
I considered it and it did not have an impact on the way I view the DIP1027 `execi` implementation you have given.
Jan 09 2024
prev sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 1/9/2024 4:35 AM, Timon Gehr wrote:
 However, let's for the sake 
 of argument assume that, miraculously, `execi` can read the format string at 
 compile time, then:
Adam's implementation of execi() also runs at run time, not compile time.
 - With this signature, if you pass a manually-constructed string to it, it
would 
 just accept the SQL injection.
It was just a proof of concept piece of code. execi could check for format strings that contain ?n sequences. It could also check the number of %s formats against the number of arguments.
 But you get a useful error message that exactly pinpoints what the problem is.
 Also, they could be supported, which is the point.
 - It does not give a proper error message for nested istrings.
execi could be extended to reject arguments that contain %s sequences. Or, if there was an embedded istring, the number of %s formats can be checked against the number of arguments. An embedded istring would show a mismatch. I expect that use of nested istrings would be exceedingly rare. If they are used, wrapping them in text() will make work. Besides, would a nested istring in an sql call be intended as part of the sql format, or would a text string be the intended result?
 - It has to manually parse the format string. It iterates over each character
of 
 the original format string.
Correct. And it does not need to iterate over and remove all the Interpolation arguments. Nor does it need the extra two arguments, which aren't free of cost.
 - It (ironically!) constructs a new format string, the original one was
useless.
Yes, it converts the format specifiers to the sql ones. Why is this a problem?
 - If you pass a bad format string to it (for example, by specifying a manual 
 format), it will just do nonsense, while DIP1036e avoids bad format strings by 
 construction.
What happens when ?3 is included in a DIP1036 istring? `i"string ?3 ($betty)" ? I didn't see any check for that. Of course, one could add such a check to the 1036 execi. printf format strings are checked by the compiler, and writef format strings are checked by writef. execi is also capable of being extended to check the format string to ensure the format matches the args.
Jan 09 2024
next sibling parent reply Nickolay Bukreyev <buknik95 ya.ru> writes:
On Tuesday, 9 January 2024 at 20:01:34 UTC, Walter Bright wrote:
 With the istring, there are 4 calls to struct member functions 
 that just return null.
 This can't be good for performance or program size.
A valid point, thanks. Could you test if that fixes the issue? ```d import core.interpolation; import std.meta: AliasSeq, staticMap; import std.stdio; template filterOutEmpty(alias arg) { alias T = typeof(arg); static if (is(T == InterpolatedLiteral!s, string s)) static if (s.length) alias filterOutEmpty = s; else alias filterOutEmpty = AliasSeq!(); else static if ( is(T == InterpolationHeader) || is(T == InterpolatedExpression!code, string code) || is(T == InterpolationFooter) ) alias filterOutEmpty = AliasSeq!(); else alias filterOutEmpty = arg; } pragma(inline, true) // This pragma is necessary unless you compile with `-inline`. void log(Args...)(InterpolationHeader, Args args, InterpolationFooter) { writeln(staticMap!(filterOutEmpty, args)); } void main() { int baz = 3; log(i"$(baz + 4)"); writeln(baz + 5); } ```
 Adam's implementation of execi() also runs at run time, not 
 compile time.
We are probably talking about different things. Adam’s implementation constructs a format string at compile time thanks to `enum` storage class [in line 36](https://github.com/adamdruppe/interpolation-examples/blob/a8a5d4d4ee37ee9ae3942c4f4e8489011c3c 673/lib/sql.d#L36). Constructing it at compile time is essential so that we can validate the generated SQL and abort compilation, as Paolo [demonstrated](https://forum.dlang.org/post/qbtbyxcglwijjbeygtvi forum.dlang.org).
 execi could be extended to reject arguments that contain %s 
sequences.
I disagree. Storing a string that contains `%s` in a database should be allowed (storing any string should obviously be allowed, regardless of its contents). But `execi` is unable to differentiate between a string that happens to contain `%s` and a nested format string: ``` // DIP1027 example(i"prefix1 $(i"prefix2 $(x) suffix2") suffix1"); // Gets rewritten as: example("prefix1 %s suffix1", "prefix2 %s suffix2", x); ``` I might be wrong, but it appears to me that DIP1027 is not able to deal with nested format strings, in a general case. DIP1036 has no such limitation (demonstrated in point 2 [here](https://forum.dlang.org/post/lizjwxdgsnmgykaoczyf forum.dlang.org)).
 Nor does it need the extra two arguments, which aren't free of 
 cost.
I explained [here](https://forum.dlang.org/post/qkvxnbqjefnvjyytfana forum.dlang.org) why these two arguments are valuable. Aren’t free of cost—correct unless you enable inlining. `execi` may require some changes (like `filterOutEmpty` I showed above) to make them free of cost, but it is doable.
 What happens when ?3 is included in a DIP1036 istring? 
 `i"string ?3 ($betty)"` ? I didn't see any check for that. Of 
 course, one could add such a check to the 1036 execi.
You are right, it doesn’t. Timon’s point (expressed as “This does not work”) is that DIP1036 is able to do validation at compile time while DIP1027 is only able to do it at runtime, when this function actually gets invoked.
Jan 09 2024
parent reply Walter Bright <newshound2 digitalmars.com> writes:
P.S. Thank you for your well constructed arguments.

On 1/9/2024 1:35 PM, Nickolay Bukreyev wrote:
 A valid point, thanks. Could you test if that fixes the issue?
Yes, that works.
 We are probably talking about different things. Adam’s implementation
constructs 
 a format string at compile time thanks to `enum` storage class [in line 
 36](https://github.com/adamdruppe/interpolation-examples/blob/a8a5d4d4ee37ee9ae3942c4f4e8489011c3c4673/lib/sql.d#L36).
Yes, you're right.
 Constructing it at compile time is essential so that we can validate the
generated SQL and abort compilation, as Paolo
[demonstrated](https://forum.dlang.org/post/qbtbyxcglwijjbeygtvi forum.dlang.org).
That only checks one aspect of correctness - nested string interpolations.
  execi could be extended to reject arguments that contain %s sequences.
I disagree. Storing a string that contains `%s` in a database should be allowed (storing any string should obviously be allowed, regardless of its contents).
True, which is why a % that is not intended as a format specifier is entered as %%.
 But `execi` is unable to differentiate between a string that happens to
contain 
 `%s` and a nested format string:
 
 ```
 // DIP1027
 example(i"prefix1 $(i"prefix2 $(x) suffix2") suffix1");
 // Gets rewritten as:
 example("prefix1 %s suffix1", "prefix2 %s suffix2", x);
 ```
 
 I might be wrong, but it appears to me that DIP1027 is not able to deal with 
 nested format strings, in a general case.
The expansion for `example` has a mismatch in the number of formats (1) and number of arguments (2). This can be detected at runtime by `example`, as I've explained. A compile time way is DIP1027 can be modified to reject any arguments that consist of tuples with other than one element. This would eliminate nested istring tuples at compile time.
 DIP1036 has no such limitation 
 (demonstrated in point 2 
 [here](https://forum.dlang.org/post/lizjwxdgsnmgykaoczyf forum.dlang.org)).
DIP1036 cannot detect other problems with the string literals. It seems like a lot of complexity to deal with only one issue with malformed strings at compile time rather than runtime.
 I explained 
 [here](https://forum.dlang.org/post/qkvxnbqjefnvjyytfana forum.dlang.org) why 
 these two arguments are valuable. Aren’t free of cost—correct unless you
enable 
 inlining. `execi` may require some changes (like `filterOutEmpty` I showed 
 above) to make them free of cost, but it is doable.
You'd have to also make every formatted writer a template, and add the filter to them.
 You are right, it doesn’t. Timon’s point (expressed as “This does not
work”) is 
 that DIP1036 is able to do validation at compile time while DIP1027 is only
able 
 to do it at runtime, when this function actually gets invoked.
The only validation it does is check for nested string interpolations.
Jan 09 2024
next sibling parent Paolo Invernizzi <paolo.invernizzi gmail.com> writes:
On Tuesday, 9 January 2024 at 23:21:34 UTC, Walter Bright wrote:
 P.S. Thank you for your well constructed arguments.

 On 1/9/2024 1:35 PM, Nickolay Bukreyev wrote:
 A valid point, thanks. Could you test if that fixes the issue?
Yes, that works.
 We are probably talking about different things. Adam’s 
 implementation constructs a format string at compile time 
 thanks to `enum` storage class [in line 
 36](https://github.com/adamdruppe/interpolation-examples/blob/a8a5d4d4ee37ee9ae3942c4f4e8489011c3c4673/lib/sql.d#L36).
Yes, you're right.
 Constructing it at compile time is essential so that we can 
 validate the generated SQL and abort compilation, as Paolo 
 [demonstrated](https://forum.dlang.org/post/qbtbyxcglwijjbeygtvi forum.dlang.org).
That only checks one aspect of correctness - nested string interpolations.
No. If you look at the errors raised during the compilation of our codebase, we are checking FAR MORE, for example the second error is related to wrong missing grant condition on a select. I've included it as an example, just not syntax, table names, semantic or so, but also permissions, at compile time. And that's a concrete codebase, used in production, not speculations. /P
Jan 09 2024
prev sibling next sibling parent reply Paolo Invernizzi <paolo.invernizzi gmail.com> writes:
On Tuesday, 9 January 2024 at 23:21:34 UTC, Walter Bright wrote:

 Constructing it at compile time is essential so that we can 
 validate the generated SQL and abort compilation, as Paolo 
 [demonstrated](https://forum.dlang.org/post/qbtbyxcglwijjbeygtvi forum.dlang.org).
That only checks one aspect of correctness - nested string interpolations.
<snip>
 DIP1036 has no such limitation (demonstrated in point 2 
 [here](https://forum.dlang.org/post/lizjwxdgsnmgykaoczyf forum.dlang.org)).
DIP1036 cannot detect other problems with the string literals. It seems like a lot of complexity to deal with only one issue with malformed strings at compile time rather than runtime.
You are underestimating what can be gained as value in catching SQL problems at compile time instead of runtime. And, believe me, it's not a matter of mocking the DB and relying on unittest and coverage. CTFE capability is needed. /P
Jan 09 2024
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 1/9/2024 3:49 PM, Paolo Invernizzi wrote:
 You are underestimating what can be gained as value in catching SQL problems
at 
 compile time instead of runtime. And, believe me, it's not a matter of mocking 
 the DB and relying on unittest and coverage.
Please expand on that. This is a very important topic. I want to know all the relevant facts.
 CTFE capability is needed.
I concur that compile time errors are better than runtime errors. But in this case, there's a continuing cost to have them, cost to other far more common use cases for istrings. The cost is in terms of complexity, about needing to filter out all the extra marker templates, about reducing its utility as a tuple generator with the unexpected extra elements, larger object files, much longer mangled names, and so on. Want to know the source of my unease about it? Simple things should be simple. This isn't. The extra complexity is always there, even for the simple cases, and the simple cases are far and away the most common use cases. Frankly, it reminds me of C++ template expressions, which caught the C++ world by storm for about 2 years, before it faded away into oblivion and nobody talks about them anymore. Fortunately for C++, template expressions could be ignored, as they were not a core language feature. But DIP1036 is a core language feature, a feature we would be stuck with forever. And I'll be the one who gets the heat for it. The compile-time vs runtime issue is the only thing left standing where the advantage goes to DIP1036. So it needs a very compelling case. P.S. You can do template expressions in D, too!
Jan 11 2024
next sibling parent Paolo Invernizzi <paolo.invernizzi gmail.com> writes:
On Friday, 12 January 2024 at 06:06:52 UTC, Walter Bright wrote:
 On 1/9/2024 3:49 PM, Paolo Invernizzi wrote:
 You are underestimating what can be gained as value in 
 catching SQL problems at compile time instead of runtime. And, 
 believe me, it's not a matter of mocking the DB and relying on 
 unittest and coverage.
Please expand on that. This is a very important topic. I want to know all the relevant facts.
As a preamble, we are _currently_ doing all the SQL validations against schemas at compile time: semantic of the query, correctness of the relations involved, types matching with D (and Elm types), permission granted to roles that are performing the query. That's not a problem at all, it's just something like: sql!`select foo from bar where baz > 1` [1] In the same way we check also this: sql!`update foo set bag = ${d_variable_bag}` But to attach sanitise functionalities in what is inside `d_variable_bag`, checking its type, and actually bind the content for the sql protocol is done by mixins, after the sql!string instantiation. As you can guess, that is the most common usage, by far, the business logic is FULL of stuff like that. The security aspect is related to the fact that you _always_ need to sanitise the data content of the d variable, the mixin takes care of that part, and you can't skip it. Said that, unittesting at runtime can be done against a real db, or mocking it. A real db is onerous, sometime you need additional licenses, resource management, and it's time consuming. Just imagine writing D code, but having back errors not during compilations but only when the "autotester" CI task completed! Keep in mind that using a real db is a very common, for one simple reason: mocking a db to be point of being useful for unit testing is a PITA. The common approach is simply skipping that, and mock the _results_ of the data retrieved by the query, to unittest the business logic. The queries are not checked until they run agains the dev db. The compile time solutions instead, give you immediately feedback on wrong query, wrong type bindings, and that's invaluable especially regarding a fundamental things: refactory of code, or schema changes. If the DB schema is changed, the application simply does not compile anymore, until you align it again against the changed schema. And the compiler gently points you to the pieces of code you need to adjust, and the same if you change a D type that somewhere will be bond to a sql parameters. So you can refactor without fears, and if the application compiles, you are assured to have everything aligned. It's like extending the correctness of type system down to the db type system, and it's priceless. So, long story short: we will be forced to use mixin if we can't rely on CT interpolation, but having it will simplify the codebase. [1] well, query sometimes can be things like that: with dsx as (select face_id, bounding_box_px, gaze_yaw_deg, gaze_pitch_deg from dev_eyes where eye = ${sx}), ddx as (select face_id, bounding_box_px, gaze_yaw_deg, gaze_pitch_deg from dev_eyes where eye = ${dx}) select dfc.bounding_box_px as face, dfc.expression, dby.center_z_mm, dsx.bounding_box_px as eye_sx, dsx.gaze_pitch_deg, dsx.gaze_yaw_deg, ddx.bounding_box_px as eye_dx, ddx.gaze_pitch_deg, ddx.gaze_yaw_deg from dev_samples left join dev_bodies as dby using(sample_id) left join dev_faces as dfc using(body_id) left join dsx using(face_id) left join ddx using(face_id) where dev_samples.device_id = ${deviceId} and system_timestamp_ms = (select max(system_timestamp_ms) from dev_samples where dev_samples.device_id=${deviceId}) and dfc.bounding_box_px is not null` order by dby.center_z_mm
Jan 12 2024
prev sibling next sibling parent reply Timon Gehr <timon.gehr gmx.ch> writes:
On 1/12/24 07:06, Walter Bright wrote:
 
 The compile-time vs runtime issue is the only thing left standing where 
 the advantage goes to DIP1036.
This is not true, DIP1027 also suffers from other drawbacks. For example: - DIP1027 has already been rejected. - Format string has to be passed as a runtime argument. - Format string has to be parsed. (Whether at runtime or compile time.) - Format string is not transparent to the library user, they have to manually escape '%'. - No simple way to detect the end of the part of the argument list that is part of the istring. - Cannot support nested istrings. (I guess the `enum Format: string;` would mitigate this to some extent.) DIP1027 has the following advantages: - No interspersed runtime arguments not carrying any runtime data, this is a bit easier to consume. - Fewer template instantiations. In any case, I think the compile-time vs runtime issue is the most significant. I do not want a solution that does not integrate well with metaprogramming, it's just not worth it.
Jan 12 2024
parent Timon Gehr <timon.gehr gmx.ch> writes:
On 1/12/24 16:27, Timon Gehr wrote:
 - Cannot support nested istrings. (I guess the `enum Format: string;` 
 would mitigate this to some extent.)
- In any case, DIP1027 cannot support nested expression sequences without the user passing a manual marker. DIP1036e can support them quite naturally.
Jan 12 2024
prev sibling parent Steven Schveighoffer <schveiguy gmail.com> writes:
On Friday, 12 January 2024 at 06:06:52 UTC, Walter Bright wrote:
 On 1/9/2024 3:49 PM, Paolo Invernizzi wrote:
 CTFE capability is needed.
I concur that compile time errors are better than runtime errors. But in this case, there's a continuing cost to have them, cost to other far more common use cases for istrings. The cost is in terms of complexity, about needing to filter out all the extra marker templates, about reducing its utility as a tuple generator with the unexpected extra elements, larger object files, much longer mangled names, and so on.
The point is to pass the things that the compiler knows to the library, namely the string literal parts. Within the current domain of the D language, the best way to do this is to use string template parameters. Necessarily, this is going to incur template symbol name explosion. I would love to solve this problem, especially in the cases where compile-time usage isn't needed. Having the compile-time expressions is essential when you need it, but is pretty ugly when you don't. Again, we can have wrapper templates that do this for you. The problem (as always) is that these wrapper templates are still in there, still taking up space. Is there any room for a solution here? I'm talking about the compiler being clued in that these functions shouldn't exist in the binary. Then the compiler can take a lot of shortcuts (like hashing the type data instead of making a demangleable symbol). But Timon is also right that the "format string" version is actually adding to the grief for library writers and users. There's no reason I can think of to add additional parsing requirements for the library. I'd prefer Jonathan Marler's solution of just interspersing strings and values if I had to pick between that and DIP1027. But that still leaves so much on the table of what *could be great*. I also think it's fine to tell users 'Hey, you want formatted output? it's writef("format", args)'. My target was not and never will be, `writef`.
 Want to know the source of my unease about it? Simple things 
 should be simple. This isn't. The extra complexity is always 
 there, even for the simple cases, and the simple cases are far 
 and away the most common use cases.
It actually is simple. It's a simple transformation from a parsed expression to the subexpressions contained within (sprinkling in types to make it easy to know what is what). What you *do* with the transformation might not be simple, but that's not necessary to use the feature.
 Frankly, it reminds me of C++ template expressions, which 
 caught the C++ world by storm for about 2 years, before it 
 faded away into oblivion and nobody talks about them anymore. 
 Fortunately for C++, template expressions could be ignored, as 
 they were not a core language feature. But DIP1036 is a core 
 language feature, a feature we would be stuck with forever. And 
 I'll be the one who gets the heat for it.
I just looked it up and... no. It's not even close. There is no *requirement* to make this complicated. The transformation is simple and straightforward. It's easy to understand if you take 5 minutes to read the docs. If you want to build some insanely complex thing out of this, it's possible. But there is no requirement to use it that way. To reiterate, the *feature* is simple, what you can do with the feature is unbounded. This is like saying templates are too complicated because of what you *can do* with templates.
 P.S. You can do template expressions in D, too!
I rest my case ;) -Steve
Jan 12 2024
prev sibling next sibling parent Nickolay Bukreyev <buknik95 ya.ru> writes:
On Tuesday, 9 January 2024 at 23:21:34 UTC, Walter Bright wrote:
 A compile time way is DIP1027 can be modified to reject any 
 arguments that consist of tuples with other than one element. 
 This would eliminate nested istring tuples at compile time.
To sum up, it works with nested istrings poorly; it may even be sensible to forbid them entirely for DIP1027. Glad we’ve reached a consensus on this point. This case doesn’t seem crucial at the moment though; now we can focus on more relevant questions.
 DIP1036 cannot detect other problems with the string literals. 
 It seems like a lot of complexity to deal with only one issue 
 with malformed strings at compile time rather than runtime.
DIP1036 provides full CTFE capabilities at your disposal. You can validate _anything_ about a format string; any compile-time-executable hypothetical `validateSql(query)` will fit. I guess none of the examples presented so far featured such validation because it usually tends to be long and not illustrative. However, another Adam’s example [does perform](https://github.com/adamdruppe/interpolation-examples/blob/a8a5d4d4ee37ee9ae3942c4f4e8489011c3 4673/07-html.d#L13) non-trivial compile-time validation. Here is how it is [implemented](https://github.com/adamdruppe/interpolation-examples/blob/a8a5d4d4ee37ee9ae3942c4f4e8489011c3c4673/lib/html.d#L97).
 Constructing it at compile time is essential so that we can 
 validate the generated SQL and abort compilation, as Paolo 
 [demonstrated](https://forum.dlang.org/post/qbtbyxcglwijjbeygtvi forum.dlang.org).
That only checks one aspect of correctness - nested string interpolations.
They check a lot more. I agree it is hard to spot the error messages in the linked post so I’ll copy them here: relation "snapshotsssss" does not exist. SQL: select size_mm, size_px from snapshotsssss where snapshot_id = $1 role "dummyuser" can't select on table "snapshots". SQL: select size_mm, size_px from snapshots where snapshot_id = $1 As you can see, they check sophisticated business logic expressed in terms of relational databases. And all of that happens at compile time. Isn’t that a miracle?
 I explained 
 [here](https://forum.dlang.org/post/qkvxnbqjefnvjyytfana forum.dlang.org) why
these two arguments are valuable. Aren’t free of cost—correct unless you
enable inlining. `execi` may require some changes (like `filterOutEmpty` I
showed above) to make them free of cost, but it is doable.
You'd have to also make every formatted writer a template,
Err… every formatted writer has to be a template anyway, doesn’t it? It needs to accept argument lists that may contain values of arbitrary types.
 …and add the filter to them.
Yeah. I admit this is a problem. As a rule of thumb, the most obvious code should yield the best results. With DIP1036, this is not the case at the moment: when you pass an interpolation sequence to a function not specifically designed for it, it wastes more stack space than necessary and passes useless junk in registers. Others have mentioned that DIP1027 performs much worse in terms of speed (due to runtime parsing). While that is undoubtable, I think DIP1036 should be tweaked to behave as good as possible. There was an idea in this thread to improve the ABI so that it ignores empty structs, but I’m rather sceptical about it. Instead, let us note there are basically two patterns of usage for istrings: 1. Passing to a function that processes an istring and does something non-trivial. `execi` is a good example. 2. Passing to a function that simply stringifies every fragment, one after another. `writeln` is a good example. Something counterintuitive, case 1 is easier to address: the function already traverses the received sequence and transforms it. So it is only necessary to write it in such way that it is inline-friendly. By the way, what functions do we have in Phobos that fall into the case-2 category? `write`/`writeln`, `std.conv.text`, `std.logger.core.log`, and… is that all? Must be something else!.. Turns out there are only a handful of relevant functions in the entire stdlib. It shouldn’t be hard to put a filter in each of them. It also hints they are probably not that common in the wild. However, when one encounters a third-party `write`-like function that is unaware of `InterpolationHeader`/etc., they should have a means to fix it from outside, i.e., without touching its source and ideally without writing a wrapper by hand. Unfortunately, I could not come up with a satisfactory solution for this. Will keep thinking. Perhaps someone else manages to find it faster. --- An idea in a different direction. Currently, `InterpolationHeader`/etc. structs interoperate with `write`-like functions seamlessly (at the expense of passing zero-sized arguments) due to the fact they all have an appropriate `toString` method. If we remove those methods (and do nothing else), then `write(i"a$(x)b")` would produce something like: InterpolationHeader()InterpolatedLiteral!"a"()InterpolatedExpression!"x"()42InterpolatedLiteral!"b"()InterpolationFooter() The program, rather than introducing a silent inefficiency, immediately tells the user they need to account for these types. --- And one more idea. Current implementation of DIP1036 can emit empty chunks—i.e., `InterpolatedLiteral!""`—see for example `i"$(x)"`. If I was making a guess why it does so, I would say it strives to produce consistent, regular sequences. On the one hand, it might ease the job of interpolation-sequence handlers: they can count on the fact that expressions and literals always alternate inside a sequence. On the other, they have to check if a literal is empty and drop it if it is so it actually makes their job harder. I do not know whether not producing empty literals in the first place would be a positive or negative change. But it is something worth to consider. --- Slightly off-topic: when I was thinking about this, I was astonished by the fact istrings can work with `readf`/`formattedRead`/`scanf`. Just wanted to share this observation. ```d readf(i" $(&x) $(&y)"); ```
 The compiler will indeed reject it (The error message would be 
 a bit baffling to those who don't know what Interpolation types 
 are)
This is true. I suppose the docs should mention `InterpolationHeader` and friends when talking about istrings, explain what an istring is lowered to, and show examples. Then a programmer who’ve read the docs will have a mental association between “istring” and “InterpolationHeader/Footer/etc.” Those who don’t read the docs—well, they won’t have. Only googling will save them. To be honest, I’m not concerned about this point too much.
 along with any attempt to call execi() with a pre-constructed 
 string.

 The end result is that to do manipulation with istring tuples, 
 the programmer is alternately faced with adding Interpolation 
 elements or filtering them out. Is that really what we want?
I’d argue it is wonderful that `execi` cannot be called with a pre-constructed string. The API should provide another function instead—say, `execDynamicStatement(Sqlite, string, Args...)`. `execi` should be used for statically known SQL with interpolated arguments, and `execDynamicStatement`—for arbitrary SQL constructed at runtime. A verbose name is intentional to discourage its usage in favour of `execi`.
 P.S. most keyboarding bugs result from neglecting to add needed 
 syntax, not typing extra stuff.
That makes sense. Though you’ll never guess what beast can be spawned by uncareful refactoring. Extra protection won’t harm, especially if it’s zero-cost. P.S. Zero-initialization of variables is one of D’s cool features, indeed.
Jan 09 2024
prev sibling parent Timon Gehr <timon.gehr gmx.ch> writes:
On 1/10/24 00:21, Walter Bright wrote:
 ...
The other points I think have been adequately addressed already.
 You are right, it doesn’t. Timon’s point (expressed as “This does not 
 work”) is that DIP1036 is able to do validation at compile time while 
 DIP1027 is only able to do it at runtime, when this function actually 
 gets invoked.
The only validation it does is check for nested string interpolations.
That is not true in the least. It validates conclusively that no SQL injection attack is going on. This is the main feature of the example!
Jan 11 2024
prev sibling next sibling parent reply Timon Gehr <timon.gehr gmx.ch> writes:
On 1/9/24 21:01, Walter Bright wrote:
 On 1/9/2024 4:35 AM, Timon Gehr wrote:
 However, let's for the sake of argument assume that, miraculously, 
 `execi` can read the format string at compile time, then:
Adam's implementation of execi() also runs at run time, not compile time. ...
Adam's `execi` partially runs at compile time and partially of course it will ultimately run at run time (like code generated by a metaprogram tends to do). The SQL statement is prepared at compile time. Therefore, by construction, it cannot depend on any runtime parameters, preventing an SQL injection. (And it can be checked at compile time, like people are already doing with less convenient syntax).
 
 - With this signature, if you pass a manually-constructed string to 
 it, it would just accept the SQL injection.
It was just a proof of concept piece of code.
So is Adam's example code. In any case: I am talking about the function _signature_. Whatever crazy advanced thing you do in the implementation, the signature that DIP1027 expects `execi` to have is fundamentally significantly less safe.
 execi could check for 
 format strings that contain ?n sequences. It could also check the number 
 of %s formats against the number of arguments.
 ...
That does not fix the security issue.
 
  > But you get a useful error message that exactly pinpoints what the 
 problem is.
  > Also, they could be supported, which is the point.
 - It does not give a proper error message for nested istrings.
execi could be extended to reject arguments that contain %s sequences.
And now suddenly you can no longer store anything that looks like a format string in your data base.
 Or, if there was an embedded istring, the number of %s formats can be 
 checked against the number of arguments.
Maybe at runtime. But why introduce this failure mode in the first place?
 An embedded istring would show a mismatch.
 ...
The error message would be phrased in overly general terms and hence be confusing.
 I expect that use of nested istrings would be exceedingly rare. If they 
 are used, wrapping them in text() will make work.
Depends on how exactly they are used. For the SQL case, not allowing them is a decent option.
 Besides, would a 
 nested istring in an sql call be intended as part of the sql format, or 
 would a text string be the intended result?
 ...
Whatever it is, with DIP1036e and compile-time SQL construction, user data does not make it into the SQL expression sent to the database.
 
 - It has to manually parse the format string. It iterates over each 
 character of the original format string.
Correct. And it does not need to iterate over and remove all the Interpolation arguments.
Adam's implementation does the filtering at compile time. The function body will be something like: auto statement = Statement(db, "...?1...?2...?3..."); // replace ... by query int number = 0; statement.bind(++number, firstArg); statement.bind(++number, secondArg); statement.bind(++number, thirdArg); But yes, DIP1036e does make some concessions and it will indeed pass empty struct arguments in case the function is not inlined (could use pragma(inline, true) to avoid it.)
 Nor does it need the extra two arguments, which 
 aren't free of cost.
 ...
Are you really going to argue that some extra empty struct arguments are in some way more expensive than runtime query construction including format string parsing and query construction using GC strings? But anyway, if you think interpolation is not worth runtime overhead that would perhaps need to be mitigated using additional features or an improved calling convention, that's up to you, but then DIP1027 loses too.
 
 - It (ironically!) constructs a new format string, the original one 
 was useless.
Yes, it converts the format specifiers to the sql ones. Why is this a problem? ...
You argued earlier like it is in some way an ironic benefit of DIP1027 that the DB interface requires something that is similar to a format string under the hood. Well, it does not require the kind of format string that DMD is generating.
 
 - If you pass a bad format string to it (for example, by specifying a 
 manual format), it will just do nonsense, while DIP1036e avoids bad 
 format strings by construction.
What happens when ?3 is included in a DIP1036 istring? `i"string ?3 ($betty)" ? I didn't see any check for that.
That's a fair point in general, but I was specifically talking about the format string that you pass into the function that accepts the istring, not similar kinds of strings that may or may not be generated in the implementation. In any case, DIP1027 istrings can also create a format string with `?3`, and there no way to check within `execi` if that `?3` came from malicious data that was read as input to the program or was put there by an incompetent programmer.
 Of course, one could add such a check to the 1036 execi.
 ...
With DIP1036e the check could be done at compile time.
 printf format strings are checked by the compiler,
As a one-off special case that only supports a specific kind of format string.
 and writef format strings are checked by writef.
`writef` allows the format string to be passed as a template parameter if compile-time parsing and checking is requested. DIP1027 does not naturally support this.
 execi is also capable of being extended 
 to check the format string to ensure the format matches the args.
With DIP1027, you'd have to do it at runtime.
Jan 09 2024
parent reply Walter Bright <newshound2 digitalmars.com> writes:
I'd like to see an example of how DIP1027 does not prevent an injection attack.
Jan 11 2024
parent Timon Gehr <timon.gehr gmx.ch> writes:
On 1/12/24 07:13, Walter Bright wrote:
 I'd like to see an example of how DIP1027 does not prevent an injection 
 attack.
```d // mock SQL import std.format, std.variant; class Sqlite{ this(string){} Sqlite query(string command,scope Variant[int] args=null){ writeln("EXECUTING"); writeln(command); if(args.length){ writeln("ARGS:"); foreach(k,v;args){ if(v!=Variant.init) writefln(i"?$k = ($v)"); } } writeln("DONE"); return this; } struct Row{ int opIndex(int i){ return 0; } } int opApply(scope int delegate(Row) dg){ writeln("ITERATING OVER ROWS"); return 0; } } struct Statement{ Sqlite db; string query; Variant[int] args; void bind(T)(int i,T arg){ args[i]=Variant(arg); } void execute(){ db.query(query,args); } } auto execi(Args...)(Sqlite db, Args args) { // sqlite lets you do ?1, ?2, etc string query = () { // note: parsing done at runtime string sql; int number; import std.conv; auto fmt = args[0]; for (size_t i = 0; i < fmt.length; ++i) { char c = fmt[i]; if (c == '%' && i + 1 < fmt.length && fmt[i + 1] == 's') { sql ~= "?" ~ to!string(++number); ++i; } else if (c == '%' && i + 1 < fmt.length && fmt[i + 1] == '%') ++i; // skip escaped % else sql ~= c; } return sql; }(); auto statement = Statement(db, query); int number; foreach(arg; args[1 .. args.length]) { statement.bind(++number, arg); } return statement.execute(); } import std.stdio; void main() { auto db = new Sqlite(":memory:"); db.query("CREATE TABLE Students (id INTEGER, name TEXT)"); // you might think this is sql injection... and you'd be right! the lib // cannot use rich metadata because it is not provided by the istring // therefore, it cannot verify that the user didn't construct the // query themselves in an unsafe way int id = 1; string name = "Robert'); DROP TABLE Students;--"; db.execi(i"INSERT INTO sample VALUES ($(id), '$(name)')".format); foreach(row; db.query("SELECT * from sample")) writeln(row[0], ": ", row[1]); } ``` Prints: EXECUTING CREATE TABLE Students (id INTEGER, name TEXT) DONE EXECUTING INSERT INTO sample VALUES (1, 'Robert'); DROP TABLE Students;--') DONE EXECUTING SELECT * from sample DONE ITERATING OVER ROWS https://xkcd.com/327/
Jan 12 2024
prev sibling parent Timon Gehr <timon.gehr gmx.ch> writes:
On 1/9/24 21:01, Walter Bright wrote:
 
 
 I expect that use of nested istrings would be exceedingly rare. If they 
 are used, wrapping them in text() will make work.
One more point here is that `text` will of course only work with DIP1038e, with DIP1027 you need `format`. In any case, unfortunately I have to bow out of this discussion now as it is consuming too much of my time right in front of a deadline. I can get back to this in a couple of days.
Jan 09 2024
prev sibling parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
At the end of the day, DIP1027 is an improvement of `writef`, and 
`writef` only (not even `printf` works correctly). The 
interpolation DIP Atila is writing (I'll call it IDIP) supports 
all manner of interpolated transformations, efficiently and 
effectively, with proper compiler checks.

Let's go through the points made...

On Monday, 8 January 2024 at 23:06:40 UTC, Walter Bright wrote:
 Here's how SQL support is done for DIP1036:

 https://github.com/adamdruppe/interpolation-examples/blob/master/lib/sql.d
...
 This:

 1. The istring, after converted to a tuple of arguments, is 
 passed to the `execi` template.
Yes, and with an explicit type to be matched against, enabling overloading. Note that `execi` could be called the same thing as the normal execution function, and then users could use whatever form they prefer -- sql string + args or istring. It's a seamless experience. Compare to DIP1027 where you can accidentally use the wrong form with string args.
 2. It loops over the arguments, essentially turing it 
 (ironically!) back into a format
 string. The formats, instead of %s, are ?1, ?2, ?3, etc.
There is no formatting, sqlite does not have any kind of format specifiers. No, it is not "turned back" into a format string, because there was no format string to begin with. The sql is *constructed* using the given information from the compiler clearly identifying which portions are sql and which portions are parameters. And the SQL query is built at compile time, not runtime (as DIP1027 *must do*). This incurs no memory allocations at runtime.
 3. It skips all the Interpolation arguments inserted by DIP1036.
Sure, those are not necessary here. Should be a no-op, as no data is actually passed.
 4. The remaining argument are each bound to the indices 1, 2, 
 3, ...
Yes.
 5. Then it executes the sql statement.
Yes.
 Note that nested istrings are not supported.
Note that nested istrings can be *detected*. And they are not supported *as explicitly specified*! This is not a defect or limitation but a choice of the particular example library. Noting this "limitation" is like noting the limitation that `void foo(int)` can't be called with a `string` argument.
 Let's see how this can work with DIP1027:

 ```d
 auto execi(Args...)(Sqlite db, Args args) {
     import arsd.sqlite;

     // sqlite lets you do ?1, ?2, etc

     enum string query = () {
         string sql;
         int number;
         import std.conv;
         auto fmt = args[0];
         for (size_t i = 0; i < fmt.length, ++i)
         {
             char c = fmt[i];
             if (c == '%' && i + 1 < fmt.length && fmt[i + 1] == 
 's')
             {
                 sql ~= "?" ~ to!string(++number);
                 ++i;
             }
             else if (c == '%' && i + 1 < fmt.length && fmt[i + 
 1] == '%')
                 ++i;  // skip escaped %
             else
                 sql ~= c;
         }
         return sql;
     }();
 ```
As mentioned several times, this fails to compile -- an enum cannot be built from the runtime variable `args`. Now, you can just do this *without* an enum, and yes, it will compile, build a string at runtime, and you are now at the mercy of the user to not have put in specialized placeholder (poorly named as a "format specifier" in DIP1027 because it is solely focused on writef). No compiler help for you! To put it another way, you have given up complete control of the API of your library to the compiler and the user. Instead of understanding what the user has said, you have to guess. And BTW, this is valid SQL: ```sql SELECT * FROM someTable WHERE fieldN LIKE '%something%' ``` Which means, the poor user needs to escape `%` in a way completely unrelated to the sql language *or* the istring specification, something that IDIP doesn't require. This is a further burden on the user that is wholly unnecessary, just because DIP1027 decided to use `%s` as "the definitive ~~placeholder~~ format specifier".
 ```d
     auto statement = Statement(db, query);
     int number;
     foreach(arg; args[1 .. args.length]) {
         statement.bind(++number, arg);
     }

     return statement.execute();
 }
 ```
 This:

 1. The istring, after converted to a tuple of arguments, is 
 passed to the `execi` template.
A tuple with an incorrect parameter that needs runtime transformation and allocations.
 2. The first tuple element is the format string.
 3. A replacement format string is created by replacing all 
 instances of "%s" with
 "?n", where `n` is the index of the corresponding arg.
SQL doesn't use format strings, so the parameter must be transformed at runtime using memory allocations. And it does this without knowing whether the "%s" came from the "format string" or from a parameter. Not to mention the user can pass in other "format specifiers" at will.
 4. The replacement format string is bound to `statement`, and 
 the arguments are bound
 to their indices.
Maybe. sqlite frowns upon mismatching arguments because the library decided your search string was actually a placeholder in some unrelated domain specific language (the language of `writef`).
 5. Then it executes the sql statement.
Maybe.
 It is equivalent.
It is most certainly not. The two are only slightly comparable. IDIP is a mechanism for an SQL library author (and many other domains, see Adam's repository) to effectively and gracefully consume succinct and intuitive instructions from a user to avoid SQL injections, and use the compiler to weed out problematic calls. Whereas DIP1027 is a loaded footgun which is built for `writef` that can be shoehorned into an SQL lib, which necessitates allocations and all checks are done at runtime. -Steve
Jan 09 2024
parent reply Timon Gehr <timon.gehr gmx.ch> writes:
On 1/9/24 23:30, Steven Schveighoffer wrote:
 
 And BTW, this is valid SQL:
 
 ```sql
 SELECT * FROM someTable WHERE fieldN LIKE '%something%'
 ```
 
 Which means, the poor user needs to escape `%` in a way completely 
 unrelated to the sql language *or* the istring specification, something 
 that IDIP doesn't require.
I had typed up a similar point in my post, but then thought that most likely DIP1027 does the escaping automatically and dropped the line of inquiry. But actually checking it now, it indeed does not seem to do anything to prevent such hijacking. https://github.com/dlang/DIPs/blob/master/DIPs/rejected/DIP1027.md https://github.com/dlang/dmd/compare/master...WalterBright:dmd:dip1027#diff-a556a8e6917dd4042f541bdb19673f96940149ec3d416b0156af4d0e4cc5e4bdR16347-R16452 Having the SQL library arbitrarily interpret a substring `%s` in your SQL query as a placeholder seems like unnecessary pain, and it also renders moot the idea that DIP1027 code is able to detect mismatches.
Jan 09 2024
parent reply Walter Bright <newshound2 digitalmars.com> writes:
Please post an example of a problem it cannot detect.
Jan 11 2024
parent Timon Gehr <timon.gehr gmx.ch> writes:
On 1/12/24 07:17, Walter Bright wrote:
 Please post an example of a problem it cannot detect.
For example: ```d import std.stdio; void main(){ int x=2,y=3; writefln(i"%success: $y",x); } ```
Jan 12 2024