digitalmars.D - Interpolated strings and SQL

Walter Bright (86/86) Jan 08 2024 Here's how SQL support is done for DIP1036:

Nickolay Bukreyev (110/110) Jan 08 2024 Hello. It is fascinating to see string interpolation in D. Let me

Nickolay Bukreyev (8/10) Jan 09 2024 Shame on me. `segregatedInterpolations(Args...)` should end with
Walter Bright (25/50) Jan 09 2024 Thank you for your thoughts!

Alexandru Ermicioi (11/15) Jan 09 2024 If that's the case, then 1036 wins imho, by simple thing of not

Walter Bright (50/52) Jan 09 2024 Consider the overhead 1036 has by comparing it with plain writeln or wri...

Alexandru Ermicioi (11/16) Jan 09 2024 How is this related to original argument of not requiring any
Timon Gehr (16/33) Jan 09 2024 I think Alexandru and Nickolay already discharged the concerns about

Walter Bright (39/42) Jan 10 2024 Yes, I used writeln instead of writefln. The similarity between the two ...

Nickolay Bukreyev (39/46) Jan 10 2024 It looks very similar to what I presented in my later posts

Walter Bright (8/11) Jan 10 2024 Thank you for the explanation. It was entirely missing from the spec, an...

Nickolay Bukreyev (16/23) Jan 10 2024 Importance of the ability to do processing at compile time was
Timon Gehr (3/6) Jan 11 2024 As far as I am concerned it is a must-have. For example, this is what

Walter Bright (3/9) Jan 11 2024 Why does compile time make it a guarantee and runtime not?

Richard (Rikki) Andrew Cattermole (10/21) Jan 11 2024 Where possible we absolutely should not be.

Walter Bright (13/24) Jan 11 2024 I agree that compile time checking is preferable. But there is a cost in...

Richard (Rikki) Andrew Cattermole (13/43) Jan 11 2024 So I guess the question is, do you want to hear from a company that they...

Richard (Rikki) Andrew Cattermole (15/15) Jan 11 2024 Let's try something different.

zjh (8/16) Jan 11 2024 I think D language can create an `attribute dictionary` for any

Timon Gehr (10/21) Jan 12 2024 Because a SQL injection attack by definition is when a third party can

Richard (Rikki) Andrew Cattermole (14/17) Jan 10 2024 Another potential solution would be to allow passing metadata on the

Nickolay Bukreyev (16/22) Jan 10 2024 Sorry, I don’t understand how this can possibly work. After

Richard (Rikki) Andrew Cattermole (6/33) Jan 10 2024 This has side effects. It affects ``ref`` and ``out``. It also affects

Nickolay Bukreyev (6/11) Jan 10 2024 Thank you for the clarification. I see a downside that pretty

Steven Schveighoffer (88/100) Jan 10 2024 Yes, DIP1036e has a lot of extra templates generated, and the
Timon Gehr (19/83) Jan 11 2024 My point was with DIP1036e it either works or does not compile, not that...

Walter Bright (4/6) Jan 11 2024 What's missing is why is a runtime check not good enough? The D compiler...

Timon Gehr (3/10) Jan 12 2024 Sure.

Steven Schveighoffer (15/26) Jan 09 2024 Yeah, and writeln could avoid those if it's that important. A

Walter Bright (8/18) Jan 10 2024 I've been aware for a long time that writeln and writefln are very ineff...

Hipreme (38/60) Jan 10 2024 Are you sure you really want to keep optimizing debug logging

Walter Bright (4/18) Jan 10 2024 I regularly work on many of those problems. For example, without looking...

Timon Gehr (3/23) Jan 11 2024 Thanks a lot for the incredible amount of work you have invested into D

Walter Bright (2/4) Jan 11 2024 It is indeed my pleasure, especially the privilege of working with you g...

Nickolay Bukreyev (20/33) Jan 09 2024 No. This line is inside `enum string query = () { ... }();`. So

Paolo Invernizzi (60/93) Jan 09 2024 Compile time string creation when dealing with SQL give you the

Timon Gehr (5/7) Jan 09 2024 Yes. Besides the usability benefits you allude to, it is simply a

Nickolay Bukreyev (7/8) Jan 09 2024 Oh, I realized you might be reading this without a fancy Markdown
Timon Gehr (2/6) Jan 09 2024 And I stand by that.

Walter Bright (2/9) Jan 09 2024 But I showed that DIP1027 could do the SQL example.

Timon Gehr (2/12) Jan 09 2024 You actually did not.

Walter Bright (2/3) Jan 09 2024 See my other reply to you in this thread.

Nickolay Bukreyev (5/7) Jan 09 2024 Also, when I said, _like in Swift_, in no event was I meaning,

Nickolay Bukreyev (14/14) Jan 09 2024 I’ve just realized DIP1036 has an excellent feature that is not

Walter Bright (15/33) Jan 09 2024 The compiler will indeed reject it (The error message would be a bit baf...

Nickolay Bukreyev (41/61) Jan 10 2024 Yes! It would be brilliant if `alias` could refer to any

Nickolay Bukreyev (18/32) Jan 10 2024 Well, `InterpolatedLiteral` and `InterpolatedExpression` don’t
Walter Bright (22/24) Jan 10 2024 Structs with no fields have a size of 1 byte for D and C++ structs, and ...
Timon Gehr (5/11) Jan 11 2024 I am not a big fan of this option. If we are going to allow passing

Timon Gehr (18/59) Jan 11 2024 What we want that DIP1036e mostly provides is:

Timon Gehr (6/10) Jan 11 2024 if (condition);

Timon Gehr (23/127) Jan 09 2024 This is not ironic at all. The point is it _can_ do that, while DIP1027

Walter Bright (18/19) Jan 09 2024 How so? Consider this:

Timon Gehr (13/39) Jan 09 2024 It does not compile. The arg->args fix I'll grant you as it is a typo

Walter Bright (21/35) Jan 09 2024 It was just a proof of concept piece of code. execi could check for form...

Nickolay Bukreyev (59/71) Jan 09 2024 A valid point, thanks. Could you test if that fixes the issue?

Walter Bright (18/49) Jan 09 2024 Yes, that works.

Paolo Invernizzi (10/24) Jan 09 2024 No.
Paolo Invernizzi (8/18) Jan 09 2024 You are underestimating what can be gained as value in catching

Walter Bright (22/26) Jan 11 2024 Please expand on that. This is a very important topic. I want to know al...

Paolo Invernizzi (71/78) Jan 12 2024 As a preamble, we are _currently_ doing all the SQL validations
Timon Gehr (18/21) Jan 12 2024 This is not true, DIP1027 also suffers from other drawbacks. For example...

Timon Gehr (4/6) Jan 12 2024 - In any case, DIP1027 cannot support nested expression sequences

Steven Schveighoffer (44/65) Jan 12 2024 The point is to pass the things that the compiler knows to the

Nickolay Bukreyev (110/135) Jan 09 2024 To sum up, it works with nested istrings poorly; it may even be
Timon Gehr (4/11) Jan 11 2024 That is not true in the least. It validates conclusively that no SQL

Timon Gehr (58/119) Jan 09 2024 Adam's `execi` partially runs at compile time and partially of course it...

Walter Bright (1/1) Jan 11 2024 I'd like to see an example of how DIP1027 does not prevent an injection ...

Timon Gehr (92/94) Jan 12 2024 ```d

Timon Gehr (6/10) Jan 09 2024 One more point here is that `text` will of course only work with

Steven Schveighoffer (76/138) Jan 09 2024 At the end of the day, DIP1027 is an improvement of `writef`, and

Timon Gehr (10/20) Jan 09 2024 I had typed up a similar point in my post, but then thought that most

Walter Bright (1/1) Jan 11 2024 Please post an example of a problem it cannot detect.

Timon Gehr (9/10) Jan 12 2024 For example:

Walter Bright <newshound2 digitalmars.com> writes:

Here's how SQL support is done for DIP1036:

https://github.com/adamdruppe/interpolation-examples/blob/master/lib/sql.d

```
auto execi(Args...)(Sqlite db, InterpolationHeader header, Args args, 
InterpolationFooter footer) {
     import arsd.sqlite;

     // sqlite lets you do ?1, ?2, etc

     enum string query = () {
         string sql;
         int number;
         import std.conv;
         foreach(idx, arg; Args)
             static if(is(arg == InterpolatedLiteral!str, string str))
                 sql ~= str;
             else static if(is(arg == InterpolationHeader) || is(arg == 
InterpolationFooter))
                 throw new Exception("Nested interpolation not supported");
             else static if(is(arg == InterpolatedExpression!code, string code))
                 {   } // just skip it
             else
                 sql ~= "?" ~ to!string(++number);
         return sql;
     }();

     auto statement = Statement(db, query);
     int number;
     foreach(arg; args) {
         static if(!isInterpolatedMetadata!(typeof(arg)))
             statement.bind(++number, arg);
     }

     return statement.execute();
}
```
This:

1. The istring, after converted to a tuple of arguments, is passed to the 
`execi` template.
2. It loops over the arguments, essentially turing it (ironically!) back into a 
format
string. The formats, instead of %s, are ?1, ?2, ?3, etc.
3. It skips all the Interpolation arguments inserted by DIP1036.
4. The remaining argument are each bound to the indices 1, 2, 3, ...
5. Then it executes the sql statement.

Note that nested istrings are not supported.

Let's see how this can work with DIP1027:

```
auto execi(Args...)(Sqlite db, Args args) {
     import arsd.sqlite;

     // sqlite lets you do ?1, ?2, etc

     enum string query = () {
         string sql;
         int number;
         import std.conv;
         auto fmt = arg[0];
         for (size_t i = 0; i < fmt.length, ++i)
         {
             char c = fmt[i];
             if (c == '%' && i + 1 < fmt.length && fmt[i + 1] == 's')
             {
                 sql ~= "?" ~ to!string(++number);
                 ++i;
             }
             else if (c == '%' && i + 1 < fmt.length && fmt[i + 1] == '%')
                 ++i;  // skip escaped %
             else
                 sql ~= c;
         }
         return sql;
     }();

     auto statement = Statement(db, query);
     int number;
     foreach(arg; args[1 .. args.length]) {
         statement.bind(++number, arg);
     }

     return statement.execute();
}
```
This:

1. The istring, after converted to a tuple of arguments, is passed to the 
`execi` template.
2. The first tuple element is the format string.
3. A replacement format string is created by replacing all instances of "%s"
with
"?n", where `n` is the index of the corresponding arg.
4. The replacement format string is bound to `statement`, and the arguments are 
bound
to their indices.
5. Then it executes the sql statement.

It is equivalent.

Jan 08 2024

Nickolay Bukreyev <buknik95 ya.ru> writes:

Hello. It is fascinating to see string interpolation in D. Let me 
try to spread some light on it; I hope my thoughts will be useful.

1.  First of all, I’d like to notice that in the DIP1027 variant 
of the code we see:

     > `auto fmt = arg[0];`

     (`arg` is undeclared identifier here; I presume `args` was 
meant.) There is a problem: this line is executed at CTFE, but it 
cannot access `args`, which is a runtime parameter of `execi`. 
For this to work, the format string should go to a template 
parameter, and interpolated expressions should go to runtime 
parameters. How can DIP1027 accomplish this?

2.
     > Note that nested istrings are not supported.

     To clarify: “not supported” means one cannot write

     ```
     db.execi(i"SELECT field FROM items WHERE server = 
$(i"europe$(number)")");
     ```

     Instead, you have to be more explicit about what you want the 
inner string to become. This is legal:

     ```
     db.execi(i"SELECT field FROM items WHERE server = 
$(i"europe$(number)".text)");
     ```

     However, it is not hard to adjust `execi` so that it fully 
supports nested istrings:

     ```d
     struct Span {
         size_t i, j;
         bool topLevel;
     }

     enum segregatedInterpolations(Args...) = {
         Span[ ] result;
         size_t processedTill;
         size_t depth;
         static foreach (i, T; Args)
             static if (is(T == InterpolationHeader)) {
                 if (!depth++) {
                     result ~= Span(processedTill, i, true);
                     processedTill = i;
                 }
             } else static if (is(T == InterpolationFooter))
                 if (!--depth) {
                     result ~= Span(processedTill, i + 1);
                     processedTill = i + 1;
                 }
         return result;
     }();

     auto execi(Args...)(Sqlite db, InterpolationHeader header, 
Args args, InterpolationFooter footer) {
         import std.conv: text, to;
         import arsd.sqlite;

         // sqlite lets you do ?1, ?2, etc

         enum string query = () {
             string sql;
             int number;
             static foreach (span; segregatedInterpolations!Args)
                 static if (span.topLevel) {
                     static foreach (T; Args[span.i .. span.j])
                         static if (is(T == 
InterpolatedLiteral!str, string str))
                             sql ~= str;
                         else static if (is(T == 
InterpolatedExpression!code, string code))
                             sql ~= "?" ~ to!string(++number);
                 }
             return sql;
         }();

         auto statement = Statement(db, query);
         int number;
         static foreach (span; segregatedInterpolations!Args)
             static if (span.topLevel) {
                 static foreach (arg; args[span.i .. span.j])
                     static if 
(!isInterpolatedMetadata!(typeof(arg)))
                         statement.bind(++number, arg);
             } else // Convert a nested interpolation to string 
with `.text`.
                 statement.bind(++number, args[span.i .. 
span.j].text);

         return statement.execute();
     }
     ```

     Here, we just invoke `.text` on nested istrings. A more 
advanced implementation would allocate a buffer and reuse it. It 
could even be ` nogc` if it wanted.

3.  DIP1036 appeals more to me because it passes rich, high-level 
information about parts of the string. With DIP1027, on the other 
hand, we have to extract that information ourselves by parsing 
the string character by character. But the compiler already 
tokenized the string; why do we have to do it again? (And no, 
lower level doesn’t imply broader possibilities here.)

     It may have another implication: looping over characters 
might put current CTFE engine in trouble if strings are large. 
Much more iterations need to be executed, and more memory is 
consumed in the process. We certainly need numbers here, but I 
thought it was important to at least bring attention to this 
point.

4.  What I don’t like in both DIPs is a rather arbitrary 
selection of meta characters: `$`, `$$` and `%s`. In regular 
strings, all of them are just normal characters; in istrings, 
they gain special meaning.

     I suppose a cleaner way would be to use `\(...)` syntax (like 
in Swift). So `i"a \(x) b"` interpolates `x` while `"a \(x) b"` 
is an immediate syntax error. First, it helps to catch bugs 
caused by missing `i`. Second, the question, how do we escape 
`$`, gets the most straightforward answer: we don’t.

     A downside is that parentheses will always be required with 
this syntax. But the community preferred them anyway even with 
`$`.

Jan 08 2024

Nickolay Bukreyev <buknik95 ya.ru> writes:

On Tuesday, 9 January 2024 at 07:30:57 UTC, Nickolay Bukreyev 
wrote:
 However, it is not hard to adjust `execi` so that it fully 
 supports nested istrings:

Shame on me. `segregatedInterpolations(Args...)` should end with 
this:

```d
result ~= Span(processedTill, Args.length, true);
return result;
```

Jan 09 2024

Walter Bright <newshound2 digitalmars.com> writes:

Thank you for your thoughts!

On 1/8/2024 11:30 PM, Nickolay Bukreyev wrote:> 1.  First of all, I’d like
to 
notice that in the DIP1027 variant of the code we
 see:
 
      > `auto fmt = arg[0];`
 
      (`arg` is undeclared identifier here; I presume `args` was meant.)

Yes. I don't have sql on my system, so didn't try to compile it. I always make 
typos. Oof.

 There is a problem: this line is executed at CTFE,

It's executed at runtime. The code is not optimized for speed, I just wanted to 
show the concept. The speed doesn't particularly matter, because after all this 
is a call to a database which is going to be slow. Anyhow, DIP1036 also uses 
unoptimized code here.


 3.  DIP1036 appeals more to me because it passes rich, high-level information 
 about parts of the string. With DIP1027, on the other hand, we have to extract 
 that information ourselves by parsing the string character by character. But
the 
 compiler already tokenized the string; why do we have to do it again? (And no, 
 lower level doesn’t imply broader possibilities here.)

DIP1036 also builds a new format string.

      It may have another implication: looping over characters might put
current 
 CTFE engine in trouble if strings are large. Much more iterations need to be 
 executed, and more memory is consumed in the process. We certainly need
numbers 
 here, but I thought it was important to at least bring attention to this point.

It happens at runtime.


 4.  What I don’t like in both DIPs is a rather arbitrary selection of meta 
 characters: `$`, `$$` and `%s`. In regular strings, all of them are just
normal 
 characters; in istrings, they gain special meaning.

I looked at several schemes, and picked `$` because it looked the nicest.

      I suppose a cleaner way would be to use `\(...)` syntax (like in
Swift). So 
 `i"a \(x) b"` interpolates `x` while `"a \(x) b"` is an immediate syntax
error. 
 First, it helps to catch bugs caused by missing `i`.

I'm sorry to say, that looks like tty noise. Aesthetic appeal is very important 
design consideration for D.

 Second, the question, how 
 do we escape `$`, gets the most straightforward answer: we don’t.

It will rarely need to be escaped, but when one does need it, one needs it!


      A downside is that parentheses will always be required with this
syntax. 
 But the community preferred them anyway even with `$`.

DIP1027 does not require ( ) if it's just an identifier. That makes for the 
shortest, simplest istring syntax. The ( ) usage will be relatively rare. The 
idea is the most common cases should require the least syntactical noise.

Also, the reason I picked the SQL example is because that is the one most cited 
as being needed and in showing the power of DIP1036 and because I was told that 
DIP1027 couldn't do it :-)

The intent of DIP1027 is not to provide the most powerful, richest mechanism. 
It's meant to be the simplest I could think of, with the most attractive 
appearance, minimal runtime overhead, while handling the meat and potatoes use 
cases.

Jan 09 2024

Alexandru Ermicioi <alexandru.ermicioi gmail.com> writes:

On Tuesday, 9 January 2024 at 08:29:08 UTC, Walter Bright wrote:
 The intent of DIP1027 is not to provide the most powerful, 
 richest mechanism. It's meant to be the simplest I could think 
 of, with the most attractive appearance, minimal runtime 
 overhead, while handling the meat and potatoes use cases.

If that's the case, then 1036 wins imho, by simple thing of not 
doing any parsing of format string.

Note, that other use cases might not require building of a format 
string.

What about logging functionality?

In case of 1036, a log function could just dump all text into 
sink directly, for 1027 it would still need to parse format 
string to find where to inject arguments. This use case makes 
1036 more favourable than 1027, by your own criterias for a good 
mechanism.

Jan 09 2024

Walter Bright <newshound2 digitalmars.com> writes:

On 1/9/2024 12:45 AM, Alexandru Ermicioi wrote:
 If that's the case, then 1036 wins imho, by simple thing of not doing any 
 parsing of format string.

Consider the overhead 1036 has by comparing it with plain writeln or writefln:

```
void test(int baz)
{
     writeln(i"$(baz + 4)");
     writeln(baz + 5);
     writefln("%d", baz + 6);
}
```

Generated code:

0000:   55                       push      RBP
0001:   48 8B EC                 mov       RBP,RSP
0004:   48 83 EC 20              sub       RSP,020h
0008:   48 89 5D E8              mov       -018h[RBP],RBX

000c:   89 7D F8                 mov       -8[RBP],EDI          // baz
000f:   48 83 EC 08              sub       RSP,8
0013:   31 C0                    xor       EAX,EAX
0015:   88 45 F0                 mov       -010h[RBP],AL
0018:   48 8D 75 F0              lea       RSI,-010h[RBP]
001c:   FF 36                    push      dword ptr [RSI]      // header
001e:   88 45 F1                 mov       -0Fh[RBP],AL
0021:   48 8D 5D F1              lea       RBX,-0Fh[RBP]
0025:   FF 33                    push      dword ptr [RBX]      // 
expression!"baz + 4"
0027:   8D 7F 04                 lea       EDI,4[RDI]           // baz + 4
002a:   88 45 F2                 mov       -0Eh[RBP],AL
002d:   48 8D 75 F2              lea       RSI,-0Eh[RBP]
0031:   FF 36                    push      dword ptr [RSI]      // footer
0033:   E8 00 00 00 00           call      writeln

0038:   48 83 C4 20              add       RSP,020h
003c:   8B 45 F8                 mov       EAX,-8[RBP]
003f:   8D 78 05                 lea       EDI,5[RAX]           // baz + 5
0042:   E8 00 00 00 00           call      writeln

0047:   BA 00 00 00 00           mov       EDX,0                // "%d".ptr
004c:   BE 02 00 00 00           mov       ESI,2                // "%d".length
0051:   8B 4D F8                 mov       ECX,-8[RBP]
0054:   8D 79 06                 lea       EDI,6[RCX]           // baz + 6
0057:   E8 00 00 00 00           call      writefln

005c:   48 8B 5D E8              mov       RBX,-018h[RBP]
0060:   C9                       leave
0061:   C3                       ret

With the istring, there are 4 calls to struct member functions that just return 
null.
This can't be good for performance or program size.
We can compute the number of arguments passed to the function:

istring: 1 + 3 * <number of arguments> + 1 + 1  (*)
writeln: <number of arguments>
writefln: 1 + <number of arguments>

(*) includes string literals before, between, and after arguments

Jan 09 2024

Alexandru Ermicioi <alexandru.ermicioi gmail.com> writes:

On Tuesday, 9 January 2024 at 19:05:40 UTC, Walter Bright wrote:
 On 1/9/2024 12:45 AM, Alexandru Ermicioi wrote:
 If that's the case, then 1036 wins imho, by simple thing of 
 not doing any parsing of format string.

 Consider the overhead 1036 has by comparing it with plain 
 writeln or writefln:

How is this related to original argument of not requiring any 
parsing to be done by user inside function that accepts istring, 
that you replied to?

I personally would be ok with any overhead 1036 adds as long as I 
don't need to do any extra work such as parsing.

Please take into consideration also code inside function that 
does accept interpolated string. I'm pretty sure that parsing of 
format string inside dip1027 function would result in bigger and 
more complex generated code, than overhead you've mentioned for 
1036 version, for use cases similar to logging I've mentioned.

Jan 09 2024

Timon Gehr <timon.gehr gmx.ch> writes:

On 1/9/24 20:05, Walter Bright wrote:
 On 1/9/2024 12:45 AM, Alexandru Ermicioi wrote:
 If that's the case, then 1036 wins imho, by simple thing of not doing 
 any parsing of format string.

 
 Consider the overhead 1036 has by comparing it with plain writeln or 
 writefln:
 
 ```
 void test(int baz)
 {
      writeln(i"$(baz + 4)");
      writeln(baz + 5);
      writefln("%d", baz + 6);
 }
 ```
 
 ...

I think Alexandru and Nickolay already discharged the concerns about 
overhead pretty well, but just note that with DIP1027, `test(3)` prints:

%s7
8
9

There is fundamentally no way to make this work correctly, due to how 
DIP1027 throws away the information about the format string.

With DIP1036e, `test(3)` prints:

7
8
9

And you can get rid of the runtime overhead by adding a `pragma(inline, 
true)` `writeln` overload. (I guess with DMD that will still bloat the 
executable, but I think other compiler backends and linkers can be made 
elide such symbols completely.)

Jan 09 2024

Walter Bright <newshound2 digitalmars.com> writes:

On 1/9/2024 2:38 PM, Timon Gehr wrote:
 %s7 8 9

Yes, I used writeln instead of writefln. The similarity between the two names
is 
a source of error, but if that was a festering problem we'd have seen a lot of 
complaints about it by now.


 And you can get rid of the runtime overhead by adding a `pragma(inline, true)` 
 `writeln` overload. (I guess with DMD that will still bloat the executable,

Try it and see.

I didn't mention the other kind of bloat - the rather massive number and size
of 
template names being generated that go into the object file, as well as all the 
uncalled functions generated only to be removed by the linker.

As far as I can tell, the only advantage of DIP1036 is the use of inserted 
templates to "key" the tuples to specific functions. Isn't that what the type 
system is supposed to do? Maybe the real issue is that a format string should
be 
a different type than a conventional string. For example:

```d
extern (C) pragma(printf) int printf(const(char*), ...);

enum Format : string;

void foo(Format f) { printf("Format %s\n", f.ptr); }
void foo(string s) { printf("string %s\n", s.ptr); }

void main()
{
     Format f = cast(Format)"f";
     foo(f);
     string s = "s";
     foo(s);
}
```
which prints:

Format f
string s

If we comment out `foo(string s)`:

test2.d(14): Error: function `test2.foo(Format f)` is not callable using 
argument types `(string)`
test2.d(14):        cannot pass argument `s` of type `string` to parameter 
`Format f`

If we comment out `foo(Format s)`:

string f
string s

This means that if execi()'s first parameter is of type `Format`, and the 
istring generates the format string with type `Format`, this key will fit the 
lock. A string generated by other means, such as `.text`, will not fit that
lock.

Jan 10 2024

Nickolay Bukreyev <buknik95 ya.ru> writes:

On Wednesday, 10 January 2024 at 19:53:48 UTC, Walter Bright 
wrote:
 I may have found a solution. I'm interested in your thoughts on 
 it.

It looks very similar to what I presented in my later posts 
([this](https://forum.dlang.org/post/qiyrmzwnoguzxxllgzcz forum.dlang.org) and
one following). It’s inspiring: we are probably getting closer to common
understanding of things.

 As far as I can tell, the only advantage of DIP1036 is the use 
 of inserted templates to "key" the tuples to specific 
 functions. Isn't that what the type system is supposed to do? 
 Maybe the real issue is that a format string should be a 
 different type than a conventional string.

Exactly. Let me try to explain why DIP1036 is doing what it is 
doing. For illustrative purposes, I’ll be drastically simplifying 
code; please excuse me for that.

Let there be `foo`, a function that would like to receive an 
istring. Inside it, we would like to transform its argument list 
at compile time into a new argument list. So what we essentially 
want is to pass an istring to a template parameter so that it is 
available to `foo` at compile time:

```d
int x;
foo!(cast(Format)"prefix ", 2 * x); // foo!(alias Format, alias 
int)()
```

Unfortunately, this does not work because `2 * x` cannot be 
passed to an `alias` parameter. _This is the root of the 
problem._ The only way to do that is to pass them to runtime 
parameters:

```d
int x;
foo(cast(Format)"prefix ", 2 * x); // foo!(Format, int)(Format, 
int)
```

However, now `foo` cannot access the format string at compile 
time—its type is simply `Format`, and its value becomes known 
only at runtime. So we encode the value into the type:

```d
int x;
foo(Format!"prefix "(), 2 * x); // foo!(Format!"prefix ", 
int)(Format!"prefix ", int)
```

This is more or less what DIP1036 is doing at the moment. Hope it 
became clear now.

I’d say DIP1036, as we see it now, relies on a clever workaround 
of a limitation imposed by the language. If that limitation is 
gone, the DIP will become simpler.

Jan 10 2024

Walter Bright <newshound2 digitalmars.com> writes:

On 1/10/2024 5:53 PM, Nickolay Bukreyev wrote:
 Exactly. Let me try to explain why DIP1036 is doing what it is doing. For 
 illustrative purposes, I’ll be drastically simplifying code; please excuse
me 
 for that.

Thank you for the explanation. It was entirely missing from the spec, and I 
overlooked it in the code. (This is why reverse engineering a spec from code is 
not so easy.) It is indeed clever.

As for it being a required feature of string interpolation to do this
processing 
at compile time, that's a nice feature, not a must have.

The enum proposal is to obviate the requirement for a header and footer 
template, which is a big improvement.

Jan 10 2024

Nickolay Bukreyev <buknik95 ya.ru> writes:

On Thursday, 11 January 2024 at 02:21:17 UTC, Walter Bright wrote:
 As for it being a required feature of string interpolation to 
 do this processing at compile time, that's a nice feature, not 
 a must have.

Importance of the ability to do processing at compile time was 
stated by:
* Alexandru 
([here](https://forum.dlang.org/post/yxrqncmaiyfmhxnvzgil forum.dlang.org) and
[here](https://forum.dlang.org/post/yqwxvjnvqaahhshrfohy forum.dlang.org)),
* Timon 
([here](https://forum.dlang.org/post/unjfb9$1ku5$1 digitalmars.com)),
* Paolo 
([here](https://forum.dlang.org/post/rhpblxrebibhpnfxfihv forum.dlang.org) and
[here](https://forum.dlang.org/post/ajeqtckcwawuvtusbvxb forum.dlang.org)),
* Steven 
([here](https://forum.dlang.org/post/ilituyhcqipsqktqmfor forum.dlang.org)).

 The enum proposal is to obviate the requirement for a header 
 and footer template, which is a big improvement.

Header and footer are not templates; `InterpolatedLiteral` and 
`InterpolatedExpression` are. Yes, the latter two can be replaced 
by enums iff it becomes possible to pass arbitrary expressions to 
alias parameters. And I agree it would be a big improvement.

 Structs with no fields have a size of 1 byte for D and C++ 
 structs, and 0 or 4 for C structs (depending on the target).

Yes, I mistakenly wrote, _zero-sized_, when I meant, _empty_.

Jan 10 2024

Timon Gehr <timon.gehr gmx.ch> writes:

On 1/11/24 03:21, Walter Bright wrote:
 
 As for it being a required feature of string interpolation to do this 
 processing at compile time, that's a nice feature, not a must have.

As far as I am concerned it is a must-have. For example, this is what 
prevents the SQL injection attack, it's a safety guarantee.

Jan 11 2024

Walter Bright <newshound2 digitalmars.com> writes:

On 1/11/2024 11:50 AM, Timon Gehr wrote:
 On 1/11/24 03:21, Walter Bright wrote:
 As for it being a required feature of string interpolation to do this 
 processing at compile time, that's a nice feature, not a must have.

 
 As far as I am concerned it is a must-have. For example, this is what prevents 
 the SQL injection attack, it's a safety guarantee.

Why does compile time make it a guarantee and runtime not?

We do array bounds checking at runtime.

Jan 11 2024

"Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:

On 12/01/2024 6:28 PM, Walter Bright wrote:
 On 1/11/2024 11:50 AM, Timon Gehr wrote:
 On 1/11/24 03:21, Walter Bright wrote:
 As for it being a required feature of string interpolation to do this 
 processing at compile time, that's a nice feature, not a must have.

 As far as I am concerned it is a must-have. For example, this is what 
 prevents the SQL injection attack, it's a safety guarantee.

 
 Why does compile time make it a guarantee and runtime not?
 
 We do array bounds checking at runtime.

Where possible we absolutely should not be.

Making things crash at runtime, because the compiler did not apply the 
knowledge it has is just ridiculous.

Imagine going to ``http://google.com/itsacrash`` and crashing Google.

Or pressing a button too fast on an airplane and suddenly the fuel pumps 
turn off and then refuse to turn back on.

Instead of the compiler catching clearly bad logic that it has a full 
understanding of, you're disrupting service and making people lose 
money. This is not a good thing.

Jan 11 2024

Walter Bright <newshound2 digitalmars.com> writes:

On 1/11/2024 9:36 PM, Richard (Rikki) Andrew Cattermole wrote:
 Making things crash at runtime, because the compiler did not apply the
knowledge 
 it has is just ridiculous.
 
 Imagine going to ``http://google.com/itsacrash`` and crashing Google.
 
 Or pressing a button too fast on an airplane and suddenly the fuel pumps turn 
 off and then refuse to turn back on.
 
 Instead of the compiler catching clearly bad logic that it has a full 
 understanding of, you're disrupting service and making people lose money. This 
 is not a good thing.

I agree that compile time checking is preferable. But there is a cost involved, 
as I explained more fully in another post. It isn't free.

Since the format string is a compile time creature, not a user input feature,
if 
the fault only happened when the code is deployed, it means the code was
*never* 
executed before it was shipped.

This is an inexcusable failure for any avionics system, or any critical system, 
since we have simple tools that check coverage.

BTW, professional code is full of assert()s. Asserts check for faults in the 
code logic that are not the result of user input, but are the result of 
programming errors. We leave them as asserts because nobody knows how to get 
compilers to detect them, or is too costly to detect them.

In other words, this is not an absolute thing. It's a weighing of cost and
benefit.

Jan 11 2024

"Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:

On 12/01/2024 8:00 PM, Walter Bright wrote:
 On 1/11/2024 9:36 PM, Richard (Rikki) Andrew Cattermole wrote:
 Making things crash at runtime, because the compiler did not apply the 
 knowledge it has is just ridiculous.

 Imagine going to ``http://google.com/itsacrash`` and crashing Google.

 Or pressing a button too fast on an airplane and suddenly the fuel 
 pumps turn off and then refuse to turn back on.

 Instead of the compiler catching clearly bad logic that it has a full 
 understanding of, you're disrupting service and making people lose 
 money. This is not a good thing.

 
 I agree that compile time checking is preferable. But there is a cost 
 involved, as I explained more fully in another post. It isn't free.
 
 Since the format string is a compile time creature, not a user input 
 feature, if the fault only happened when the code is deployed, it means 
 the code was *never* executed before it was shipped.
 
 This is an inexcusable failure for any avionics system, or any critical 
 system, since we have simple tools that check coverage.
 
 BTW, professional code is full of assert()s. Asserts check for faults in 
 the code logic that are not the result of user input, but are the result 
 of programming errors. We leave them as asserts because nobody knows how 
 to get compilers to detect them, or is too costly to detect them.
 
 In other words, this is not an absolute thing. It's a weighing of cost 
 and benefit.

So I guess the question is, do you want to hear from a company that they 
lost X amount of business because they used a language feature that 
could have caught errors at compile time, but instead continually 
crashed in a live environment?

I do not.

That would be a total embarrassment.

I have an identical problem currently with `` mustuse``.
It errors out at runtime if you do not check to see if it has an error, 
if you try to get access to the value.

It is hell. I could never recommend such an error prone design. I am 
only putting up with it until the language is capable of something better.

https://issues.dlang.org/show_bug.cgi?id=23998

Jan 11 2024

"Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:

Let's try something different.

Would you like me to write a small specification for an alternative 
method for passing metadata from the call site into the body that would 
allow a string interpolation feature to not use extra templates while 
still being compile time based?

I described this to Adam Wilson yesterday:

```d
func( metadata("hi!") 2);

void func(T)(T arg) {
	enum MetaData = __traits(getAttributes, arg);
	pragma(msg, MetaData);
}
```

This is essentially what 1036e is attempting to do, but it does it with 
extra templates.

Jan 11 2024

zjh <fqbqrr 163.com> writes:

On Friday, 12 January 2024 at 07:31:49 UTC, Richard (Rikki) 
Andrew Cattermole wrote:
 Let's try something different.

 ```d
 func( metadata("hi!") 2);

 void func(T)(T arg) {
 	enum MetaData = __traits(getAttributes, arg);
 	pragma(msg, MetaData);
 }
 ```



I think D language can create an `attribute dictionary` for any 
building block

In this way, the `attribute soup` can be simplified. It would be 
even better to simplify the method of `getting and setting` 
attributes. It can be used to facilitate the extraction of 
`metadata`

Jan 11 2024

Timon Gehr <timon.gehr gmx.ch> writes:

On 1/12/24 06:28, Walter Bright wrote:
 On 1/11/2024 11:50 AM, Timon Gehr wrote:
 On 1/11/24 03:21, Walter Bright wrote:
 As for it being a required feature of string interpolation to do this 
 processing at compile time, that's a nice feature, not a must have.

 As far as I am concerned it is a must-have. For example, this is what 
 prevents the SQL injection attack, it's a safety guarantee.

 
 Why does compile time make it a guarantee and runtime not?
 ...

Because a SQL injection attack by definition is when a third party can 
control safety-critical parts of your SQL query at runtime.

The very fact that the whole prepared SQL query is known at 
compile-time, with runtime data only entering through the placeholders, 
conclusively rules this out. If the SQL query is constructed at runtime 
based on runtime data, `execi` is unable to check whether an SQL 
injection vulnerability is present.

 We do array bounds checking at runtime.

You can check array bounds at runtime. You cannot check where a 
runtime-known string came from at runtime. It's simply not possible.

Jan 12 2024

"Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:

On 11/01/2024 2:53 PM, Nickolay Bukreyev wrote:
 I’d say DIP1036, as we see it now, relies on a clever workaround of a 
 limitation imposed by the language. If that limitation is gone, the DIP 
 will become simpler.

Another potential solution would be to allow passing metadata on the 
function call side, to the function.

Consider:

``i"prefix${expr:format}suffix"``

Could be:

```d
func("prefix",  format("format") expr, "suffix");

void func(T...)(T args) {
	pragma(msg, __traits(getAttributes, args[1])); // format("format")
}

```

This is so much simpler than what 1036e is.

But it does require another language feature.

Jan 10 2024

Nickolay Bukreyev <buknik95 ya.ru> writes:

On Thursday, 11 January 2024 at 02:35:00 UTC, Richard (Rikki) 
Andrew Cattermole wrote:
 ```d
 void func(T...)(T args) {
     pragma(msg, __traits(getAttributes, args[1])); // 
 format("format")
 }
 ```

Sorry, I don’t understand how this can possibly work. After 
`func` template is instantiated, its `T` is bound to, e.g., 
`AliasSeq!(string, int, string)`. `args` is just a local variable 
of type `AliasSeq!(string, int, string)`. How can `__traits` know 
what attributes were attached at call site?

If, on the other hand, attributes do affect the type, then IMHO

```d
func("prefix",  format("format") expr, "suffix");
```

is not much different than

```d
func("prefix", format!"format"(expr), "suffix");
```

I.e., we can do it already.

Jan 10 2024

"Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:

On 11/01/2024 5:31 PM, Nickolay Bukreyev wrote:
 On Thursday, 11 January 2024 at 02:35:00 UTC, Richard (Rikki) Andrew 
 Cattermole wrote:
 ```d
 void func(T...)(T args) {
     pragma(msg, __traits(getAttributes, args[1])); // format("format")
 }
 ```

 
 Sorry, I don’t understand how this can possibly work. After `func` 
 template is instantiated, its `T` is bound to, e.g., `AliasSeq!(string, 
 int, string)`. `args` is just a local variable of type 
 `AliasSeq!(string, int, string)`. How can `__traits` know what 
 attributes were attached at call site?
 
 If, on the other hand, attributes do affect the type, then IMHO
 
 ```d
 func("prefix",  format("format") expr, "suffix");
 ```
 
 is not much different than
 
 ```d
 func("prefix", format!"format"(expr), "suffix");
 ```
 
 I.e., we can do it already.

This has side effects. It affects ``ref`` and ``out``. It also affects 
lifetime analysis.

So we can't do it currently.

But yes, it affects the type, without being in the type system 
explicitly as it is meta data.

Jan 10 2024

Nickolay Bukreyev <buknik95 ya.ru> writes:

On Thursday, 11 January 2024 at 04:34:33 UTC, Richard (Rikki) 
Andrew Cattermole wrote:
 This has side effects. It affects ``ref`` and ``out``. It also 
 affects lifetime analysis.

 So we can't do it currently.

 But yes, it affects the type, without being in the type system 
 explicitly as it is meta data.

Thank you for the clarification. I see a downside that pretty 
much any generic code should strip the annotations off its 
arguments after it inspected them, to reduce template bloating. 
However, we are probably going off-topic.

Jan 10 2024

Steven Schveighoffer <schveiguy gmail.com> writes:

On Wednesday, 10 January 2024 at 19:53:48 UTC, Walter Bright 
wrote:
 And you can get rid of the runtime overhead by adding a 
 `pragma(inline, true)` `writeln` overload. (I guess with DMD 
 that will still bloat the executable,

 I didn't mention the other kind of bloat - the rather massive 
 number and size of template names being generated that go into 
 the object file, as well as all the uncalled functions 
 generated only to be removed by the linker.

Yes, DIP1036e has a lot of extra templates generated, and the 
mangled name is going to be large.

Let's skip for a moment the template that writeln will generate 
(which I agree isn't ideal, but also is somewhat par for the 
course).

This shouldn't be a huge problem for the interpolation *types* 
because the type doesn't get included in the binary. It is a big 
problem for the `toString` function, because that *is* included.

However, we can mitigate the ones that return `null`:

```d
string __interpNull() => null;

struct InterpolatedExpression(string expr)
{
   alias toString = __interpNull;
}

... // and so on
```

I tested this and it does work. So this reduces all the 
`toString` member functions from `InterpolatedExpression` (and 
`InterpolationPrologue` and `InterpolationEpilog`, but those are 
not templated structs anyway) to one function in the binary.

But we can't do this for `InterpolatedLiteral` (which by the way 
is improperly described in Atila's DIP, the associated `toString` 
member function should return the literal).

We can do possibly a couple things here to mitigate:

1. We can modify how `std.format` works so it will accept the 
following as a `toString` hook:

```d
struct S
{
    enum toString = "I am an S";
}
```

This means, no function calls, no extra long symobls in the 
binary (since it's an enum, it should not go in), and I think 
even the compilation will be faster.

2. We modify it to be aware of `InterpolationLiteral` types, and 
avoid depending on the `toString` API. After all, we own both 
Phobos and druntime, we can coordinate the release.

And as a further suggestion, though this is kind of off-topic, we 
may look into ways to have templates that *don't* make it into 
the binary explicitly. Basically, they are marked as shims or 
forwarders by the library author, and just serve as a way to 
write nicer syntax. This could help in more than just the 
interpolation DIP.

 As far as I can tell, the only advantage of DIP1036 is the use 
 of inserted templates to "key" the tuples to specific 
 functions. Isn't that what the type system is supposed to do? 
 Maybe the real issue is that a format string should be a 
 different type than a conventional string.

No. While I agree that having a different *type* makes it more 
useful and easier to hook, there is a fundamental problem being 
solved with the compile-time literals being passed to the 
function. Namely, tremendous power is available to validate, 
parse, prepare, etc. string data at compile time, for use during 
runtime. This simply *is not possible* with 1027.

The runtime benefits are huge:
* No need to allocate anything (` nogc`, `-betterC`, etc. all 
available)
* You get compiler errors instead of runtime errors (if you put 
in the work)
* It's possible generate "perfect forwarding" to another function 
that does use another form. For example, `printf`.
* If you inline the call, it can be as if you called the 
forwarded function directly with the exactly correct parameters.

And I want to continue to point out, that a constructed "format 
string" mechanism just is inferior, regardless if it is another 
type, as long as you don't need formatting specifiers (and 
arguably, it's just a difference in taste otherwise). The 
compiler parsed it out, it knows the separate pieces. Giving 
those pieces directly to the library is both the most efficient 
way, and also the most obvious way. The "format string" 
mechanism, while making sense for writef, *must* add an element 
of complexity to the receiving function, since it now has to know 
what "language" the translated string is. e.g. with DIP1027, one 
must know that `%s` is special and what it represents, and the 
user must know to escape `%s` to avoid miscommunication. With 
1036e, there is no format string, so there is no complication 
there, or confusion. The value being passed is right where you 
would expect it, and you don't have to parse a separate thing to 
know.

Note in YAIDIP, this was done partly through an interpolation 
header, which had all the compile-time information, and then 
strings and interpolated data were interspersed. I find this also 
a workable solution, and could even do without the strings being 
passed interspersed (as I said, we have control over `writeln` 
and `text`), but I think the ordering of the tuple to match what 
the actual string literal looks like is so intuitive, and we 
would be losing that if we did some kind of "format header" 
mechanism.

-Steve

Jan 10 2024

Timon Gehr <timon.gehr gmx.ch> writes:

On 1/10/24 20:53, Walter Bright wrote:
 On 1/9/2024 2:38 PM, Timon Gehr wrote:
  > %s7 8 9
 
 Yes, I used writeln instead of writefln. The similarity between the two 
 names is a source of error, but if that was a festering problem we'd 
 have seen a lot of complaints about it by now.
 ...

My point was with DIP1036e it either works or does not compile, not that 
you called the wrong function.

 
 And you can get rid of the runtime overhead by adding a 
 `pragma(inline, true)` `writeln` overload. (I guess with DMD that will 
 still bloat the executable,

 
 Try it and see.
 
 I didn't mention the other kind of bloat - the rather massive number and 
 size of template names being generated that go into the object file, as 
 well as all the uncalled functions generated only to be removed by the 
 linker.
 ...

I understand the drawbacks of DIP1036e which it shares with most 
non-trivial metaprogramming. D underdelivers in this department at the 
moment, but this still remains one of the key selling points of D.

The issue is that DIP1027 is worse than DIP1036e. DIP1027 is also worse 
than nothing. It has been rejected for good reason. For some reason you 
however keep insisting it is essentially as useful as DIP1036e. That's 
just not the case.

I think a much better answer to DIP1036e than a DIP1027 revival would 
have been to add a -preview=experimental-DIP1036e flag and do a call to 
action to resolve language issues and limitations that force DIP1036e to 
generate bloat. Maybe there would have been an even better way to handle 
this.

 As far as I can tell, the only advantage of DIP1036 is the use of 
 inserted templates to "key" the tuples to specific functions.

Well, this is not the case, that is not the only advantage.

 Isn't that 
 what the type system is supposed to do? Maybe the real issue is that a 
 format string should be a different type than a conventional string. For 
 example:
 
 ```d
 extern (C) pragma(printf) int printf(const(char*), ...);
 
 enum Format : string;
 
 void foo(Format f) { printf("Format %s\n", f.ptr); }
 void foo(string s) { printf("string %s\n", s.ptr); }
 
 void main()
 {
      Format f = cast(Format)"f";
      foo(f);
      string s = "s";
      foo(s);
 }
 ```
 which prints:
 
 Format f
 string s
 
 If we comment out `foo(string s)`:
 
 test2.d(14): Error: function `test2.foo(Format f)` is not callable using 
 argument types `(string)`
 test2.d(14):        cannot pass argument `s` of type `string` to 
 parameter `Format f`
 
 If we comment out `foo(Format s)`:
 
 string f
 string s
 
 This means that if execi()'s first parameter is of type `Format`, and 
 the istring generates the format string with type `Format`, this key 
 will fit the lock. A string generated by other means, such as `.text`, 
 will not fit that lock.
 

Well, this is a step in the right direction, but rest assured if this 
was the only advantage of DIP1036e, then Adam would have gone with this 
suggestion. I am almost sure this is one of the ideas he discarded.

Jan 11 2024

Walter Bright <newshound2 digitalmars.com> writes:

On 1/11/2024 11:45 AM, Timon Gehr wrote:
 My point was with DIP1036e it either works or does not compile, not that you 
 called the wrong function.

What's missing is why is a runtime check not good enough? The D compiler emits 
more than one safety check at runtime. For example, array bounds checking, and 
switch statement default checks.

Jan 11 2024

Timon Gehr <timon.gehr gmx.ch> writes:

On 1/12/24 06:33, Walter Bright wrote:
 On 1/11/2024 11:45 AM, Timon Gehr wrote:
 My point was with DIP1036e it either works or does not compile, not 
 that you called the wrong function.

 
 What's missing is why is a runtime check not good enough?

There is no runtime check, it just does the wrong thing.

 The D compiler emits more than one safety check at runtime. For example, array
bounds 
 checking, and switch statement default checks.

Sure.

Jan 12 2024

Steven Schveighoffer <schveiguy gmail.com> writes:

On Tuesday, 9 January 2024 at 19:05:40 UTC, Walter Bright wrote:

 With the istring, there are 4 calls to struct member functions 
 that just return null.

Yeah, and writeln could avoid those if it's that important. A 
good optimizer will remove that call.

 This can't be good for performance or program size.

Then use writeln the way you want? I don't see it as significant 
at all.

 We can compute the number of arguments passed to the function:

 ```
 istring: 1 + 3 * <number of arguments> + 1 + 1  (*)
 writeln: <number of arguments>
 writefln: 1 + <number of arguments>
 ```

 (*) includes string literals before, between, and after 
 arguments

I find it bizarre to be concerned about the call performance of 
zero-sized structs and empty strings to writeln or writef, like 
the function is some shining example of performance or efficient 
argument passing. If you do not have inlining or optimizations 
enabled, do you think the call tree of writefln is going to be 
compact? Not to mention it eventually just calls into C opaquely.

Note that you can write a simple wrapper that can be inlined, 
which will mitigate all of this via compile-time transformations.

If you like, I can write it up and you can try it out!

-Steve

Jan 09 2024

Walter Bright <newshound2 digitalmars.com> writes:

On 1/9/2024 3:33 PM, Steven Schveighoffer wrote:
 I find it bizarre to be concerned about the call performance of zero-sized 
 structs and empty strings to writeln or writef, like the function is some 
 shining example of performance or efficient argument passing. If you do not
have 
 inlining or optimizations enabled, do you think the call tree of writefln is 
 going to be compact? Not to mention it eventually just calls into C opaquely.
 
 Note that you can write a simple wrapper that can be inlined, which will 
 mitigate all of this via compile-time transformations.
 
 If you like, I can write it up and you can try it out!

I've been aware for a long time that writeln and writefln are very inefficient, 
and could use a re-engineering.

A big part of the problem is the blizzard of templates resulting from using 
them. This issue doubles the number of templates. Even if they are optimized 
away, they sit in the object file.

Anyhow, see my other reply to Timon. I may have found a solution. I'm
interested 
in your thoughts on it.

Jan 10 2024

Hipreme <msnmancini hotmail.com> writes:

On Wednesday, 10 January 2024 at 20:19:46 UTC, Walter Bright 
wrote:
 On 1/9/2024 3:33 PM, Steven Schveighoffer wrote:
 I find it bizarre to be concerned about the call performance 
 of zero-sized structs and empty strings to writeln or writef, 
 like the function is some shining example of performance or 
 efficient argument passing. If you do not have inlining or 
 optimizations enabled, do you think the call tree of writefln 
 is going to be compact? Not to mention it eventually just 
 calls into C opaquely.
 
 Note that you can write a simple wrapper that can be inlined, 
 which will mitigate all of this via compile-time 
 transformations.
 
 If you like, I can write it up and you can try it out!

 I've been aware for a long time that writeln and writefln are 
 very inefficient, and could use a re-engineering.

 A big part of the problem is the blizzard of templates 
 resulting from using them. This issue doubles the number of 
 templates. Even if they are optimized away, they sit in the 
 object file.

 Anyhow, see my other reply to Timon. I may have found a 
 solution. I'm interested in your thoughts on it.

Are you sure you really want to keep optimizing debug logging 
functionality? Come on. The only reason to keep using `printf` 
and `writeln` is for debug logging. If you're going to show your 
log function to a user, it is going to be completely different.

They are super easy to disable by simply creating a wrapper.
If you want to know what increases the compilation time on them, 
is `std.conv.to!float`. I have said this many times on forums 
already. I don't know about people's hobby, but caring about 
performance on logging is simply too much.

Do me a favor: Press F12 to open your browser's console, then 
write at it: `for(let i = 0; i < 10000; i ++) console.log(i);`

You'll notice how slot it is. And this is not JS problem. Logging 
is always slow, no matter how much you optimize. I personally 
find this a great loss of time that could be directed into a lot 
more useful tasks, such as:
- Improving debugging symbols in DMD and for macOS
- Improving importC until it actually works
- Listen to rikki's complaint about how slow it is to import UTF 
Tables
- Improving support for shared libraries on DMD (like not making 
it collect an interfaced object)
- Solve the problem with `init` property of structs containing 
memory reference which can be easily be corrupted
- Fix the problem when an abstract class implements an interface
- Make a D compiler daemon
- Help in the project of DMD as a library focused on helping 
WebFreak in code-d and serve-d
- Implement DMD support for Apple Silicon
- Revive newCTFE engine
- Implement ctfe caching


Those are the only thing I can take of my mind right now. Anyway, 
I'm not here to demand anything at all. I'm only giving examples 
on what could be done in fields I have no experience in how to 
make it better, but I know people out there can do it. But for 
me, it is just a pity to see such genius wasting time on 
improving a rather antiquated debug functionality

Jan 10 2024

Walter Bright <newshound2 digitalmars.com> writes:

On 1/10/2024 12:56 PM, Hipreme wrote:
 - Improving debugging symbols in DMD and for macOS
 - Improving importC until it actually works
 - Listen to rikki's complaint about how slow it is to import UTF Tables
 - Improving support for shared libraries on DMD (like not making it collect an 
 interfaced object)
 - Solve the problem with `init` property of structs containing memory
reference 
 which can be easily be corrupted
 - Fix the problem when an abstract class implements an interface
 - Make a D compiler daemon
 - Help in the project of DMD as a library focused on helping WebFreak in
code-d 
 and serve-d
 - Implement DMD support for Apple Silicon
 - Revive newCTFE engine
 - Implement ctfe caching

I regularly work on many of those problems. For example, without looking it up, 
I think I've fixed maybe 20 ImportC issues in the last month. I've also done a 
number of recent PRs aimed at making D more tractable as a library. So has
Razvan.

Jan 10 2024

Timon Gehr <timon.gehr gmx.ch> writes:

On 1/10/24 22:21, Walter Bright wrote:
 On 1/10/2024 12:56 PM, Hipreme wrote:
 - Improving debugging symbols in DMD and for macOS
 - Improving importC until it actually works
 - Listen to rikki's complaint about how slow it is to import UTF Tables
 - Improving support for shared libraries on DMD (like not making it 
 collect an interfaced object)
 - Solve the problem with `init` property of structs containing memory 
 reference which can be easily be corrupted
 - Fix the problem when an abstract class implements an interface
 - Make a D compiler daemon
 - Help in the project of DMD as a library focused on helping WebFreak 
 in code-d and serve-d
 - Implement DMD support for Apple Silicon
 - Revive newCTFE engine
 - Implement ctfe caching

 
 I regularly work on many of those problems. For example, without looking 
 it up, I think I've fixed maybe 20 ImportC issues in the last month. 
 I've also done a number of recent PRs aimed at making D more tractable 
 as a library. So has Razvan.

Thanks a lot for the incredible amount of work you have invested into D 
over the years!

Jan 11 2024

Walter Bright <newshound2 digitalmars.com> writes:

On 1/11/2024 12:20 PM, Timon Gehr wrote:
 Thanks a lot for the incredible amount of work you have invested into D over
the 
 years!

It is indeed my pleasure, especially the privilege of working with you guys!

Jan 11 2024

Nickolay Bukreyev <buknik95 ya.ru> writes:

On Tuesday, 9 January 2024 at 08:29:08 UTC, Walter Bright wrote:
 It happens at runtime.

No. This line is inside `enum string query = () { ... }();`. So 
CTFE-performance considerations do apply.

 I'm sorry to say, that looks like tty noise.

That’s sad. In my opinion, it is at least as readable, plus I see 
a few objective advantages in it. We don’t have to agree on this 
though.

 It will rarely need to be escaped, but when one does need it, 
 one needs it!

Yes, but I see a benefit in reducing the number of characters 
that _have_ to be escaped in the first place. While `$` rarely 
appeared in examples we’ve been thinking of so far, if someone 
faces a need to create a string full of dollars, escaping them 
all will uglify the string.

 DIP1027 does not require ( ) if it's just an identifier. That 
 makes for the shortest, simplest
 istring syntax. The ( ) usage will be relatively rare. The idea 
 is the most common cases should
 require the least syntactical noise.

Totally agree. Personally, I prefer omitting parentheses in 
interpolations when a language supports such syntax, but it’s a 
matter of taste.

 Also, the reason I picked the SQL example is because that is 
 the one most cited as being needed
 and in showing the power of DIP1036 and because I was told that 
 DIP1027 couldn't do it :-)

DIP1027 is unable to do it _at compile time_. I cannot argue that 
compile-time string creation doesn’t give us much if we call an 
SQL engine afterwards. So we need another example where 
CTFE-ability is desired. Alexandru Ermicioi asked about logging; 
I agree it is nice to rule out format-string parsing from every 
`log` call.

Jan 09 2024

Paolo Invernizzi <paolo.invernizzi gmail.com> writes:

On Tuesday, 9 January 2024 at 09:25:28 UTC, Nickolay Bukreyev 
wrote:
 On Tuesday, 9 January 2024 at 08:29:08 UTC, Walter Bright wrote:
 It happens at runtime.

 No. This line is inside `enum string query = () { ... }();`. So 
 CTFE-performance considerations do apply.

 I'm sorry to say, that looks like tty noise.

 That’s sad. In my opinion, it is at least as readable, plus I 
 see a few objective advantages in it. We don’t have to agree on 
 this though.

 It will rarely need to be escaped, but when one does need it, 
 one needs it!

 Yes, but I see a benefit in reducing the number of characters 
 that _have_ to be escaped in the first place. While `$` rarely 
 appeared in examples we’ve been thinking of so far, if someone 
 faces a need to create a string full of dollars, escaping them 
 all will uglify the string.

 DIP1027 does not require ( ) if it's just an identifier. That 
 makes for the shortest, simplest
 istring syntax. The ( ) usage will be relatively rare. The 
 idea is the most common cases should
 require the least syntactical noise.

 Totally agree. Personally, I prefer omitting parentheses in 
 interpolations when a language supports such syntax, but it’s a 
 matter of taste.

 Also, the reason I picked the SQL example is because that is 
 the one most cited as being needed
 and in showing the power of DIP1036 and because I was told 
 that DIP1027 couldn't do it :-)

 DIP1027 is unable to do it _at compile time_. I cannot argue 
 that compile-time string creation doesn’t give us much if we 
 call an SQL engine afterwards. So we need another example where 
 CTFE-ability is desired. Alexandru Ermicioi asked about 
 logging; I agree it is nice to rule out format-string parsing 
 from every `log` call.

Compile time string creation when dealing with SQL give you the 
ability to validate the string for correctness at compile time.

Here an example of what we are doing internally:


```
pinver utumno fieldmanager % bin/yab build ldc_lab_mac_i64_dg
2024-01-09T10:48:07.889 [info] melkor.d:235:executeReadyLabel 
executing ldc_lab_mac_i64_dg:
/Users/pinver/dlang/ldc-1.36.0/bin/ldc2     -preview=dip1000 -i 
-Isrc -mtriple=x86_64-apple-darwin --vcolumns 
-J/Users/pinver/Lembas --d-version=env_dev_ 
--d-version=listen_for_nx_ --d-version=disable_ssl 
--d-version=disable_fixations --d-version=disable_metrics 
--d-version=disable_aggregator --d-debug -g 
-of/Users/pinver/Projects/DeepGlance/fieldmanager/bin/lab_mac_i64_dg
/Users/pinver/Projects/DeepGlance/fieldmanager/src/application.d
src/sbx/raygui/c_raygui.c
2024-01-09T10:48:13.423 [error] melkor.d:247:executeReadyLabel 
build failed:
src/ops/sql/semantics.d(489,31): Error: uncaught CTFE exception 
`object.Exception("42P01: relation \"snapshotsssss\" does not 
exist. SQL: select size_mm, size_px from snapshotsssss where 
snapshot_id = $1")`
src/api3.d(41,9):        thrown from here
src/api3.d(51,43):        called from here: 
`checkSql(Schema("public", ["aggregators":Table("aggregators", 
["aggregated_till":Column("aggregated_till", Type.timestamp, 
true, false), "touchpoint_id":Column("touchpoint_id", 
Type.smallint, true, false)], [], [], ["pinver", 
"ipsos_analysis_operator", "i
/Users/pinver/Projects/DeepGlance/fieldmanager/src/application.d(644,45):
Error: template instance `api3.forgeSqlCheckerForSchema!(Schema("public",
["aggregators":Table("aggregators",
["aggregated_till":Column("aggregated_till", Type.timestamp, true, false),
"touchpoint_id":Column("touchpoint_id", T
```

or

```

pinver utumno fieldmanager % bin/yab build ldc_lab_mac_i64_dg
2024-01-09T10:52:36.220 [info] melkor.d:235:executeReadyLabel 
executing ldc_lab_mac_i64_dg:
/Users/pinver/dlang/ldc-1.36.0/bin/ldc2     -preview=dip1000 -i 
-Isrc -mtriple=x86_64-apple-darwin --vcolumns 
-J/Users/pinver/Lembas --d-version=env_dev_ 
--d-version=listen_for_nx_ --d-version=disable_ssl 
--d-version=disable_fixations --d-version=disable_metrics 
--d-version=disable_aggregator --d-debug -g 
-of/Users/pinver/Projects/DeepGlance/fieldmanager/bin/lab_mac_i64_dg
/Users/pinver/Projects/DeepGlance/fieldmanager/src/application.d
src/sbx/raygui/c_raygui.c
2024-01-09T10:52:37.254 [error] melkor.d:247:executeReadyLabel 
build failed:

src/ops/sql/semantics.d(504,19): Error: uncaught CTFE exception 
`object.Exception("XXXX! role \"dummyuser\" can't select on table 
\"snapshots\". SQL: select size_mm, size_px from snapshots where 
snapshot_id = $1")`
src/api3.d(41,9):        thrown from here
src/api3.d(51,43):        called from here: 
`checkSql(Schema("public", ["aggregators":Table("aggregators", 
["aggregated_till":Column("aggregated_till", Type.timestamp, 
true, false), "touchpoint_id":Column("touchpoint_id", 
Type.smallint, true, false)], [], [], ["pinver", 
"ipsos_analysis_operator", "i
/Users/pinver/Projects/DeepGlance/fieldmanager/src/application.d(644,45):
Error: template instance `api3.forgeSqlCheckerForSchema!(Schema("public",
["aggregators":Table("aggregators",
["aggregated_till":Column("aggregated_till", Type.timestamp, true, false),
"touchpoint_id":Column("touchpoint_id", T
```

CTFE support is a must IMHO

/P

Jan 09 2024

Timon Gehr <timon.gehr gmx.ch> writes:

On 1/9/24 10:59, Paolo Invernizzi wrote:
 
 CTFE support is a must IMHO

Yes. Besides the usability benefits you allude to, it is simply a 
security feature. We absolutely do not want the constructed string to 
depend on dynamically entered runtime data. Constructing it at compile 
time ensures that this is the case.

Jan 09 2024

Nickolay Bukreyev <buknik95 ya.ru> writes:

On Tuesday, 9 January 2024 at 08:29:08 UTC, Walter Bright wrote:
 that looks like tty noise.

Oh, I realized you might be reading this without a fancy Markdown 
renderer. Backticks are part of Markdown syntax, not D. I only 
suggested using

     i"a \(x) b"

rather than

     i"a $(x) b"

Jan 09 2024

Timon Gehr <timon.gehr gmx.ch> writes:

On 1/9/24 09:29, Walter Bright wrote:
 
 Also, the reason I picked the SQL example is because that is the one 
 most cited as being needed and in showing the power of DIP1036 and 
 because I was told that DIP1027 couldn't do it :-)

And I stand by that.

Jan 09 2024

Walter Bright <newshound2 digitalmars.com> writes:

On 1/9/2024 4:40 AM, Timon Gehr wrote:
 On 1/9/24 09:29, Walter Bright wrote:
 Also, the reason I picked the SQL example is because that is the one most 
 cited as being needed and in showing the power of DIP1036 and because I was 
 told that DIP1027 couldn't do it :-)

 
 And I stand by that.

But I showed that DIP1027 could do the SQL example.

Jan 09 2024

Timon Gehr <timon.gehr gmx.ch> writes:

On 1/9/24 20:06, Walter Bright wrote:
 On 1/9/2024 4:40 AM, Timon Gehr wrote:
 On 1/9/24 09:29, Walter Bright wrote:
 Also, the reason I picked the SQL example is because that is the one 
 most cited as being needed and in showing the power of DIP1036 and 
 because I was told that DIP1027 couldn't do it :-)

 And I stand by that.

 
 But I showed that DIP1027 could do the SQL example.

You actually did not.

Jan 09 2024

Walter Bright <newshound2 digitalmars.com> writes:

On 1/9/2024 1:24 PM, Timon Gehr wrote:
 You actually did not.

See my other reply to you in this thread.

Jan 09 2024

Nickolay Bukreyev <buknik95 ya.ru> writes:

On Tuesday, 9 January 2024 at 07:30:57 UTC, Nickolay Bukreyev 
wrote:
 I suppose a cleaner way would be to use `\(...)` syntax (like 
 in Swift).

Also, when I said, _like in Swift_, in no event was I meaning, 
_Swift has it, therefore, D should do the same_. I meant, _there 
is at least one other language that does it this way_.

Jan 09 2024

Nickolay Bukreyev <buknik95 ya.ru> writes:

I’ve just realized DIP1036 has an excellent feature that is not 
evident right away. Look at the signature of `execi`:

```d
auto execi(Args...)(Sqlite db, InterpolationHeader header, Args 
args, InterpolationFooter footer) { ... }
```

`InterpolationHeader`/`InterpolationFooter` _require_ you to pass 
an istring. Consider this example:

```d
db.execi(i"INSERT INTO items VALUES ($(x))".text);
```

Here, we accidentally added `.text`. It would be an SQL 
injection… but the compiler rejects it! `typeof(i"...".text)` is 
`string`, and `execi` cannot be called with `(Sqlite, string)`.

Jan 09 2024

Walter Bright <newshound2 digitalmars.com> writes:

On 1/9/2024 12:04 AM, Nickolay Bukreyev wrote:
 I’ve just realized DIP1036 has an excellent feature that is not evident
right 
 away. Look at the signature of `execi`:
 
 ```d
 auto execi(Args...)(Sqlite db, InterpolationHeader header, Args args, 
 InterpolationFooter footer) { ... }
 ```
 
 `InterpolationHeader`/`InterpolationFooter` _require_ you to pass an istring. 
 Consider this example:
 
 ```d
 db.execi(i"INSERT INTO items VALUES ($(x))".text);
 ```
 
 Here, we accidentally added `.text`. It would be an SQL injection… but the 
 compiler rejects it! `typeof(i"...".text)` is `string`, and `execi` cannot be 
 called with `(Sqlite, string)`.

The compiler will indeed reject it (The error message would be a bit baffling
to 
those who don't know what Interpolation types are), along with any attempt to 
call execi() with a pre-constructed string.

The end result is that to do manipulation with istring tuples, the programmer
is 
alternately faced with adding Interpolation elements or filtering them out. Is 
that really what we want? Will that impede the use of tuples generally, or just 
impede the use of istrings?

---

P.S. most keyboarding bugs result from neglecting to add needed syntax, not 
typing extra stuff. This is why:

     int* p;

is initialized to zero, while:

     int* p = void;

is left uninitialized. The user is unlikely to accidentally type "= void".

Jan 09 2024

Nickolay Bukreyev <buknik95 ya.ru> writes:

On Monday, 8 January 2024 at 03:05:17 UTC, Walter Bright wrote:
 On 1/7/2024 6:30 PM, Walter Bright wrote:
 On 1/7/2024 3:50 PM, Timon Gehr wrote:
 This cannot work:

 ```
 int x=readln.strip.split.to!int;
 db.execi(xxx!i"INSERT INTO sample VALUES ($(id), $(2*x))");
 ```

 
 True, you got me there. It's the 2\*x that is not turnable 
 into an alias. I'm going to think about this a bit.

 I wonder if what we're missing are functions that operate on 
 tuples and return tuples. We almost have them in the form of:

 ```
 template tuple(A ...) { alias tuple = A; }
 ```

 but the compiler wants A to only consist of symbols, types and 
 expressions that can be computed at compile time. This is so 
 the name mangling will work. But what if we don't bother doing 
 name mangling for this kind of template?

Yes! It would be brilliant if `alias` could refer to any 
Expression, not just symbols. If that was the case, we could just 
pass InterpolationHeader/Footer/etc. to template parameters (as 
opposed to runtime parameters, where they go now).

```d
// Desired syntax:
db.execi!i"INSERT INTO sample VALUES ($(id), $(2*x))";
// Desugars to:
db.execi!(
     InterpolationHeader(),
     InterpolatedLiteral!"INSERT INTO sample VALUES ("(),
     InterpolatedExpression!"id"(),
     id,
     InterpolatedLiteral!", "(),
     InterpolatedExpression!"2*x"(),
     2*x, // Currently illegal (`2*x` is not aliasable).
     InterpolatedLiteral!")"(),
     InterpolationFooter(),
);
// `execi!(...)` would expand to:
db.execImpl("INSERT INTO sample VALUES (?1, ?2)", id, 2*x);
```

With this approach, they are processed entirely via compile-time 
sequence manipulations. Zero-sized structs are never passed as 
arguments. Inlining is not necessary to get rid of them.

An example with `writeln` (or just about any function alike):

```d
writeln(interpolate!i"prefix $(baz + 4) suffix");
// Desugars to:
writeln(interpolate!(
     InterpolationHeader(),
     InterpolatedLiteral!"prefix "(),
     InterpolatedExpression!"baz + 4"(),
     baz + 4,
     InterpolatedLiteral!" suffix"(),
     InterpolationFooter(),
));
// `interpolate!(...)` would expand to:
writeln("prefix ", baz + 4, " suffix");
```

Jan 10 2024

Nickolay Bukreyev <buknik95 ya.ru> writes:

On Wednesday, 10 January 2024 at 15:07:42 UTC, Nickolay Bukreyev 
wrote:
 ```d
 writeln(interpolate!i"prefix $(baz + 4) suffix");
 // Desugars to:
 writeln(interpolate!(
     InterpolationHeader(),
     InterpolatedLiteral!"prefix "(),
     InterpolatedExpression!"baz + 4"(),
     baz + 4,
     InterpolatedLiteral!" suffix"(),
     InterpolationFooter(),
 ));
 // `interpolate!(...)` would expand to:
 writeln("prefix ", baz + 4, " suffix");
 ```

Well, `InterpolatedLiteral` and `InterpolatedExpression` don’t 
have to be templates anymore:

```d
writeln(interpolate!i"prefix $(baz + 4) suffix");
// Desugars to:
writeln(interpolate!(
     InterpolationHeader(),
     InterpolatedLiteral("prefix "),
     InterpolatedExpression("baz + 4"),
     baz + 4,
     InterpolatedLiteral(" suffix"),
     InterpolationFooter(),
));
// `interpolate!(...)` would expand to:
writeln("prefix ", baz + 4, " suffix");
```

Jan 10 2024

Walter Bright <newshound2 digitalmars.com> writes:

On 1/10/2024 7:07 AM, Nickolay Bukreyev wrote:
 Zero-sized structs are never passed as arguments. Inlining is not 
 necessary to get rid of them.

Structs with no fields have a size of 1 byte for D and C++ structs, and 0 or 4 
for C structs (depending on the target). The rationale for a non-zero size is
so 
that different structs instances will be at different addresses.

```d
struct S { }

void foo(S s);

void test(S s)
{
     foo(s);
}
```

```
                 push    RBP
                 mov     RBP,RSP
                 sub     RSP,8
                 push    dword ptr 010h[RBP]
                 call      _D5test43fooFSQm1SZv PC32
                 add     RSP,010h
                 pop     RBP
                 ret
```

Jan 10 2024

Timon Gehr <timon.gehr gmx.ch> writes:

On 1/10/24 16:07, Nickolay Bukreyev wrote:

 
 Yes! It would be brilliant if `alias` could refer to any Expression, not 
 just symbols. If that was the case, we could just pass 
 InterpolationHeader/Footer/etc. to template parameters (as opposed to 
 runtime parameters, where they go now).

I am not a big fan of this option. If we are going to allow passing 
runtime arguments as template parameters, we might as well just allow 
passing template parameters as runtime arguments instead. It's much more 
clear how to make that work.

Jan 11 2024

Timon Gehr <timon.gehr gmx.ch> writes:

On 1/10/24 01:03, Walter Bright wrote:
 On 1/9/2024 12:04 AM, Nickolay Bukreyev wrote:
 I’ve just realized DIP1036 has an excellent feature that is not 
 evident right away. Look at the signature of `execi`:

 ```d
 auto execi(Args...)(Sqlite db, InterpolationHeader header, Args args, 
 InterpolationFooter footer) { ... }
 ```

 `InterpolationHeader`/`InterpolationFooter` _require_ you to pass an 
 istring. Consider this example:

 ```d
 db.execi(i"INSERT INTO items VALUES ($(x))".text);
 ```

 Here, we accidentally added `.text`. It would be an SQL injection… but 
 the compiler rejects it! `typeof(i"...".text)` is `string`, and 
 `execi` cannot be called with `(Sqlite, string)`.

 
 The compiler will indeed reject it (The error message would be a bit 
 baffling to those who don't know what Interpolation types are), along 
 with any attempt to call execi() with a pre-constructed string.
 
 The end result is that to do manipulation with istring tuples, the 
 programmer is alternately faced with adding Interpolation elements or 
 filtering them out. Is that really what we want?

What we want that DIP1036e mostly provides is:
0. The library can detect whether it is being passed an istring.
1. The library that accepts the istring decides how to process it.
2. The string parts of the istring are known to the library at compile time.
3. The expression parts of the istring can be evaluated only at runtime.
4. The expression parts of the istring can be passed arbitrarily, by 
ref, lazy, alias, ... (this part in fact works better with DIP1027).
5. The library can access the original expression, e.g. in string form.
6. A templated function that is called with an istring can do all of the 
above.

 Will that impede the use of tuples generally, or just impede the use of
istrings?
 ...

It's just a way to achieve 0.-6. above relatively well with a simple 
patch to the lexer. I am not sure why it would impede anything except 
compile time and binary size.

 ---
 
 P.S. most keyboarding bugs result from neglecting to add needed syntax, 
 not typing extra stuff. This is why:
 
      int* p;
 
 is initialized to zero, while:
 
      int* p = void;
 
 is left uninitialized. The user is unlikely to accidentally type "= void".

The user (especially the kind of user that may be prone to accidentally 
introduce an SQL injection attack) is more likely to accidentally type 
`.format` or `.text` because that may be a relatively common way to use 
an istring in their code base.

Jan 11 2024

Timon Gehr <timon.gehr gmx.ch> writes:

On 1/11/24 21:13, Timon Gehr wrote:
 
 
 P.S. most keyboarding bugs result from neglecting to add needed syntax, 
 not typing extra stuff.

if (condition);
{
     ...
}

I think it's due to muscle memory and it does happen quite a bit.

Jan 11 2024

Timon Gehr <timon.gehr gmx.ch> writes:

On 1/9/24 00:06, Walter Bright wrote:
 Here's how SQL support is done for DIP1036:
 
 https://github.com/adamdruppe/interpolation-examples/blob/master/lib/sql.d
 
 ```
 auto execi(Args...)(Sqlite db, InterpolationHeader header, Args args, 
 InterpolationFooter footer) {
      import arsd.sqlite;
 
      // sqlite lets you do ?1, ?2, etc
 
      enum string query = () {
          string sql;
          int number;
          import std.conv;
          foreach(idx, arg; Args)
              static if(is(arg == InterpolatedLiteral!str, string
str))
                  sql ~= str;
              else static if(is(arg == InterpolationHeader) ||
is(arg == 
 InterpolationFooter))
                  throw new Exception("Nested interpolation not
supported");
              else static if(is(arg == InterpolatedExpression!code, 
 string code))
                  {   } // just skip it
              else
                  sql ~= "?" ~ to!string(++number);
          return sql;
      }();
 
      auto statement = Statement(db, query);
      int number;
      foreach(arg; args) {
          static if(!isInterpolatedMetadata!(typeof(arg)))
              statement.bind(++number, arg);
      }
 
      return statement.execute();
 }
 ```
 This:
 
 1. The istring, after converted to a tuple of arguments, is passed to 
 the `execi` template.
 2. It loops over the arguments, essentially turing it (ironically!) back 
 into a format

This is not ironic at all. The point is it _can_ do that, while DIP1027 
_cannot_ do _either this or the opposite direction_. It is yourself who 
called the istring the building block instead of the end product, but 
now you are indeed failing to turn the sausage back into the cow.

 string. The formats, instead of %s, are ?1, ?2, ?3, etc.
 3. It skips all the Interpolation arguments inserted by DIP1036.
 4. The remaining argument are each bound to the indices 1, 2, 3, ...
 5. Then it executes the sql statement.
 
 Note that nested istrings are not supported.
 ...

But you get a useful error message that exactly pinpoints what the 
problem is. Also, they could be supported, which is the point.

 Let's see how this can work with DIP1027:
 
 ```
 auto execi(Args...)(Sqlite db, Args args) {
      import arsd.sqlite;
 
      // sqlite lets you do ?1, ?2, etc
 
      enum string query = () {
          string sql;
          int number;
          import std.conv;
          auto fmt = arg[0];
          for (size_t i = 0; i < fmt.length, ++i)
          {
              char c = fmt[i];
              if (c == '%' && i + 1 < fmt.length && fmt[i + 1] ==
's')
              {
                  sql ~= "?" ~ to!string(++number);
                  ++i;
              }
              else if (c == '%' && i + 1 < fmt.length && fmt[i + 1]
== '%')
                  ++i;  // skip escaped %
              else
                  sql ~= c;
          }
          return sql;
      }();
 
      auto statement = Statement(db, query);
      int number;
      foreach(arg; args[1 .. args.length]) {
          statement.bind(++number, arg);
      }
 
      return statement.execute();
 }
 ```
 This:
 ...

This does not work.

 1. The istring, after converted to a tuple of arguments, is passed to 
 the `execi` template.
 2. The first tuple element is the format string.
 3. A replacement format string is created by replacing all instances of 
 "%s" with
 "?n", where `n` is the index of the corresponding arg.
 4. The replacement format string is bound to `statement`, and the 
 arguments are bound
 to their indices.
 5. Then it executes the sql statement.
 
 It is equivalent.

No. As Nickolay already explained, it is not equivalent.

- It does not even compile, even if we fix the typo arg -> args.

That is enough to dismiss DIP1027 for this example. However, let's for 
the sake of argument assume that, miraculously, `execi` can read the 
format string at compile time, then:

- With this signature, if you pass a manually-constructed string to it, 
it would just accept the SQL injection.
- It does not give a proper error message for nested istrings.
- It has to manually parse the format string. It iterates over each 
character of the original format string.
- It (ironically!) constructs a new format string, the original one was 
useless.
- If you pass a bad format string to it (for example, by specifying a 
manual format), it will just do nonsense, while DIP1036e avoids bad 
format strings by construction.

Jan 09 2024

Walter Bright <newshound2 digitalmars.com> writes:

On 1/9/2024 4:35 AM, Timon Gehr wrote:
 This does not work.

How so? Consider this:

```
import std.stdio;

auto execi(Args...)(Args args)
{
     auto fmt = args[0].dup;
     fmt[0] = 'k';
     writefln(fmt, args[1 .. args.length]);
}

void main()
{
     string b = "betty";
     execi(i"hello $b");
}
```

which compiles and runs, printing:

kello betty

Jan 09 2024

Timon Gehr <timon.gehr gmx.ch> writes:

On 1/9/24 20:16, Walter Bright wrote:
 On 1/9/2024 4:35 AM, Timon Gehr wrote:
 This does not work.

 
 How so?

It does not compile. The arg->args fix I'll grant you as it is a typo 
whose only significance is to make it even more clear that you never 
tried to run any version of the code, but then you still get another 
compile error. I suggest you mock out the SQL library, you don't 
actually need to install it to try your code.

If we remove the `enum` then your code still does not work correctly, 
for example because it does not prevent an SQL injection attack if the 
user constructs the SQL string manually by accidentally using `format`. 
I and other people already pointed out this flaw and other flaws in 
other posts.

 Consider this:
 
 ```
 import std.stdio;
 
 auto execi(Args...)(Args args)
 {
      auto fmt = args[0].dup;
      fmt[0] = 'k';
      writefln(fmt, args[1 .. args.length]);
 }
 
 void main()
 {
      string b = "betty";
      execi(i"hello $b");
 }
 ```
 
 which compiles and runs, printing:
 
 kello betty

I considered it and it did not have an impact on the way I view the 
DIP1027 `execi` implementation you have given.

Jan 09 2024

Walter Bright <newshound2 digitalmars.com> writes:

On 1/9/2024 4:35 AM, Timon Gehr wrote:
 However, let's for the sake 
 of argument assume that, miraculously, `execi` can read the format string at 
 compile time, then:

Adam's implementation of execi() also runs at run time, not compile time.


 - With this signature, if you pass a manually-constructed string to it, it
would 
 just accept the SQL injection.

It was just a proof of concept piece of code. execi could check for format 
strings that contain ?n sequences. It could also check the number of %s formats 
against the number of arguments.


 But you get a useful error message that exactly pinpoints what the problem is.
 Also, they could be supported, which is the point.
 - It does not give a proper error message for nested istrings.

execi could be extended to reject arguments that contain %s sequences. Or, if 
there was an embedded istring, the number of %s formats can be checked against 
the number of arguments. An embedded istring would show a mismatch.

I expect that use of nested istrings would be exceedingly rare. If they are 
used, wrapping them in text() will make work. Besides, would a nested istring
in 
an sql call be intended as part of the sql format, or would a text string be
the 
intended result?


 - It has to manually parse the format string. It iterates over each character
of 
 the original format string.

Correct. And it does not need to iterate over and remove all the Interpolation 
arguments. Nor does it need the extra two arguments, which aren't free of cost.


 - It (ironically!) constructs a new format string, the original one was
useless.

Yes, it converts the format specifiers to the sql ones. Why is this a problem?


 - If you pass a bad format string to it (for example, by specifying a manual 
 format), it will just do nonsense, while DIP1036e avoids bad format strings by 
 construction.

What happens when ?3 is included in a DIP1036 istring? `i"string ?3 ($betty)" ? 
I didn't see any check for that. Of course, one could add such a check to the 
1036 execi.

printf format strings are checked by the compiler, and writef format strings
are 
checked by writef. execi is also capable of being extended to check the format 
string to ensure the format matches the args.

Jan 09 2024

Nickolay Bukreyev <buknik95 ya.ru> writes:

On Tuesday, 9 January 2024 at 20:01:34 UTC, Walter Bright wrote:
 With the istring, there are 4 calls to struct member functions 
 that just return null.
 This can't be good for performance or program size.

A valid point, thanks. Could you test if that fixes the issue?

```d
import core.interpolation;
import std.meta: AliasSeq, staticMap;
import std.stdio;

template filterOutEmpty(alias arg) {
     alias T = typeof(arg);

     static if (is(T == InterpolatedLiteral!s, string s))
         static if (s.length)
             alias filterOutEmpty = s;
         else
             alias filterOutEmpty = AliasSeq!();
     else static if (
         is(T == InterpolationHeader) ||
         is(T == InterpolatedExpression!code, string code) ||
         is(T == InterpolationFooter)
     )
         alias filterOutEmpty = AliasSeq!();
     else
         alias filterOutEmpty = arg;
}

pragma(inline, true) // This pragma is necessary unless you 
compile with `-inline`.
void log(Args...)(InterpolationHeader, Args args, 
InterpolationFooter) {
     writeln(staticMap!(filterOutEmpty, args));
}

void main() {
     int baz = 3;
     log(i"$(baz + 4)");
     writeln(baz + 5);
}
```

 Adam's implementation of execi() also runs at run time, not 
 compile time.

We are probably talking about different things. Adam’s 
implementation constructs a format string at compile time thanks 
to `enum` storage class [in line 
36](https://github.com/adamdruppe/interpolation-examples/blob/a8a5d4d4ee37ee9ae3942c4f4e8489011c3c
673/lib/sql.d#L36). Constructing it at compile time is essential so that we can
validate the generated SQL and abort compilation, as Paolo
[demonstrated](https://forum.dlang.org/post/qbtbyxcglwijjbeygtvi forum.dlang.org).

 execi could be extended to reject arguments that contain %s 
sequences.

I disagree. Storing a string that contains `%s` in a database 
should be allowed (storing any string should obviously be 
allowed, regardless of its contents). But `execi` is unable to 
differentiate between a string that happens to contain `%s` and a 
nested format string:

```
// DIP1027
example(i"prefix1 $(i"prefix2 $(x) suffix2") suffix1");
// Gets rewritten as:
example("prefix1 %s suffix1", "prefix2 %s suffix2", x);
```

I might be wrong, but it appears to me that DIP1027 is not able 
to deal with nested format strings, in a general case. DIP1036 
has no such limitation (demonstrated in point 2 
[here](https://forum.dlang.org/post/lizjwxdgsnmgykaoczyf forum.dlang.org)).

 Nor does it need the extra two arguments, which aren't free of 
 cost.

I explained 
[here](https://forum.dlang.org/post/qkvxnbqjefnvjyytfana forum.dlang.org) why
these two arguments are valuable. Aren’t free of cost—correct unless you
enable inlining. `execi` may require some changes (like `filterOutEmpty` I
showed above) to make them free of cost, but it is doable.

 What happens when ?3 is included in a DIP1036 istring? 
 `i"string ?3 ($betty)"` ? I didn't see any check for that. Of 
 course, one could add such a check to the 1036 execi.

You are right, it doesn’t. Timon’s point (expressed as “This does 
not work”) is that DIP1036 is able to do validation at compile 
time while DIP1027 is only able to do it at runtime, when this 
function actually gets invoked.

Jan 09 2024

Walter Bright <newshound2 digitalmars.com> writes:

P.S. Thank you for your well constructed arguments.

On 1/9/2024 1:35 PM, Nickolay Bukreyev wrote:
A valid point, thanks. Could you test if that fixes the issue?

Yes, that works.

We are probably talking about different things. Adam’s implementation
constructs
a format string at compile time thanks to `enum` storage class [in line
36](https://github.com/adamdruppe/interpolation-examples/blob/a8a5d4d4ee37ee9ae3942c4f4e8489011c3c4673/lib/sql.d#L36).

Yes, you're right.

Constructing it at compile time is essential so that we can validate the
generated SQL and abort compilation, as Paolo
[demonstrated](https://forum.dlang.org/post/qbtbyxcglwijjbeygtvi forum.dlang.org).

That only checks one aspect of correctness - nested string interpolations.

execi could be extended to reject arguments that contain %s sequences.

I disagree. Storing a string that contains `%s` in a database should be
allowed
(storing any string should obviously be allowed, regardless of its contents).

True, which is why a % that is not intended as a format specifier is entered as
%%.

But `execi` is unable to differentiate between a string that happens to
contain
`%s` and a nested format string:

```
// DIP1027
example(i"prefix1 $(i"prefix2 $(x) suffix2") suffix1");
// Gets rewritten as:
example("prefix1 %s suffix1", "prefix2 %s suffix2", x);
```

I might be wrong, but it appears to me that DIP1027 is not able to deal with
nested format strings, in a general case.

The expansion for `example` has a mismatch in the number of formats (1) and
number of arguments (2). This can be detected at runtime by `example`, as I've
explained.

A compile time way is DIP1027 can be modified to reject any arguments that
consist of tuples with other than one element. This would eliminate nested
istring tuples at compile time.

DIP1036 has no such limitation
(demonstrated in point 2
[here](https://forum.dlang.org/post/lizjwxdgsnmgykaoczyf forum.dlang.org)).

DIP1036 cannot detect other problems with the string literals. It seems like a
lot of complexity to deal with only one issue with malformed strings at compile
time rather than runtime.

I explained
[here](https://forum.dlang.org/post/qkvxnbqjefnvjyytfana forum.dlang.org) why
these two arguments are valuable. Aren’t free of cost—correct unless you
enable
inlining. `execi` may require some changes (like `filterOutEmpty` I showed
above) to make them free of cost, but it is doable.

You'd have to also make every formatted writer a template, and add the filter
to
them.

You are right, it doesn’t. Timon’s point (expressed as “This does not
work”) is
that DIP1036 is able to do validation at compile time while DIP1027 is only
able
to do it at runtime, when this function actually gets invoked.

The only validation it does is check for nested string interpolations.

Jan 09 2024

Paolo Invernizzi <paolo.invernizzi gmail.com> writes:

On Tuesday, 9 January 2024 at 23:21:34 UTC, Walter Bright wrote:
 P.S. Thank you for your well constructed arguments.

 On 1/9/2024 1:35 PM, Nickolay Bukreyev wrote:
 A valid point, thanks. Could you test if that fixes the issue?

 Yes, that works.

 We are probably talking about different things. Adam’s 
 implementation constructs a format string at compile time 
 thanks to `enum` storage class [in line 
 36](https://github.com/adamdruppe/interpolation-examples/blob/a8a5d4d4ee37ee9ae3942c4f4e8489011c3c4673/lib/sql.d#L36).

 Yes, you're right.

 Constructing it at compile time is essential so that we can 
 validate the generated SQL and abort compilation, as Paolo 
 [demonstrated](https://forum.dlang.org/post/qbtbyxcglwijjbeygtvi forum.dlang.org).

 That only checks one aspect of correctness - nested string 
 interpolations.

No.

If you look at the errors raised during the compilation of our 
codebase, we are checking FAR MORE, for example the second error 
is related to wrong missing grant condition on a select.

I've included it as an example, just not syntax, table names, 
semantic or so, but also permissions, at compile time.

And that's a concrete codebase, used in production, not 
speculations.

/P

Jan 09 2024

Paolo Invernizzi <paolo.invernizzi gmail.com> writes:

On Tuesday, 9 January 2024 at 23:21:34 UTC, Walter Bright wrote:

 Constructing it at compile time is essential so that we can 
 validate the generated SQL and abort compilation, as Paolo 
 [demonstrated](https://forum.dlang.org/post/qbtbyxcglwijjbeygtvi forum.dlang.org).

 That only checks one aspect of correctness - nested string 
 interpolations.

<snip>

 DIP1036 has no such limitation (demonstrated in point 2 
 [here](https://forum.dlang.org/post/lizjwxdgsnmgykaoczyf forum.dlang.org)).

 DIP1036 cannot detect other problems with the string literals. 
 It seems like a lot of complexity to deal with only one issue 
 with malformed strings at compile time rather than runtime.

You are underestimating what can be gained as value in catching 
SQL problems at compile time instead of runtime. And, believe me, 
it's not a matter of mocking the DB and relying on unittest and 
coverage.

CTFE capability is needed.

/P

Jan 09 2024

Walter Bright <newshound2 digitalmars.com> writes:

On 1/9/2024 3:49 PM, Paolo Invernizzi wrote:
 You are underestimating what can be gained as value in catching SQL problems
at 
 compile time instead of runtime. And, believe me, it's not a matter of mocking 
 the DB and relying on unittest and coverage.

Please expand on that. This is a very important topic. I want to know all the 
relevant facts.


 CTFE capability is needed.

I concur that compile time errors are better than runtime errors. But in this 
case, there's a continuing cost to have them, cost to other far more common use 
cases for istrings. The cost is in terms of complexity, about needing to filter 
out all the extra marker templates, about reducing its utility as a tuple 
generator with the unexpected extra elements, larger object files, much longer 
mangled names, and so on.

Want to know the source of my unease about it? Simple things should be simple. 
This isn't. The extra complexity is always there, even for the simple cases,
and 
the simple cases are far and away the most common use cases.

Frankly, it reminds me of C++ template expressions, which caught the C++ world 
by storm for about 2 years, before it faded away into oblivion and nobody talks 
about them anymore. Fortunately for C++, template expressions could be ignored, 
as they were not a core language feature. But DIP1036 is a core language 
feature, a feature we would be stuck with forever. And I'll be the one who gets 
the heat for it.

The compile-time vs runtime issue is the only thing left standing where the 
advantage goes to DIP1036.

So it needs a very compelling case.

P.S. You can do template expressions in D, too!

Jan 11 2024

Paolo Invernizzi <paolo.invernizzi gmail.com> writes:

On Friday, 12 January 2024 at 06:06:52 UTC, Walter Bright wrote:
 On 1/9/2024 3:49 PM, Paolo Invernizzi wrote:
 You are underestimating what can be gained as value in 
 catching SQL problems at compile time instead of runtime. And, 
 believe me, it's not a matter of mocking the DB and relying on 
 unittest and coverage.

 Please expand on that. This is a very important topic. I want 
 to know all the relevant facts.

As a preamble, we are _currently_ doing all the SQL validations 
against schemas at compile time: semantic of the query, 
correctness of the relations involved, types matching with D (and 
Elm types), permission granted to roles that are performing the 
query.

That's not a problem at all, it's just something like:

    sql!`select foo from bar where baz > 1` [1]

In the same way we check also this:

   sql!`update foo set bag = ${d_variable_bag}`

But to attach sanitise functionalities in what is inside 
`d_variable_bag`, checking its type, and actually bind the 
content for the sql protocol is done by mixins, after the 
sql!string instantiation. As you can guess, that is the most 
common usage, by far, the business logic is FULL of stuff like 
that.

The security aspect is related to the fact that you _always_ need 
to sanitise the data content of the d variable, the mixin takes 
care of that part, and you can't skip it.

Said that, unittesting at runtime can be done against a real db, 
or mocking it.

A real db is onerous, sometime you need additional licenses, 
resource management, and it's time consuming. Just imagine 
writing D code, but having back errors not during compilations 
but only when the "autotester" CI task completed!

Keep in mind that using a real db is a very common, for one 
simple reason: mocking a db to be point of being useful for unit 
testing is a PITA. The common approach is simply skipping that, 
and mock the _results_ of the data retrieved by the query, to 
unittest the business logic. The queries are not checked until 
they run agains the dev db.

The compile time solutions instead, give you immediately feedback 
on wrong query, wrong type bindings, and that's invaluable 
especially regarding a fundamental things: refactory of code, or 
schema changes.

If the DB schema is changed, the application simply does not 
compile anymore, until you align it again against the changed 
schema. And the compiler gently points you to the pieces of code 
you need to adjust, and the same if you change a D type that 
somewhere will be bond to a sql parameters. So you can refactor 
without fears, and if the application compiles, you are assured 
to have everything aligned.

It's like extending the correctness of  type system down to the 
db type system, and it's priceless.

So, long story short: we will be forced to use mixin if we can't 
rely on CT interpolation, but having it will simplify the 
codebase.

[1] well, query sometimes can be things like that:

     with
         dsx as (select face_id, bounding_box_px, gaze_yaw_deg, 
gaze_pitch_deg from dev_eyes where eye = ${sx}),
         ddx as (select face_id, bounding_box_px, gaze_yaw_deg, 
gaze_pitch_deg from dev_eyes where eye = ${dx})
     select
         dfc.bounding_box_px as face, dfc.expression, 
dby.center_z_mm,
         dsx.bounding_box_px as eye_sx, dsx.gaze_pitch_deg, 
dsx.gaze_yaw_deg,
         ddx.bounding_box_px as eye_dx, ddx.gaze_pitch_deg, 
ddx.gaze_yaw_deg
     from dev_samples
         left join dev_bodies as dby using(sample_id)
         left join dev_faces as dfc using(body_id)
         left join dsx using(face_id)
         left join ddx using(face_id)
     where dev_samples.device_id = ${deviceId}
         and system_timestamp_ms = (select 
max(system_timestamp_ms) from dev_samples where 
dev_samples.device_id=${deviceId})
         and dfc.bounding_box_px is not null`
     order by dby.center_z_mm

Jan 12 2024

Timon Gehr <timon.gehr gmx.ch> writes:

On 1/12/24 07:06, Walter Bright wrote:
 
 The compile-time vs runtime issue is the only thing left standing where 
 the advantage goes to DIP1036.

This is not true, DIP1027 also suffers from other drawbacks. For example:

- DIP1027 has already been rejected.
- Format string has to be passed as a runtime argument.
- Format string has to be parsed. (Whether at runtime or compile time.)
- Format string is not transparent to the library user, they have to 
manually escape '%'.
- No simple way to detect the end of the part of the argument list that 
is part of the istring.
- Cannot support nested istrings. (I guess the `enum Format: string;` 
would mitigate this to some extent.)

DIP1027 has the following advantages:
- No interspersed runtime arguments not carrying any runtime data, this 
is a bit easier to consume.
- Fewer template instantiations.


In any case, I think the compile-time vs runtime issue is the most 
significant. I do not want a solution that does not integrate well with 
metaprogramming, it's just not worth it.

Jan 12 2024

Timon Gehr <timon.gehr gmx.ch> writes:

On 1/12/24 16:27, Timon Gehr wrote:
 - Cannot support nested istrings. (I guess the `enum Format: string;` 
 would mitigate this to some extent.)

- In any case, DIP1027 cannot support nested expression sequences 
without the user passing a manual marker. DIP1036e can support them 
quite naturally.

Jan 12 2024

Steven Schveighoffer <schveiguy gmail.com> writes:

On Friday, 12 January 2024 at 06:06:52 UTC, Walter Bright wrote:
 On 1/9/2024 3:49 PM, Paolo Invernizzi wrote:
 CTFE capability is needed.

 I concur that compile time errors are better than runtime 
 errors. But in this case, there's a continuing cost to have 
 them, cost to other far more common use cases for istrings. The 
 cost is in terms of complexity, about needing to filter out all 
 the extra marker templates, about reducing its utility as a 
 tuple generator with the unexpected extra elements, larger 
 object files, much longer mangled names, and so on.

The point is to pass the things that the compiler knows to the 
library, namely the string literal parts. Within the current 
domain of the D language, the best way to do this is to use 
string template parameters.

Necessarily, this is going to incur template symbol name 
explosion. I would love to solve this problem, especially in the 
cases where compile-time usage isn't needed. Having the 
compile-time expressions is essential when you need it, but is 
pretty ugly when you don't.

Again, we can have wrapper templates that do this for you. The 
problem (as always) is that these wrapper templates are still in 
there, still taking up space. Is there any room for a solution 
here? I'm talking about the compiler being clued in that these 
functions shouldn't exist in the binary. Then the compiler can 
take a lot of shortcuts (like hashing the type data instead of 
making a demangleable symbol).

But Timon is also right that the "format string" version is 
actually adding to the grief for library writers and users. 
There's no reason I can think of to add additional parsing 
requirements for the library. I'd prefer Jonathan Marler's 
solution of just interspersing strings and values if I had to 
pick between that and DIP1027. But that still leaves so much on 
the table of what *could be great*.

I also think it's fine to tell users 'Hey, you want formatted 
output? it's writef("format", args)'. My target was not and never 
will be, `writef`.

 Want to know the source of my unease about it? Simple things 
 should be simple. This isn't. The extra complexity is always 
 there, even for the simple cases, and the simple cases are far 
 and away the most common use cases.

It actually is simple. It's a simple transformation from a parsed 
expression to the subexpressions contained within (sprinkling in 
types to make it easy to know what is what). What you *do* with 
the transformation might not be simple, but that's not necessary 
to use the feature.

 Frankly, it reminds me of C++ template expressions, which 
 caught the C++ world by storm for about 2 years, before it 
 faded away into oblivion and nobody talks about them anymore. 
 Fortunately for C++, template expressions could be ignored, as 
 they were not a core language feature. But DIP1036 is a core 
 language feature, a feature we would be stuck with forever. And 
 I'll be the one who gets the heat for it.

I just looked it up and... no. It's not even close. There is no 
*requirement* to make this complicated. The transformation is 
simple and straightforward. It's easy to understand if you take 5 
minutes to read the docs.

If you want to build some insanely complex thing out of this, 
it's possible. But there is no requirement to use it that way. To 
reiterate, the *feature* is simple, what you can do with the 
feature is unbounded.

This is like saying templates are too complicated because of what 
you *can do* with templates.

 P.S. You can do template expressions in D, too!

I rest my case ;)

-Steve

Jan 12 2024

Nickolay Bukreyev <buknik95 ya.ru> writes:

On Tuesday, 9 January 2024 at 23:21:34 UTC, Walter Bright wrote:
 A compile time way is DIP1027 can be modified to reject any 
 arguments that consist of tuples with other than one element. 
 This would eliminate nested istring tuples at compile time.

To sum up, it works with nested istrings poorly; it may even be 
sensible to forbid them entirely for DIP1027. Glad we’ve reached 
a consensus on this point. This case doesn’t seem crucial at the 
moment though; now we can focus on more relevant questions.

 DIP1036 cannot detect other problems with the string literals. 
 It seems like a lot of complexity to deal with only one issue 
 with malformed strings at compile time rather than runtime.

DIP1036 provides full CTFE capabilities at your disposal. You can 
validate _anything_ about a format string; any 
compile-time-executable hypothetical `validateSql(query)` will 
fit. I guess none of the examples presented so far featured such 
validation because it usually tends to be long and not 
illustrative.

However, another Adam’s example [does 
perform](https://github.com/adamdruppe/interpolation-examples/blob/a8a5d4d4ee37ee9ae3942c4f4e8489011c3
4673/07-html.d#L13) non-trivial compile-time validation. Here is how it is
[implemented](https://github.com/adamdruppe/interpolation-examples/blob/a8a5d4d4ee37ee9ae3942c4f4e8489011c3c4673/lib/html.d#L97).

 Constructing it at compile time is essential so that we can 
 validate the generated SQL and abort compilation, as Paolo 
 [demonstrated](https://forum.dlang.org/post/qbtbyxcglwijjbeygtvi forum.dlang.org).

 That only checks one aspect of correctness - nested string 
 interpolations.

They check a lot more. I agree it is hard to spot the error 
messages in the linked post so I’ll copy them here:

     relation "snapshotsssss" does not exist. SQL: select size_mm, 
size_px from snapshotsssss where snapshot_id = $1

     role "dummyuser" can't select on table "snapshots". SQL: 
select size_mm, size_px from snapshots where snapshot_id = $1

As you can see, they check sophisticated business logic expressed 
in terms of relational databases. And all of that happens at 
compile time. Isn’t that a miracle?

 I explained 
 [here](https://forum.dlang.org/post/qkvxnbqjefnvjyytfana forum.dlang.org) why
these two arguments are valuable. Aren’t free of cost—correct unless you
enable inlining. `execi` may require some changes (like `filterOutEmpty` I
showed above) to make them free of cost, but it is doable.

 You'd have to also make every formatted writer a template,

Err… every formatted writer has to be a template anyway, doesn’t 
it? It needs to accept argument lists that may contain values of 
arbitrary types.

 …and add the filter to them.

Yeah. I admit this is a problem. As a rule of thumb, the most 
obvious code should yield the best results. With DIP1036, this is 
not the case at the moment: when you pass an interpolation 
sequence to a function not specifically designed for it, it 
wastes more stack space than necessary and passes useless junk in 
registers.

Others have mentioned that DIP1027 performs much worse in terms 
of speed (due to runtime parsing). While that is undoubtable, I 
think DIP1036 should be tweaked to behave as good as possible.

There was an idea in this thread to improve the ABI so that it 
ignores empty structs, but I’m rather sceptical about it.

Instead, let us note there are basically two patterns of usage 
for istrings:

1. Passing to a function that processes an istring and does 
something non-trivial. `execi` is a good example.
2. Passing to a function that simply stringifies every fragment, 
one after another. `writeln` is a good example.

Something counterintuitive, case 1 is easier to address: the 
function already traverses the received sequence and transforms 
it. So it is only necessary to write it in such way that it is 
inline-friendly.

By the way, what functions do we have in Phobos that fall into 
the case-2 category? `write`/`writeln`, `std.conv.text`, 
`std.logger.core.log`, and… is that all? Must be something else!..

Turns out there are only a handful of relevant functions in the 
entire stdlib. It shouldn’t be hard to put a filter in each of 
them. It also hints they are probably not that common in the wild.

However, when one encounters a third-party `write`-like function 
that is unaware of `InterpolationHeader`/etc., they should have a 
means to fix it from outside, i.e., without touching its source 
and ideally without writing a wrapper by hand. Unfortunately, I 
could not come up with a satisfactory solution for this. Will 
keep thinking. Perhaps someone else manages to find it faster.

---

An idea in a different direction. Currently, 
`InterpolationHeader`/etc. structs interoperate with `write`-like 
functions seamlessly (at the expense of passing zero-sized 
arguments) due to the fact they all have an appropriate 
`toString` method. If we remove those methods (and do nothing 
else), then `write(i"a$(x)b")` would produce something like:

     
InterpolationHeader()InterpolatedLiteral!"a"()InterpolatedExpression!"x"()42InterpolatedLiteral!"b"()InterpolationFooter()

The program, rather than introducing a silent inefficiency, 
immediately tells the user they need to account for these types.

---

And one more idea. Current implementation of DIP1036 can emit 
empty chunks—i.e., `InterpolatedLiteral!""`—see for example 
`i"$(x)"`. If I was making a guess why it does so, I would say it 
strives to produce consistent, regular sequences. On the one 
hand, it might ease the job of interpolation-sequence handlers: 
they can count on the fact that expressions and literals always 
alternate inside a sequence. On the other, they have to check if 
a literal is empty and drop it if it is so it actually makes 
their job harder.

I do not know whether not producing empty literals in the first 
place would be a positive or negative change. But it is something 
worth to consider.

---

Slightly off-topic: when I was thinking about this, I was 
astonished by the fact istrings can work with 
`readf`/`formattedRead`/`scanf`. Just wanted to share this 
observation.

```d
readf(i" $(&x) $(&y)");
```

 The compiler will indeed reject it (The error message would be 
 a bit baffling to those who don't know what Interpolation types 
 are)

This is true. I suppose the docs should mention 
`InterpolationHeader` and friends when talking about istrings, 
explain what an istring is lowered to, and show examples. Then a 
programmer who’ve read the docs will have a mental association 
between “istring” and “InterpolationHeader/Footer/etc.” Those who 
don’t read the docs—well, they won’t have. Only googling will 
save them.

To be honest, I’m not concerned about this point too much.

 along with any attempt to call execi() with a pre-constructed 
 string.

 The end result is that to do manipulation with istring tuples, 
 the programmer is alternately faced with adding Interpolation 
 elements or filtering them out. Is that really what we want?

I’d argue it is wonderful that `execi` cannot be called with a 
pre-constructed string. The API should provide another function 
instead—say, `execDynamicStatement(Sqlite, string, Args...)`. 
`execi` should be used for statically known SQL with interpolated 
arguments, and `execDynamicStatement`—for arbitrary SQL 
constructed at runtime. A verbose name is intentional to 
discourage its usage in favour of `execi`.

 P.S. most keyboarding bugs result from neglecting to add needed 
 syntax, not typing extra stuff.

That makes sense. Though you’ll never guess what beast can be 
spawned by uncareful refactoring. Extra protection won’t harm, 
especially if it’s zero-cost.

P.S. Zero-initialization of variables is one of D’s cool 
features, indeed.

Jan 09 2024

Timon Gehr <timon.gehr gmx.ch> writes:

On 1/10/24 00:21, Walter Bright wrote:
 ...

The other points I think have been adequately addressed already.

 You are right, it doesn’t. Timon’s point (expressed as “This does not 
 work”) is that DIP1036 is able to do validation at compile time while 
 DIP1027 is only able to do it at runtime, when this function actually 
 gets invoked.

 
 The only validation it does is check for nested string interpolations.

That is not true in the least. It validates conclusively that no SQL 
injection attack is going on. This is the main feature of the example!

Jan 11 2024

Timon Gehr <timon.gehr gmx.ch> writes:

On 1/9/24 21:01, Walter Bright wrote:
 On 1/9/2024 4:35 AM, Timon Gehr wrote:
 However, let's for the sake of argument assume that, miraculously, 
 `execi` can read the format string at compile time, then:

 Adam's implementation of execi() also runs at run time, not compile time.
 ...

Adam's `execi` partially runs at compile time and partially of course it 
will ultimately run at run time (like code generated by a metaprogram 
tends to do).

The SQL statement is prepared at compile time. Therefore, by 
construction, it cannot depend on any runtime parameters, preventing an 
SQL injection. (And it can be checked at compile time, like people are 
already doing with less convenient syntax).

 - With this signature, if you pass a manually-constructed string to 
 it, it would just accept the SQL injection.

 It was just a proof of concept piece of code.

So is Adam's example code. In any case:

I am talking about the function _signature_. Whatever crazy advanced 
thing you do in the implementation, the signature that DIP1027 expects 
`execi` to have is fundamentally significantly less safe.

 execi could check for 
 format strings that contain ?n sequences. It could also check the number 
 of %s formats against the number of arguments.
 ...

That does not fix the security issue.

  > But you get a useful error message that exactly pinpoints what the 
 problem is.
  > Also, they could be supported, which is the point.
 - It does not give a proper error message for nested istrings.

 execi could be extended to reject arguments that contain %s sequences. 

And now suddenly you can no longer store anything that looks like a 
format string in your data base.

 Or, if there was an embedded istring, the number of %s formats can be 
 checked against the number of arguments.

Maybe at runtime. But why introduce this failure mode in the first place?

 An embedded istring would show a mismatch.
 ...

The error message would be phrased in overly general terms and hence be 
confusing.

 I expect that use of nested istrings would be exceedingly rare. If they 
 are used, wrapping them in text() will make work.

Depends on how exactly they are used. For the SQL case, not allowing 
them is a decent option.

 Besides, would a 
 nested istring in an sql call be intended as part of the sql format, or 
 would a text string be the intended result?
 ...

Whatever it is, with DIP1036e and compile-time SQL construction, user 
data does not make it into the SQL expression sent to the database.

 - It has to manually parse the format string. It iterates over each 
 character of the original format string.

 Correct. And it does not need to iterate over and remove all the 
 Interpolation arguments.

Adam's implementation does the filtering at compile time.

The function body will be something like:

auto statement = Statement(db, "...?1...?2...?3..."); // replace ... by 
query
int number = 0;
statement.bind(++number, firstArg);
statement.bind(++number, secondArg);
statement.bind(++number, thirdArg);

But yes, DIP1036e does make some concessions and it will indeed pass 
empty struct arguments in case the function is not inlined (could use 
pragma(inline, true) to avoid it.)

 Nor does it need the extra two arguments, which 
 aren't free of cost.
 ...

Are you really going to argue that some extra empty struct arguments are 
in some way more expensive than runtime query construction including 
format string parsing and query construction using GC strings?

But anyway, if you think interpolation is not worth runtime overhead 
that would perhaps need to be mitigated using additional features or an 
improved calling convention, that's up to you, but then DIP1027 loses too.

 - It (ironically!) constructs a new format string, the original one 
 was useless.

 Yes, it converts the format specifiers to the sql ones. Why is this a 
 problem?
 ...

You argued earlier like it is in some way an ironic benefit of DIP1027 
that the DB interface requires something that is similar to a format 
string under the hood. Well, it does not require the kind of format 
string that DMD is generating.

 - If you pass a bad format string to it (for example, by specifying a 
 manual format), it will just do nonsense, while DIP1036e avoids bad 
 format strings by construction.

 What happens when ?3 is included in a DIP1036 istring? `i"string ?3 
 ($betty)" ? I didn't see any check for that.

That's a fair point in general, but I was specifically talking about the 
format string that you pass into the function that accepts the istring, 
not similar kinds of strings that may or may not be generated in the 
implementation.

In any case, DIP1027 istrings can also create a format string with `?3`, 
and there no way to check within `execi` if that `?3` came from 
malicious data that was read as input to the program or was put there by 
an incompetent programmer.

 Of course, one could add such a check to the 1036 execi.
 ...

With DIP1036e the check could be done at compile time.

 printf format strings are checked by the compiler,

As a one-off special case that only supports a specific kind of format 
string.

 and writef format strings are checked by writef.

`writef` allows the format string to be passed as a template parameter 
if compile-time parsing and checking is requested. DIP1027 does not 
naturally support this.

 execi is also capable of being extended 
 to check the format string to ensure the format matches the args.

With DIP1027, you'd have to do it at runtime.

Jan 09 2024

Walter Bright <newshound2 digitalmars.com> writes:

I'd like to see an example of how DIP1027 does not prevent an injection attack.

Jan 11 2024

Timon Gehr <timon.gehr gmx.ch> writes:

On 1/12/24 07:13, Walter Bright wrote:
 I'd like to see an example of how DIP1027 does not prevent an injection 
 attack.

```d
// mock SQL
import std.format, std.variant;
class Sqlite{
     this(string){}
     Sqlite query(string command,scope Variant[int] args=null){
         writeln("EXECUTING");
         writeln(command);
         if(args.length){
             writeln("ARGS:");
             foreach(k,v;args){
                 if(v!=Variant.init)
                     writefln(i"?$k = ($v)");
             }
         }
         writeln("DONE");
         return this;
     }
     struct Row{ int opIndex(int i){ return 0; } }
     int opApply(scope int delegate(Row) dg){
         writeln("ITERATING OVER ROWS");
         return 0;
     }
}
struct Statement{
     Sqlite db;
     string query;
     Variant[int] args;
     void bind(T)(int i,T arg){
         args[i]=Variant(arg);
     }
     void execute(){
         db.query(query,args);
     }
}

auto execi(Args...)(Sqlite db, Args args) {
     // sqlite lets you do ?1, ?2, etc

     string query = () { // note: parsing done at runtime
         string sql;
         int number;
         import std.conv;
         auto fmt = args[0];
         for (size_t i = 0; i < fmt.length; ++i)
         {
             char c = fmt[i];
             if (c == '%' && i + 1 < fmt.length && fmt[i + 1] == 's')
             {
                 sql ~= "?" ~ to!string(++number);
                 ++i;
             }
             else if (c == '%' && i + 1 < fmt.length && fmt[i + 1] == '%')
                 ++i;  // skip escaped %
             else
                 sql ~= c;
         }
         return sql;
     }();

     auto statement = Statement(db, query);
     int number;
     foreach(arg; args[1 .. args.length]) {
         statement.bind(++number, arg);
     }

     return statement.execute();
}

import std.stdio;

void main() {
     auto db = new Sqlite(":memory:");
     db.query("CREATE TABLE Students (id INTEGER, name TEXT)");

     // you might think this is sql injection... and you'd be right! the lib
     // cannot use rich metadata because it is not provided by the istring
     // therefore, it cannot verify that the user didn't construct the
     // query themselves in an unsafe way
     int id = 1;
     string name = "Robert'); DROP TABLE Students;--";
     db.execi(i"INSERT INTO sample VALUES ($(id), '$(name)')".format);

     foreach(row; db.query("SELECT * from sample"))
         writeln(row[0], ": ", row[1]);
}
```

Prints:
EXECUTING
CREATE TABLE Students (id INTEGER, name TEXT)
DONE
EXECUTING
INSERT INTO sample VALUES (1, 'Robert'); DROP TABLE Students;--')
DONE
EXECUTING
SELECT * from sample
DONE
ITERATING OVER ROWS


https://xkcd.com/327/

Jan 12 2024

Timon Gehr <timon.gehr gmx.ch> writes:

On 1/9/24 21:01, Walter Bright wrote:
 
 
 I expect that use of nested istrings would be exceedingly rare. If they 
 are used, wrapping them in text() will make work.

One more point here is that `text` will of course only work with 
DIP1038e, with DIP1027 you need `format`. In any case, unfortunately I 
have to bow out of this discussion now as it is consuming too much of my 
time right in front of a deadline. I can get back to this in a couple of 
days.

Jan 09 2024

Steven Schveighoffer <schveiguy gmail.com> writes:

At the end of the day, DIP1027 is an improvement of `writef`, and 
`writef` only (not even `printf` works correctly). The 
interpolation DIP Atila is writing (I'll call it IDIP) supports 
all manner of interpolated transformations, efficiently and 
effectively, with proper compiler checks.

Let's go through the points made...

On Monday, 8 January 2024 at 23:06:40 UTC, Walter Bright wrote:
 Here's how SQL support is done for DIP1036:

 https://github.com/adamdruppe/interpolation-examples/blob/master/lib/sql.d

...

 This:

 1. The istring, after converted to a tuple of arguments, is 
 passed to the `execi` template.

Yes, and with an explicit type to be matched against, enabling 
overloading. Note that `execi` could be called the same thing as 
the normal execution function, and then users could use whatever 
form they prefer -- sql string + args or istring. It's a seamless 
experience.

Compare to DIP1027 where you can accidentally use the wrong form 
with string args.

 2. It loops over the arguments, essentially turing it 
 (ironically!) back into a format
 string. The formats, instead of %s, are ?1, ?2, ?3, etc.

There is no formatting, sqlite does not have any kind of format 
specifiers.

No, it is not "turned back" into a format string, because there 
was no format string to begin with. The sql is *constructed* 
using the given information from the compiler clearly identifying 
which portions are sql and which portions are parameters.

And the SQL query is built at compile time, not runtime (as 
DIP1027 *must do*). This incurs no memory allocations at runtime.

 3. It skips all the Interpolation arguments inserted by DIP1036.

Sure, those are not necessary here. Should be a no-op, as no data 
is actually passed.

 4. The remaining argument are each bound to the indices 1, 2, 
 3, ...

Yes.

 5. Then it executes the sql statement.

Yes.

 Note that nested istrings are not supported.

Note that nested istrings can be *detected*.

And they are not supported *as explicitly specified*! This is not 
a defect or limitation but a choice of the particular example 
library.

Noting this "limitation" is like noting the limitation that `void 
foo(int)` can't be called with a `string` argument.

 Let's see how this can work with DIP1027:

 ```d
 auto execi(Args...)(Sqlite db, Args args) {
     import arsd.sqlite;

     // sqlite lets you do ?1, ?2, etc

     enum string query = () {
         string sql;
         int number;
         import std.conv;
         auto fmt = args[0];
         for (size_t i = 0; i < fmt.length, ++i)
         {
             char c = fmt[i];
             if (c == '%' && i + 1 < fmt.length && fmt[i + 1] == 
 's')
             {
                 sql ~= "?" ~ to!string(++number);
                 ++i;
             }
             else if (c == '%' && i + 1 < fmt.length && fmt[i + 
 1] == '%')
                 ++i;  // skip escaped %
             else
                 sql ~= c;
         }
         return sql;
     }();
 ```


As mentioned several times, this fails to compile -- an enum 
cannot be built from the runtime variable `args`.

Now, you can just do this *without* an enum, and yes, it will 
compile, build a string at runtime, and you are now at the mercy 
of the user to not have put in specialized placeholder (poorly 
named as a "format specifier" in DIP1027 because it is solely 
focused on writef). No compiler help for you!

To put it another way, you have given up complete control of the 
API of your library to the compiler and the user. Instead of 
understanding what the user has said, you have to guess.

And BTW, this is valid SQL:

```sql
SELECT * FROM someTable WHERE fieldN LIKE '%something%'
```

Which means, the poor user needs to escape `%` in a way 
completely unrelated to the sql language *or* the istring 
specification, something that IDIP doesn't require.

This is a further burden on the user that is wholly unnecessary, 
just because DIP1027 decided to use `%s` as "the definitive 
~~placeholder~~ format specifier".

 ```d
     auto statement = Statement(db, query);
     int number;
     foreach(arg; args[1 .. args.length]) {
         statement.bind(++number, arg);
     }

     return statement.execute();
 }
 ```
 This:

 1. The istring, after converted to a tuple of arguments, is 
 passed to the `execi` template.

A tuple with an incorrect parameter that needs runtime 
transformation and allocations.

 2. The first tuple element is the format string.
 3. A replacement format string is created by replacing all 
 instances of "%s" with
 "?n", where `n` is the index of the corresponding arg.

SQL doesn't use format strings, so the parameter must be 
transformed at runtime using memory allocations.

And it does this without knowing whether the "%s" came from the 
"format string" or from a parameter.

Not to mention the user can pass in other "format specifiers" at 
will.

 4. The replacement format string is bound to `statement`, and 
 the arguments are bound
 to their indices.

Maybe. sqlite frowns upon mismatching arguments because the 
library decided your search string was actually a placeholder in 
some unrelated domain specific language (the language of 
`writef`).

 5. Then it executes the sql statement.

Maybe.

 It is equivalent.

It is most certainly not. The two are only slightly comparable. 
IDIP is a mechanism for an SQL library author (and many other 
domains, see Adam's repository) to effectively and gracefully 
consume succinct and intuitive instructions from a user to avoid 
SQL injections, and use the compiler to weed out problematic 
calls.

Whereas DIP1027 is a loaded footgun which is built for `writef` 
that can be shoehorned into an SQL lib, which necessitates 
allocations and all checks are done at runtime.

-Steve

Jan 09 2024

Timon Gehr <timon.gehr gmx.ch> writes:

On 1/9/24 23:30, Steven Schveighoffer wrote:
 
 And BTW, this is valid SQL:
 
 ```sql
 SELECT * FROM someTable WHERE fieldN LIKE '%something%'
 ```
 
 Which means, the poor user needs to escape `%` in a way completely 
 unrelated to the sql language *or* the istring specification, something 
 that IDIP doesn't require.

I had typed up a similar point in my post, but then thought that most 
likely DIP1027 does the escaping automatically and dropped the line of 
inquiry. But actually checking it now, it indeed does not seem to do 
anything to prevent such hijacking.

https://github.com/dlang/DIPs/blob/master/DIPs/rejected/DIP1027.md
https://github.com/dlang/dmd/compare/master...WalterBright:dmd:dip1027#diff-a556a8e6917dd4042f541bdb19673f96940149ec3d416b0156af4d0e4cc5e4bdR16347-R16452

Having the SQL library arbitrarily interpret a substring `%s` in your 
SQL query as a placeholder seems like unnecessary pain, and it also 
renders moot the idea that DIP1027 code is able to detect mismatches.

Jan 09 2024

Walter Bright <newshound2 digitalmars.com> writes:

Please post an example of a problem it cannot detect.

Jan 11 2024

Timon Gehr <timon.gehr gmx.ch> writes:

On 1/12/24 07:17, Walter Bright wrote:
 Please post an example of a problem it cannot detect.

For example:

```d
import std.stdio;
void main(){
     int x=2,y=3;
     writefln(i"%success: $y",x);
}
```

Jan 12 2024

D Programming

C/C++ Programming

Other

digitalmars.D - Interpolated strings and SQL