digitalmars.D - D perfomance

serge (19/19) Apr 22 2020 D has sparked my interested therefore last week I have started to

welkam (8/12) Apr 22 2020 Equivalent implementation in C, C++, D, Rust, Nim compiled with

Arine (4/12) Apr 22 2020 Not quite. Rust will generate better assembly as it can guarantee

drug (6/18) Apr 23 2020 You forget to add "in some cases Rust may generate better assembly than

Arine (9/30) Apr 23 2020 I wasn't replying to the author of the thread. I was replying to

drug (5/14) Apr 23 2020 Well, you're right, I used wrong wording to express my thoughts. I meant...

Arine (37/38) Apr 24 2020 https://godbolt.org/z/g_euiT

IGotD- (2/25) Apr 24 2020 Would DIP 1000 enable such optimization possibility in D?

Les De Ridder (2/36) Apr 24 2020 Technically DIP 1021 could.

Les De Ridder (3/9) Apr 24 2020 Actually, nevermind:

mipri (7/18) Apr 24 2020 eh, this isn't Rust; it's "Rust +nightly with an unstable codegen

SrMordred (5/27) Apr 24 2020 I believe that "-enable-scoped-noalias=true" should be the D

Walter Bright (15/17) Apr 25 2020 The following C code:

Walter Bright (4/6) Apr 25 2020 D's @live functions can indeed do such optimizations, though I haven't g...

Joseph Rushton Wakeling (41/47) Apr 25 2020 In any case, I seriously doubt those kinds of optimization have

Jonathan M Davis (7/11) Apr 25 2020 You could probably do that, but I'm not sure that it could be considered

Joseph Rushton Wakeling (13/19) Apr 26 2020 I think it would be OK to have it as a non-@safe tool. But ...

Walter Bright (11/41) Apr 25 2020 I agree. I also generally structure my code so that optimization wouldn'...

Joseph Rushton Wakeling (25/35) Apr 26 2020 Yes! :-) And in particular, that computational complexity in the

Walter Bright (4/6) Apr 26 2020 It manages its own memory privately, and presents the results as dynamic...

welkam (10/17) Apr 26 2020 I heard that proving that two pointers do not alias is a big

John Colvin (10/52) Apr 26 2020 I understand that it was an annoying breaking change, but aside

Stefan Koch (6/12) Apr 26 2020 Can you imagine replacing every usage of slices with a custom

Sebastiaan Koppe (3/17) Apr 26 2020 I suppose nowadays that custom type can use a scoped ubyte slice

Joseph Rushton Wakeling (46/54) Apr 26 2020 That's not entirely unfair, but I think it does help to

Steven Schveighoffer (31/89) Apr 26 2020 In terms of performance, depending on the task at hand, D1 code is

Joseph Rushton Wakeling (16/49) Apr 26 2020 That makes sense. I just know that Mathias L. seemed to be quite
Mathias LANG (44/82) Apr 26 2020 Well, Sociomantic didn't use any kind of multi-threading in "user

Steven Schveighoffer (51/146) Apr 27 2020 I tested the performance when I added the feature. D2 was significantly

Walter Bright (2/2) Apr 29 2020 This is what the D n.g. is about - informative, collegial, and useful! T...

Timon Gehr (3/9) Apr 25 2020 What's an example of such an optimization and why won't it introduce UB

Walter Bright (8/10) Apr 25 2020 @live void test() { int a,b; foo(a, b); }

Timon Gehr (2/15) Apr 26 2020 Actually they can, even in @safe @live code.

Walter Bright (2/5) Apr 26 2020 Bug reports are welcome. Please tag them with the 'live' keyword in bugz...

Timon Gehr (23/30) Apr 26 2020 I can't do that because you did not agree it was a bug. According to

Walter Bright (4/31) Apr 26 2020 @live's invariants rely on arguments passed to it that conform to its

Timon Gehr (8/42) Apr 27 2020 No, it is not analogous, because only @system or @trusted code can get

Walter Bright (4/7) Apr 27 2020 It is a good point. The design of @live up to this point did not change ...
John Colvin (8/11) Apr 28 2020 the existence of any overly @trusting code renders @safe code

welkam (2/14) Apr 28 2020 Would be eager to listen.
ag0aep6g (4/8) Apr 28 2020 I don't see how you arrive at "buggy @safe code" here. You say it

Timon Gehr (5/14) Apr 28 2020 I guess he is talking about the case where @trusted code calls buggy

drug (21/38) Apr 25 2020 Yes, your statement that Rust assembly output is better is wrong,
random (23/39) Apr 29 2020 A competent C Programmer could just write something like this. Or

IGotD- (7/14) Apr 29 2020 I'm incompetent so I would just write:

random (9/15) Apr 29 2020 Ok in this simple case it's obvius.

IGotD- (4/12) Apr 29 2020 In the strcmp example, shouldn't the compiler be able to do the

random (22/25) Apr 29 2020 Good question. My strcmp example is actually really bad, because

random (11/11) Apr 29 2020 On Wednesday, 29 April 2020 at 16:19:55 UTC, random wrote:
Walter Bright (2/4) Apr 29 2020 You'd be right.

random (10/10) Apr 29 2020 On Wednesday, 29 April 2020 at 10:32:33 UTC, random wrote:

welkam (6/10) Apr 26 2020 Cool. Did not knew that. I know that different languages have

Bastiaan Veelo (9/16) Apr 22 2020 [...]
Guillaume Piolat (24/26) Apr 22 2020 Yes.

serge (18/45) Apr 24 2020 Could you please elaborate on that? what are you referring to as

Guillaume Piolat (17/33) Apr 26 2020 I was mentionning LLVM vs GCC vs Intel compiler backend, the part

JN (10/20) Apr 26 2020 These are connected. Languages like Java don't give you options.

Daniel Kozak (11/15) Apr 26 2020 Unfortunately there is a big issue with techempower. Because it is so

JN (10/23) Apr 26 2020 It's nice to have a moral victory and claim to be above "those

Daniel Kozak (5/14) Apr 26 2020 Yes I agree, this is the reason why I am improving those benchmarks

Wulfklaue (69/87) Apr 26 2020 As somebody who implemented the Swoole+PHP and Crystal code at

mipri (51/55) Apr 22 2020 Consider this benchmark from the thread next door, "Memory

serge (18/73) Apr 24 2020 I did check the library.. My understanding that proposal is to

Paulo Pinto (18/103) Apr 25 2020 Yes, that does mean that D's GC still needs some improvements,

Jon Degenhardt (9/29) Apr 24 2020 I gave a talk at DConf 2018 you may be interested in. The talk
Joseph Rushton Wakeling (32/34) Apr 25 2020 I don't have a reference off the top of my head, but IIRC much of

serge <abc abc.com> writes:

D has sparked my interested therefore last week I have started to 
look into the language and have completed D course on 
pluralsight. One of area where I would like to apply D in  web 
application/cloud. Golang  is not bad but I think D seems more 
powerful. Although during my research I have found interesting 
facts:

1) fib test (https://github.com/drujensen/fib) with D  (compiled 
with ldc) showed really good performance results.
2) Various Web performance test on  
https://www.techempower.com/benchmarks/#section=data-r18&hw=ph&test=query show
pretty poor performance of D using vibe.d and hunt framework/libraries. 
Regardless of type of test - json, single query, multiple query and so on.

My understanding that D is the language in similar ballpark 
performance league as C, C++, Rust. Hence the question - why web 
performance of those 2 web frameworks is so poor compared to 
rivals? I would say few times worse. This is not a troll post by 
any means. I am researching if D and D web framework whether it 
can be used as  replacement python/django/flask within our 
company. Although if D web framework show worse performance than 
Go then probably it is not right tool for the job.
Any comments and feedback would be appreciated.

Apr 22 2020

welkam <wwwelkam gmail.com> writes:

On Wednesday, 22 April 2020 at 14:00:10 UTC, serge wrote:
 My understanding that D is the language in similar ballpark 
 performance league as C, C++, Rust.

Equivalent implementation in C, C++, D, Rust, Nim compiled with 
same compiler backend should give exact same machine code. What 
you see in online language comparisons is mostly comparing 
different implementations and how much time people spent on 
optimizing.

 why web performance of those 2 web frameworks is so poor 
 compared to rivals?

Difference in implementation. My guess is that people writing 
those servers didnt had time to spend on optimizations.

Apr 22 2020

Arine <arine1283798123 gmail.com> writes:

On Wednesday, 22 April 2020 at 15:24:29 UTC, welkam wrote:
 On Wednesday, 22 April 2020 at 14:00:10 UTC, serge wrote:
 My understanding that D is the language in similar ballpark 
 performance league as C, C++, Rust.

 Equivalent implementation in C, C++, D, Rust, Nim compiled with 
 same compiler backend should give exact same machine code. What 
 you see in online language comparisons is mostly comparing 
 different implementations and how much time people spent on 
 optimizing.

Not quite. Rust will generate better assembly as it can guarantee 
that use of an object is unique. Similar to C's "restrict" 
keyword but you get it for "free" across the entire application.

Apr 22 2020

drug <drug2004 bk.ru> writes:

23.04.2020 01:34, Arine пишет:
 On Wednesday, 22 April 2020 at 15:24:29 UTC, welkam wrote:
 On Wednesday, 22 April 2020 at 14:00:10 UTC, serge wrote:
 My understanding that D is the language in similar ballpark 
 performance league as C, C++, Rust.

 Equivalent implementation in C, C++, D, Rust, Nim compiled with same 
 compiler backend should give exact same machine code. What you see in 
 online language comparisons is mostly comparing different 
 implementations and how much time people spent on optimizing.

 
 Not quite. Rust will generate better assembly as it can guarantee that 
 use of an object is unique. Similar to C's "restrict" keyword but you 
 get it for "free" across the entire application.

You forget to add "in some cases Rust may generate better assembly than 
C/C++/D because..." But this is not the answer to the question OP asked. 
Rust has llvm based backend like ldc so nothing prevents ldc to be as 
fast as any other llvm based compiler. Nothing. The question is how many 
efforts you put into it.

Apr 23 2020

Arine <arine1283798123 gmail.com> writes:

On Thursday, 23 April 2020 at 11:05:35 UTC, drug wrote:
 23.04.2020 01:34, Arine пишет:
 On Wednesday, 22 April 2020 at 15:24:29 UTC, welkam wrote:
 On Wednesday, 22 April 2020 at 14:00:10 UTC, serge wrote:
 My understanding that D is the language in similar ballpark 
 performance league as C, C++, Rust.

 Equivalent implementation in C, C++, D, Rust, Nim compiled 
 with same compiler backend should give exact same machine 
 code. What you see in online language comparisons is mostly 
 comparing different implementations and how much time people 
 spent on optimizing.

 
 Not quite. Rust will generate better assembly as it can 
 guarantee that use of an object is unique. Similar to C's 
 "restrict" keyword but you get it for "free" across the entire 
 application.

 You forget to add "in some cases Rust may generate better 
 assembly than C/C++/D because..." But this is not the answer to 
 the question OP asked. Rust has llvm based backend like ldc so 
 nothing prevents ldc to be as fast as any other llvm based 
 compiler. Nothing. The question is how many efforts you put 
 into it.

I wasn't replying to the author of the thread. I was replying to 
a misinformed individual in the thread.

If that's the way you want to think about, you can create your 
own compiler and language. "It's just about how many efforts you 
put into it", even if that means making your own language and 
compiler. How much "efforts" you have to put into something is a 
factor in that decision. You'd basically have to remake Rust in D 
to get the same assembly results and guarantee regarding aliasing.

Apr 23 2020

drug <drug2004 bk.ru> writes:

23.04.2020 18:13, Arine пишет:
 I wasn't replying to the author of the thread. I was replying to a 
 misinformed individual in the thread.
 
 If that's the way you want to think about, you can create your own 
 compiler and language. "It's just about how many efforts you put into 
 it", even if that means making your own language and compiler. How much 
 "efforts" you have to put into something is a factor in that decision. 
 You'd basically have to remake Rust in D to get the same assembly 
 results and guarantee regarding aliasing.

Well, you're right, I used wrong wording to express my thoughts. I meant 
that C/C++/Rust/D belong to the same performance league. The difference 
appears in specific cases of course, but in general they are equal. And 
your statement that Rust assembly output is better is wrong.

Apr 23 2020

Arine <arine1283798123 gmail.com> writes:

On Thursday, 23 April 2020 at 15:57:01 UTC, drug wrote:
 And your statement that Rust assembly output is better is wrong.

https://godbolt.org/z/g_euiT

D:

     int foo(ref int a, ref int b) {
         a = 0;
         b = 1;
         return a;
     }

int example.foo(ref int, ref int):
         movl    $0, (%rsi)
         movl    $1, (%rdi)
         movl    (%rsi), %eax
         retq

Rust:

     pub fn foo(x: &mut i32, y: &mut i32) -> i32 {
         *x = 0;
         *y = 1;
         *x
     }

example::foo:
         mov     dword ptr [rdi], 0
         mov     dword ptr [rsi], 1
         xor     eax, eax
         ret


There most definitely is a difference and the assembly generated 
with rust is better. This is just a simple example to illustrate 
the difference. If you don't know why the difference is 
significant or why it is happening. There are a lot of great 
articles out there, sadly there are people such as yourself 
spreading misinformation that don't know what a borrow checker is 
and don't know Rust or why it is has gone as far as it has. This 
is why the borrow checker for D is going to fail. Because the 
person designing it, such as yourself, doesn't have any idea what 
they are redoing and have never even bothered to touch Rust or 
learn about it. Anyways I'm not your babysitter, if you don't 
understand the above, as most people seem to not bother to learn 
assembly anymore, you're on your own.

Apr 24 2020

IGotD- <nise nise.com> writes:

On Friday, 24 April 2020 at 19:27:40 UTC, Arine wrote:
 https://godbolt.org/z/g_euiT

 D:

     int foo(ref int a, ref int b) {
         a = 0;
         b = 1;
         return a;
     }

 int example.foo(ref int, ref int):
         movl    $0, (%rsi)
         movl    $1, (%rdi)
         movl    (%rsi), %eax
         retq

 Rust:

     pub fn foo(x: &mut i32, y: &mut i32) -> i32 {
         *x = 0;
         *y = 1;
         *x
     }

 example::foo:
         mov     dword ptr [rdi], 0
         mov     dword ptr [rsi], 1
         xor     eax, eax
         ret

Would DIP 1000 enable such optimization possibility in D?

Apr 24 2020

Les De Ridder <les lesderid.net> writes:

On Friday, 24 April 2020 at 22:52:31 UTC, IGotD- wrote:
 On Friday, 24 April 2020 at 19:27:40 UTC, Arine wrote:
 https://godbolt.org/z/g_euiT

 D:

     int foo(ref int a, ref int b) {
         a = 0;
         b = 1;
         return a;
     }

 int example.foo(ref int, ref int):
         movl    $0, (%rsi)
         movl    $1, (%rdi)
         movl    (%rsi), %eax
         retq

 Rust:

     pub fn foo(x: &mut i32, y: &mut i32) -> i32 {
         *x = 0;
         *y = 1;
         *x
     }

 example::foo:
         mov     dword ptr [rdi], 0
         mov     dword ptr [rsi], 1
         xor     eax, eax
         ret

 Would DIP 1000 enable such optimization possibility in D?

Technically DIP 1021 could.

Apr 24 2020

Les De Ridder <les lesderid.net> writes:

On Friday, 24 April 2020 at 23:03:25 UTC, Les De Ridder wrote:
 On Friday, 24 April 2020 at 22:52:31 UTC, IGotD- wrote:
 On Friday, 24 April 2020 at 19:27:40 UTC, Arine wrote:
 [...]

 Would DIP 1000 enable such optimization possibility in D?

 Technically DIP 1021 could.

Actually, nevermind:

https://github.com/dlang/DIPs/blob/master/DIPs/accepted/DIP1021.md#limitations

Apr 24 2020

mipri <mipri minimaltype.com> writes:

On Friday, 24 April 2020 at 19:27:40 UTC, Arine wrote:
 Rust:

     pub fn foo(x: &mut i32, y: &mut i32) -> i32 {
         *x = 0;
         *y = 1;
         *x
     }

 example::foo:
         mov     dword ptr [rdi], 0
         mov     dword ptr [rsi], 1
         xor     eax, eax
         ret

eh, this isn't Rust; it's "Rust +nightly with an unstable codegen 
flag."

Marginal codegen improvements aren't going to turn heavy usage of
dynamic arrays into heavy usage of std.container.array either, so
they're not that relevant to expected performance of real-world
programs in D vs. other languages.

Apr 24 2020

SrMordred <patric.dexheimer gmail.com> writes:

On Friday, 24 April 2020 at 23:24:49 UTC, mipri wrote:
 On Friday, 24 April 2020 at 19:27:40 UTC, Arine wrote:
 Rust:

     pub fn foo(x: &mut i32, y: &mut i32) -> i32 {
         *x = 0;
         *y = 1;
         *x
     }

 example::foo:
         mov     dword ptr [rdi], 0
         mov     dword ptr [rsi], 1
         xor     eax, eax
         ret

 eh, this isn't Rust; it's "Rust +nightly with an unstable 
 codegen flag."

 Marginal codegen improvements aren't going to turn heavy usage 
 of
 dynamic arrays into heavy usage of std.container.array either, 
 so
 they're not that relevant to expected performance of real-world
 programs in D vs. other languages.

I believe that "-enable-scoped-noalias=true" should be the D 
equivalent (with LDC) but it didnt change anything.
Also u can achieve the same asm with  llvmAttr("noalias") in 
front of at least one argument.

Apr 24 2020

Walter Bright <newshound2 digitalmars.com> writes:

On 4/24/2020 8:06 PM, SrMordred wrote:
 Also u can achieve the same asm with  llvmAttr("noalias") in front of at least 
 one argument.

The following C code:

int test(int * __restrict__ x, int * __restrict__ y) {
     *x = 0;
     *y = 1;
     return *x;
}

compiled with gcc -O:

test:
                 mov     dword ptr [RDI],0
                 mov     dword ptr [RSI],1
                 mov     EAX,0
                 ret

It's not a unique property of Rust, C99 has it too. DMC doesn't implement it, 
but it probably should.

Apr 25 2020

Walter Bright <newshound2 digitalmars.com> writes:

On 4/24/2020 12:27 PM, Arine wrote:
 There most definitely is a difference and the assembly generated with rust is 
 better.

D's  live functions can indeed do such optimizations, though I haven't got 
around to implementing them in DMD's optimizer. There's nothing particularly 
difficult about it.

Apr 25 2020

Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:

On Saturday, 25 April 2020 at 10:15:33 UTC, Walter Bright wrote:
 On 4/24/2020 12:27 PM, Arine wrote:
 There most definitely is a difference and the assembly 
 generated with rust is better.

 D's  live functions can indeed do such optimizations, though I 
 haven't got around to implementing them in DMD's optimizer. 
 There's nothing particularly difficult about it.

In any case, I seriously doubt those kinds of optimization have 
anything to do with the web framework performance differences.

My experience of writing number-crunching stuff in D and Rust is 
that Rust seems to have a small but consistent performance edge 
that could quite possibly be down the kind of optimizations that 
Arine mentions (that's speculation: I haven't verified).  
However, it's small differences, not order-of-magnitude stuff.

I suppose that in a more complicated app there could be some 
multiplicative impact, but where high-throughput web frameworks 
are concerned I'm pretty sure that the memory allocation and 
reuse strategy is going to be what makes 99% of the difference.

There may also be a bit of an impact from the choice of futures 
vs. fibers for managing asynchronous tasks (there's a context 
switching cost for fibers), but I would expect that to only make 
a difference at the extreme upper end of performance, once other 
design factors have been addressed.

BTW, on the memory allocation front, Mathias Lang has pointed out 
that there is quite a nasty impact from `assumeSafeAppend`.  
Imagine that your request processing looks something like this:

     // extract array instance from reusable pool,
     // and set its length to zero so that you can
     // write into it from the start
     x = buffer_pool.get();
     x.length = 0;
     assumeSafeAppend(x);   // a cost each time you do this

     // now append stuff into x to
     // create your response

     // now publish your response

     // with the response published, clean
     // up by recycling the buffer back into
     // the pool
     buffer_pool.recycle(x);

This is the kind of pattern that Sociomantic used a lot.  In D1 
it was easy because there was no array stomping prevention -- you 
could just set length == 0 and start appending.  But having to 
call `assumeSafeAppend` each time does carry a performance cost.

IIRC Mathias has suggested that it should be possible to tag 
arrays as intended for this kind of re-use, so that stomping 
prevention will never trigger, and you don't have to 
`assumeSafeAppend` each time you reduce the length.

Apr 25 2020

Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:

On Saturday, April 25, 2020 4:34:44 AM MDT Joseph Rushton Wakeling via 
Digitalmars-d wrote:
 IIRC Mathias has suggested that it should be possible to tag
 arrays as intended for this kind of re-use, so that stomping
 prevention will never trigger, and you don't have to
 `assumeSafeAppend` each time you reduce the length.

You could probably do that, but I'm not sure that it could be considered
 safe. It would probably make more sense to just use a custom array type if
that's what you really needed, though of course, that causes its own set of
difficulties (including having to duplicate the array appending logic).

- Jonathan M Davis

Apr 25 2020

Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:

On Saturday, 25 April 2020 at 15:21:03 UTC, Jonathan M Davis 
wrote:
 You could probably do that, but I'm not sure that it could be 
 considered  safe.

I think it would be OK to have it as a non- safe tool.  But ...

 It would probably make more sense to just use a custom array 
 type if that's what you really needed, though of course, that 
 causes its own set of difficulties (including having to 
 duplicate the array appending logic).

... I think that could possibly make more sense.  One thing that 
I really don't like about the original idea of an 
`alwaysAssumeSafeAppend(x)` is that it makes behaviour dependent 
on the instance rather than the type.  It would probably be 
better to have a clear type-based separation.

OTOH in my experience custom types are often finnicky in terms of 
how they interact with functions that expect a slice as input.  
So there could be a convenience in having it as an option for 
regular dynamic arrays.  Or it could just be that the custom type 
would need a bit more work in its implementation :-)

Apr 26 2020

Walter Bright <newshound2 digitalmars.com> writes:

On 4/25/2020 3:34 AM, Joseph Rushton Wakeling wrote:
 In any case, I seriously doubt those kinds of optimization have anything to do 
 with the web framework performance differences.

I agree. I also generally structure my code so that optimization wouldn't make
a 
difference. But it's still a worthwhile benefit to add it for  live functions.


 I suppose that in a more complicated app there could be some multiplicative 
 impact, but where high-throughput web frameworks are concerned I'm pretty sure 
 that the memory allocation and reuse strategy is going to be what makes 99% of 
 the difference.

My experience is if the code has never been profiled, there's one obscure 
function unexpectedly consuming the bulk of the run time, which is easily 
recoded. A programs that have been runtime profiled tend to have a pretty flat 
graph of which functions eat the time.


      // extract array instance from reusable pool,
      // and set its length to zero so that you can
      // write into it from the start
      x = buffer_pool.get();
      x.length = 0;
      assumeSafeAppend(x);   // a cost each time you do this
 
      // now append stuff into x to
      // create your response
 
      // now publish your response
 
      // with the response published, clean
      // up by recycling the buffer back into
      // the pool
      buffer_pool.recycle(x);
 
 This is the kind of pattern that Sociomantic used a lot.  In D1 it was easy 
 because there was no array stomping prevention -- you could just set length ==
0 
 and start appending.  But having to call `assumeSafeAppend` each time does
carry 
 a performance cost.

This is why I use OutBuffer for such activities.


 IIRC Mathias has suggested that it should be possible to tag arrays as
intended 
 for this kind of re-use, so that stomping prevention will never trigger, and
you 
 don't have to `assumeSafeAppend` each time you reduce the length.

Sounds like an idea worth exploring. How about taking point on that? But I
would 
be concerned about tagging such arrays, and then stomping them unintentionally, 
leading to memory corruption bugs. OutBuffer is memory safe.

Apr 25 2020

Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:

On Saturday, 25 April 2020 at 22:15:44 UTC, Walter Bright wrote:
 My experience is if the code has never been profiled, there's 
 one obscure function unexpectedly consuming the bulk of the run 
 time, which is easily recoded. A programs that have been 
 runtime profiled tend to have a pretty flat graph of which 
 functions eat the time.

Yes! :-)  And in particular, that computational complexity in the 
real world is very different from theoretical arguments about 
O(...).  One can gain a lot by being clear headed about what the 
actual problem is and what is optimal for that particular problem 
with that particular data.

 This is why I use OutBuffer for such activities.

Yes, I have some memory of talking with Dicebot about whether 
this would be an appropriate tool for the other side of the D2 
conversion.  I don't remember if any firm conclusions were drawn, 
though.

 Sounds like an idea worth exploring. How about taking point on 
 that? But I would be concerned about tagging such arrays, and 
 then stomping them unintentionally, leading to memory 
 corruption bugs. OutBuffer is memory safe.

Yes, it's clear (as Jonathan noted) that an always-stompable 
array could probably not be  safe.  That said, what does 
OutBuffer do that means that it _is_ safe in this context?

Of course, Sociomantic never had  safe to play with.  In practice 
I don't recall there ever being an issue with unintentional 
stomping (I'm not saying it never happened, but I have no 
recollection of it being a common issue).  That did however rest 
on a program structure that made it less likely anyone would make 
such a mistake.

About stepping up with a feature contribution: the idea is lovely 
but I'm very aware of how limited my time is right now, so I 
don't want to make offers I can't guarantee to follow up on.  
There's a reason I post so rarely in the forums these days!  But 
I will ping Mathias L. to let him know, as the idea was his to 
start with.

Apr 26 2020

Walter Bright <newshound2 digitalmars.com> writes:

On 4/26/2020 4:30 AM, Joseph Rushton Wakeling wrote:
 That said, what does OutBuffer do that means that it 
 _is_ safe in this context?

It manages its own memory privately, and presents the results as dynamic arrays 
which do their own bounds checking. It's been a reliable solution for me for 
maybe 30 years.

Apr 26 2020

welkam <wwwelkam gmail.com> writes:

On Saturday, 25 April 2020 at 22:15:44 UTC, Walter Bright wrote:
 On 4/25/2020 3:34 AM, Joseph Rushton Wakeling wrote:
 In any case, I seriously doubt those kinds of optimization 
 have anything to do with the web framework performance 
 differences.

 I agree. I also generally structure my code so that 
 optimization wouldn't make a difference. But it's still a 
 worthwhile benefit to add it for  live functions.

I heard that proving that two pointers do not alias is a big 
problem in compiler backends and that some or most auto 
vectorization optimizations do not fire because compiler can't 
prove no aliasing.

A new language used in Unity game engine is designed such that 
references do not alias by default for optimization reasons. I 
haven't looked into this topic further but I believe its worth 
checking it out. Data science people would benefit greatly from 
autovectoriztion

Apr 26 2020

John Colvin <john.loughran.colvin gmail.com> writes:

On Saturday, 25 April 2020 at 10:34:44 UTC, Joseph Rushton 
Wakeling wrote:
 In any case, I seriously doubt those kinds of optimization have 
 anything to do with the web framework performance differences.

 My experience of writing number-crunching stuff in D and Rust 
 is that Rust seems to have a small but consistent performance 
 edge that could quite possibly be down the kind of 
 optimizations that Arine mentions (that's speculation: I 
 haven't verified).  However, it's small differences, not 
 order-of-magnitude stuff.

 I suppose that in a more complicated app there could be some 
 multiplicative impact, but where high-throughput web frameworks 
 are concerned I'm pretty sure that the memory allocation and 
 reuse strategy is going to be what makes 99% of the difference.

 There may also be a bit of an impact from the choice of futures 
 vs. fibers for managing asynchronous tasks (there's a context 
 switching cost for fibers), but I would expect that to only 
 make a difference at the extreme upper end of performance, once 
 other design factors have been addressed.

 BTW, on the memory allocation front, Mathias Lang has pointed 
 out that there is quite a nasty impact from `assumeSafeAppend`.
  Imagine that your request processing looks something like this:

     // extract array instance from reusable pool,
     // and set its length to zero so that you can
     // write into it from the start
     x = buffer_pool.get();
     x.length = 0;
     assumeSafeAppend(x);   // a cost each time you do this

     // now append stuff into x to
     // create your response

     // now publish your response

     // with the response published, clean
     // up by recycling the buffer back into
     // the pool
     buffer_pool.recycle(x);

 This is the kind of pattern that Sociomantic used a lot.  In D1 
 it was easy because there was no array stomping prevention -- 
 you could just set length == 0 and start appending.  But having 
 to call `assumeSafeAppend` each time does carry a performance 
 cost.

 IIRC Mathias has suggested that it should be possible to tag 
 arrays as intended for this kind of re-use, so that stomping 
 prevention will never trigger, and you don't have to 
 `assumeSafeAppend` each time you reduce the length.

I understand that it was an annoying breaking change, but aside 
from the difficulty of migrating I don't understand why a custom 
type isn't the appropriate solution for this problem. I think I 
heard "We want to use the built-in slices", but I never 
understood the technical argument behind that, or how it stacked 
up against not getting the desired behaviour.

My sense was that the irritation at the breakage was influencing 
the technical debate.

Apr 26 2020

Stefan Koch <uplink.coder googlemail.com> writes:

On Sunday, 26 April 2020 at 11:40:49 UTC, John Colvin wrote:
 I understand that it was an annoying breaking change, but aside 
 from the difficulty of migrating I don't understand why a 
 custom type isn't the appropriate solution for this problem. I 
 think I heard "We want to use the built-in slices", but I never 
 understood the technical argument behind that, or how it 
 stacked up against not getting the desired behaviour.

Can you imagine replacing every usage of slices with a custom 
type in your code?
And making sure programmers joining the company do the same?
and having converts that e.g. accept ubyte arrays from libraries 
and convert them into yours?

Apr 26 2020

Sebastiaan Koppe <mail skoppe.eu> writes:

On Sunday, 26 April 2020 at 11:59:27 UTC, Stefan Koch wrote:
 On Sunday, 26 April 2020 at 11:40:49 UTC, John Colvin wrote:
 I understand that it was an annoying breaking change, but 
 aside from the difficulty of migrating I don't understand why 
 a custom type isn't the appropriate solution for this problem. 
 I think I heard "We want to use the built-in slices", but I 
 never understood the technical argument behind that, or how it 
 stacked up against not getting the desired behaviour.

 Can you imagine replacing every usage of slices with a custom 
 type in your code?
 And making sure programmers joining the company do the same?
 and having converts that e.g. accept ubyte arrays from 
 libraries and convert them into yours?

I suppose nowadays that custom type can use a scoped ubyte slice 
to expose its temp buffer.

Apr 26 2020

Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:

On Sunday, 26 April 2020 at 11:40:49 UTC, John Colvin wrote:
 I understand that it was an annoying breaking change, but aside 
 from the difficulty of migrating I don't understand why a 
 custom type isn't the appropriate solution for this problem. I 
 think I heard "We want to use the built-in slices", but I never 
 understood the technical argument behind that, or how it 
 stacked up against not getting the desired behaviour.

 My sense was that the irritation at the breakage was 
 influencing the technical debate.

That's not entirely unfair, but I think it does help to 
appreciate the magnitude of the problem:

   * there's a very large codebase, including many different 
applications and
     a large amount of common library code, all containing a lot 
of functions
     that expect slice input (because the concept of a range was 
never in D1,
     and because slices were the only use case)

   * most of the library functionality shouldn't have to care 
whether its input
     is a reusable buffer or any other kind of slice

   * you can't rewrite to use range-based generics because that's 
D2 only and
     you need to keep D1 compatibility until the last application 
has migrated

   * there are _very_ extreme performance and reliability 
constraints on some
     of the key applications, meaning that validating D2 
transition efforts is
     very time consuming

   * you can't use any Phobos functionality until the codebase is 
D2 only, and
     even then you probably want to limit how much of it you use 
because it is
     not written with these extreme performance concerns in mind

   * all the time spent on those transitional efforts is time 
taken away from
     feature development

It's very easy to look back and say something like, "Well, if 
you'd written with introspection-based design from the start, you 
would have had a much easier migration effort", but that in 
itself would have been trickier to do in D1, and would have 
carried extra maintenance and development costs (particularly 
w.r.t. forcing devs to write what would have seemed like very 
boilerplate-y code compared to the actual set of use cases).

Even with the D1 compatibility requirement dropped, there still 
remains a big burden to transition all the reusable buffers to a 
different type.  IIRC the focus would probably have been on using 
`Appender`.

Note that many of these concerns still apply if we want to 
preserve a future for any of the (very well crafted) library and 
application code that Sociomantic open-sourced.  They are all now 
D2-only, but the effort required to rewrite around dedicated 
reusable-buffer types would still be quite substantial.

Apr 26 2020

Steven Schveighoffer <schveiguy gmail.com> writes:

On 4/25/20 6:34 AM, Joseph Rushton Wakeling wrote:
 On Saturday, 25 April 2020 at 10:15:33 UTC, Walter Bright wrote:
 On 4/24/2020 12:27 PM, Arine wrote:
 There most definitely is a difference and the assembly generated with 
 rust is better.

 D's  live functions can indeed do such optimizations, though I haven't 
 got around to implementing them in DMD's optimizer. There's nothing 
 particularly difficult about it.

 
 In any case, I seriously doubt those kinds of optimization have anything 
 to do with the web framework performance differences.
 
 My experience of writing number-crunching stuff in D and Rust is that 
 Rust seems to have a small but consistent performance edge that could 
 quite possibly be down the kind of optimizations that Arine mentions 
 (that's speculation: I haven't verified). However, it's small 
 differences, not order-of-magnitude stuff.
 
 I suppose that in a more complicated app there could be some 
 multiplicative impact, but where high-throughput web frameworks are 
 concerned I'm pretty sure that the memory allocation and reuse strategy 
 is going to be what makes 99% of the difference.
 
 There may also be a bit of an impact from the choice of futures vs. 
 fibers for managing asynchronous tasks (there's a context switching cost 
 for fibers), but I would expect that to only make a difference at the 
 extreme upper end of performance, once other design factors have been 
 addressed.
 
 BTW, on the memory allocation front, Mathias Lang has pointed out that 
 there is quite a nasty impact from `assumeSafeAppend`. Imagine that your 
 request processing looks something like this:
 
      // extract array instance from reusable pool,
      // and set its length to zero so that you can
      // write into it from the start
      x = buffer_pool.get();
      x.length = 0;
      assumeSafeAppend(x);   // a cost each time you do this
 
      // now append stuff into x to
      // create your response
 
      // now publish your response
 
      // with the response published, clean
      // up by recycling the buffer back into
      // the pool
      buffer_pool.recycle(x);
 
 This is the kind of pattern that Sociomantic used a lot.  In D1 it was 
 easy because there was no array stomping prevention -- you could just 
 set length == 0 and start appending.  But having to call 
 `assumeSafeAppend` each time does carry a performance cost.

In terms of performance, depending on the task at hand, D1 code is 
slower than D2 appending, by the fact that there's a thread-local cache 
for appending for D2, and D1 only has a global one-array cache for the 
same. However, I'm assuming that since you were focused on D1, your 
usage naturally was written to take advantage of what D1 has to offer.

The assumeSafeAppend call also uses this cache, and so it should be 
quite fast. But setting length to 0 is a ton faster, because you aren't 
calling an opaque function.

So depending on the usage pattern, D2 with assumeSafeAppend can be 
faster, or it could be slower.

 
 IIRC Mathias has suggested that it should be possible to tag arrays as 
 intended for this kind of re-use, so that stomping prevention will never 
 trigger, and you don't have to `assumeSafeAppend` each time you reduce 
 the length.

I spoke for a while with Dicebot at Dconf 2016 or 17 about this issue. 
IIRC, I suggested either using a custom type or custom runtime. He was 
not interested in either of these ideas, and it makes sense (large 
existing code base, didn't want to stray from mainline D).

By far, the best mechanism to use is a custom type. Not only will that 
fix this problem as you can implement whatever behavior you want, but 
you also do not need to call opaque functions for appending either. It 
should outperform everything you could do in a generic runtime.

Note that this was before (I think) destructor calls were added. The 
destructor calls are something that assumeSafeAppend is going to do, and 
won't be done with just setting length to 0.

However, there are other options. We could introduce a druntime 
configuration option so when this specific situation happens (slice 
points at start of block and has 0 length), assumeSafeAppend is called 
automatically on the first append. Jonathan is right that this is not 
 safe, but it could be an opt-in configuration option.

I don't think configuring specific arrays makes a lot of sense, as this 
would require yet another optional bit that would have to be checked and 
allocated for all arrays.

-Steve

Apr 26 2020

Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:

On Sunday, 26 April 2020 at 16:20:19 UTC, Steven Schveighoffer 
wrote:
 In terms of performance, depending on the task at hand, D1 code 
 is slower than D2 appending, by the fact that there's a 
 thread-local cache for appending for D2, and D1 only has a 
 global one-array cache for the same. However, I'm assuming that 
 since you were focused on D1, your usage naturally was written 
 to take advantage of what D1 has to offer.

 The assumeSafeAppend call also uses this cache, and so it 
 should be quite fast. But setting length to 0 is a ton faster, 
 because you aren't calling an opaque function.

 So depending on the usage pattern, D2 with assumeSafeAppend can 
 be faster, or it could be slower.

That makes sense.  I just know that Mathias L. seemed to be quite 
concerned about the `assumeSafeAppend` performance impact.  I 
think he was not looking for a D1/D2 comparison but in terms of 
getting the most performant behaviour in future.

It's not that it was slower than D1, it's that it was a per-use 
speed hit.

 I spoke for a while with Dicebot at Dconf 2016 or 17 about this 
 issue. IIRC, I suggested either using a custom type or custom 
 runtime. He was not interested in either of these ideas, and it 
 makes sense (large existing code base, didn't want to stray 
 from mainline D).

Yes.  To be fair I think in that context, at that stage of 
transition, that probably made more sense: it was easier to just 
mandate that everybody start putting `assumeSafeAppend` into 
their code (actually we implemented a transitional wrapper, 
`enableStomping`, which was a no-op in D1 and called 
`assumeSafeAppend` in D2).

 By far, the best mechanism to use is a custom type. Not only 
 will that fix this problem as you can implement whatever 
 behavior you want, but you also do not need to call opaque 
 functions for appending either. It should outperform everything 
 you could do in a generic runtime.

 Note that this was before (I think) destructor calls were 
 added. The destructor calls are something that assumeSafeAppend 
 is going to do, and won't be done with just setting length to 0.

 However, there are other options. We could introduce a druntime 
 configuration option so when this specific situation happens 
 (slice points at start of block and has 0 length), 
 assumeSafeAppend is called automatically on the first append. 
 Jonathan is right that this is not  safe, but it could be an 
 opt-in configuration option.

 I don't think configuring specific arrays makes a lot of sense, 
 as this would require yet another optional bit that would have 
 to be checked and allocated for all arrays.

The druntime option does sound interesting, although I'm leery 
about the idea of creating 2 different language behaviours.

Apr 26 2020

Mathias LANG <geod24 gmail.com> writes:

On Sunday, 26 April 2020 at 16:20:19 UTC, Steven Schveighoffer
wrote:
In terms of performance, depending on the task at hand, D1 code
is slower than D2 appending, by the fact that there's a
thread-local cache for appending for D2, and D1 only has a
global one-array cache for the same. However, I'm assuming that
since you were focused on D1, your usage naturally was written
to take advantage of what D1 has to offer.

The assumeSafeAppend call also uses this cache, and so it
should be quite fast. But setting length to 0 is a ton faster,
because you aren't calling an opaque function.

So depending on the usage pattern, D2 with assumeSafeAppend can
be faster, or it could be slower.

Well, Sociomantic didn't use any kind of multi-threading in "user
code".
We had single-threaded fibers for concurrency, and process-level
scaling for parallelism.
Some corner cases were using threads, but it was for low level
things (e.g. low latency file IO on Linux), which were highly
scrutinized and stayed clear of the GC AFAIK.

Note that accessing TLS *does* have a cost which is higher than
accessing a global. By this reasoning, I would assume that D2
appending would definitely be slower, although I never profiled
it. What I did profile tho, is `assumeSafeAppend`. The fact that
it looks up GC metadata (taking the GC lock in the process) made
it quite expensive given how often it was called (in D1 it was
simply a no-op, and called defensively).

IIRC Mathias has suggested that it should be possible to tag
arrays as intended for this kind of re-use, so that stomping
prevention will never trigger, and you don't have to
`assumeSafeAppend` each time you reduce the length.

I spoke for a while with Dicebot at Dconf 2016 or 17 about this
issue. IIRC, I suggested either using a custom type or custom
runtime. He was not interested in either of these ideas, and it
makes sense (large existing code base, didn't want to stray
from mainline D).

By far, the best mechanism to use is a custom type. Not only
will that fix this problem as you can implement whatever
behavior you want, but you also do not need to call opaque
functions for appending either. It should outperform everything
you could do in a generic runtime.

Well... Here's nothing I never really quite understood actually:
Mihails *did* introduce a buffer type. See
https://github.com/sociomantic-tsunami/ocean/blob/36c9fda09544ee5a0695a74186b06b32feda82d4/src/ocean/core/Buffer.d#L116-L130
And we also had a (very old) similar utility here:
https://github.com/sociomantic-tsunami/ocean/blob/36c9fda09544ee5a0695a74186b06b32feda82d4/src/ocean/util/container/ConcatBuffer.d
I always wanted to unify this, but never got to it. But if you
look at the first link, it calls `assumeSafeAppend` twice, before
and after setting the length. In practice it is only necessary
*after* reducing the length, but as I mentioned, this is
defensive programming.

For reference, most of our applications had a principled buffer
use. The buffers would rarely be appended to from more than one,
perhaps two places. However, slices to the buffer would be passed
around quite liberally. So a buffer type from which one could
borrow would indeed have been optimal.

Note that this was before (I think) destructor calls were
added. The destructor calls are something that assumeSafeAppend
is going to do, and won't be done with just setting length to 0.

However, there are other options. We could introduce a druntime
configuration option so when this specific situation happens
(slice points at start of block and has 0 length),
assumeSafeAppend is called automatically on the first append.
Jonathan is right that this is not safe, but it could be an
opt-in configuration option.

I don't think configuring specific arrays makes a lot of sense,
as this would require yet another optional bit that would have
to be checked and allocated for all arrays.

-Steve

I don't even know if we had a single case where we had arrays of
objects with destructors. The vast majority of our buffer were
`char[]` and `ubyte[]`. We had some elaborate types, but I think
destructors + buffer would have been frowned upon in code review.

Also the reason we didn't modify druntime to just have the D1
behavior (that would have been a trivial change) was because how
dependent on the new behavior druntime had become. It was also
the motivation for the suggestion Joe mentioned. AFAIR I
mentioned it in an internal issue, did a PoC implementation, but
never got it to a state were it was mergeable.

Also, while a custom type might sound better, it doesn't really
interact well with the rest of the runtime, and it's an extra
word to pass around (if passed by value).

Apr 26 2020

Steven Schveighoffer <schveiguy gmail.com> writes:

On 4/27/20 1:04 AM, Mathias LANG wrote:
On Sunday, 26 April 2020 at 16:20:19 UTC, Steven Schveighoffer wrote:
In terms of performance, depending on the task at hand, D1 code is
slower than D2 appending, by the fact that there's a thread-local
cache for appending for D2, and D1 only has a global one-array cache
for the same. However, I'm assuming that since you were focused on D1,
your usage naturally was written to take advantage of what D1 has to
offer.

The assumeSafeAppend call also uses this cache, and so it should be
quite fast. But setting length to 0 is a ton faster, because you
aren't calling an opaque function.

So depending on the usage pattern, D2 with assumeSafeAppend can be
faster, or it could be slower.

Well, Sociomantic didn't use any kind of multi-threading in "user code".
We had single-threaded fibers for concurrency, and process-level scaling
for parallelism.
Some corner cases were using threads, but it was for low level things
(e.g. low latency file IO on Linux), which were highly scrutinized and
stayed clear of the GC AFAIK.

Note that accessing TLS *does* have a cost which is higher than
accessing a global.

That is a minor cost compared to the actual appending.

By this reasoning, I would assume that D2 appending
would definitely be slower, although I never profiled it.

I tested the performance when I added the feature. D2 was significantly
and measurably faster (at least for the appending 2 or more arrays). I
searched through my old email, for appending 5M bytes to 2 arrays the
original code was 13.99 seconds (on whatever system I was using in
2009), and 1.53 seconds with the cache.

According to that email, I had similar results even with a 1-element
cache, so somehow my code was faster, but I didn't know why. Quite
possibly it's because the cache in D1 for looking up block info is
behind the GC lock.

Literally the only thing that is more expensive in D2 vs. D1 was the
truncation of arrays. In D1 this is setting the length to 0, in D2, you
needed to call assumeSafeAppend. This is why I suggested a flag that
allows you to enable the original behavior.

What I did
profile tho, is `assumeSafeAppend`. The fact that it looks up GC
metadata (taking the GC lock in the process) made it quite expensive
given how often it was called (in D1 it was simply a no-op, and called
defensively).

The cache I referred to is to look up the GC metadata. In essence, when
you append, you will look it up anyway. Either assumeSafeAppend or
append will get the GC metadata into the cache, and then it is a
straight lookup in the cache and this doesn't take a lock or do any
expensive searches. The cache is designed to favor the most recent
arrays first.

This is an 8 element cache, so there are still cases where you will be
having issues (like if you round-robin append to 9 arrays). I believe 8
elements was a sweet spot for performance that allowed reasonably fast
appending with a reasonable number of concurrent arrays.

Where D1 will fall down is if you are switching between more than one
array, because the cache in D1 is only one element. Even if you are
doing just one array, the cache is not for the array runtime, but for
the GC. And it is based on the pointer queried, not the block data. A GC
collection, for instance, is going to invalidate the cache.

IIRC Mathias has suggested that it should be possible to tag arrays
as intended for this kind of re-use, so that stomping prevention will
never trigger, and you don't have to `assumeSafeAppend` each time you
reduce the length.

I spoke for a while with Dicebot at Dconf 2016 or 17 about this issue.
IIRC, I suggested either using a custom type or custom runtime. He was
not interested in either of these ideas, and it makes sense (large
existing code base, didn't want to stray from mainline D).

By far, the best mechanism to use is a custom type. Not only will that
fix this problem as you can implement whatever behavior you want, but
you also do not need to call opaque functions for appending either. It
should outperform everything you could do in a generic runtime.

And we also had a (very old) similar utility here:
https://github.com/sociomantic-tsunami/ocean/blob/36c9fda09544ee5a0695a74186b06b32feda82d4/src/ocean/util/cont
iner/ConcatBuffer.d

I always wanted to unify this, but never got to it. But if you look at
the first link, it calls `assumeSafeAppend` twice, before and after
setting the length. In practice it is only necessary *after* reducing
the length, but as I mentioned, this is defensive programming.

Yeah, that is unnecessary. It is not going to be that expensive,
especially if you just were appending to that array, but again, more
expensive than setting a word to 0.

For reference, most of our applications had a principled buffer use. The
buffers would rarely be appended to from more than one, perhaps two
places. However, slices to the buffer would be passed around quite
liberally. So a buffer type from which one could borrow would indeed
have been optimal.

This all works actually better with the new runtime. The old one would
reallocate if you appended to a slice that didn't start at the block
start. The new version can detect it's at the end and allow appending.

Note that this was before (I think) destructor calls were added. The
destructor calls are something that assumeSafeAppend is going to do,
and won't be done with just setting length to 0.

However, there are other options. We could introduce a druntime
configuration option so when this specific situation happens (slice
points at start of block and has 0 length), assumeSafeAppend is called
automatically on the first append. Jonathan is right that this is not
safe, but it could be an opt-in configuration option.

I don't think configuring specific arrays makes a lot of sense, as
this would require yet another optional bit that would have to be
checked and allocated for all arrays.

I don't even know if we had a single case where we had arrays of objects
with destructors. The vast majority of our buffer were `char[]` and
`ubyte[]`. We had some elaborate types, but I think destructors + buffer
would have been frowned upon in code review.

Of course! D1 didn't have destructors for structs ;)

Also the reason we didn't modify druntime to just have the D1 behavior
(that would have been a trivial change) was because how dependent on the
new behavior druntime had become. It was also the motivation for the
suggestion Joe mentioned. AFAIR I mentioned it in an internal issue, did
a PoC implementation, but never got it to a state were it was mergeable.

Having a flag per array is going to be costly, but actually, there's a
lot more junk in the block itself. Perhaps there's a spare bit somewhere
that can be a flag for the append behavior.

Also, while a custom type might sound better, it doesn't really interact
well with the rest of the runtime, and it's an extra word to pass around
(if passed by value).

The "extra value" can be stored elsewhere -- just like the GC you could
provide metadata for the capacity in a global AA or something.

In any case, there were options. The way druntime is written, it's
pretty good performance, in most cases BETTER performance than D1 for
idiomatic D code. In fact the Tango folks asked me if I could add the
feature to Tango's druntime, but I couldn't because it depends on TLS.

For code that was highly focused on optimizing D1 with its
idiosyncracies, it probably has worse performance.

The frustration is understandable, but without the possibility of
adaptation, there's not much one can do.

-Steve

Apr 27 2020

Walter Bright <newshound2 digitalmars.com> writes:

This is what the D n.g. is about - informative, collegial, and useful! Thanks, 
fellows!

Apr 29 2020

Timon Gehr <timon.gehr gmx.ch> writes:

On 25.04.20 12:15, Walter Bright wrote:
 On 4/24/2020 12:27 PM, Arine wrote:
 There most definitely is a difference and the assembly generated with 
 rust is better.

 D's  live functions can indeed do such optimizations, though I haven't 
 got around to implementing them in DMD's optimizer. There's nothing 
 particularly difficult about it.

What's an example of such an optimization and why won't it introduce UB 
to  safe code?

Apr 25 2020

Walter Bright <newshound2 digitalmars.com> writes:

On 4/25/2020 4:00 PM, Timon Gehr wrote:
 What's an example of such an optimization and why won't it introduce UB to
 safe 
 code?

      live void test() { int a,b; foo(a, b); }

      live int foo(ref int a, ref int b) {
         a = 0;
         b = 1;
         return a;
     }

ref a and ref b cannot refer to the same memory object.

Apr 25 2020

Timon Gehr <timon.gehr gmx.ch> writes:

On 26.04.20 04:22, Walter Bright wrote:
 On 4/25/2020 4:00 PM, Timon Gehr wrote:
 What's an example of such an optimization and why won't it introduce 
 UB to  safe code?

 
       live void test() { int a,b; foo(a, b); }
 
       live int foo(ref int a, ref int b) {
          a = 0;
          b = 1;
          return a;
      }
 
 ref a and ref b cannot refer to the same memory object.

Actually they can, even in  safe  live code.

Apr 26 2020

Walter Bright <newshound2 digitalmars.com> writes:

On 4/26/2020 12:45 AM, Timon Gehr wrote:
 On 26.04.20 04:22, Walter Bright wrote:
 ref a and ref b cannot refer to the same memory object.

 Actually they can, even in  safe  live code.

Bug reports are welcome. Please tag them with the 'live' keyword in bugzilla.

Apr 26 2020

Timon Gehr <timon.gehr gmx.ch> writes:

On 4/26/20 10:19 AM, Walter Bright wrote:
 On 4/26/2020 12:45 AM, Timon Gehr wrote:
 On 26.04.20 04:22, Walter Bright wrote:
 ref a and ref b cannot refer to the same memory object.

 Actually they can, even in  safe  live code.

 
 Bug reports are welcome. Please tag them with the 'live' keyword in 
 bugzilla.

I can't do that because you did not agree it was a bug. According to 
your DIP and past discussions, the following is *intended* behavior:

int bar(ref int x,ref int y) safe  live{
     x=0;
     y=1;
     return x;
}

void main() safe{
     int x;
     import std.stdio;
     writeln(bar(x,x)); // 1
}

I have always criticized this design, but so far you have stuck to it. I 
have stated many times that the main reason why it is bad is that you 
don't actually enforce any new invariant, so  live does not enable any 
new patterns at least in  safe code.

In particular, if you start optimizing based on non-enforced and 
undocumented  live assumptions,  safe  live code will not be memory safe.

You can't optimize based on  live and preserve memory safety. Given that 
you want to preserve interoperability, this is because it is tied to 
functions instead of types.  live in its current form is useless except 
perhaps as a linting tool.

Apr 26 2020

Walter Bright <newshound2 digitalmars.com> writes:

On 4/26/2020 2:52 PM, Timon Gehr wrote:
 I can't do that because you did not agree it was a bug. According to your DIP 
 and past discussions, the following is *intended* behavior:
 
 int bar(ref int x,ref int y) safe  live{
      x=0;
      y=1;
      return x;
 }
 
 void main() safe{
      int x;
      import std.stdio;
      writeln(bar(x,x)); // 1
 }
 
 I have always criticized this design, but so far you have stuck to it. I have 
 stated many times that the main reason why it is bad is that you don't
actually 
 enforce any new invariant, so  live does not enable any new patterns at least
in 
  safe code.
 
 In particular, if you start optimizing based on non-enforced and undocumented 
  live assumptions,  safe  live code will not be memory safe.
 
 You can't optimize based on  live and preserve memory safety. Given that you 
 want to preserve interoperability, this is because it is tied to functions 
 instead of types.  live in its current form is useless except perhaps as a 
 linting tool.

 live's invariants rely on arguments passed to it that conform to its 
requirements. It's analogous to  safe code relying on its arguments conforming.

To get the checking here, main would have to be declared  live, too.

Apr 26 2020

Timon Gehr <timon.gehr gmx.ch> writes:

On 27.04.20 07:40, Walter Bright wrote:
 On 4/26/2020 2:52 PM, Timon Gehr wrote:
 I can't do that because you did not agree it was a bug. According to 
 your DIP and past discussions, the following is *intended* behavior:

 int bar(ref int x,ref int y) safe  live{
      x=0;
      y=1;
      return x;
 }

 void main() safe{
      int x;
      import std.stdio;
      writeln(bar(x,x)); // 1
 }

 I have always criticized this design, but so far you have stuck to it. 
 I have stated many times that the main reason why it is bad is that 
 you don't actually enforce any new invariant, so  live does not enable 
 any new patterns at least in  safe code.

 In particular, if you start optimizing based on non-enforced and 
 undocumented  live assumptions,  safe  live code will not be memory safe.

 You can't optimize based on  live and preserve memory safety. Given 
 that you want to preserve interoperability, this is because it is tied 
 to functions instead of types.  live in its current form is useless 
 except perhaps as a linting tool.

 
  live's invariants rely on arguments passed to it that conform to its 
 requirements. It's analogous to  safe code relying on its arguments 
 conforming.
 ...

No, it is not analogous, because only  system or  trusted code can get 
that wrong, not  safe code.  safe code itself is (supposed to be) 
verified, not trusted.

 To get the checking here, main would have to be declared  live, too.

I understand the design. It just does not make sense. All of the code is 
annotated  safe, but if you optimize based on unverified assumptions, it 
will not be memory safe. Is the goal of  live really to undermine 
 safe's guarantees?

Apr 27 2020

Walter Bright <newshound2 digitalmars.com> writes:

On 4/27/2020 7:26 AM, Timon Gehr wrote:
 I understand the design. It just does not make sense. All of the code is 
 annotated  safe, but if you optimize based on unverified assumptions, it will 
 not be memory safe.

It is a good point. The design of  live up to this point did not change the way 
code was generated. I still want to see how much of a difference it makes, and 
will implement it but make it an option.

Apr 27 2020

John Colvin <john.loughran.colvin gmail.com> writes:

On Monday, 27 April 2020 at 14:26:50 UTC, Timon Gehr wrote:
 No, it is not analogous, because only  system or  trusted code 
 can get that wrong, not  safe code.  safe code itself is 
 (supposed to be) verified, not trusted.

the existence of any overly  trusting code renders  safe code 
liable to cause memory safety bugs. While the invalid accesses 
won't occur inside  safe code, they can definitely be caused by 
them, even without the buggy  safe code calling any  trusted.

Some day I'll have time to write up all my (many, many pages of) 
notes for this stuff... would have been for dconf, I guess now 
for dconf online?

Apr 28 2020

welkam <wwwelkam gmail.com> writes:

On Tuesday, 28 April 2020 at 13:44:05 UTC, John Colvin wrote:
 On Monday, 27 April 2020 at 14:26:50 UTC, Timon Gehr wrote:
 No, it is not analogous, because only  system or  trusted code 
 can get that wrong, not  safe code.  safe code itself is 
 (supposed to be) verified, not trusted.

 the existence of any overly  trusting code renders  safe code 
 liable to cause memory safety bugs. While the invalid accesses 
 won't occur inside  safe code, they can definitely be caused by 
 them, even without the buggy  safe code calling any  trusted.

 Some day I'll have time to write up all my (many, many pages 
 of) notes for this stuff... would have been for dconf, I guess 
 now for dconf online?

Would be eager to listen.

Apr 28 2020

ag0aep6g <anonymous example.com> writes:

On 28.04.20 15:44, John Colvin wrote:
 the existence of any overly  trusting code renders  safe code liable to 
 cause memory safety bugs. While the invalid accesses won't occur inside 
  safe code, they can definitely be caused by them, even without the 
 buggy  safe code calling any  trusted.

I don't see how you arrive at "buggy  safe code" here. You say it 
yourself: When there is "overly  trusted code", then that's where the 
bug is.

Apr 28 2020

Timon Gehr <timon.gehr gmx.ch> writes:

On 29.04.20 00:36, ag0aep6g wrote:
 On 28.04.20 15:44, John Colvin wrote:
 the existence of any overly  trusting code renders  safe code liable 
 to cause memory safety bugs. While the invalid accesses won't occur 
 inside  safe code, they can definitely be caused by them, even without 
 the buggy  safe code calling any  trusted.

 
 I don't see how you arrive at "buggy  safe code" here. You say it 
 yourself: When there is "overly  trusted code", then that's where the 
 bug is.

I guess he is talking about the case where  trusted code calls buggy 
 safe code and relies on it being correct to ensure memory safety. 
(However, this is still the fault of the  trusted code.  safe code 
cannot be blamed for violations of memory safety.)

Apr 28 2020

drug <drug2004 bk.ru> writes:

24.04.2020 22:27, Arine пишет:
 On Thursday, 23 April 2020 at 15:57:01 UTC, drug wrote:
 And your statement that Rust assembly output is better is wrong.

 

Yes, your statement that Rust assembly output is better is wrong, 
because	one single optimization applicable in some cases does not make 
Rust better in general. Period.

Once again, Rust assembly output can be better in some cases. But there 
is the big difference between these two statements - "better in some 
cases", and "better in general". More over you are wrong twice. Because 
this optimization is not free at all. You pay for it in the form of 
restriction that you can not have more than one mutable reference. This 
means that cyclic data structures are unusually difficult compared to 
almost any other programming language.
Also this optimization is available in C for a long time. Even more - in 
some cases GC based application can be faster that one with manual 
memory management because it allows to avoid numerous 
allocation/deallocation. What you are talking about is premature 
optimization in fact.

 
 There most definitely is a difference and the assembly generated with 
 rust is better. This is just a simple example to illustrate the 
 difference. If you don't know why the difference is significant or why 
 it is happening. There are a lot of great articles out there, sadly 
 there are people such as yourself spreading misinformation that don't 
 know what a borrow checker is and don't know Rust or why it is has gone 
 as far as it has. This is why the borrow checker for D is going to fail. 
 Because the person designing it, such as yourself, doesn't have any idea 
 what they are redoing and have never even bothered to touch Rust or 
 learn about it. Anyways I'm not your babysitter, if you don't understand 
 the above, as most people seem to not bother to learn assembly anymore, 
 you're on your own.
 

Self-importance written all over your post. Here you make your third 
mistake - you are very far away of being able to be my babysitter. 
Trying to show your competence you show only your blindly ignorance. The 
world is much less trivial than a function with two mutable references 
not performing any useful work.

Apr 25 2020

random <random spaml.de> writes:

On Friday, 24 April 2020 at 19:27:40 UTC, Arine wrote:
 On Thursday, 23 April 2020 at 15:57:01 UTC, drug wrote:
 And your statement that Rust assembly output is better is 
 wrong.

 There most definitely is a difference and the assembly 
 generated with rust is better. This is just a simple example to 
 illustrate the difference. If you don't know why the difference 
 is significant or why it is happening. There are a lot of great 
 articles out there, sadly there are people such as yourself 
 spreading misinformation that don't know what a borrow checker 
 is and don't know Rust or why it is has gone as far as it has. 
 This is why the borrow checker for D is going to fail. Because 
 the person designing it, such as yourself, doesn't have any 
 idea what they are redoing and have never even bothered to 
 touch Rust or learn about it. Anyways I'm not your babysitter, 
 if you don't understand the above, as most people seem to not 
 bother to learn assembly anymore, you're on your own.

A competent C Programmer could just write something like this. Or 
use restrict...

int test(int* x, int* y) {
     int result = *x = 0;
     *y = 1;
     return result;
}

Produce this with gcc -O

test(int*, int*):
         mov     DWORD PTR [rdi], 0
         mov     DWORD PTR [rsi], 1
         mov     eax, 0
         ret

https://godbolt.org/z/rpM_eK

So the statement rust produces better assembly is wrong.
I it's on my todo-list to learn rust. What is really off-putting 
are those random fanatic rust fanboys.
In your language:
"If you don't know why the difference is significant or why it is 
happening",
you should probably learn C before you start insulting people in 
a programming forum ;)

Apr 29 2020

IGotD- <nise nise.com> writes:

On Wednesday, 29 April 2020 at 10:32:33 UTC, random wrote:
 A competent C Programmer could just write something like this. 
 Or use restrict...

 int test(int* x, int* y) {
     int result = *x = 0;
     *y = 1;
     return result;
 }

I'm incompetent so I would just write:

int test(int* x, int* y) {
      *x = 0;
      *y = 1;
      return 0;
}

Apr 29 2020

random <random spaml.de> writes:

On Wednesday, 29 April 2020 at 10:36:59 UTC, IGotD- wrote:
 I'm incompetent so I would just write:

 int test(int* x, int* y) {
      *x = 0;
      *y = 1;
      return 0;
 }

Ok in this simple case it's obvius.
For a real world example look at the source for strcmp():
https://code.woboq.org/userspace/glibc/string/strcmp.c.html

The trick is to load the value in a variable. The compiler can't 
optimize multiple pointer reads from the same pointer because the 
content could already be changed by an other pointer. restrict 
solves that, but if you now what is happening and why you can 
solve that by hand.

Apr 29 2020

IGotD- <nise nise.com> writes:

On Wednesday, 29 April 2020 at 10:46:57 UTC, random wrote:
 Ok in this simple case it's obvius.
 For a real world example look at the source for strcmp():
 https://code.woboq.org/userspace/glibc/string/strcmp.c.html

 The trick is to load the value in a variable. The compiler 
 can't optimize multiple pointer reads from the same pointer 
 because the content could already be changed by an other 
 pointer. restrict solves that, but if you now what is happening 
 and why you can solve that by hand.

In the strcmp example, shouldn't the compiler be able to do the 
same optimizations as you would use restrict because both 
pointers are declared const and the content do not change?

Apr 29 2020

random <random spaml.de> writes:

On Wednesday, 29 April 2020 at 12:37:29 UTC, IGotD- wrote:
 In the strcmp example, shouldn't the compiler be able to do the 
 same optimizations as you would use restrict because both 
 pointers are declared const and the content do not change?

Good question. My strcmp example is actually really bad, because 
if you never write through any pointer it doesn't make a 
difference ;) The way it is written is still interesting.

I made a quick test case to evaluate the influence of const:

https://godbolt.org/z/qRwFa9
https://godbolt.org/z/iEj7LV
https://godbolt.org/z/EMqDDy

int test(int * x, int * y, <const?> int * <restrict?> z)
{
     *y = *z;
     *x = *z;
     return *z;
}

As you can see from the compiler output const doesn't improve the 
optimization.
I think the compiler can't optimize it because const doesn't give 
you real guarantees in C.
You could just call the function like this.

int a;
test(&a, &a, &a);

"One mans constant is an other mans variable."

Apr 29 2020

random <random spaml.de> writes:

On Wednesday, 29 April 2020 at 16:19:55 UTC, random wrote:

And of course the workaround if you don't want to use restrict:

int test(int * x, int * y, int * z)
{
     int tmp = *z;
     *y = tmp;
     *x = tmp;
     return tmp;
}

Produces the same as the restrict version.
https://godbolt.org/z/yJJcMK

Apr 29 2020

Walter Bright <newshound2 digitalmars.com> writes:

On 4/29/2020 9:19 AM, random wrote:
 I think the compiler can't optimize it because const doesn't give you real 
 guarantees in C.

You'd be right.

Apr 29 2020

random <random spaml.de> writes:

On Wednesday, 29 April 2020 at 10:32:33 UTC, random wrote:
<

I forgot to add this...
Compile with gcc -O3:

test(int*, int*):
         mov     DWORD PTR [rdi], 0
         xor     eax, eax
         mov     DWORD PTR [rsi], 1
         ret

https://godbolt.org/z/xW6w6W

Apr 29 2020

welkam <wwwelkam gmail.com> writes:

On Wednesday, 22 April 2020 at 22:34:32 UTC, Arine wrote:
 Not quite. Rust will generate better assembly as it can 
 guarantee that use of an object is unique. Similar to C's 
 "restrict" keyword but you get it for "free" across the entire 
 application.

Cool. Did not knew that. I know that different languages have 
different semantics and code that looks the same might produce 
different results so thats why I used a word equivalent instead 
of same. You can achieve the same goal in D as in Rust but the 
code would be different.

Apr 26 2020

Bastiaan Veelo <Bastiaan Veelo.net> writes:

On Wednesday, 22 April 2020 at 14:00:10 UTC, serge wrote:

[...]

 2) Various Web performance test on  
 https://www.techempower.com/benchmarks/#section=data-r18&hw=ph&test=query show
pretty poor performance of D using vibe.d and hunt framework/libraries. 
Regardless of type of test - json, single query, multiple query and so on.

[...]

 I am researching if D and D web framework whether it can be 
 used as  replacement python/django/flask within our company. 
 Although if D web framework show worse performance than Go then 
 probably it is not right tool for the job.
 Any comments and feedback would be appreciated.

Vibe.d's performance in benchmarks has been discussed before[1]. 
 From what I remember, the limiting factor is developer time 
allocation and profiling on specific hardware, which means it can 
probably be solved with money ;-)

-- Bastiaan.

[1] https://forum.dlang.org/post/qg9dud$hbo$1 digitalmars.com

Apr 22 2020

Guillaume Piolat <first.last gmail.com> writes:

On Wednesday, 22 April 2020 at 14:00:10 UTC, serge wrote:
 My understanding that D is the language in similar ballpark 
 performance league as C, C++, Rust.

Yes.

If you have time to optimize: there is preciously little 
difference speed-wise between native languages.

Every native language using the same backend end up in the same 
ballbark, with tricks to get the code to the same baseline.

The last percents wille be due to different handling of UB, 
integer overflow, aliasing... but in general the ethos of native 
language is to allow you to reach top native speed and in the end 
they will generate the exact same code.


But, if your application is barely optimized, or more likely you 
don't have time to optimize properly, it becomes a bit more 
interesting. Defaults will matter a lot more and things like GC, 
whether the langage encourages copies, and the "idiomatic" style 
that is accepted will start to bear consequences (and even more 
so: libraries). This is what end up in benchmarks, but if the 
application was worth optimizing for it (in terms of added value) 
it would be optimized hard to get to that native ceiling.

In short, the less useful an application is, the more it will 
display large differences between languages with similar 
low-level capabilities.


It would be much more interesting to compare _backends_, but 
people keep comparing front-ends because it drives traffic and 
commentary.

Apr 22 2020

serge <abc abc.com> writes:

On Wednesday, 22 April 2020 at 16:23:58 UTC, Guillaume Piolat 
wrote:
 On Wednesday, 22 April 2020 at 14:00:10 UTC, serge wrote:
 My understanding that D is the language in similar ballpark 
 performance league as C, C++, Rust.

 Yes.

 If you have time to optimize: there is preciously little 
 difference speed-wise between native languages.

 Every native language using the same backend end up in the same 
 ballbark, with tricks to get the code to the same baseline.

 The last percents wille be due to different handling of UB, 
 integer overflow, aliasing... but in general the ethos of 
 native language is to allow you to reach top native speed and 
 in the end they will generate the exact same code.


 But, if your application is barely optimized, or more likely 
 you don't have time to optimize properly, it becomes a bit more 
 interesting. Defaults will matter a lot more and things like 
 GC, whether the langage encourages copies, and the "idiomatic" 
 style that is accepted will start to bear consequences (and 
 even more so: libraries). This is what end up in benchmarks, 
 but if the application was worth optimizing for it (in terms of 
 added value) it would be optimized hard to get to that native 
 ceiling.

 In short, the less useful an application is, the more it will 
 display large differences between languages with similar 
 low-level capabilities.


 It would be much more interesting to compare _backends_, but 
 people keep comparing front-ends because it drives traffic and 
 commentary.

Could you please elaborate on that? what are you referring to as 
backend?  I am not interested to compare one small single 
operation - fib test already  did that.
To me techempower stats is pretty good indicator - it shows json 
processing, single/multiquery requests, database, static. Overall 
performance across those stats give pretty good idea, how 
language and web framework is created, its ecosystem.
For example if language is fast on basic operations but two 
frameworks show less then adequate performance then obviously 
something wrong with the whole ecosystem - it could be difficult 
to create  fast and efficient apps for average developer. For 
example Scala - powerfull but yet very complicated language with 
tons of problems. Most of Scala projects failed. It is very 
difficult and slow to create  efficient  applications for  
average developer. It kinds requires rocket scientist to write 
good code in Scala.  Does D exhibit same problem?

Apr 24 2020

Guillaume Piolat <firstname.lastname gmail.com> writes:

On Friday, 24 April 2020 at 13:44:18 UTC, serge wrote:
 Could you please elaborate on that? what are you referring to 
 as backend?

I was mentionning LLVM vs GCC vs Intel compiler backend, the part 
that converts code to instructions after the original language is 
out of sight.

 To me techempower stats is pretty good indicator - it shows 
 json processing, single/multiquery requests, database, static. 
 Overall performance across those stats give pretty good idea, 
 how language and web framework is created, its ecosystem.
 For example if language is fast on basic operations but two 
 frameworks show less then adequate performance then obviously 
 something wrong with the whole ecosystem - it could be 
 difficult to create  fast and efficient apps for average 
 developer. For example Scala - powerfull but yet very 
 complicated language with tons of problems. Most of Scala 
 projects failed. It is very difficult and slow to create  
 efficient  applications for  average developer. It kinds 
 requires rocket scientist to write good code in Scala.  Does D 
 exhibit same problem?

Very fair reasoning.

I don't think D has as much problems as Scala, D has a very 
gentle learning curve and it's not difficult to be productive in. 
But I'd say most of D's problems are indeed ecosystem-related, 
possibly because of the kind of personnalities that D attracts : 
the reluctance from D programmers to gather around the same piece 
of code makes the ecosystem more insular than needed, as is 
typical with native programming. D code today has a tendency to 
balkanize based on various requirements such as exceptions or 
not, runtime or not,  safe or not, -betterC or not... It seems to 
me languages where DIY is frowned upon (Java) or discouraged by 
the practice of FFI have better library ecosystems, for better or 
worse.

Apr 26 2020

JN <666total wp.pl> writes:

On Sunday, 26 April 2020 at 12:37:48 UTC, Guillaume Piolat wrote:
 But I'd say most of D's problems are indeed ecosystem-related, 
 possibly because of the kind of personnalities that D attracts 
 : the reluctance from D programmers to gather around the same 
 piece of code makes the ecosystem more insular than needed, as 
 is typical with native programming. D code today has a tendency 
 to balkanize based on various requirements such as exceptions 
 or not, runtime or not,  safe or not, -betterC or not... It 
 seems to me languages where DIY is frowned upon (Java) or 
 discouraged by the practice of FFI have better library 
 ecosystems, for better or worse.

These are connected. Languages like Java don't give you options. 
You will use the GC, you will use OOP. Imagine an XML library. 
Any Java XML DOM library will offer a XMLDocument object with a 
load method (or constructor). This is expected and more or less 
the same in every library.

D doesn't force the paradigm on you. Some people will want to use 
the GC, some won't, some will want to use OOP, some will avoid it 
like fire. It's a tradeoff, for higher flexibility and power you 
trade some composability.

Apr 26 2020

Daniel Kozak <kozzi11 gmail.com> writes:

On Fri, Apr 24, 2020 at 3:46 PM serge via Digitalmars-d
<digitalmars-d puremagic.com> wrote:
 To me techempower stats is pretty good indicator - it shows json
 processing, single/multiquery requests, database, static. Overall
 performance across those stats give pretty good idea, how
 language and web framework is created, its ecosystem.

Unfortunately there is a big issue with techempower. Because it is so
popular almost every framework [language] try to have a best score in
it.
And in many cases this mean they use some hacks or tricks to achieve
that. So in general techempower results are useless. From my own
experience D performance is really good in a real word scenarios.
Other issue with techempower benchmark is there is almost zero
complexity. All tests do some basic operations on realy small
datasets.

Apr 26 2020

JN <666total wp.pl> writes:

On Sunday, 26 April 2020 at 16:59:44 UTC, Daniel Kozak wrote:
 Unfortunately there is a big issue with techempower. Because it 
 is so
 popular almost every framework [language] try to have a best 
 score in
 it.
 And in many cases this mean they use some hacks or tricks to 
 achieve
 that. So in general techempower results are useless. From my own
 experience D performance is really good in a real word 
 scenarios.
 Other issue with techempower benchmark is there is almost zero
 complexity. All tests do some basic operations on realy small
 datasets.

It's nice to have a moral victory and claim to be above "those 
cheaters", but links to these benchmarks are shared in many 
places. If someone wants to see how fast D is, they will write 
"programming language benchmark" in their websearch of choice, 
and TechEmpower will be high in the results list. He will click, 
and go "oh wow, even PHP is faster than that D stuff".

Whether it's cheating or not, perception matters and people will 
use such benchmarks to base their decision, even if it's 
unreasonable and doesn't apply to real world scenarios.

Apr 26 2020

Daniel Kozak <kozzi11 gmail.com> writes:

On Sun, Apr 26, 2020 at 9:35 PM JN via Digitalmars-d
<digitalmars-d puremagic.com> wrote:
 It's nice to have a moral victory and claim to be above "those
 cheaters", but links to these benchmarks are shared in many
 places. If someone wants to see how fast D is, they will write
 "programming language benchmark" in their websearch of choice,
 and TechEmpower will be high in the results list. He will click,
 and go "oh wow, even PHP is faster than that D stuff".

 Whether it's cheating or not, perception matters and people will
 use such benchmarks to base their decision, even if it's
 unreasonable and doesn't apply to real world scenarios.

Yes I agree, this is the reason why I am improving those benchmarks
from time to time, to
make D faster than PHP :D

Apr 26 2020

Wulfklaue <wulfklaue wulfklaue.com> writes:

On Sunday, 26 April 2020 at 16:59:44 UTC, Daniel Kozak wrote:
On Fri, Apr 24, 2020 at 3:46 PM serge via Digitalmars-d
<digitalmars-d puremagic.com> wrote:
To me techempower stats is pretty good indicator - it shows
json processing, single/multiquery requests, database, static.
Overall performance across those stats give pretty good idea,
how language and web framework is created, its ecosystem.

Unfortunately there is a big issue with techempower. Because it
is so
popular almost every framework [language] try to have a best
score in
it.
And in many cases this mean they use some hacks or tricks to
achieve
that. So in general techempower results are useless.

As somebody who implemented the Swoole+PHP and Crystal code at
Techempowered, i can state that this statement is factually wrong.

The code is very idiomatic code that anybody writes. Basic
database calls, pool for connections, prepare statements,
standard http module or frameworks. There is no magic in the code
that try's to do direct system calls or has stripped down drivers
or any other stuff that people normally will not use.

Where you can see some funny business, is in the top 10 a 20,
where Rust and co have some extreme optimized code, that is not
how most people will write the code. But those are the extreme
cases, what anybody with half a brain ignores because that is not
how your write normal code. I always say: Look at the code to see
if the results are normal or over optimized/unrealistic crap.

If we compare normal implementations (
https://www.techempower.com/benchmarks/#section=test&runid=c7152e8f-5b33-4ae7-9e89-630af44bc8de&
w=ph&test=plaintext ) like

Futures:

vibed-ldc-pgsql: 58k
Crystal: 206k
PHP+Swoole: 289k

D's results are simply abysmal. We are talking basic idiomatic
code here. This tells me more that D has a issue with its DB
handling on those tests.

We need to look at stuff like "hello world" ( plain text ), json,
where the performance difference drops down to 2x.

The plaintext is literally take a string and output it. We are
talking route + echo in PHP and any other language. Or basic
encoding a json and outputting it. A few lines of code, that is
it. Yet D suffers still in those tests with a 2x issue. Does that
not tell you that D or VibeD suffer from a actual performance
issue? A point that clearly needs to be looked after.

If the argument is dat D is not properly optimized, then what are
PHP+Workerman/Swoole/..., Crystal?

Other issue with techempower benchmark is there is almost zero
complexity. All tests do some basic operations on realy small
datasets.

Futures shows a more realistic real world web scenario. The rest
mostly show weaknesses in each language+framework specific
section. If your json score is low, there is a problem with your
json library or the way your framework handles the requests. If
your plaintext results are low ... You get the drill.

If you simply try to scuff at a issue by stating "in real world
we are faster" but you have benchmarks like this online... The
nice things about techempowered is that it really shows if your
language is fast for basic web tasks or not. It does not give a
darn that your language can run "real world" fast, if there are
underlying issues. For people who are interested in D for website
hosting, its simply slower then the competitors.

Do not like it? Then see where the issues are and fix them. Be it
in the techempowered code, in D or in VibeD. But clearly there is
a issue if D can not compete with implementations of other
languages ( again, talking normal implementations, stuff that
anybody will use ).

If given the choice, what will people pick? D that simply ignores
the web market or other languages/frameworks where the speed out
of the door is great.

Its funny seeing comments like this where a simple question by
the OP, turns into a whole and totally useless technical
discussion. Followed by some comment that comes down to"ignore it
because everybody cheats" And people here wonder why D has issues
with popularity. Really! Get out much?

From my point of view, the comment is insulting and tantamount as
to calling people like me, who implemented a few of the other
languages as "cheaters", when its literally basic code that is
used everywhere ( trust me, i am not some magic programmer, who
knows C++ out of the back of his hand. I barely scrap by on my
own with PHP and Ruby ).

If the issue is at D its end. Be its D, Vibe.D or the Code used,
then fix it but do not insult everybody else ( especially the
people who wrote normal code ).

As the saying goes: "always clean your own house first, before
criticizing your neighbor's house".

Apr 26 2020

mipri <mipri minimaltype.com> writes:

On Wednesday, 22 April 2020 at 14:00:10 UTC, serge wrote:
 My understanding that D is the language in similar ballpark 
 performance league as C, C++, Rust. Hence the question - why 
 web performance of those 2 web frameworks is so poor compared 
 to rivals?

Consider this benchmark from the thread next door, "Memory
issues. GC not giving back memory to OS?":

   import std.stdio;

   void main(string[] args)
   {
       int[] v = [1, 2];
       int n = 1 << 30;
       for (int i = 0; i < n - 2; ++i)
       {
           v ~= i;
       }
       writefln("v len: %s cap: %s\n", v.length, v.capacity);
   }

With an average of four runs, compiled with gdc -O3, this takes
40s and has a max RSS of 7.9 GB.

Here's the same benchmark changed to use std.container.array:

   void main()  nogc {
       import core.stdc.stdio : printf;
       import std.container.array;

       Array!int v = Array!int(1, 2);
       foreach (i; 0 .. (1 << 30) - 2)
           v ~= i;
       printf("v len: %d cap: %d\n", v.length, v.capacity);
   }

Same treatment: 3.3s and a max RSS of 4.01 GB.
More than ten times faster.

If you set out to make a similar benchmark in C++ or Rust
you'll naturally see performance more like the second example
than the first. So there's some extra tension here: D has
high-convenience facilities like this that let it compete with
scripting languages for ease of development, but after you've
exercised some ease of development you might want to transition
away from these facilities.

D has other tensions, like "would you like the GC, or no?" or
"would you like the the whole language with TypeInfo and AAs,
or no?", or "would you like speed-of-light compile times, or
would you like to do a lot of CTFE and static reflection?"

And this is more how I'd characterize the language. Not as "it
has this such-and-such performance ballpark and I should be
very surprised if a particular web framework doesn't match
that", but "it's a bunch of sublanguages in one and therefore
you have to look closer at a given web framework to even say
which sublanguage it's written in".

I think the disadvantages of D being like this are obvious. An
advantage of it being like this, is that if you one day decide
that you'd prefer a D application have C++-style performance,
you don't have to laboriously rewrite the application into
a completely different language. The D-to-D FFI, as it were, is
really good, so you can make transitions like that as needed,
even to just the parts of the application that need them.

Apr 22 2020

serge <abc abc.com> writes:

On Thursday, 23 April 2020 at 00:00:29 UTC, mipri wrote:
 On Wednesday, 22 April 2020 at 14:00:10 UTC, serge wrote:
 My understanding that D is the language in similar ballpark 
 performance league as C, C++, Rust. Hence the question - why 
 web performance of those 2 web frameworks is so poor compared 
 to rivals?

 Consider this benchmark from the thread next door, "Memory
 issues. GC not giving back memory to OS?":

   import std.stdio;

   void main(string[] args)
   {
       int[] v = [1, 2];
       int n = 1 << 30;
       for (int i = 0; i < n - 2; ++i)
       {
           v ~= i;
       }
       writefln("v len: %s cap: %s\n", v.length, v.capacity);
   }

 With an average of four runs, compiled with gdc -O3, this takes
 40s and has a max RSS of 7.9 GB.

 Here's the same benchmark changed to use std.container.array:

   void main()  nogc {
       import core.stdc.stdio : printf;
       import std.container.array;

       Array!int v = Array!int(1, 2);
       foreach (i; 0 .. (1 << 30) - 2)
           v ~= i;
       printf("v len: %d cap: %d\n", v.length, v.capacity);
   }

 Same treatment: 3.3s and a max RSS of 4.01 GB.
 More than ten times faster.

 If you set out to make a similar benchmark in C++ or Rust
 you'll naturally see performance more like the second example
 than the first. So there's some extra tension here: D has
 high-convenience facilities like this that let it compete with
 scripting languages for ease of development, but after you've
 exercised some ease of development you might want to transition
 away from these facilities.

 D has other tensions, like "would you like the GC, or no?" or
 "would you like the the whole language with TypeInfo and AAs,
 or no?", or "would you like speed-of-light compile times, or
 would you like to do a lot of CTFE and static reflection?"

 And this is more how I'd characterize the language. Not as "it
 has this such-and-such performance ballpark and I should be
 very surprised if a particular web framework doesn't match
 that", but "it's a bunch of sublanguages in one and therefore
 you have to look closer at a given web framework to even say
 which sublanguage it's written in".

 I think the disadvantages of D being like this are obvious. An
 advantage of it being like this, is that if you one day decide
 that you'd prefer a D application have C++-style performance,
 you don't have to laboriously rewrite the application into
 a completely different language. The D-to-D FFI, as it were, is
 really good, so you can make transitions like that as needed,
 even to just the parts of the application that need them.

I did check the library.. My understanding that proposal is to 
use the library with manual memory management without  GC. My 
concern was that D web frameworks performed worse then frameworks 
in Go and Java which are GC only languages.

Does it mean that GC in D is far from great that we need to avoid 
it in order to beat Java/Go? Probably would worth to stress  - I 
didn't mean in fact beat, I would love to see stats on par with 
Go and Java but unfortunately D was few times slower - close to 
Python and Ruby...
Despite the library can speed things up I believe the 
language/runtime should be able to work efficiently for such type 
of operations. We should not need to develop such libraries in 
order to have good performance. To me it is bug or poor 
implementation of GC or deficiency in design of runtime.  The 
need for that type of libraries to solve deficiencies in runtime 
would not allow to focus on writing good code but instead to look 
for gotchas to get adequate solution.

Apr 24 2020

Paulo Pinto <pjmlp progtools.org> writes:

On Friday, 24 April 2020 at 13:58:53 UTC, serge wrote:
 On Thursday, 23 April 2020 at 00:00:29 UTC, mipri wrote:
 On Wednesday, 22 April 2020 at 14:00:10 UTC, serge wrote:
 My understanding that D is the language in similar ballpark 
 performance league as C, C++, Rust. Hence the question - why 
 web performance of those 2 web frameworks is so poor compared 
 to rivals?

 Consider this benchmark from the thread next door, "Memory
 issues. GC not giving back memory to OS?":

   import std.stdio;

   void main(string[] args)
   {
       int[] v = [1, 2];
       int n = 1 << 30;
       for (int i = 0; i < n - 2; ++i)
       {
           v ~= i;
       }
       writefln("v len: %s cap: %s\n", v.length, v.capacity);
   }

 With an average of four runs, compiled with gdc -O3, this takes
 40s and has a max RSS of 7.9 GB.

 Here's the same benchmark changed to use std.container.array:

   void main()  nogc {
       import core.stdc.stdio : printf;
       import std.container.array;

       Array!int v = Array!int(1, 2);
       foreach (i; 0 .. (1 << 30) - 2)
           v ~= i;
       printf("v len: %d cap: %d\n", v.length, v.capacity);
   }

 Same treatment: 3.3s and a max RSS of 4.01 GB.
 More than ten times faster.

 If you set out to make a similar benchmark in C++ or Rust
 you'll naturally see performance more like the second example
 than the first. So there's some extra tension here: D has
 high-convenience facilities like this that let it compete with
 scripting languages for ease of development, but after you've
 exercised some ease of development you might want to transition
 away from these facilities.

 D has other tensions, like "would you like the GC, or no?" or
 "would you like the the whole language with TypeInfo and AAs,
 or no?", or "would you like speed-of-light compile times, or
 would you like to do a lot of CTFE and static reflection?"

 And this is more how I'd characterize the language. Not as "it
 has this such-and-such performance ballpark and I should be
 very surprised if a particular web framework doesn't match
 that", but "it's a bunch of sublanguages in one and therefore
 you have to look closer at a given web framework to even say
 which sublanguage it's written in".

 I think the disadvantages of D being like this are obvious. An
 advantage of it being like this, is that if you one day decide
 that you'd prefer a D application have C++-style performance,
 you don't have to laboriously rewrite the application into
 a completely different language. The D-to-D FFI, as it were, is
 really good, so you can make transitions like that as needed,
 even to just the parts of the application that need them.

 I did check the library.. My understanding that proposal is to 
 use the library with manual memory management without  GC. My 
 concern was that D web frameworks performed worse then 
 frameworks in Go and Java which are GC only languages.

 Does it mean that GC in D is far from great that we need to 
 avoid it in order to beat Java/Go? Probably would worth to 
 stress  - I didn't mean in fact beat, I would love to see stats 
 on par with Go and Java but unfortunately D was few times 
 slower - close to Python and Ruby...
 Despite the library can speed things up I believe the 
 language/runtime should be able to work efficiently for such 
 type of operations. We should not need to develop such 
 libraries in order to have good performance. To me it is bug or 
 poor implementation of GC or deficiency in design of runtime.  
 The need for that type of libraries to solve deficiencies in 
 runtime would not allow to focus on writing good code but 
 instead to look for gotchas to get adequate solution.

Yes, that does mean that D's GC still needs some improvements, 
although much have been done during the last year.

Also note that while Java and Go are heavy GC languages, there 
are ways to do value based coding, and although Project Valhala 
is taking its time due to the engineering issues it addresses, it 
will eventually be done.

Just like in all GC enabled languages that offer multiple 
allocation mechanisms alongside the GC you should approach it in 
stages.

Use the GC for you initial solution and only in cases where it is 
obvious from the start that it might be a problem, or when proven 
that some issues might have to be adressed then look for value 
allocation,  nogc, referenced counted collections and other low 
level style tricks.

That is the nice thing about systems languages like D, you don't 
need to code like C from the start, and when you need to actually 
do it, the tools are available.

Apr 25 2020

Jon Degenhardt <jond noreply.com> writes:

On Wednesday, 22 April 2020 at 14:00:10 UTC, serge wrote:
 D has sparked my interested therefore last week I have started 
 to look into the language and have completed D course on 
 pluralsight. One of area where I would like to apply D in  web 
 application/cloud. Golang  is not bad but I think D seems more 
 powerful. Although during my research I have found interesting 
 facts:

 1) fib test (https://github.com/drujensen/fib) with D  
 (compiled with ldc) showed really good performance results.
 2) Various Web performance test on  
 https://www.techempower.com/benchmarks/#section=data-r18&hw=ph&test=query show
pretty poor performance of D using vibe.d and hunt framework/libraries. 
Regardless of type of test - json, single query, multiple query and so on.

 My understanding that D is the language in similar ballpark 
 performance league as C, C++, Rust. Hence the question - why 
 web performance of those 2 web frameworks is so poor compared 
 to rivals? I would say few times worse. This is not a troll 
 post by any means. I am researching if D and D web framework 
 whether it can be used as  replacement python/django/flask 
 within our company. Although if D web framework show worse 
 performance than Go then probably it is not right tool for the 
 job.
 Any comments and feedback would be appreciated.

I gave a talk at DConf 2018 you may be interested in. The talk 
goes over a set of benchmark studies I did. The video was lost, 
but the slides are here: 
https://github.com/eBay/tsv-utils/blob/master/docs/dconf2018.pdf.

The slides are probably the easiest way to get an overview. The 
full details on the benchmark studies on the tsv-utils repo: 
https://github.com/eBay/tsv-utils/blob/master/docs/Performance.md

--Jon

Apr 24 2020

Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:

On Wednesday, 22 April 2020 at 14:00:10 UTC, serge wrote:
 2) Various Web performance test on  
 https://www.techempower.com/benchmarks/#section=data-r18&hw=ph&test=query show
pretty poor performance of D using vibe.d and hunt framework/libraries. 
Regardless of type of test - json, single query, multiple query and so on.

I don't have a reference off the top of my head, but IIRC much of 
the upper end of those benchmarks is the result less of inherent 
language or framework differences and more to do with which 
implementations have been strongly tailored for each particular 
benchmark.  (I think this has been discussed on the D forums 
before.)

That tailoring to the benchmark often means dropping things (e.g. 
certain kinds of validation or safety measures) that any 
realistic app would have to do.  So the top end of those 
benchmark tables may be rather misleading when it comes to 
real-world usage.

There may well be framework design decisions in place that have a 
stronger impact.  For example I recall that the default 
"user-friendly" vibe.d tools for handling HTTP requests creates a 
situation where the easy thing to do is generate garbage per 
request.  So unless one addresses that, it will put ready 
constraints on exactly how performant one can get.

Note, this is _not_ a case of "the GC is bad" or "you can't get 
good performance with a GC".  It's a case of, if you use the GC 
naively, rather than having a good strategy for preallocation and 
re-use of resources, you will force the GC into having to do work 
that can be avoided.

So leaving aside misleading factors like extreme tailoring to the 
benchmark, I would suggest the memory allocation strategies in 
use are probably the first thing to look at in asking why those D 
implementations might be less performant could than they could be.

When it comes to the frameworks, the questions might be: (i) are 
there any cases where that framework _forces_ you into a 
suboptimal memory (re)allocation strategy? and (ii) even if there 
aren't, how easy/idiomatic is it to use a performance-oriented 
allocation strategy?

Apr 25 2020

D Programming

C/C++ Programming

Other

digitalmars.D - D perfomance