www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - D perfomance

reply serge <abc abc.com> writes:
D has sparked my interested therefore last week I have started to 
look into the language and have completed D course on 
pluralsight. One of area where I would like to apply D in  web 
application/cloud. Golang  is not bad but I think D seems more 
powerful. Although during my research I have found interesting 
facts:

1) fib test (https://github.com/drujensen/fib) with D  (compiled 
with ldc) showed really good performance results.
2) Various Web performance test on  
https://www.techempower.com/benchmarks/#section=data-r18&hw=ph&test=query show
pretty poor performance of D using vibe.d and hunt framework/libraries. 
Regardless of type of test - json, single query, multiple query and so on.

My understanding that D is the language in similar ballpark 
performance league as C, C++, Rust. Hence the question - why web 
performance of those 2 web frameworks is so poor compared to 
rivals? I would say few times worse. This is not a troll post by 
any means. I am researching if D and D web framework whether it 
can be used as  replacement python/django/flask within our 
company. Although if D web framework show worse performance than 
Go then probably it is not right tool for the job.
Any comments and feedback would be appreciated.
Apr 22 2020
next sibling parent reply welkam <wwwelkam gmail.com> writes:
On Wednesday, 22 April 2020 at 14:00:10 UTC, serge wrote:
 My understanding that D is the language in similar ballpark 
 performance league as C, C++, Rust.
Equivalent implementation in C, C++, D, Rust, Nim compiled with same compiler backend should give exact same machine code. What you see in online language comparisons is mostly comparing different implementations and how much time people spent on optimizing.
 why web performance of those 2 web frameworks is so poor 
 compared to rivals?
Difference in implementation. My guess is that people writing those servers didnt had time to spend on optimizations.
Apr 22 2020
parent reply Arine <arine1283798123 gmail.com> writes:
On Wednesday, 22 April 2020 at 15:24:29 UTC, welkam wrote:
 On Wednesday, 22 April 2020 at 14:00:10 UTC, serge wrote:
 My understanding that D is the language in similar ballpark 
 performance league as C, C++, Rust.
Equivalent implementation in C, C++, D, Rust, Nim compiled with same compiler backend should give exact same machine code. What you see in online language comparisons is mostly comparing different implementations and how much time people spent on optimizing.
Not quite. Rust will generate better assembly as it can guarantee that use of an object is unique. Similar to C's "restrict" keyword but you get it for "free" across the entire application.
Apr 22 2020
next sibling parent reply drug <drug2004 bk.ru> writes:
23.04.2020 01:34, Arine пишет:
 On Wednesday, 22 April 2020 at 15:24:29 UTC, welkam wrote:
 On Wednesday, 22 April 2020 at 14:00:10 UTC, serge wrote:
 My understanding that D is the language in similar ballpark 
 performance league as C, C++, Rust.
Equivalent implementation in C, C++, D, Rust, Nim compiled with same compiler backend should give exact same machine code. What you see in online language comparisons is mostly comparing different implementations and how much time people spent on optimizing.
Not quite. Rust will generate better assembly as it can guarantee that use of an object is unique. Similar to C's "restrict" keyword but you get it for "free" across the entire application.
You forget to add "in some cases Rust may generate better assembly than C/C++/D because..." But this is not the answer to the question OP asked. Rust has llvm based backend like ldc so nothing prevents ldc to be as fast as any other llvm based compiler. Nothing. The question is how many efforts you put into it.
Apr 23 2020
parent reply Arine <arine1283798123 gmail.com> writes:
On Thursday, 23 April 2020 at 11:05:35 UTC, drug wrote:
 23.04.2020 01:34, Arine пишет:
 On Wednesday, 22 April 2020 at 15:24:29 UTC, welkam wrote:
 On Wednesday, 22 April 2020 at 14:00:10 UTC, serge wrote:
 My understanding that D is the language in similar ballpark 
 performance league as C, C++, Rust.
Equivalent implementation in C, C++, D, Rust, Nim compiled with same compiler backend should give exact same machine code. What you see in online language comparisons is mostly comparing different implementations and how much time people spent on optimizing.
Not quite. Rust will generate better assembly as it can guarantee that use of an object is unique. Similar to C's "restrict" keyword but you get it for "free" across the entire application.
You forget to add "in some cases Rust may generate better assembly than C/C++/D because..." But this is not the answer to the question OP asked. Rust has llvm based backend like ldc so nothing prevents ldc to be as fast as any other llvm based compiler. Nothing. The question is how many efforts you put into it.
I wasn't replying to the author of the thread. I was replying to a misinformed individual in the thread. If that's the way you want to think about, you can create your own compiler and language. "It's just about how many efforts you put into it", even if that means making your own language and compiler. How much "efforts" you have to put into something is a factor in that decision. You'd basically have to remake Rust in D to get the same assembly results and guarantee regarding aliasing.
Apr 23 2020
parent reply drug <drug2004 bk.ru> writes:
23.04.2020 18:13, Arine пишет:
 I wasn't replying to the author of the thread. I was replying to a 
 misinformed individual in the thread.
 
 If that's the way you want to think about, you can create your own 
 compiler and language. "It's just about how many efforts you put into 
 it", even if that means making your own language and compiler. How much 
 "efforts" you have to put into something is a factor in that decision. 
 You'd basically have to remake Rust in D to get the same assembly 
 results and guarantee regarding aliasing.
Well, you're right, I used wrong wording to express my thoughts. I meant that C/C++/Rust/D belong to the same performance league. The difference appears in specific cases of course, but in general they are equal. And your statement that Rust assembly output is better is wrong.
Apr 23 2020
parent reply Arine <arine1283798123 gmail.com> writes:
On Thursday, 23 April 2020 at 15:57:01 UTC, drug wrote:
 And your statement that Rust assembly output is better is wrong.
https://godbolt.org/z/g_euiT D: int foo(ref int a, ref int b) { a = 0; b = 1; return a; } int example.foo(ref int, ref int): movl $0, (%rsi) movl $1, (%rdi) movl (%rsi), %eax retq Rust: pub fn foo(x: &mut i32, y: &mut i32) -> i32 { *x = 0; *y = 1; *x } example::foo: mov dword ptr [rdi], 0 mov dword ptr [rsi], 1 xor eax, eax ret There most definitely is a difference and the assembly generated with rust is better. This is just a simple example to illustrate the difference. If you don't know why the difference is significant or why it is happening. There are a lot of great articles out there, sadly there are people such as yourself spreading misinformation that don't know what a borrow checker is and don't know Rust or why it is has gone as far as it has. This is why the borrow checker for D is going to fail. Because the person designing it, such as yourself, doesn't have any idea what they are redoing and have never even bothered to touch Rust or learn about it. Anyways I'm not your babysitter, if you don't understand the above, as most people seem to not bother to learn assembly anymore, you're on your own.
Apr 24 2020
next sibling parent reply IGotD- <nise nise.com> writes:
On Friday, 24 April 2020 at 19:27:40 UTC, Arine wrote:
 https://godbolt.org/z/g_euiT

 D:

     int foo(ref int a, ref int b) {
         a = 0;
         b = 1;
         return a;
     }

 int example.foo(ref int, ref int):
         movl    $0, (%rsi)
         movl    $1, (%rdi)
         movl    (%rsi), %eax
         retq

 Rust:

     pub fn foo(x: &mut i32, y: &mut i32) -> i32 {
         *x = 0;
         *y = 1;
         *x
     }

 example::foo:
         mov     dword ptr [rdi], 0
         mov     dword ptr [rsi], 1
         xor     eax, eax
         ret
Would DIP 1000 enable such optimization possibility in D?
Apr 24 2020
parent reply Les De Ridder <les lesderid.net> writes:
On Friday, 24 April 2020 at 22:52:31 UTC, IGotD- wrote:
 On Friday, 24 April 2020 at 19:27:40 UTC, Arine wrote:
 https://godbolt.org/z/g_euiT

 D:

     int foo(ref int a, ref int b) {
         a = 0;
         b = 1;
         return a;
     }

 int example.foo(ref int, ref int):
         movl    $0, (%rsi)
         movl    $1, (%rdi)
         movl    (%rsi), %eax
         retq

 Rust:

     pub fn foo(x: &mut i32, y: &mut i32) -> i32 {
         *x = 0;
         *y = 1;
         *x
     }

 example::foo:
         mov     dword ptr [rdi], 0
         mov     dword ptr [rsi], 1
         xor     eax, eax
         ret
Would DIP 1000 enable such optimization possibility in D?
Technically DIP 1021 could.
Apr 24 2020
parent Les De Ridder <les lesderid.net> writes:
On Friday, 24 April 2020 at 23:03:25 UTC, Les De Ridder wrote:
 On Friday, 24 April 2020 at 22:52:31 UTC, IGotD- wrote:
 On Friday, 24 April 2020 at 19:27:40 UTC, Arine wrote:
 [...]
Would DIP 1000 enable such optimization possibility in D?
Technically DIP 1021 could.
Actually, nevermind: https://github.com/dlang/DIPs/blob/master/DIPs/accepted/DIP1021.md#limitations
Apr 24 2020
prev sibling next sibling parent reply mipri <mipri minimaltype.com> writes:
On Friday, 24 April 2020 at 19:27:40 UTC, Arine wrote:
 Rust:

     pub fn foo(x: &mut i32, y: &mut i32) -> i32 {
         *x = 0;
         *y = 1;
         *x
     }

 example::foo:
         mov     dword ptr [rdi], 0
         mov     dword ptr [rsi], 1
         xor     eax, eax
         ret
eh, this isn't Rust; it's "Rust +nightly with an unstable codegen flag." Marginal codegen improvements aren't going to turn heavy usage of dynamic arrays into heavy usage of std.container.array either, so they're not that relevant to expected performance of real-world programs in D vs. other languages.
Apr 24 2020
parent reply SrMordred <patric.dexheimer gmail.com> writes:
On Friday, 24 April 2020 at 23:24:49 UTC, mipri wrote:
 On Friday, 24 April 2020 at 19:27:40 UTC, Arine wrote:
 Rust:

     pub fn foo(x: &mut i32, y: &mut i32) -> i32 {
         *x = 0;
         *y = 1;
         *x
     }

 example::foo:
         mov     dword ptr [rdi], 0
         mov     dword ptr [rsi], 1
         xor     eax, eax
         ret
eh, this isn't Rust; it's "Rust +nightly with an unstable codegen flag." Marginal codegen improvements aren't going to turn heavy usage of dynamic arrays into heavy usage of std.container.array either, so they're not that relevant to expected performance of real-world programs in D vs. other languages.
I believe that "-enable-scoped-noalias=true" should be the D equivalent (with LDC) but it didnt change anything. Also u can achieve the same asm with llvmAttr("noalias") in front of at least one argument.
Apr 24 2020
parent Walter Bright <newshound2 digitalmars.com> writes:
On 4/24/2020 8:06 PM, SrMordred wrote:
 Also u can achieve the same asm with  llvmAttr("noalias") in front of at least 
 one argument.
The following C code: int test(int * __restrict__ x, int * __restrict__ y) { *x = 0; *y = 1; return *x; } compiled with gcc -O: test: mov dword ptr [RDI],0 mov dword ptr [RSI],1 mov EAX,0 ret It's not a unique property of Rust, C99 has it too. DMC doesn't implement it, but it probably should.
Apr 25 2020
prev sibling next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 4/24/2020 12:27 PM, Arine wrote:
 There most definitely is a difference and the assembly generated with rust is 
 better.
D's live functions can indeed do such optimizations, though I haven't got around to implementing them in DMD's optimizer. There's nothing particularly difficult about it.
Apr 25 2020
next sibling parent reply Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:
On Saturday, 25 April 2020 at 10:15:33 UTC, Walter Bright wrote:
 On 4/24/2020 12:27 PM, Arine wrote:
 There most definitely is a difference and the assembly 
 generated with rust is better.
D's live functions can indeed do such optimizations, though I haven't got around to implementing them in DMD's optimizer. There's nothing particularly difficult about it.
In any case, I seriously doubt those kinds of optimization have anything to do with the web framework performance differences. My experience of writing number-crunching stuff in D and Rust is that Rust seems to have a small but consistent performance edge that could quite possibly be down the kind of optimizations that Arine mentions (that's speculation: I haven't verified). However, it's small differences, not order-of-magnitude stuff. I suppose that in a more complicated app there could be some multiplicative impact, but where high-throughput web frameworks are concerned I'm pretty sure that the memory allocation and reuse strategy is going to be what makes 99% of the difference. There may also be a bit of an impact from the choice of futures vs. fibers for managing asynchronous tasks (there's a context switching cost for fibers), but I would expect that to only make a difference at the extreme upper end of performance, once other design factors have been addressed. BTW, on the memory allocation front, Mathias Lang has pointed out that there is quite a nasty impact from `assumeSafeAppend`. Imagine that your request processing looks something like this: // extract array instance from reusable pool, // and set its length to zero so that you can // write into it from the start x = buffer_pool.get(); x.length = 0; assumeSafeAppend(x); // a cost each time you do this // now append stuff into x to // create your response // now publish your response // with the response published, clean // up by recycling the buffer back into // the pool buffer_pool.recycle(x); This is the kind of pattern that Sociomantic used a lot. In D1 it was easy because there was no array stomping prevention -- you could just set length == 0 and start appending. But having to call `assumeSafeAppend` each time does carry a performance cost. IIRC Mathias has suggested that it should be possible to tag arrays as intended for this kind of re-use, so that stomping prevention will never trigger, and you don't have to `assumeSafeAppend` each time you reduce the length.
Apr 25 2020
next sibling parent reply Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:
On Saturday, April 25, 2020 4:34:44 AM MDT Joseph Rushton Wakeling via 
Digitalmars-d wrote:
 IIRC Mathias has suggested that it should be possible to tag
 arrays as intended for this kind of re-use, so that stomping
 prevention will never trigger, and you don't have to
 `assumeSafeAppend` each time you reduce the length.
You could probably do that, but I'm not sure that it could be considered safe. It would probably make more sense to just use a custom array type if that's what you really needed, though of course, that causes its own set of difficulties (including having to duplicate the array appending logic). - Jonathan M Davis
Apr 25 2020
parent Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:
On Saturday, 25 April 2020 at 15:21:03 UTC, Jonathan M Davis 
wrote:
 You could probably do that, but I'm not sure that it could be 
 considered  safe.
I think it would be OK to have it as a non- safe tool. But ...
 It would probably make more sense to just use a custom array 
 type if that's what you really needed, though of course, that 
 causes its own set of difficulties (including having to 
 duplicate the array appending logic).
... I think that could possibly make more sense. One thing that I really don't like about the original idea of an `alwaysAssumeSafeAppend(x)` is that it makes behaviour dependent on the instance rather than the type. It would probably be better to have a clear type-based separation. OTOH in my experience custom types are often finnicky in terms of how they interact with functions that expect a slice as input. So there could be a convenience in having it as an option for regular dynamic arrays. Or it could just be that the custom type would need a bit more work in its implementation :-)
Apr 26 2020
prev sibling next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 4/25/2020 3:34 AM, Joseph Rushton Wakeling wrote:
 In any case, I seriously doubt those kinds of optimization have anything to do 
 with the web framework performance differences.
I agree. I also generally structure my code so that optimization wouldn't make a difference. But it's still a worthwhile benefit to add it for live functions.
 I suppose that in a more complicated app there could be some multiplicative 
 impact, but where high-throughput web frameworks are concerned I'm pretty sure 
 that the memory allocation and reuse strategy is going to be what makes 99% of 
 the difference.
My experience is if the code has never been profiled, there's one obscure function unexpectedly consuming the bulk of the run time, which is easily recoded. A programs that have been runtime profiled tend to have a pretty flat graph of which functions eat the time.
      // extract array instance from reusable pool,
      // and set its length to zero so that you can
      // write into it from the start
      x = buffer_pool.get();
      x.length = 0;
      assumeSafeAppend(x);   // a cost each time you do this
 
      // now append stuff into x to
      // create your response
 
      // now publish your response
 
      // with the response published, clean
      // up by recycling the buffer back into
      // the pool
      buffer_pool.recycle(x);
 
 This is the kind of pattern that Sociomantic used a lot.  In D1 it was easy 
 because there was no array stomping prevention -- you could just set length ==
0 
 and start appending.  But having to call `assumeSafeAppend` each time does
carry 
 a performance cost.
This is why I use OutBuffer for such activities.
 IIRC Mathias has suggested that it should be possible to tag arrays as
intended 
 for this kind of re-use, so that stomping prevention will never trigger, and
you 
 don't have to `assumeSafeAppend` each time you reduce the length.
Sounds like an idea worth exploring. How about taking point on that? But I would be concerned about tagging such arrays, and then stomping them unintentionally, leading to memory corruption bugs. OutBuffer is memory safe.
Apr 25 2020
next sibling parent reply Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:
On Saturday, 25 April 2020 at 22:15:44 UTC, Walter Bright wrote:
 My experience is if the code has never been profiled, there's 
 one obscure function unexpectedly consuming the bulk of the run 
 time, which is easily recoded. A programs that have been 
 runtime profiled tend to have a pretty flat graph of which 
 functions eat the time.
Yes! :-) And in particular, that computational complexity in the real world is very different from theoretical arguments about O(...). One can gain a lot by being clear headed about what the actual problem is and what is optimal for that particular problem with that particular data.
 This is why I use OutBuffer for such activities.
Yes, I have some memory of talking with Dicebot about whether this would be an appropriate tool for the other side of the D2 conversion. I don't remember if any firm conclusions were drawn, though.
 Sounds like an idea worth exploring. How about taking point on 
 that? But I would be concerned about tagging such arrays, and 
 then stomping them unintentionally, leading to memory 
 corruption bugs. OutBuffer is memory safe.
Yes, it's clear (as Jonathan noted) that an always-stompable array could probably not be safe. That said, what does OutBuffer do that means that it _is_ safe in this context? Of course, Sociomantic never had safe to play with. In practice I don't recall there ever being an issue with unintentional stomping (I'm not saying it never happened, but I have no recollection of it being a common issue). That did however rest on a program structure that made it less likely anyone would make such a mistake. About stepping up with a feature contribution: the idea is lovely but I'm very aware of how limited my time is right now, so I don't want to make offers I can't guarantee to follow up on. There's a reason I post so rarely in the forums these days! But I will ping Mathias L. to let him know, as the idea was his to start with.
Apr 26 2020
parent Walter Bright <newshound2 digitalmars.com> writes:
On 4/26/2020 4:30 AM, Joseph Rushton Wakeling wrote:
 That said, what does OutBuffer do that means that it 
 _is_ safe in this context?
It manages its own memory privately, and presents the results as dynamic arrays which do their own bounds checking. It's been a reliable solution for me for maybe 30 years.
Apr 26 2020
prev sibling parent welkam <wwwelkam gmail.com> writes:
On Saturday, 25 April 2020 at 22:15:44 UTC, Walter Bright wrote:
 On 4/25/2020 3:34 AM, Joseph Rushton Wakeling wrote:
 In any case, I seriously doubt those kinds of optimization 
 have anything to do with the web framework performance 
 differences.
I agree. I also generally structure my code so that optimization wouldn't make a difference. But it's still a worthwhile benefit to add it for live functions.
I heard that proving that two pointers do not alias is a big problem in compiler backends and that some or most auto vectorization optimizations do not fire because compiler can't prove no aliasing. A new language used in Unity game engine is designed such that references do not alias by default for optimization reasons. I haven't looked into this topic further but I believe its worth checking it out. Data science people would benefit greatly from autovectoriztion
Apr 26 2020
prev sibling next sibling parent reply John Colvin <john.loughran.colvin gmail.com> writes:
On Saturday, 25 April 2020 at 10:34:44 UTC, Joseph Rushton 
Wakeling wrote:
 In any case, I seriously doubt those kinds of optimization have 
 anything to do with the web framework performance differences.

 My experience of writing number-crunching stuff in D and Rust 
 is that Rust seems to have a small but consistent performance 
 edge that could quite possibly be down the kind of 
 optimizations that Arine mentions (that's speculation: I 
 haven't verified).  However, it's small differences, not 
 order-of-magnitude stuff.

 I suppose that in a more complicated app there could be some 
 multiplicative impact, but where high-throughput web frameworks 
 are concerned I'm pretty sure that the memory allocation and 
 reuse strategy is going to be what makes 99% of the difference.

 There may also be a bit of an impact from the choice of futures 
 vs. fibers for managing asynchronous tasks (there's a context 
 switching cost for fibers), but I would expect that to only 
 make a difference at the extreme upper end of performance, once 
 other design factors have been addressed.

 BTW, on the memory allocation front, Mathias Lang has pointed 
 out that there is quite a nasty impact from `assumeSafeAppend`.
  Imagine that your request processing looks something like this:

     // extract array instance from reusable pool,
     // and set its length to zero so that you can
     // write into it from the start
     x = buffer_pool.get();
     x.length = 0;
     assumeSafeAppend(x);   // a cost each time you do this

     // now append stuff into x to
     // create your response

     // now publish your response

     // with the response published, clean
     // up by recycling the buffer back into
     // the pool
     buffer_pool.recycle(x);

 This is the kind of pattern that Sociomantic used a lot.  In D1 
 it was easy because there was no array stomping prevention -- 
 you could just set length == 0 and start appending.  But having 
 to call `assumeSafeAppend` each time does carry a performance 
 cost.

 IIRC Mathias has suggested that it should be possible to tag 
 arrays as intended for this kind of re-use, so that stomping 
 prevention will never trigger, and you don't have to 
 `assumeSafeAppend` each time you reduce the length.
I understand that it was an annoying breaking change, but aside from the difficulty of migrating I don't understand why a custom type isn't the appropriate solution for this problem. I think I heard "We want to use the built-in slices", but I never understood the technical argument behind that, or how it stacked up against not getting the desired behaviour. My sense was that the irritation at the breakage was influencing the technical debate.
Apr 26 2020
next sibling parent reply Stefan Koch <uplink.coder googlemail.com> writes:
On Sunday, 26 April 2020 at 11:40:49 UTC, John Colvin wrote:
 I understand that it was an annoying breaking change, but aside 
 from the difficulty of migrating I don't understand why a 
 custom type isn't the appropriate solution for this problem. I 
 think I heard "We want to use the built-in slices", but I never 
 understood the technical argument behind that, or how it 
 stacked up against not getting the desired behaviour.
Can you imagine replacing every usage of slices with a custom type in your code? And making sure programmers joining the company do the same? and having converts that e.g. accept ubyte arrays from libraries and convert them into yours?
Apr 26 2020
parent Sebastiaan Koppe <mail skoppe.eu> writes:
On Sunday, 26 April 2020 at 11:59:27 UTC, Stefan Koch wrote:
 On Sunday, 26 April 2020 at 11:40:49 UTC, John Colvin wrote:
 I understand that it was an annoying breaking change, but 
 aside from the difficulty of migrating I don't understand why 
 a custom type isn't the appropriate solution for this problem. 
 I think I heard "We want to use the built-in slices", but I 
 never understood the technical argument behind that, or how it 
 stacked up against not getting the desired behaviour.
Can you imagine replacing every usage of slices with a custom type in your code? And making sure programmers joining the company do the same? and having converts that e.g. accept ubyte arrays from libraries and convert them into yours?
I suppose nowadays that custom type can use a scoped ubyte slice to expose its temp buffer.
Apr 26 2020
prev sibling parent Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:
On Sunday, 26 April 2020 at 11:40:49 UTC, John Colvin wrote:
 I understand that it was an annoying breaking change, but aside 
 from the difficulty of migrating I don't understand why a 
 custom type isn't the appropriate solution for this problem. I 
 think I heard "We want to use the built-in slices", but I never 
 understood the technical argument behind that, or how it 
 stacked up against not getting the desired behaviour.

 My sense was that the irritation at the breakage was 
 influencing the technical debate.
That's not entirely unfair, but I think it does help to appreciate the magnitude of the problem: * there's a very large codebase, including many different applications and a large amount of common library code, all containing a lot of functions that expect slice input (because the concept of a range was never in D1, and because slices were the only use case) * most of the library functionality shouldn't have to care whether its input is a reusable buffer or any other kind of slice * you can't rewrite to use range-based generics because that's D2 only and you need to keep D1 compatibility until the last application has migrated * there are _very_ extreme performance and reliability constraints on some of the key applications, meaning that validating D2 transition efforts is very time consuming * you can't use any Phobos functionality until the codebase is D2 only, and even then you probably want to limit how much of it you use because it is not written with these extreme performance concerns in mind * all the time spent on those transitional efforts is time taken away from feature development It's very easy to look back and say something like, "Well, if you'd written with introspection-based design from the start, you would have had a much easier migration effort", but that in itself would have been trickier to do in D1, and would have carried extra maintenance and development costs (particularly w.r.t. forcing devs to write what would have seemed like very boilerplate-y code compared to the actual set of use cases). Even with the D1 compatibility requirement dropped, there still remains a big burden to transition all the reusable buffers to a different type. IIRC the focus would probably have been on using `Appender`. Note that many of these concerns still apply if we want to preserve a future for any of the (very well crafted) library and application code that Sociomantic open-sourced. They are all now D2-only, but the effort required to rewrite around dedicated reusable-buffer types would still be quite substantial.
Apr 26 2020
prev sibling parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 4/25/20 6:34 AM, Joseph Rushton Wakeling wrote:
 On Saturday, 25 April 2020 at 10:15:33 UTC, Walter Bright wrote:
 On 4/24/2020 12:27 PM, Arine wrote:
 There most definitely is a difference and the assembly generated with 
 rust is better.
D's live functions can indeed do such optimizations, though I haven't got around to implementing them in DMD's optimizer. There's nothing particularly difficult about it.
In any case, I seriously doubt those kinds of optimization have anything to do with the web framework performance differences. My experience of writing number-crunching stuff in D and Rust is that Rust seems to have a small but consistent performance edge that could quite possibly be down the kind of optimizations that Arine mentions (that's speculation: I haven't verified). However, it's small differences, not order-of-magnitude stuff. I suppose that in a more complicated app there could be some multiplicative impact, but where high-throughput web frameworks are concerned I'm pretty sure that the memory allocation and reuse strategy is going to be what makes 99% of the difference. There may also be a bit of an impact from the choice of futures vs. fibers for managing asynchronous tasks (there's a context switching cost for fibers), but I would expect that to only make a difference at the extreme upper end of performance, once other design factors have been addressed. BTW, on the memory allocation front, Mathias Lang has pointed out that there is quite a nasty impact from `assumeSafeAppend`. Imagine that your request processing looks something like this:     // extract array instance from reusable pool,     // and set its length to zero so that you can     // write into it from the start     x = buffer_pool.get();     x.length = 0;     assumeSafeAppend(x);   // a cost each time you do this     // now append stuff into x to     // create your response     // now publish your response     // with the response published, clean     // up by recycling the buffer back into     // the pool     buffer_pool.recycle(x); This is the kind of pattern that Sociomantic used a lot.  In D1 it was easy because there was no array stomping prevention -- you could just set length == 0 and start appending.  But having to call `assumeSafeAppend` each time does carry a performance cost.
In terms of performance, depending on the task at hand, D1 code is slower than D2 appending, by the fact that there's a thread-local cache for appending for D2, and D1 only has a global one-array cache for the same. However, I'm assuming that since you were focused on D1, your usage naturally was written to take advantage of what D1 has to offer. The assumeSafeAppend call also uses this cache, and so it should be quite fast. But setting length to 0 is a ton faster, because you aren't calling an opaque function. So depending on the usage pattern, D2 with assumeSafeAppend can be faster, or it could be slower.
 
 IIRC Mathias has suggested that it should be possible to tag arrays as 
 intended for this kind of re-use, so that stomping prevention will never 
 trigger, and you don't have to `assumeSafeAppend` each time you reduce 
 the length.
I spoke for a while with Dicebot at Dconf 2016 or 17 about this issue. IIRC, I suggested either using a custom type or custom runtime. He was not interested in either of these ideas, and it makes sense (large existing code base, didn't want to stray from mainline D). By far, the best mechanism to use is a custom type. Not only will that fix this problem as you can implement whatever behavior you want, but you also do not need to call opaque functions for appending either. It should outperform everything you could do in a generic runtime. Note that this was before (I think) destructor calls were added. The destructor calls are something that assumeSafeAppend is going to do, and won't be done with just setting length to 0. However, there are other options. We could introduce a druntime configuration option so when this specific situation happens (slice points at start of block and has 0 length), assumeSafeAppend is called automatically on the first append. Jonathan is right that this is not safe, but it could be an opt-in configuration option. I don't think configuring specific arrays makes a lot of sense, as this would require yet another optional bit that would have to be checked and allocated for all arrays. -Steve
Apr 26 2020
next sibling parent Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:
On Sunday, 26 April 2020 at 16:20:19 UTC, Steven Schveighoffer 
wrote:
 In terms of performance, depending on the task at hand, D1 code 
 is slower than D2 appending, by the fact that there's a 
 thread-local cache for appending for D2, and D1 only has a 
 global one-array cache for the same. However, I'm assuming that 
 since you were focused on D1, your usage naturally was written 
 to take advantage of what D1 has to offer.

 The assumeSafeAppend call also uses this cache, and so it 
 should be quite fast. But setting length to 0 is a ton faster, 
 because you aren't calling an opaque function.

 So depending on the usage pattern, D2 with assumeSafeAppend can 
 be faster, or it could be slower.
That makes sense. I just know that Mathias L. seemed to be quite concerned about the `assumeSafeAppend` performance impact. I think he was not looking for a D1/D2 comparison but in terms of getting the most performant behaviour in future. It's not that it was slower than D1, it's that it was a per-use speed hit.
 I spoke for a while with Dicebot at Dconf 2016 or 17 about this 
 issue. IIRC, I suggested either using a custom type or custom 
 runtime. He was not interested in either of these ideas, and it 
 makes sense (large existing code base, didn't want to stray 
 from mainline D).
Yes. To be fair I think in that context, at that stage of transition, that probably made more sense: it was easier to just mandate that everybody start putting `assumeSafeAppend` into their code (actually we implemented a transitional wrapper, `enableStomping`, which was a no-op in D1 and called `assumeSafeAppend` in D2).
 By far, the best mechanism to use is a custom type. Not only 
 will that fix this problem as you can implement whatever 
 behavior you want, but you also do not need to call opaque 
 functions for appending either. It should outperform everything 
 you could do in a generic runtime.

 Note that this was before (I think) destructor calls were 
 added. The destructor calls are something that assumeSafeAppend 
 is going to do, and won't be done with just setting length to 0.

 However, there are other options. We could introduce a druntime 
 configuration option so when this specific situation happens 
 (slice points at start of block and has 0 length), 
 assumeSafeAppend is called automatically on the first append. 
 Jonathan is right that this is not  safe, but it could be an 
 opt-in configuration option.

 I don't think configuring specific arrays makes a lot of sense, 
 as this would require yet another optional bit that would have 
 to be checked and allocated for all arrays.
The druntime option does sound interesting, although I'm leery about the idea of creating 2 different language behaviours.
Apr 26 2020
prev sibling parent reply Mathias LANG <geod24 gmail.com> writes:
On Sunday, 26 April 2020 at 16:20:19 UTC, Steven Schveighoffer 
wrote:
 In terms of performance, depending on the task at hand, D1 code 
 is slower than D2 appending, by the fact that there's a 
 thread-local cache for appending for D2, and D1 only has a 
 global one-array cache for the same. However, I'm assuming that 
 since you were focused on D1, your usage naturally was written 
 to take advantage of what D1 has to offer.

 The assumeSafeAppend call also uses this cache, and so it 
 should be quite fast. But setting length to 0 is a ton faster, 
 because you aren't calling an opaque function.

 So depending on the usage pattern, D2 with assumeSafeAppend can 
 be faster, or it could be slower.
Well, Sociomantic didn't use any kind of multi-threading in "user code". We had single-threaded fibers for concurrency, and process-level scaling for parallelism. Some corner cases were using threads, but it was for low level things (e.g. low latency file IO on Linux), which were highly scrutinized and stayed clear of the GC AFAIK. Note that accessing TLS *does* have a cost which is higher than accessing a global. By this reasoning, I would assume that D2 appending would definitely be slower, although I never profiled it. What I did profile tho, is `assumeSafeAppend`. The fact that it looks up GC metadata (taking the GC lock in the process) made it quite expensive given how often it was called (in D1 it was simply a no-op, and called defensively).
 IIRC Mathias has suggested that it should be possible to tag 
 arrays as intended for this kind of re-use, so that stomping 
 prevention will never trigger, and you don't have to 
 `assumeSafeAppend` each time you reduce the length.
I spoke for a while with Dicebot at Dconf 2016 or 17 about this issue. IIRC, I suggested either using a custom type or custom runtime. He was not interested in either of these ideas, and it makes sense (large existing code base, didn't want to stray from mainline D). By far, the best mechanism to use is a custom type. Not only will that fix this problem as you can implement whatever behavior you want, but you also do not need to call opaque functions for appending either. It should outperform everything you could do in a generic runtime.
Well... Here's nothing I never really quite understood actually: Mihails *did* introduce a buffer type. See https://github.com/sociomantic-tsunami/ocean/blob/36c9fda09544ee5a0695a74186b06b32feda82d4/src/ocean/core/Buffer.d#L116-L130 And we also had a (very old) similar utility here: https://github.com/sociomantic-tsunami/ocean/blob/36c9fda09544ee5a0695a74186b06b32feda82d4/src/ocean/util/container/ConcatBuffer.d I always wanted to unify this, but never got to it. But if you look at the first link, it calls `assumeSafeAppend` twice, before and after setting the length. In practice it is only necessary *after* reducing the length, but as I mentioned, this is defensive programming. For reference, most of our applications had a principled buffer use. The buffers would rarely be appended to from more than one, perhaps two places. However, slices to the buffer would be passed around quite liberally. So a buffer type from which one could borrow would indeed have been optimal.
 Note that this was before (I think) destructor calls were 
 added. The destructor calls are something that assumeSafeAppend 
 is going to do, and won't be done with just setting length to 0.

 However, there are other options. We could introduce a druntime 
 configuration option so when this specific situation happens 
 (slice points at start of block and has 0 length), 
 assumeSafeAppend is called automatically on the first append. 
 Jonathan is right that this is not  safe, but it could be an 
 opt-in configuration option.

 I don't think configuring specific arrays makes a lot of sense, 
 as this would require yet another optional bit that would have 
 to be checked and allocated for all arrays.

 -Steve
I don't even know if we had a single case where we had arrays of objects with destructors. The vast majority of our buffer were `char[]` and `ubyte[]`. We had some elaborate types, but I think destructors + buffer would have been frowned upon in code review. Also the reason we didn't modify druntime to just have the D1 behavior (that would have been a trivial change) was because how dependent on the new behavior druntime had become. It was also the motivation for the suggestion Joe mentioned. AFAIR I mentioned it in an internal issue, did a PoC implementation, but never got it to a state were it was mergeable. Also, while a custom type might sound better, it doesn't really interact well with the rest of the runtime, and it's an extra word to pass around (if passed by value).
Apr 26 2020
parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 4/27/20 1:04 AM, Mathias LANG wrote:
 On Sunday, 26 April 2020 at 16:20:19 UTC, Steven Schveighoffer wrote:
 In terms of performance, depending on the task at hand, D1 code is 
 slower than D2 appending, by the fact that there's a thread-local 
 cache for appending for D2, and D1 only has a global one-array cache 
 for the same. However, I'm assuming that since you were focused on D1, 
 your usage naturally was written to take advantage of what D1 has to 
 offer.

 The assumeSafeAppend call also uses this cache, and so it should be 
 quite fast. But setting length to 0 is a ton faster, because you 
 aren't calling an opaque function.

 So depending on the usage pattern, D2 with assumeSafeAppend can be 
 faster, or it could be slower.
Well, Sociomantic didn't use any kind of multi-threading in "user code". We had single-threaded fibers for concurrency, and process-level scaling for parallelism. Some corner cases were using threads, but it was for low level things (e.g. low latency file IO on Linux), which were highly scrutinized and stayed clear of the GC AFAIK. Note that accessing TLS *does* have a cost which is higher than accessing a global.
That is a minor cost compared to the actual appending.
 By this reasoning, I would assume that D2 appending 
 would definitely be slower, although I never profiled it.
I tested the performance when I added the feature. D2 was significantly and measurably faster (at least for the appending 2 or more arrays). I searched through my old email, for appending 5M bytes to 2 arrays the original code was 13.99 seconds (on whatever system I was using in 2009), and 1.53 seconds with the cache. According to that email, I had similar results even with a 1-element cache, so somehow my code was faster, but I didn't know why. Quite possibly it's because the cache in D1 for looking up block info is behind the GC lock. Literally the only thing that is more expensive in D2 vs. D1 was the truncation of arrays. In D1 this is setting the length to 0, in D2, you needed to call assumeSafeAppend. This is why I suggested a flag that allows you to enable the original behavior.
 What I did 
 profile tho, is `assumeSafeAppend`. The fact that it looks up GC 
 metadata (taking the GC lock in the process) made it quite expensive 
 given how often it was called (in D1 it was simply a no-op, and called 
 defensively).
The cache I referred to is to look up the GC metadata. In essence, when you append, you will look it up anyway. Either assumeSafeAppend or append will get the GC metadata into the cache, and then it is a straight lookup in the cache and this doesn't take a lock or do any expensive searches. The cache is designed to favor the most recent arrays first. This is an 8 element cache, so there are still cases where you will be having issues (like if you round-robin append to 9 arrays). I believe 8 elements was a sweet spot for performance that allowed reasonably fast appending with a reasonable number of concurrent arrays. Where D1 will fall down is if you are switching between more than one array, because the cache in D1 is only one element. Even if you are doing just one array, the cache is not for the array runtime, but for the GC. And it is based on the pointer queried, not the block data. A GC collection, for instance, is going to invalidate the cache.
 
 IIRC Mathias has suggested that it should be possible to tag arrays 
 as intended for this kind of re-use, so that stomping prevention will 
 never trigger, and you don't have to `assumeSafeAppend` each time you 
 reduce the length.
I spoke for a while with Dicebot at Dconf 2016 or 17 about this issue. IIRC, I suggested either using a custom type or custom runtime. He was not interested in either of these ideas, and it makes sense (large existing code base, didn't want to stray from mainline D). By far, the best mechanism to use is a custom type. Not only will that fix this problem as you can implement whatever behavior you want, but you also do not need to call opaque functions for appending either. It should outperform everything you could do in a generic runtime.
Well... Here's nothing I never really quite understood actually: Mihails *did* introduce a buffer type. See https://github.com/sociomantic-tsunami/ocean/blob/36c9fda09544ee5a0695a74186b06b32feda82d4/src/ocean/cor /Buffer.d#L116-L130 And we also had a (very old) similar utility here: https://github.com/sociomantic-tsunami/ocean/blob/36c9fda09544ee5a0695a74186b06b32feda82d4/src/ocean/util/cont iner/ConcatBuffer.d I always wanted to unify this, but never got to it. But if you look at the first link, it calls `assumeSafeAppend` twice, before and after setting the length. In practice it is only necessary *after* reducing the length, but as I mentioned, this is defensive programming.
Yeah, that is unnecessary. It is not going to be that expensive, especially if you just were appending to that array, but again, more expensive than setting a word to 0.
 
 For reference, most of our applications had a principled buffer use. The 
 buffers would rarely be appended to from more than one, perhaps two 
 places. However, slices to the buffer would be passed around quite 
 liberally. So a buffer type from which one could borrow would indeed 
 have been optimal.
This all works actually better with the new runtime. The old one would reallocate if you appended to a slice that didn't start at the block start. The new version can detect it's at the end and allow appending.
 
 Note that this was before (I think) destructor calls were added. The 
 destructor calls are something that assumeSafeAppend is going to do, 
 and won't be done with just setting length to 0.

 However, there are other options. We could introduce a druntime 
 configuration option so when this specific situation happens (slice 
 points at start of block and has 0 length), assumeSafeAppend is called 
 automatically on the first append. Jonathan is right that this is not 
  safe, but it could be an opt-in configuration option.

 I don't think configuring specific arrays makes a lot of sense, as 
 this would require yet another optional bit that would have to be 
 checked and allocated for all arrays.
I don't even know if we had a single case where we had arrays of objects with destructors. The vast majority of our buffer were `char[]` and `ubyte[]`. We had some elaborate types, but I think destructors + buffer would have been frowned upon in code review.
Of course! D1 didn't have destructors for structs ;)
 
 Also the reason we didn't modify druntime to just have the D1 behavior 
 (that would have been a trivial change) was because how dependent on the 
 new behavior druntime had become. It was also the motivation for the 
 suggestion Joe mentioned. AFAIR I mentioned it in an internal issue, did 
 a PoC implementation, but never got it to a state were it was mergeable.
Having a flag per array is going to be costly, but actually, there's a lot more junk in the block itself. Perhaps there's a spare bit somewhere that can be a flag for the append behavior.
 
 Also, while a custom type might sound better, it doesn't really interact 
 well with the rest of the runtime, and it's an extra word to pass around 
 (if passed by value).
The "extra value" can be stored elsewhere -- just like the GC you could provide metadata for the capacity in a global AA or something. In any case, there were options. The way druntime is written, it's pretty good performance, in most cases BETTER performance than D1 for idiomatic D code. In fact the Tango folks asked me if I could add the feature to Tango's druntime, but I couldn't because it depends on TLS. For code that was highly focused on optimizing D1 with its idiosyncracies, it probably has worse performance. The frustration is understandable, but without the possibility of adaptation, there's not much one can do. -Steve
Apr 27 2020
parent Walter Bright <newshound2 digitalmars.com> writes:
This is what the D n.g. is about - informative, collegial, and useful! Thanks, 
fellows!
Apr 29 2020
prev sibling parent reply Timon Gehr <timon.gehr gmx.ch> writes:
On 25.04.20 12:15, Walter Bright wrote:
 On 4/24/2020 12:27 PM, Arine wrote:
 There most definitely is a difference and the assembly generated with 
 rust is better.
D's live functions can indeed do such optimizations, though I haven't got around to implementing them in DMD's optimizer. There's nothing particularly difficult about it.
What's an example of such an optimization and why won't it introduce UB to safe code?
Apr 25 2020
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 4/25/2020 4:00 PM, Timon Gehr wrote:
 What's an example of such an optimization and why won't it introduce UB to
 safe 
 code?
live void test() { int a,b; foo(a, b); } live int foo(ref int a, ref int b) { a = 0; b = 1; return a; } ref a and ref b cannot refer to the same memory object.
Apr 25 2020
parent reply Timon Gehr <timon.gehr gmx.ch> writes:
On 26.04.20 04:22, Walter Bright wrote:
 On 4/25/2020 4:00 PM, Timon Gehr wrote:
 What's an example of such an optimization and why won't it introduce 
 UB to  safe code?
    live void test() { int a,b; foo(a, b); }     live int foo(ref int a, ref int b) {         a = 0;         b = 1;         return a;     } ref a and ref b cannot refer to the same memory object.
Actually they can, even in safe live code.
Apr 26 2020
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 4/26/2020 12:45 AM, Timon Gehr wrote:
 On 26.04.20 04:22, Walter Bright wrote:
 ref a and ref b cannot refer to the same memory object.
Actually they can, even in safe live code.
Bug reports are welcome. Please tag them with the 'live' keyword in bugzilla.
Apr 26 2020
parent reply Timon Gehr <timon.gehr gmx.ch> writes:
On 4/26/20 10:19 AM, Walter Bright wrote:
 On 4/26/2020 12:45 AM, Timon Gehr wrote:
 On 26.04.20 04:22, Walter Bright wrote:
 ref a and ref b cannot refer to the same memory object.
Actually they can, even in safe live code.
Bug reports are welcome. Please tag them with the 'live' keyword in bugzilla.
I can't do that because you did not agree it was a bug. According to your DIP and past discussions, the following is *intended* behavior: int bar(ref int x,ref int y) safe live{ x=0; y=1; return x; } void main() safe{ int x; import std.stdio; writeln(bar(x,x)); // 1 } I have always criticized this design, but so far you have stuck to it. I have stated many times that the main reason why it is bad is that you don't actually enforce any new invariant, so live does not enable any new patterns at least in safe code. In particular, if you start optimizing based on non-enforced and undocumented live assumptions, safe live code will not be memory safe. You can't optimize based on live and preserve memory safety. Given that you want to preserve interoperability, this is because it is tied to functions instead of types. live in its current form is useless except perhaps as a linting tool.
Apr 26 2020
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 4/26/2020 2:52 PM, Timon Gehr wrote:
 I can't do that because you did not agree it was a bug. According to your DIP 
 and past discussions, the following is *intended* behavior:
 
 int bar(ref int x,ref int y) safe  live{
      x=0;
      y=1;
      return x;
 }
 
 void main() safe{
      int x;
      import std.stdio;
      writeln(bar(x,x)); // 1
 }
 
 I have always criticized this design, but so far you have stuck to it. I have 
 stated many times that the main reason why it is bad is that you don't
actually 
 enforce any new invariant, so  live does not enable any new patterns at least
in 
  safe code.
 
 In particular, if you start optimizing based on non-enforced and undocumented 
  live assumptions,  safe  live code will not be memory safe.
 
 You can't optimize based on  live and preserve memory safety. Given that you 
 want to preserve interoperability, this is because it is tied to functions 
 instead of types.  live in its current form is useless except perhaps as a 
 linting tool.
live's invariants rely on arguments passed to it that conform to its requirements. It's analogous to safe code relying on its arguments conforming. To get the checking here, main would have to be declared live, too.
Apr 26 2020
parent reply Timon Gehr <timon.gehr gmx.ch> writes:
On 27.04.20 07:40, Walter Bright wrote:
 On 4/26/2020 2:52 PM, Timon Gehr wrote:
 I can't do that because you did not agree it was a bug. According to 
 your DIP and past discussions, the following is *intended* behavior:

 int bar(ref int x,ref int y) safe  live{
      x=0;
      y=1;
      return x;
 }

 void main() safe{
      int x;
      import std.stdio;
      writeln(bar(x,x)); // 1
 }

 I have always criticized this design, but so far you have stuck to it. 
 I have stated many times that the main reason why it is bad is that 
 you don't actually enforce any new invariant, so  live does not enable 
 any new patterns at least in  safe code.

 In particular, if you start optimizing based on non-enforced and 
 undocumented  live assumptions,  safe  live code will not be memory safe.

 You can't optimize based on  live and preserve memory safety. Given 
 that you want to preserve interoperability, this is because it is tied 
 to functions instead of types.  live in its current form is useless 
 except perhaps as a linting tool.
live's invariants rely on arguments passed to it that conform to its requirements. It's analogous to safe code relying on its arguments conforming. ...
No, it is not analogous, because only system or trusted code can get that wrong, not safe code. safe code itself is (supposed to be) verified, not trusted.
 To get the checking here, main would have to be declared  live, too.
I understand the design. It just does not make sense. All of the code is annotated safe, but if you optimize based on unverified assumptions, it will not be memory safe. Is the goal of live really to undermine safe's guarantees?
Apr 27 2020
next sibling parent Walter Bright <newshound2 digitalmars.com> writes:
On 4/27/2020 7:26 AM, Timon Gehr wrote:
 I understand the design. It just does not make sense. All of the code is 
 annotated  safe, but if you optimize based on unverified assumptions, it will 
 not be memory safe.
It is a good point. The design of live up to this point did not change the way code was generated. I still want to see how much of a difference it makes, and will implement it but make it an option.
Apr 27 2020
prev sibling parent reply John Colvin <john.loughran.colvin gmail.com> writes:
On Monday, 27 April 2020 at 14:26:50 UTC, Timon Gehr wrote:
 No, it is not analogous, because only  system or  trusted code 
 can get that wrong, not  safe code.  safe code itself is 
 (supposed to be) verified, not trusted.
the existence of any overly trusting code renders safe code liable to cause memory safety bugs. While the invalid accesses won't occur inside safe code, they can definitely be caused by them, even without the buggy safe code calling any trusted. Some day I'll have time to write up all my (many, many pages of) notes for this stuff... would have been for dconf, I guess now for dconf online?
Apr 28 2020
next sibling parent welkam <wwwelkam gmail.com> writes:
On Tuesday, 28 April 2020 at 13:44:05 UTC, John Colvin wrote:
 On Monday, 27 April 2020 at 14:26:50 UTC, Timon Gehr wrote:
 No, it is not analogous, because only  system or  trusted code 
 can get that wrong, not  safe code.  safe code itself is 
 (supposed to be) verified, not trusted.
the existence of any overly trusting code renders safe code liable to cause memory safety bugs. While the invalid accesses won't occur inside safe code, they can definitely be caused by them, even without the buggy safe code calling any trusted. Some day I'll have time to write up all my (many, many pages of) notes for this stuff... would have been for dconf, I guess now for dconf online?
Would be eager to listen.
Apr 28 2020
prev sibling parent reply ag0aep6g <anonymous example.com> writes:
On 28.04.20 15:44, John Colvin wrote:
 the existence of any overly  trusting code renders  safe code liable to 
 cause memory safety bugs. While the invalid accesses won't occur inside 
  safe code, they can definitely be caused by them, even without the 
 buggy  safe code calling any  trusted.
I don't see how you arrive at "buggy safe code" here. You say it yourself: When there is "overly trusted code", then that's where the bug is.
Apr 28 2020
parent Timon Gehr <timon.gehr gmx.ch> writes:
On 29.04.20 00:36, ag0aep6g wrote:
 On 28.04.20 15:44, John Colvin wrote:
 the existence of any overly  trusting code renders  safe code liable 
 to cause memory safety bugs. While the invalid accesses won't occur 
 inside  safe code, they can definitely be caused by them, even without 
 the buggy  safe code calling any  trusted.
I don't see how you arrive at "buggy safe code" here. You say it yourself: When there is "overly trusted code", then that's where the bug is.
I guess he is talking about the case where trusted code calls buggy safe code and relies on it being correct to ensure memory safety. (However, this is still the fault of the trusted code. safe code cannot be blamed for violations of memory safety.)
Apr 28 2020
prev sibling next sibling parent drug <drug2004 bk.ru> writes:
24.04.2020 22:27, Arine пишет:
 On Thursday, 23 April 2020 at 15:57:01 UTC, drug wrote:
 And your statement that Rust assembly output is better is wrong.
Yes, your statement that Rust assembly output is better is wrong, because one single optimization applicable in some cases does not make Rust better in general. Period. Once again, Rust assembly output can be better in some cases. But there is the big difference between these two statements - "better in some cases", and "better in general". More over you are wrong twice. Because this optimization is not free at all. You pay for it in the form of restriction that you can not have more than one mutable reference. This means that cyclic data structures are unusually difficult compared to almost any other programming language. Also this optimization is available in C for a long time. Even more - in some cases GC based application can be faster that one with manual memory management because it allows to avoid numerous allocation/deallocation. What you are talking about is premature optimization in fact.
 
 There most definitely is a difference and the assembly generated with 
 rust is better. This is just a simple example to illustrate the 
 difference. If you don't know why the difference is significant or why 
 it is happening. There are a lot of great articles out there, sadly 
 there are people such as yourself spreading misinformation that don't 
 know what a borrow checker is and don't know Rust or why it is has gone 
 as far as it has. This is why the borrow checker for D is going to fail. 
 Because the person designing it, such as yourself, doesn't have any idea 
 what they are redoing and have never even bothered to touch Rust or 
 learn about it. Anyways I'm not your babysitter, if you don't understand 
 the above, as most people seem to not bother to learn assembly anymore, 
 you're on your own.
 
Self-importance written all over your post. Here you make your third mistake - you are very far away of being able to be my babysitter. Trying to show your competence you show only your blindly ignorance. The world is much less trivial than a function with two mutable references not performing any useful work.
Apr 25 2020
prev sibling parent reply random <random spaml.de> writes:
On Friday, 24 April 2020 at 19:27:40 UTC, Arine wrote:
 On Thursday, 23 April 2020 at 15:57:01 UTC, drug wrote:
 And your statement that Rust assembly output is better is 
 wrong.
There most definitely is a difference and the assembly generated with rust is better. This is just a simple example to illustrate the difference. If you don't know why the difference is significant or why it is happening. There are a lot of great articles out there, sadly there are people such as yourself spreading misinformation that don't know what a borrow checker is and don't know Rust or why it is has gone as far as it has. This is why the borrow checker for D is going to fail. Because the person designing it, such as yourself, doesn't have any idea what they are redoing and have never even bothered to touch Rust or learn about it. Anyways I'm not your babysitter, if you don't understand the above, as most people seem to not bother to learn assembly anymore, you're on your own.
A competent C Programmer could just write something like this. Or use restrict... int test(int* x, int* y) { int result = *x = 0; *y = 1; return result; } Produce this with gcc -O test(int*, int*): mov DWORD PTR [rdi], 0 mov DWORD PTR [rsi], 1 mov eax, 0 ret https://godbolt.org/z/rpM_eK So the statement rust produces better assembly is wrong. I it's on my todo-list to learn rust. What is really off-putting are those random fanatic rust fanboys. In your language: "If you don't know why the difference is significant or why it is happening", you should probably learn C before you start insulting people in a programming forum ;)
Apr 29 2020
next sibling parent reply IGotD- <nise nise.com> writes:
On Wednesday, 29 April 2020 at 10:32:33 UTC, random wrote:
 A competent C Programmer could just write something like this. 
 Or use restrict...

 int test(int* x, int* y) {
     int result = *x = 0;
     *y = 1;
     return result;
 }
I'm incompetent so I would just write: int test(int* x, int* y) { *x = 0; *y = 1; return 0; }
Apr 29 2020
parent reply random <random spaml.de> writes:
On Wednesday, 29 April 2020 at 10:36:59 UTC, IGotD- wrote:
 I'm incompetent so I would just write:

 int test(int* x, int* y) {
      *x = 0;
      *y = 1;
      return 0;
 }
Ok in this simple case it's obvius. For a real world example look at the source for strcmp(): https://code.woboq.org/userspace/glibc/string/strcmp.c.html The trick is to load the value in a variable. The compiler can't optimize multiple pointer reads from the same pointer because the content could already be changed by an other pointer. restrict solves that, but if you now what is happening and why you can solve that by hand.
Apr 29 2020
parent reply IGotD- <nise nise.com> writes:
On Wednesday, 29 April 2020 at 10:46:57 UTC, random wrote:
 Ok in this simple case it's obvius.
 For a real world example look at the source for strcmp():
 https://code.woboq.org/userspace/glibc/string/strcmp.c.html

 The trick is to load the value in a variable. The compiler 
 can't optimize multiple pointer reads from the same pointer 
 because the content could already be changed by an other 
 pointer. restrict solves that, but if you now what is happening 
 and why you can solve that by hand.
In the strcmp example, shouldn't the compiler be able to do the same optimizations as you would use restrict because both pointers are declared const and the content do not change?
Apr 29 2020
parent reply random <random spaml.de> writes:
On Wednesday, 29 April 2020 at 12:37:29 UTC, IGotD- wrote:
 In the strcmp example, shouldn't the compiler be able to do the 
 same optimizations as you would use restrict because both 
 pointers are declared const and the content do not change?
Good question. My strcmp example is actually really bad, because if you never write through any pointer it doesn't make a difference ;) The way it is written is still interesting. I made a quick test case to evaluate the influence of const: https://godbolt.org/z/qRwFa9 https://godbolt.org/z/iEj7LV https://godbolt.org/z/EMqDDy int test(int * x, int * y, <const?> int * <restrict?> z) { *y = *z; *x = *z; return *z; } As you can see from the compiler output const doesn't improve the optimization. I think the compiler can't optimize it because const doesn't give you real guarantees in C. You could just call the function like this. int a; test(&a, &a, &a); "One mans constant is an other mans variable."
Apr 29 2020
next sibling parent random <random spaml.de> writes:
On Wednesday, 29 April 2020 at 16:19:55 UTC, random wrote:

And of course the workaround if you don't want to use restrict:

int test(int * x, int * y, int * z)
{
     int tmp = *z;
     *y = tmp;
     *x = tmp;
     return tmp;
}

Produces the same as the restrict version.
https://godbolt.org/z/yJJcMK
Apr 29 2020
prev sibling parent Walter Bright <newshound2 digitalmars.com> writes:
On 4/29/2020 9:19 AM, random wrote:
 I think the compiler can't optimize it because const doesn't give you real 
 guarantees in C.
You'd be right.
Apr 29 2020
prev sibling parent random <random spaml.de> writes:
On Wednesday, 29 April 2020 at 10:32:33 UTC, random wrote:
<

I forgot to add this...
Compile with gcc -O3:

test(int*, int*):
         mov     DWORD PTR [rdi], 0
         xor     eax, eax
         mov     DWORD PTR [rsi], 1
         ret

https://godbolt.org/z/xW6w6W
Apr 29 2020
prev sibling parent welkam <wwwelkam gmail.com> writes:
On Wednesday, 22 April 2020 at 22:34:32 UTC, Arine wrote:
 Not quite. Rust will generate better assembly as it can 
 guarantee that use of an object is unique. Similar to C's 
 "restrict" keyword but you get it for "free" across the entire 
 application.
Cool. Did not knew that. I know that different languages have different semantics and code that looks the same might produce different results so thats why I used a word equivalent instead of same. You can achieve the same goal in D as in Rust but the code would be different.
Apr 26 2020
prev sibling next sibling parent Bastiaan Veelo <Bastiaan Veelo.net> writes:
On Wednesday, 22 April 2020 at 14:00:10 UTC, serge wrote:

[...]

 2) Various Web performance test on  
 https://www.techempower.com/benchmarks/#section=data-r18&hw=ph&test=query show
pretty poor performance of D using vibe.d and hunt framework/libraries. 
Regardless of type of test - json, single query, multiple query and so on.
[...]
 I am researching if D and D web framework whether it can be 
 used as  replacement python/django/flask within our company. 
 Although if D web framework show worse performance than Go then 
 probably it is not right tool for the job.
 Any comments and feedback would be appreciated.
Vibe.d's performance in benchmarks has been discussed before[1]. From what I remember, the limiting factor is developer time allocation and profiling on specific hardware, which means it can probably be solved with money ;-) -- Bastiaan. [1] https://forum.dlang.org/post/qg9dud$hbo$1 digitalmars.com
Apr 22 2020
prev sibling next sibling parent reply Guillaume Piolat <first.last gmail.com> writes:
On Wednesday, 22 April 2020 at 14:00:10 UTC, serge wrote:
 My understanding that D is the language in similar ballpark 
 performance league as C, C++, Rust.
Yes. If you have time to optimize: there is preciously little difference speed-wise between native languages. Every native language using the same backend end up in the same ballbark, with tricks to get the code to the same baseline. The last percents wille be due to different handling of UB, integer overflow, aliasing... but in general the ethos of native language is to allow you to reach top native speed and in the end they will generate the exact same code. But, if your application is barely optimized, or more likely you don't have time to optimize properly, it becomes a bit more interesting. Defaults will matter a lot more and things like GC, whether the langage encourages copies, and the "idiomatic" style that is accepted will start to bear consequences (and even more so: libraries). This is what end up in benchmarks, but if the application was worth optimizing for it (in terms of added value) it would be optimized hard to get to that native ceiling. In short, the less useful an application is, the more it will display large differences between languages with similar low-level capabilities. It would be much more interesting to compare _backends_, but people keep comparing front-ends because it drives traffic and commentary.
Apr 22 2020
parent reply serge <abc abc.com> writes:
On Wednesday, 22 April 2020 at 16:23:58 UTC, Guillaume Piolat 
wrote:
 On Wednesday, 22 April 2020 at 14:00:10 UTC, serge wrote:
 My understanding that D is the language in similar ballpark 
 performance league as C, C++, Rust.
Yes. If you have time to optimize: there is preciously little difference speed-wise between native languages. Every native language using the same backend end up in the same ballbark, with tricks to get the code to the same baseline. The last percents wille be due to different handling of UB, integer overflow, aliasing... but in general the ethos of native language is to allow you to reach top native speed and in the end they will generate the exact same code. But, if your application is barely optimized, or more likely you don't have time to optimize properly, it becomes a bit more interesting. Defaults will matter a lot more and things like GC, whether the langage encourages copies, and the "idiomatic" style that is accepted will start to bear consequences (and even more so: libraries). This is what end up in benchmarks, but if the application was worth optimizing for it (in terms of added value) it would be optimized hard to get to that native ceiling. In short, the less useful an application is, the more it will display large differences between languages with similar low-level capabilities. It would be much more interesting to compare _backends_, but people keep comparing front-ends because it drives traffic and commentary.
Could you please elaborate on that? what are you referring to as backend? I am not interested to compare one small single operation - fib test already did that. To me techempower stats is pretty good indicator - it shows json processing, single/multiquery requests, database, static. Overall performance across those stats give pretty good idea, how language and web framework is created, its ecosystem. For example if language is fast on basic operations but two frameworks show less then adequate performance then obviously something wrong with the whole ecosystem - it could be difficult to create fast and efficient apps for average developer. For example Scala - powerfull but yet very complicated language with tons of problems. Most of Scala projects failed. It is very difficult and slow to create efficient applications for average developer. It kinds requires rocket scientist to write good code in Scala. Does D exhibit same problem?
Apr 24 2020
next sibling parent reply Guillaume Piolat <firstname.lastname gmail.com> writes:
On Friday, 24 April 2020 at 13:44:18 UTC, serge wrote:
 Could you please elaborate on that? what are you referring to 
 as backend?
I was mentionning LLVM vs GCC vs Intel compiler backend, the part that converts code to instructions after the original language is out of sight.
 To me techempower stats is pretty good indicator - it shows 
 json processing, single/multiquery requests, database, static. 
 Overall performance across those stats give pretty good idea, 
 how language and web framework is created, its ecosystem.
 For example if language is fast on basic operations but two 
 frameworks show less then adequate performance then obviously 
 something wrong with the whole ecosystem - it could be 
 difficult to create  fast and efficient apps for average 
 developer. For example Scala - powerfull but yet very 
 complicated language with tons of problems. Most of Scala 
 projects failed. It is very difficult and slow to create  
 efficient  applications for  average developer. It kinds 
 requires rocket scientist to write good code in Scala.  Does D 
 exhibit same problem?
Very fair reasoning. I don't think D has as much problems as Scala, D has a very gentle learning curve and it's not difficult to be productive in. But I'd say most of D's problems are indeed ecosystem-related, possibly because of the kind of personnalities that D attracts : the reluctance from D programmers to gather around the same piece of code makes the ecosystem more insular than needed, as is typical with native programming. D code today has a tendency to balkanize based on various requirements such as exceptions or not, runtime or not, safe or not, -betterC or not... It seems to me languages where DIY is frowned upon (Java) or discouraged by the practice of FFI have better library ecosystems, for better or worse.
Apr 26 2020
parent JN <666total wp.pl> writes:
On Sunday, 26 April 2020 at 12:37:48 UTC, Guillaume Piolat wrote:
 But I'd say most of D's problems are indeed ecosystem-related, 
 possibly because of the kind of personnalities that D attracts 
 : the reluctance from D programmers to gather around the same 
 piece of code makes the ecosystem more insular than needed, as 
 is typical with native programming. D code today has a tendency 
 to balkanize based on various requirements such as exceptions 
 or not, runtime or not,  safe or not, -betterC or not... It 
 seems to me languages where DIY is frowned upon (Java) or 
 discouraged by the practice of FFI have better library 
 ecosystems, for better or worse.
These are connected. Languages like Java don't give you options. You will use the GC, you will use OOP. Imagine an XML library. Any Java XML DOM library will offer a XMLDocument object with a load method (or constructor). This is expected and more or less the same in every library. D doesn't force the paradigm on you. Some people will want to use the GC, some won't, some will want to use OOP, some will avoid it like fire. It's a tradeoff, for higher flexibility and power you trade some composability.
Apr 26 2020
prev sibling parent reply Daniel Kozak <kozzi11 gmail.com> writes:
On Fri, Apr 24, 2020 at 3:46 PM serge via Digitalmars-d
<digitalmars-d puremagic.com> wrote:
 To me techempower stats is pretty good indicator - it shows json
 processing, single/multiquery requests, database, static. Overall
 performance across those stats give pretty good idea, how
 language and web framework is created, its ecosystem.
Unfortunately there is a big issue with techempower. Because it is so popular almost every framework [language] try to have a best score in it. And in many cases this mean they use some hacks or tricks to achieve that. So in general techempower results are useless. From my own experience D performance is really good in a real word scenarios. Other issue with techempower benchmark is there is almost zero complexity. All tests do some basic operations on realy small datasets.
Apr 26 2020
next sibling parent reply JN <666total wp.pl> writes:
On Sunday, 26 April 2020 at 16:59:44 UTC, Daniel Kozak wrote:
 Unfortunately there is a big issue with techempower. Because it 
 is so
 popular almost every framework [language] try to have a best 
 score in
 it.
 And in many cases this mean they use some hacks or tricks to 
 achieve
 that. So in general techempower results are useless. From my own
 experience D performance is really good in a real word 
 scenarios.
 Other issue with techempower benchmark is there is almost zero
 complexity. All tests do some basic operations on realy small
 datasets.
It's nice to have a moral victory and claim to be above "those cheaters", but links to these benchmarks are shared in many places. If someone wants to see how fast D is, they will write "programming language benchmark" in their websearch of choice, and TechEmpower will be high in the results list. He will click, and go "oh wow, even PHP is faster than that D stuff". Whether it's cheating or not, perception matters and people will use such benchmarks to base their decision, even if it's unreasonable and doesn't apply to real world scenarios.
Apr 26 2020
parent Daniel Kozak <kozzi11 gmail.com> writes:
On Sun, Apr 26, 2020 at 9:35 PM JN via Digitalmars-d
<digitalmars-d puremagic.com> wrote:
 It's nice to have a moral victory and claim to be above "those
 cheaters", but links to these benchmarks are shared in many
 places. If someone wants to see how fast D is, they will write
 "programming language benchmark" in their websearch of choice,
 and TechEmpower will be high in the results list. He will click,
 and go "oh wow, even PHP is faster than that D stuff".

 Whether it's cheating or not, perception matters and people will
 use such benchmarks to base their decision, even if it's
 unreasonable and doesn't apply to real world scenarios.
Yes I agree, this is the reason why I am improving those benchmarks from time to time, to make D faster than PHP :D
Apr 26 2020
prev sibling parent Wulfklaue <wulfklaue wulfklaue.com> writes:
On Sunday, 26 April 2020 at 16:59:44 UTC, Daniel Kozak wrote:
 On Fri, Apr 24, 2020 at 3:46 PM serge via Digitalmars-d 
 <digitalmars-d puremagic.com> wrote:
 To me techempower stats is pretty good indicator - it shows 
 json processing, single/multiquery requests, database, static. 
 Overall performance across those stats give pretty good idea, 
 how language and web framework is created, its ecosystem.
Unfortunately there is a big issue with techempower. Because it is so popular almost every framework [language] try to have a best score in it. And in many cases this mean they use some hacks or tricks to achieve that. So in general techempower results are useless.
As somebody who implemented the Swoole+PHP and Crystal code at Techempowered, i can state that this statement is factually wrong. The code is very idiomatic code that anybody writes. Basic database calls, pool for connections, prepare statements, standard http module or frameworks. There is no magic in the code that try's to do direct system calls or has stripped down drivers or any other stuff that people normally will not use. Where you can see some funny business, is in the top 10 a 20, where Rust and co have some extreme optimized code, that is not how most people will write the code. But those are the extreme cases, what anybody with half a brain ignores because that is not how your write normal code. I always say: Look at the code to see if the results are normal or over optimized/unrealistic crap. If we compare normal implementations ( https://www.techempower.com/benchmarks/#section=test&runid=c7152e8f-5b33-4ae7-9e89-630af44bc8de& w=ph&test=plaintext ) like Futures: vibed-ldc-pgsql: 58k Crystal: 206k PHP+Swoole: 289k D's results are simply abysmal. We are talking basic idiomatic code here. This tells me more that D has a issue with its DB handling on those tests. We need to look at stuff like "hello world" ( plain text ), json, where the performance difference drops down to 2x. The plaintext is literally take a string and output it. We are talking route + echo in PHP and any other language. Or basic encoding a json and outputting it. A few lines of code, that is it. Yet D suffers still in those tests with a 2x issue. Does that not tell you that D or VibeD suffer from a actual performance issue? A point that clearly needs to be looked after. If the argument is dat D is not properly optimized, then what are PHP+Workerman/Swoole/..., Crystal?
 Other issue with techempower benchmark is there is almost zero
 complexity. All tests do some basic operations on realy small
 datasets.
Futures shows a more realistic real world web scenario. The rest mostly show weaknesses in each language+framework specific section. If your json score is low, there is a problem with your json library or the way your framework handles the requests. If your plaintext results are low ... You get the drill. If you simply try to scuff at a issue by stating "in real world we are faster" but you have benchmarks like this online... The nice things about techempowered is that it really shows if your language is fast for basic web tasks or not. It does not give a darn that your language can run "real world" fast, if there are underlying issues. For people who are interested in D for website hosting, its simply slower then the competitors. Do not like it? Then see where the issues are and fix them. Be it in the techempowered code, in D or in VibeD. But clearly there is a issue if D can not compete with implementations of other languages ( again, talking normal implementations, stuff that anybody will use ). If given the choice, what will people pick? D that simply ignores the web market or other languages/frameworks where the speed out of the door is great. Its funny seeing comments like this where a simple question by the OP, turns into a whole and totally useless technical discussion. Followed by some comment that comes down to"ignore it because everybody cheats" And people here wonder why D has issues with popularity. Really! Get out much? From my point of view, the comment is insulting and tantamount as to calling people like me, who implemented a few of the other languages as "cheaters", when its literally basic code that is used everywhere ( trust me, i am not some magic programmer, who knows C++ out of the back of his hand. I barely scrap by on my own with PHP and Ruby ). If the issue is at D its end. Be its D, Vibe.D or the Code used, then fix it but do not insult everybody else ( especially the people who wrote normal code ). As the saying goes: "always clean your own house first, before criticizing your neighbor's house".
Apr 26 2020
prev sibling next sibling parent reply mipri <mipri minimaltype.com> writes:
On Wednesday, 22 April 2020 at 14:00:10 UTC, serge wrote:
 My understanding that D is the language in similar ballpark 
 performance league as C, C++, Rust. Hence the question - why 
 web performance of those 2 web frameworks is so poor compared 
 to rivals?
Consider this benchmark from the thread next door, "Memory issues. GC not giving back memory to OS?": import std.stdio; void main(string[] args) { int[] v = [1, 2]; int n = 1 << 30; for (int i = 0; i < n - 2; ++i) { v ~= i; } writefln("v len: %s cap: %s\n", v.length, v.capacity); } With an average of four runs, compiled with gdc -O3, this takes 40s and has a max RSS of 7.9 GB. Here's the same benchmark changed to use std.container.array: void main() nogc { import core.stdc.stdio : printf; import std.container.array; Array!int v = Array!int(1, 2); foreach (i; 0 .. (1 << 30) - 2) v ~= i; printf("v len: %d cap: %d\n", v.length, v.capacity); } Same treatment: 3.3s and a max RSS of 4.01 GB. More than ten times faster. If you set out to make a similar benchmark in C++ or Rust you'll naturally see performance more like the second example than the first. So there's some extra tension here: D has high-convenience facilities like this that let it compete with scripting languages for ease of development, but after you've exercised some ease of development you might want to transition away from these facilities. D has other tensions, like "would you like the GC, or no?" or "would you like the the whole language with TypeInfo and AAs, or no?", or "would you like speed-of-light compile times, or would you like to do a lot of CTFE and static reflection?" And this is more how I'd characterize the language. Not as "it has this such-and-such performance ballpark and I should be very surprised if a particular web framework doesn't match that", but "it's a bunch of sublanguages in one and therefore you have to look closer at a given web framework to even say which sublanguage it's written in". I think the disadvantages of D being like this are obvious. An advantage of it being like this, is that if you one day decide that you'd prefer a D application have C++-style performance, you don't have to laboriously rewrite the application into a completely different language. The D-to-D FFI, as it were, is really good, so you can make transitions like that as needed, even to just the parts of the application that need them.
Apr 22 2020
parent reply serge <abc abc.com> writes:
On Thursday, 23 April 2020 at 00:00:29 UTC, mipri wrote:
 On Wednesday, 22 April 2020 at 14:00:10 UTC, serge wrote:
 My understanding that D is the language in similar ballpark 
 performance league as C, C++, Rust. Hence the question - why 
 web performance of those 2 web frameworks is so poor compared 
 to rivals?
Consider this benchmark from the thread next door, "Memory issues. GC not giving back memory to OS?": import std.stdio; void main(string[] args) { int[] v = [1, 2]; int n = 1 << 30; for (int i = 0; i < n - 2; ++i) { v ~= i; } writefln("v len: %s cap: %s\n", v.length, v.capacity); } With an average of four runs, compiled with gdc -O3, this takes 40s and has a max RSS of 7.9 GB. Here's the same benchmark changed to use std.container.array: void main() nogc { import core.stdc.stdio : printf; import std.container.array; Array!int v = Array!int(1, 2); foreach (i; 0 .. (1 << 30) - 2) v ~= i; printf("v len: %d cap: %d\n", v.length, v.capacity); } Same treatment: 3.3s and a max RSS of 4.01 GB. More than ten times faster. If you set out to make a similar benchmark in C++ or Rust you'll naturally see performance more like the second example than the first. So there's some extra tension here: D has high-convenience facilities like this that let it compete with scripting languages for ease of development, but after you've exercised some ease of development you might want to transition away from these facilities. D has other tensions, like "would you like the GC, or no?" or "would you like the the whole language with TypeInfo and AAs, or no?", or "would you like speed-of-light compile times, or would you like to do a lot of CTFE and static reflection?" And this is more how I'd characterize the language. Not as "it has this such-and-such performance ballpark and I should be very surprised if a particular web framework doesn't match that", but "it's a bunch of sublanguages in one and therefore you have to look closer at a given web framework to even say which sublanguage it's written in". I think the disadvantages of D being like this are obvious. An advantage of it being like this, is that if you one day decide that you'd prefer a D application have C++-style performance, you don't have to laboriously rewrite the application into a completely different language. The D-to-D FFI, as it were, is really good, so you can make transitions like that as needed, even to just the parts of the application that need them.
I did check the library.. My understanding that proposal is to use the library with manual memory management without GC. My concern was that D web frameworks performed worse then frameworks in Go and Java which are GC only languages. Does it mean that GC in D is far from great that we need to avoid it in order to beat Java/Go? Probably would worth to stress - I didn't mean in fact beat, I would love to see stats on par with Go and Java but unfortunately D was few times slower - close to Python and Ruby... Despite the library can speed things up I believe the language/runtime should be able to work efficiently for such type of operations. We should not need to develop such libraries in order to have good performance. To me it is bug or poor implementation of GC or deficiency in design of runtime. The need for that type of libraries to solve deficiencies in runtime would not allow to focus on writing good code but instead to look for gotchas to get adequate solution.
Apr 24 2020
parent Paulo Pinto <pjmlp progtools.org> writes:
On Friday, 24 April 2020 at 13:58:53 UTC, serge wrote:
 On Thursday, 23 April 2020 at 00:00:29 UTC, mipri wrote:
 On Wednesday, 22 April 2020 at 14:00:10 UTC, serge wrote:
 My understanding that D is the language in similar ballpark 
 performance league as C, C++, Rust. Hence the question - why 
 web performance of those 2 web frameworks is so poor compared 
 to rivals?
Consider this benchmark from the thread next door, "Memory issues. GC not giving back memory to OS?": import std.stdio; void main(string[] args) { int[] v = [1, 2]; int n = 1 << 30; for (int i = 0; i < n - 2; ++i) { v ~= i; } writefln("v len: %s cap: %s\n", v.length, v.capacity); } With an average of four runs, compiled with gdc -O3, this takes 40s and has a max RSS of 7.9 GB. Here's the same benchmark changed to use std.container.array: void main() nogc { import core.stdc.stdio : printf; import std.container.array; Array!int v = Array!int(1, 2); foreach (i; 0 .. (1 << 30) - 2) v ~= i; printf("v len: %d cap: %d\n", v.length, v.capacity); } Same treatment: 3.3s and a max RSS of 4.01 GB. More than ten times faster. If you set out to make a similar benchmark in C++ or Rust you'll naturally see performance more like the second example than the first. So there's some extra tension here: D has high-convenience facilities like this that let it compete with scripting languages for ease of development, but after you've exercised some ease of development you might want to transition away from these facilities. D has other tensions, like "would you like the GC, or no?" or "would you like the the whole language with TypeInfo and AAs, or no?", or "would you like speed-of-light compile times, or would you like to do a lot of CTFE and static reflection?" And this is more how I'd characterize the language. Not as "it has this such-and-such performance ballpark and I should be very surprised if a particular web framework doesn't match that", but "it's a bunch of sublanguages in one and therefore you have to look closer at a given web framework to even say which sublanguage it's written in". I think the disadvantages of D being like this are obvious. An advantage of it being like this, is that if you one day decide that you'd prefer a D application have C++-style performance, you don't have to laboriously rewrite the application into a completely different language. The D-to-D FFI, as it were, is really good, so you can make transitions like that as needed, even to just the parts of the application that need them.
I did check the library.. My understanding that proposal is to use the library with manual memory management without GC. My concern was that D web frameworks performed worse then frameworks in Go and Java which are GC only languages. Does it mean that GC in D is far from great that we need to avoid it in order to beat Java/Go? Probably would worth to stress - I didn't mean in fact beat, I would love to see stats on par with Go and Java but unfortunately D was few times slower - close to Python and Ruby... Despite the library can speed things up I believe the language/runtime should be able to work efficiently for such type of operations. We should not need to develop such libraries in order to have good performance. To me it is bug or poor implementation of GC or deficiency in design of runtime. The need for that type of libraries to solve deficiencies in runtime would not allow to focus on writing good code but instead to look for gotchas to get adequate solution.
Yes, that does mean that D's GC still needs some improvements, although much have been done during the last year. Also note that while Java and Go are heavy GC languages, there are ways to do value based coding, and although Project Valhala is taking its time due to the engineering issues it addresses, it will eventually be done. Just like in all GC enabled languages that offer multiple allocation mechanisms alongside the GC you should approach it in stages. Use the GC for you initial solution and only in cases where it is obvious from the start that it might be a problem, or when proven that some issues might have to be adressed then look for value allocation, nogc, referenced counted collections and other low level style tricks. That is the nice thing about systems languages like D, you don't need to code like C from the start, and when you need to actually do it, the tools are available.
Apr 25 2020
prev sibling next sibling parent Jon Degenhardt <jond noreply.com> writes:
On Wednesday, 22 April 2020 at 14:00:10 UTC, serge wrote:
 D has sparked my interested therefore last week I have started 
 to look into the language and have completed D course on 
 pluralsight. One of area where I would like to apply D in  web 
 application/cloud. Golang  is not bad but I think D seems more 
 powerful. Although during my research I have found interesting 
 facts:

 1) fib test (https://github.com/drujensen/fib) with D  
 (compiled with ldc) showed really good performance results.
 2) Various Web performance test on  
 https://www.techempower.com/benchmarks/#section=data-r18&hw=ph&test=query show
pretty poor performance of D using vibe.d and hunt framework/libraries. 
Regardless of type of test - json, single query, multiple query and so on.

 My understanding that D is the language in similar ballpark 
 performance league as C, C++, Rust. Hence the question - why 
 web performance of those 2 web frameworks is so poor compared 
 to rivals? I would say few times worse. This is not a troll 
 post by any means. I am researching if D and D web framework 
 whether it can be used as  replacement python/django/flask 
 within our company. Although if D web framework show worse 
 performance than Go then probably it is not right tool for the 
 job.
 Any comments and feedback would be appreciated.
I gave a talk at DConf 2018 you may be interested in. The talk goes over a set of benchmark studies I did. The video was lost, but the slides are here: https://github.com/eBay/tsv-utils/blob/master/docs/dconf2018.pdf. The slides are probably the easiest way to get an overview. The full details on the benchmark studies on the tsv-utils repo: https://github.com/eBay/tsv-utils/blob/master/docs/Performance.md --Jon
Apr 24 2020
prev sibling parent Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:
On Wednesday, 22 April 2020 at 14:00:10 UTC, serge wrote:
 2) Various Web performance test on  
 https://www.techempower.com/benchmarks/#section=data-r18&hw=ph&test=query show
pretty poor performance of D using vibe.d and hunt framework/libraries. 
Regardless of type of test - json, single query, multiple query and so on.
I don't have a reference off the top of my head, but IIRC much of the upper end of those benchmarks is the result less of inherent language or framework differences and more to do with which implementations have been strongly tailored for each particular benchmark. (I think this has been discussed on the D forums before.) That tailoring to the benchmark often means dropping things (e.g. certain kinds of validation or safety measures) that any realistic app would have to do. So the top end of those benchmark tables may be rather misleading when it comes to real-world usage. There may well be framework design decisions in place that have a stronger impact. For example I recall that the default "user-friendly" vibe.d tools for handling HTTP requests creates a situation where the easy thing to do is generate garbage per request. So unless one addresses that, it will put ready constraints on exactly how performant one can get. Note, this is _not_ a case of "the GC is bad" or "you can't get good performance with a GC". It's a case of, if you use the GC naively, rather than having a good strategy for preallocation and re-use of resources, you will force the GC into having to do work that can be avoided. So leaving aside misleading factors like extreme tailoring to the benchmark, I would suggest the memory allocation strategies in use are probably the first thing to look at in asking why those D implementations might be less performant could than they could be. When it comes to the frameworks, the questions might be: (i) are there any cases where that framework _forces_ you into a suboptimal memory (re)allocation strategy? and (ii) even if there aren't, how easy/idiomatic is it to use a performance-oriented allocation strategy?
Apr 25 2020