digitalmars.D - D perfomance
- serge (19/19) Apr 22 2020 D has sparked my interested therefore last week I have started to
- welkam (8/12) Apr 22 2020 Equivalent implementation in C, C++, D, Rust, Nim compiled with
- Arine (4/12) Apr 22 2020 Not quite. Rust will generate better assembly as it can guarantee
- drug (6/18) Apr 23 2020 You forget to add "in some cases Rust may generate better assembly than
- Arine (9/30) Apr 23 2020 I wasn't replying to the author of the thread. I was replying to
- drug (5/14) Apr 23 2020 Well, you're right, I used wrong wording to express my thoughts. I meant...
- Arine (37/38) Apr 24 2020 https://godbolt.org/z/g_euiT
- IGotD- (2/25) Apr 24 2020 Would DIP 1000 enable such optimization possibility in D?
- Les De Ridder (2/36) Apr 24 2020 Technically DIP 1021 could.
- Les De Ridder (3/9) Apr 24 2020 Actually, nevermind:
- mipri (7/18) Apr 24 2020 eh, this isn't Rust; it's "Rust +nightly with an unstable codegen
- SrMordred (5/27) Apr 24 2020 I believe that "-enable-scoped-noalias=true" should be the D
- Walter Bright (15/17) Apr 25 2020 The following C code:
- Walter Bright (4/6) Apr 25 2020 D's @live functions can indeed do such optimizations, though I haven't g...
- Joseph Rushton Wakeling (41/47) Apr 25 2020 In any case, I seriously doubt those kinds of optimization have
- Jonathan M Davis (7/11) Apr 25 2020 You could probably do that, but I'm not sure that it could be considered
- Joseph Rushton Wakeling (13/19) Apr 26 2020 I think it would be OK to have it as a non-@safe tool. But ...
- Walter Bright (11/41) Apr 25 2020 I agree. I also generally structure my code so that optimization wouldn'...
- Joseph Rushton Wakeling (25/35) Apr 26 2020 Yes! :-) And in particular, that computational complexity in the
- Walter Bright (4/6) Apr 26 2020 It manages its own memory privately, and presents the results as dynamic...
- welkam (10/17) Apr 26 2020 I heard that proving that two pointers do not alias is a big
- John Colvin (10/52) Apr 26 2020 I understand that it was an annoying breaking change, but aside
- Stefan Koch (6/12) Apr 26 2020 Can you imagine replacing every usage of slices with a custom
- Sebastiaan Koppe (3/17) Apr 26 2020 I suppose nowadays that custom type can use a scoped ubyte slice
- Joseph Rushton Wakeling (46/54) Apr 26 2020 That's not entirely unfair, but I think it does help to
- Steven Schveighoffer (31/89) Apr 26 2020 In terms of performance, depending on the task at hand, D1 code is
- Joseph Rushton Wakeling (16/49) Apr 26 2020 That makes sense. I just know that Mathias L. seemed to be quite
- Mathias LANG (44/82) Apr 26 2020 Well, Sociomantic didn't use any kind of multi-threading in "user
- Steven Schveighoffer (51/146) Apr 27 2020 I tested the performance when I added the feature. D2 was significantly
- Walter Bright (2/2) Apr 29 2020 This is what the D n.g. is about - informative, collegial, and useful! T...
- Timon Gehr (3/9) Apr 25 2020 What's an example of such an optimization and why won't it introduce UB
- Walter Bright (8/10) Apr 25 2020 @live void test() { int a,b; foo(a, b); }
- Timon Gehr (2/15) Apr 26 2020 Actually they can, even in @safe @live code.
- Walter Bright (2/5) Apr 26 2020 Bug reports are welcome. Please tag them with the 'live' keyword in bugz...
- Timon Gehr (23/30) Apr 26 2020 I can't do that because you did not agree it was a bug. According to
- Walter Bright (4/31) Apr 26 2020 @live's invariants rely on arguments passed to it that conform to its
- Timon Gehr (8/42) Apr 27 2020 No, it is not analogous, because only @system or @trusted code can get
- Walter Bright (4/7) Apr 27 2020 It is a good point. The design of @live up to this point did not change ...
- John Colvin (8/11) Apr 28 2020 the existence of any overly @trusting code renders @safe code
- welkam (2/14) Apr 28 2020 Would be eager to listen.
- ag0aep6g (4/8) Apr 28 2020 I don't see how you arrive at "buggy @safe code" here. You say it
- Timon Gehr (5/14) Apr 28 2020 I guess he is talking about the case where @trusted code calls buggy
- drug (21/38) Apr 25 2020 Yes, your statement that Rust assembly output is better is wrong,
- random (23/39) Apr 29 2020 A competent C Programmer could just write something like this. Or
- IGotD- (7/14) Apr 29 2020 I'm incompetent so I would just write:
- random (9/15) Apr 29 2020 Ok in this simple case it's obvius.
- IGotD- (4/12) Apr 29 2020 In the strcmp example, shouldn't the compiler be able to do the
- random (22/25) Apr 29 2020 Good question. My strcmp example is actually really bad, because
- random (11/11) Apr 29 2020 On Wednesday, 29 April 2020 at 16:19:55 UTC, random wrote:
- Walter Bright (2/4) Apr 29 2020 You'd be right.
- random (10/10) Apr 29 2020 On Wednesday, 29 April 2020 at 10:32:33 UTC, random wrote:
- welkam (6/10) Apr 26 2020 Cool. Did not knew that. I know that different languages have
- Bastiaan Veelo (9/16) Apr 22 2020 [...]
- Guillaume Piolat (24/26) Apr 22 2020 Yes.
- serge (18/45) Apr 24 2020 Could you please elaborate on that? what are you referring to as
- Guillaume Piolat (17/33) Apr 26 2020 I was mentionning LLVM vs GCC vs Intel compiler backend, the part
- JN (10/20) Apr 26 2020 These are connected. Languages like Java don't give you options.
- Daniel Kozak (11/15) Apr 26 2020 Unfortunately there is a big issue with techempower. Because it is so
- JN (10/23) Apr 26 2020 It's nice to have a moral victory and claim to be above "those
- Daniel Kozak (5/14) Apr 26 2020 Yes I agree, this is the reason why I am improving those benchmarks
- Wulfklaue (69/87) Apr 26 2020 As somebody who implemented the Swoole+PHP and Crystal code at
- mipri (51/55) Apr 22 2020 Consider this benchmark from the thread next door, "Memory
- serge (18/73) Apr 24 2020 I did check the library.. My understanding that proposal is to
- Paulo Pinto (18/103) Apr 25 2020 Yes, that does mean that D's GC still needs some improvements,
- Jon Degenhardt (9/29) Apr 24 2020 I gave a talk at DConf 2018 you may be interested in. The talk
- Joseph Rushton Wakeling (32/34) Apr 25 2020 I don't have a reference off the top of my head, but IIRC much of
D has sparked my interested therefore last week I have started to look into the language and have completed D course on pluralsight. One of area where I would like to apply D in web application/cloud. Golang is not bad but I think D seems more powerful. Although during my research I have found interesting facts: 1) fib test (https://github.com/drujensen/fib) with D (compiled with ldc) showed really good performance results. 2) Various Web performance test on https://www.techempower.com/benchmarks/#section=data-r18&hw=ph&test=query show pretty poor performance of D using vibe.d and hunt framework/libraries. Regardless of type of test - json, single query, multiple query and so on. My understanding that D is the language in similar ballpark performance league as C, C++, Rust. Hence the question - why web performance of those 2 web frameworks is so poor compared to rivals? I would say few times worse. This is not a troll post by any means. I am researching if D and D web framework whether it can be used as replacement python/django/flask within our company. Although if D web framework show worse performance than Go then probably it is not right tool for the job. Any comments and feedback would be appreciated.
Apr 22 2020
On Wednesday, 22 April 2020 at 14:00:10 UTC, serge wrote:My understanding that D is the language in similar ballpark performance league as C, C++, Rust.Equivalent implementation in C, C++, D, Rust, Nim compiled with same compiler backend should give exact same machine code. What you see in online language comparisons is mostly comparing different implementations and how much time people spent on optimizing.why web performance of those 2 web frameworks is so poor compared to rivals?Difference in implementation. My guess is that people writing those servers didnt had time to spend on optimizations.
Apr 22 2020
On Wednesday, 22 April 2020 at 15:24:29 UTC, welkam wrote:On Wednesday, 22 April 2020 at 14:00:10 UTC, serge wrote:Not quite. Rust will generate better assembly as it can guarantee that use of an object is unique. Similar to C's "restrict" keyword but you get it for "free" across the entire application.My understanding that D is the language in similar ballpark performance league as C, C++, Rust.Equivalent implementation in C, C++, D, Rust, Nim compiled with same compiler backend should give exact same machine code. What you see in online language comparisons is mostly comparing different implementations and how much time people spent on optimizing.
Apr 22 2020
23.04.2020 01:34, Arine пишет:On Wednesday, 22 April 2020 at 15:24:29 UTC, welkam wrote:You forget to add "in some cases Rust may generate better assembly than C/C++/D because..." But this is not the answer to the question OP asked. Rust has llvm based backend like ldc so nothing prevents ldc to be as fast as any other llvm based compiler. Nothing. The question is how many efforts you put into it.On Wednesday, 22 April 2020 at 14:00:10 UTC, serge wrote:Not quite. Rust will generate better assembly as it can guarantee that use of an object is unique. Similar to C's "restrict" keyword but you get it for "free" across the entire application.My understanding that D is the language in similar ballpark performance league as C, C++, Rust.Equivalent implementation in C, C++, D, Rust, Nim compiled with same compiler backend should give exact same machine code. What you see in online language comparisons is mostly comparing different implementations and how much time people spent on optimizing.
Apr 23 2020
On Thursday, 23 April 2020 at 11:05:35 UTC, drug wrote:23.04.2020 01:34, Arine пишет:I wasn't replying to the author of the thread. I was replying to a misinformed individual in the thread. If that's the way you want to think about, you can create your own compiler and language. "It's just about how many efforts you put into it", even if that means making your own language and compiler. How much "efforts" you have to put into something is a factor in that decision. You'd basically have to remake Rust in D to get the same assembly results and guarantee regarding aliasing.On Wednesday, 22 April 2020 at 15:24:29 UTC, welkam wrote:You forget to add "in some cases Rust may generate better assembly than C/C++/D because..." But this is not the answer to the question OP asked. Rust has llvm based backend like ldc so nothing prevents ldc to be as fast as any other llvm based compiler. Nothing. The question is how many efforts you put into it.On Wednesday, 22 April 2020 at 14:00:10 UTC, serge wrote:Not quite. Rust will generate better assembly as it can guarantee that use of an object is unique. Similar to C's "restrict" keyword but you get it for "free" across the entire application.My understanding that D is the language in similar ballpark performance league as C, C++, Rust.Equivalent implementation in C, C++, D, Rust, Nim compiled with same compiler backend should give exact same machine code. What you see in online language comparisons is mostly comparing different implementations and how much time people spent on optimizing.
Apr 23 2020
23.04.2020 18:13, Arine пишет:I wasn't replying to the author of the thread. I was replying to a misinformed individual in the thread. If that's the way you want to think about, you can create your own compiler and language. "It's just about how many efforts you put into it", even if that means making your own language and compiler. How much "efforts" you have to put into something is a factor in that decision. You'd basically have to remake Rust in D to get the same assembly results and guarantee regarding aliasing.Well, you're right, I used wrong wording to express my thoughts. I meant that C/C++/Rust/D belong to the same performance league. The difference appears in specific cases of course, but in general they are equal. And your statement that Rust assembly output is better is wrong.
Apr 23 2020
On Thursday, 23 April 2020 at 15:57:01 UTC, drug wrote:And your statement that Rust assembly output is better is wrong.https://godbolt.org/z/g_euiT D: int foo(ref int a, ref int b) { a = 0; b = 1; return a; } int example.foo(ref int, ref int): movl $0, (%rsi) movl $1, (%rdi) movl (%rsi), %eax retq Rust: pub fn foo(x: &mut i32, y: &mut i32) -> i32 { *x = 0; *y = 1; *x } example::foo: mov dword ptr [rdi], 0 mov dword ptr [rsi], 1 xor eax, eax ret There most definitely is a difference and the assembly generated with rust is better. This is just a simple example to illustrate the difference. If you don't know why the difference is significant or why it is happening. There are a lot of great articles out there, sadly there are people such as yourself spreading misinformation that don't know what a borrow checker is and don't know Rust or why it is has gone as far as it has. This is why the borrow checker for D is going to fail. Because the person designing it, such as yourself, doesn't have any idea what they are redoing and have never even bothered to touch Rust or learn about it. Anyways I'm not your babysitter, if you don't understand the above, as most people seem to not bother to learn assembly anymore, you're on your own.
Apr 24 2020
On Friday, 24 April 2020 at 19:27:40 UTC, Arine wrote:https://godbolt.org/z/g_euiT D: int foo(ref int a, ref int b) { a = 0; b = 1; return a; } int example.foo(ref int, ref int): movl $0, (%rsi) movl $1, (%rdi) movl (%rsi), %eax retq Rust: pub fn foo(x: &mut i32, y: &mut i32) -> i32 { *x = 0; *y = 1; *x } example::foo: mov dword ptr [rdi], 0 mov dword ptr [rsi], 1 xor eax, eax retWould DIP 1000 enable such optimization possibility in D?
Apr 24 2020
On Friday, 24 April 2020 at 22:52:31 UTC, IGotD- wrote:On Friday, 24 April 2020 at 19:27:40 UTC, Arine wrote:Technically DIP 1021 could.https://godbolt.org/z/g_euiT D: int foo(ref int a, ref int b) { a = 0; b = 1; return a; } int example.foo(ref int, ref int): movl $0, (%rsi) movl $1, (%rdi) movl (%rsi), %eax retq Rust: pub fn foo(x: &mut i32, y: &mut i32) -> i32 { *x = 0; *y = 1; *x } example::foo: mov dword ptr [rdi], 0 mov dword ptr [rsi], 1 xor eax, eax retWould DIP 1000 enable such optimization possibility in D?
Apr 24 2020
On Friday, 24 April 2020 at 23:03:25 UTC, Les De Ridder wrote:On Friday, 24 April 2020 at 22:52:31 UTC, IGotD- wrote:Actually, nevermind: https://github.com/dlang/DIPs/blob/master/DIPs/accepted/DIP1021.md#limitationsOn Friday, 24 April 2020 at 19:27:40 UTC, Arine wrote:Technically DIP 1021 could.[...]Would DIP 1000 enable such optimization possibility in D?
Apr 24 2020
On Friday, 24 April 2020 at 19:27:40 UTC, Arine wrote:Rust: pub fn foo(x: &mut i32, y: &mut i32) -> i32 { *x = 0; *y = 1; *x } example::foo: mov dword ptr [rdi], 0 mov dword ptr [rsi], 1 xor eax, eax reteh, this isn't Rust; it's "Rust +nightly with an unstable codegen flag." Marginal codegen improvements aren't going to turn heavy usage of dynamic arrays into heavy usage of std.container.array either, so they're not that relevant to expected performance of real-world programs in D vs. other languages.
Apr 24 2020
On Friday, 24 April 2020 at 23:24:49 UTC, mipri wrote:On Friday, 24 April 2020 at 19:27:40 UTC, Arine wrote:I believe that "-enable-scoped-noalias=true" should be the D equivalent (with LDC) but it didnt change anything. Also u can achieve the same asm with llvmAttr("noalias") in front of at least one argument.Rust: pub fn foo(x: &mut i32, y: &mut i32) -> i32 { *x = 0; *y = 1; *x } example::foo: mov dword ptr [rdi], 0 mov dword ptr [rsi], 1 xor eax, eax reteh, this isn't Rust; it's "Rust +nightly with an unstable codegen flag." Marginal codegen improvements aren't going to turn heavy usage of dynamic arrays into heavy usage of std.container.array either, so they're not that relevant to expected performance of real-world programs in D vs. other languages.
Apr 24 2020
On 4/24/2020 8:06 PM, SrMordred wrote:Also u can achieve the same asm with llvmAttr("noalias") in front of at least one argument.The following C code: int test(int * __restrict__ x, int * __restrict__ y) { *x = 0; *y = 1; return *x; } compiled with gcc -O: test: mov dword ptr [RDI],0 mov dword ptr [RSI],1 mov EAX,0 ret It's not a unique property of Rust, C99 has it too. DMC doesn't implement it, but it probably should.
Apr 25 2020
On 4/24/2020 12:27 PM, Arine wrote:There most definitely is a difference and the assembly generated with rust is better.D's live functions can indeed do such optimizations, though I haven't got around to implementing them in DMD's optimizer. There's nothing particularly difficult about it.
Apr 25 2020
On Saturday, 25 April 2020 at 10:15:33 UTC, Walter Bright wrote:On 4/24/2020 12:27 PM, Arine wrote:In any case, I seriously doubt those kinds of optimization have anything to do with the web framework performance differences. My experience of writing number-crunching stuff in D and Rust is that Rust seems to have a small but consistent performance edge that could quite possibly be down the kind of optimizations that Arine mentions (that's speculation: I haven't verified). However, it's small differences, not order-of-magnitude stuff. I suppose that in a more complicated app there could be some multiplicative impact, but where high-throughput web frameworks are concerned I'm pretty sure that the memory allocation and reuse strategy is going to be what makes 99% of the difference. There may also be a bit of an impact from the choice of futures vs. fibers for managing asynchronous tasks (there's a context switching cost for fibers), but I would expect that to only make a difference at the extreme upper end of performance, once other design factors have been addressed. BTW, on the memory allocation front, Mathias Lang has pointed out that there is quite a nasty impact from `assumeSafeAppend`. Imagine that your request processing looks something like this: // extract array instance from reusable pool, // and set its length to zero so that you can // write into it from the start x = buffer_pool.get(); x.length = 0; assumeSafeAppend(x); // a cost each time you do this // now append stuff into x to // create your response // now publish your response // with the response published, clean // up by recycling the buffer back into // the pool buffer_pool.recycle(x); This is the kind of pattern that Sociomantic used a lot. In D1 it was easy because there was no array stomping prevention -- you could just set length == 0 and start appending. But having to call `assumeSafeAppend` each time does carry a performance cost. IIRC Mathias has suggested that it should be possible to tag arrays as intended for this kind of re-use, so that stomping prevention will never trigger, and you don't have to `assumeSafeAppend` each time you reduce the length.There most definitely is a difference and the assembly generated with rust is better.D's live functions can indeed do such optimizations, though I haven't got around to implementing them in DMD's optimizer. There's nothing particularly difficult about it.
Apr 25 2020
On Saturday, April 25, 2020 4:34:44 AM MDT Joseph Rushton Wakeling via Digitalmars-d wrote:IIRC Mathias has suggested that it should be possible to tag arrays as intended for this kind of re-use, so that stomping prevention will never trigger, and you don't have to `assumeSafeAppend` each time you reduce the length.You could probably do that, but I'm not sure that it could be considered safe. It would probably make more sense to just use a custom array type if that's what you really needed, though of course, that causes its own set of difficulties (including having to duplicate the array appending logic). - Jonathan M Davis
Apr 25 2020
On Saturday, 25 April 2020 at 15:21:03 UTC, Jonathan M Davis wrote:You could probably do that, but I'm not sure that it could be considered safe.I think it would be OK to have it as a non- safe tool. But ...It would probably make more sense to just use a custom array type if that's what you really needed, though of course, that causes its own set of difficulties (including having to duplicate the array appending logic).... I think that could possibly make more sense. One thing that I really don't like about the original idea of an `alwaysAssumeSafeAppend(x)` is that it makes behaviour dependent on the instance rather than the type. It would probably be better to have a clear type-based separation. OTOH in my experience custom types are often finnicky in terms of how they interact with functions that expect a slice as input. So there could be a convenience in having it as an option for regular dynamic arrays. Or it could just be that the custom type would need a bit more work in its implementation :-)
Apr 26 2020
On 4/25/2020 3:34 AM, Joseph Rushton Wakeling wrote:In any case, I seriously doubt those kinds of optimization have anything to do with the web framework performance differences.I agree. I also generally structure my code so that optimization wouldn't make a difference. But it's still a worthwhile benefit to add it for live functions.I suppose that in a more complicated app there could be some multiplicative impact, but where high-throughput web frameworks are concerned I'm pretty sure that the memory allocation and reuse strategy is going to be what makes 99% of the difference.My experience is if the code has never been profiled, there's one obscure function unexpectedly consuming the bulk of the run time, which is easily recoded. A programs that have been runtime profiled tend to have a pretty flat graph of which functions eat the time.// extract array instance from reusable pool, // and set its length to zero so that you can // write into it from the start x = buffer_pool.get(); x.length = 0; assumeSafeAppend(x); // a cost each time you do this // now append stuff into x to // create your response // now publish your response // with the response published, clean // up by recycling the buffer back into // the pool buffer_pool.recycle(x); This is the kind of pattern that Sociomantic used a lot. In D1 it was easy because there was no array stomping prevention -- you could just set length == 0 and start appending. But having to call `assumeSafeAppend` each time does carry a performance cost.This is why I use OutBuffer for such activities.IIRC Mathias has suggested that it should be possible to tag arrays as intended for this kind of re-use, so that stomping prevention will never trigger, and you don't have to `assumeSafeAppend` each time you reduce the length.Sounds like an idea worth exploring. How about taking point on that? But I would be concerned about tagging such arrays, and then stomping them unintentionally, leading to memory corruption bugs. OutBuffer is memory safe.
Apr 25 2020
On Saturday, 25 April 2020 at 22:15:44 UTC, Walter Bright wrote:My experience is if the code has never been profiled, there's one obscure function unexpectedly consuming the bulk of the run time, which is easily recoded. A programs that have been runtime profiled tend to have a pretty flat graph of which functions eat the time.Yes! :-) And in particular, that computational complexity in the real world is very different from theoretical arguments about O(...). One can gain a lot by being clear headed about what the actual problem is and what is optimal for that particular problem with that particular data.This is why I use OutBuffer for such activities.Yes, I have some memory of talking with Dicebot about whether this would be an appropriate tool for the other side of the D2 conversion. I don't remember if any firm conclusions were drawn, though.Sounds like an idea worth exploring. How about taking point on that? But I would be concerned about tagging such arrays, and then stomping them unintentionally, leading to memory corruption bugs. OutBuffer is memory safe.Yes, it's clear (as Jonathan noted) that an always-stompable array could probably not be safe. That said, what does OutBuffer do that means that it _is_ safe in this context? Of course, Sociomantic never had safe to play with. In practice I don't recall there ever being an issue with unintentional stomping (I'm not saying it never happened, but I have no recollection of it being a common issue). That did however rest on a program structure that made it less likely anyone would make such a mistake. About stepping up with a feature contribution: the idea is lovely but I'm very aware of how limited my time is right now, so I don't want to make offers I can't guarantee to follow up on. There's a reason I post so rarely in the forums these days! But I will ping Mathias L. to let him know, as the idea was his to start with.
Apr 26 2020
On 4/26/2020 4:30 AM, Joseph Rushton Wakeling wrote:That said, what does OutBuffer do that means that it _is_ safe in this context?It manages its own memory privately, and presents the results as dynamic arrays which do their own bounds checking. It's been a reliable solution for me for maybe 30 years.
Apr 26 2020
On Saturday, 25 April 2020 at 22:15:44 UTC, Walter Bright wrote:On 4/25/2020 3:34 AM, Joseph Rushton Wakeling wrote:I heard that proving that two pointers do not alias is a big problem in compiler backends and that some or most auto vectorization optimizations do not fire because compiler can't prove no aliasing. A new language used in Unity game engine is designed such that references do not alias by default for optimization reasons. I haven't looked into this topic further but I believe its worth checking it out. Data science people would benefit greatly from autovectoriztionIn any case, I seriously doubt those kinds of optimization have anything to do with the web framework performance differences.I agree. I also generally structure my code so that optimization wouldn't make a difference. But it's still a worthwhile benefit to add it for live functions.
Apr 26 2020
On Saturday, 25 April 2020 at 10:34:44 UTC, Joseph Rushton Wakeling wrote:In any case, I seriously doubt those kinds of optimization have anything to do with the web framework performance differences. My experience of writing number-crunching stuff in D and Rust is that Rust seems to have a small but consistent performance edge that could quite possibly be down the kind of optimizations that Arine mentions (that's speculation: I haven't verified). However, it's small differences, not order-of-magnitude stuff. I suppose that in a more complicated app there could be some multiplicative impact, but where high-throughput web frameworks are concerned I'm pretty sure that the memory allocation and reuse strategy is going to be what makes 99% of the difference. There may also be a bit of an impact from the choice of futures vs. fibers for managing asynchronous tasks (there's a context switching cost for fibers), but I would expect that to only make a difference at the extreme upper end of performance, once other design factors have been addressed. BTW, on the memory allocation front, Mathias Lang has pointed out that there is quite a nasty impact from `assumeSafeAppend`. Imagine that your request processing looks something like this: // extract array instance from reusable pool, // and set its length to zero so that you can // write into it from the start x = buffer_pool.get(); x.length = 0; assumeSafeAppend(x); // a cost each time you do this // now append stuff into x to // create your response // now publish your response // with the response published, clean // up by recycling the buffer back into // the pool buffer_pool.recycle(x); This is the kind of pattern that Sociomantic used a lot. In D1 it was easy because there was no array stomping prevention -- you could just set length == 0 and start appending. But having to call `assumeSafeAppend` each time does carry a performance cost. IIRC Mathias has suggested that it should be possible to tag arrays as intended for this kind of re-use, so that stomping prevention will never trigger, and you don't have to `assumeSafeAppend` each time you reduce the length.I understand that it was an annoying breaking change, but aside from the difficulty of migrating I don't understand why a custom type isn't the appropriate solution for this problem. I think I heard "We want to use the built-in slices", but I never understood the technical argument behind that, or how it stacked up against not getting the desired behaviour. My sense was that the irritation at the breakage was influencing the technical debate.
Apr 26 2020
On Sunday, 26 April 2020 at 11:40:49 UTC, John Colvin wrote:I understand that it was an annoying breaking change, but aside from the difficulty of migrating I don't understand why a custom type isn't the appropriate solution for this problem. I think I heard "We want to use the built-in slices", but I never understood the technical argument behind that, or how it stacked up against not getting the desired behaviour.Can you imagine replacing every usage of slices with a custom type in your code? And making sure programmers joining the company do the same? and having converts that e.g. accept ubyte arrays from libraries and convert them into yours?
Apr 26 2020
On Sunday, 26 April 2020 at 11:59:27 UTC, Stefan Koch wrote:On Sunday, 26 April 2020 at 11:40:49 UTC, John Colvin wrote:I suppose nowadays that custom type can use a scoped ubyte slice to expose its temp buffer.I understand that it was an annoying breaking change, but aside from the difficulty of migrating I don't understand why a custom type isn't the appropriate solution for this problem. I think I heard "We want to use the built-in slices", but I never understood the technical argument behind that, or how it stacked up against not getting the desired behaviour.Can you imagine replacing every usage of slices with a custom type in your code? And making sure programmers joining the company do the same? and having converts that e.g. accept ubyte arrays from libraries and convert them into yours?
Apr 26 2020
On Sunday, 26 April 2020 at 11:40:49 UTC, John Colvin wrote:I understand that it was an annoying breaking change, but aside from the difficulty of migrating I don't understand why a custom type isn't the appropriate solution for this problem. I think I heard "We want to use the built-in slices", but I never understood the technical argument behind that, or how it stacked up against not getting the desired behaviour. My sense was that the irritation at the breakage was influencing the technical debate.That's not entirely unfair, but I think it does help to appreciate the magnitude of the problem: * there's a very large codebase, including many different applications and a large amount of common library code, all containing a lot of functions that expect slice input (because the concept of a range was never in D1, and because slices were the only use case) * most of the library functionality shouldn't have to care whether its input is a reusable buffer or any other kind of slice * you can't rewrite to use range-based generics because that's D2 only and you need to keep D1 compatibility until the last application has migrated * there are _very_ extreme performance and reliability constraints on some of the key applications, meaning that validating D2 transition efforts is very time consuming * you can't use any Phobos functionality until the codebase is D2 only, and even then you probably want to limit how much of it you use because it is not written with these extreme performance concerns in mind * all the time spent on those transitional efforts is time taken away from feature development It's very easy to look back and say something like, "Well, if you'd written with introspection-based design from the start, you would have had a much easier migration effort", but that in itself would have been trickier to do in D1, and would have carried extra maintenance and development costs (particularly w.r.t. forcing devs to write what would have seemed like very boilerplate-y code compared to the actual set of use cases). Even with the D1 compatibility requirement dropped, there still remains a big burden to transition all the reusable buffers to a different type. IIRC the focus would probably have been on using `Appender`. Note that many of these concerns still apply if we want to preserve a future for any of the (very well crafted) library and application code that Sociomantic open-sourced. They are all now D2-only, but the effort required to rewrite around dedicated reusable-buffer types would still be quite substantial.
Apr 26 2020
On 4/25/20 6:34 AM, Joseph Rushton Wakeling wrote:On Saturday, 25 April 2020 at 10:15:33 UTC, Walter Bright wrote:In terms of performance, depending on the task at hand, D1 code is slower than D2 appending, by the fact that there's a thread-local cache for appending for D2, and D1 only has a global one-array cache for the same. However, I'm assuming that since you were focused on D1, your usage naturally was written to take advantage of what D1 has to offer. The assumeSafeAppend call also uses this cache, and so it should be quite fast. But setting length to 0 is a ton faster, because you aren't calling an opaque function. So depending on the usage pattern, D2 with assumeSafeAppend can be faster, or it could be slower.On 4/24/2020 12:27 PM, Arine wrote:In any case, I seriously doubt those kinds of optimization have anything to do with the web framework performance differences. My experience of writing number-crunching stuff in D and Rust is that Rust seems to have a small but consistent performance edge that could quite possibly be down the kind of optimizations that Arine mentions (that's speculation: I haven't verified). However, it's small differences, not order-of-magnitude stuff. I suppose that in a more complicated app there could be some multiplicative impact, but where high-throughput web frameworks are concerned I'm pretty sure that the memory allocation and reuse strategy is going to be what makes 99% of the difference. There may also be a bit of an impact from the choice of futures vs. fibers for managing asynchronous tasks (there's a context switching cost for fibers), but I would expect that to only make a difference at the extreme upper end of performance, once other design factors have been addressed. BTW, on the memory allocation front, Mathias Lang has pointed out that there is quite a nasty impact from `assumeSafeAppend`. Imagine that your request processing looks something like this: // extract array instance from reusable pool, // and set its length to zero so that you can // write into it from the start x = buffer_pool.get(); x.length = 0; assumeSafeAppend(x); // a cost each time you do this // now append stuff into x to // create your response // now publish your response // with the response published, clean // up by recycling the buffer back into // the pool buffer_pool.recycle(x); This is the kind of pattern that Sociomantic used a lot. In D1 it was easy because there was no array stomping prevention -- you could just set length == 0 and start appending. But having to call `assumeSafeAppend` each time does carry a performance cost.There most definitely is a difference and the assembly generated with rust is better.D's live functions can indeed do such optimizations, though I haven't got around to implementing them in DMD's optimizer. There's nothing particularly difficult about it.IIRC Mathias has suggested that it should be possible to tag arrays as intended for this kind of re-use, so that stomping prevention will never trigger, and you don't have to `assumeSafeAppend` each time you reduce the length.I spoke for a while with Dicebot at Dconf 2016 or 17 about this issue. IIRC, I suggested either using a custom type or custom runtime. He was not interested in either of these ideas, and it makes sense (large existing code base, didn't want to stray from mainline D). By far, the best mechanism to use is a custom type. Not only will that fix this problem as you can implement whatever behavior you want, but you also do not need to call opaque functions for appending either. It should outperform everything you could do in a generic runtime. Note that this was before (I think) destructor calls were added. The destructor calls are something that assumeSafeAppend is going to do, and won't be done with just setting length to 0. However, there are other options. We could introduce a druntime configuration option so when this specific situation happens (slice points at start of block and has 0 length), assumeSafeAppend is called automatically on the first append. Jonathan is right that this is not safe, but it could be an opt-in configuration option. I don't think configuring specific arrays makes a lot of sense, as this would require yet another optional bit that would have to be checked and allocated for all arrays. -Steve
Apr 26 2020
On Sunday, 26 April 2020 at 16:20:19 UTC, Steven Schveighoffer wrote:In terms of performance, depending on the task at hand, D1 code is slower than D2 appending, by the fact that there's a thread-local cache for appending for D2, and D1 only has a global one-array cache for the same. However, I'm assuming that since you were focused on D1, your usage naturally was written to take advantage of what D1 has to offer. The assumeSafeAppend call also uses this cache, and so it should be quite fast. But setting length to 0 is a ton faster, because you aren't calling an opaque function. So depending on the usage pattern, D2 with assumeSafeAppend can be faster, or it could be slower.That makes sense. I just know that Mathias L. seemed to be quite concerned about the `assumeSafeAppend` performance impact. I think he was not looking for a D1/D2 comparison but in terms of getting the most performant behaviour in future. It's not that it was slower than D1, it's that it was a per-use speed hit.I spoke for a while with Dicebot at Dconf 2016 or 17 about this issue. IIRC, I suggested either using a custom type or custom runtime. He was not interested in either of these ideas, and it makes sense (large existing code base, didn't want to stray from mainline D).Yes. To be fair I think in that context, at that stage of transition, that probably made more sense: it was easier to just mandate that everybody start putting `assumeSafeAppend` into their code (actually we implemented a transitional wrapper, `enableStomping`, which was a no-op in D1 and called `assumeSafeAppend` in D2).By far, the best mechanism to use is a custom type. Not only will that fix this problem as you can implement whatever behavior you want, but you also do not need to call opaque functions for appending either. It should outperform everything you could do in a generic runtime. Note that this was before (I think) destructor calls were added. The destructor calls are something that assumeSafeAppend is going to do, and won't be done with just setting length to 0. However, there are other options. We could introduce a druntime configuration option so when this specific situation happens (slice points at start of block and has 0 length), assumeSafeAppend is called automatically on the first append. Jonathan is right that this is not safe, but it could be an opt-in configuration option. I don't think configuring specific arrays makes a lot of sense, as this would require yet another optional bit that would have to be checked and allocated for all arrays.The druntime option does sound interesting, although I'm leery about the idea of creating 2 different language behaviours.
Apr 26 2020
On Sunday, 26 April 2020 at 16:20:19 UTC, Steven Schveighoffer wrote:In terms of performance, depending on the task at hand, D1 code is slower than D2 appending, by the fact that there's a thread-local cache for appending for D2, and D1 only has a global one-array cache for the same. However, I'm assuming that since you were focused on D1, your usage naturally was written to take advantage of what D1 has to offer. The assumeSafeAppend call also uses this cache, and so it should be quite fast. But setting length to 0 is a ton faster, because you aren't calling an opaque function. So depending on the usage pattern, D2 with assumeSafeAppend can be faster, or it could be slower.Well, Sociomantic didn't use any kind of multi-threading in "user code". We had single-threaded fibers for concurrency, and process-level scaling for parallelism. Some corner cases were using threads, but it was for low level things (e.g. low latency file IO on Linux), which were highly scrutinized and stayed clear of the GC AFAIK. Note that accessing TLS *does* have a cost which is higher than accessing a global. By this reasoning, I would assume that D2 appending would definitely be slower, although I never profiled it. What I did profile tho, is `assumeSafeAppend`. The fact that it looks up GC metadata (taking the GC lock in the process) made it quite expensive given how often it was called (in D1 it was simply a no-op, and called defensively).Well... Here's nothing I never really quite understood actually: Mihails *did* introduce a buffer type. See https://github.com/sociomantic-tsunami/ocean/blob/36c9fda09544ee5a0695a74186b06b32feda82d4/src/ocean/core/Buffer.d#L116-L130 And we also had a (very old) similar utility here: https://github.com/sociomantic-tsunami/ocean/blob/36c9fda09544ee5a0695a74186b06b32feda82d4/src/ocean/util/container/ConcatBuffer.d I always wanted to unify this, but never got to it. But if you look at the first link, it calls `assumeSafeAppend` twice, before and after setting the length. In practice it is only necessary *after* reducing the length, but as I mentioned, this is defensive programming. For reference, most of our applications had a principled buffer use. The buffers would rarely be appended to from more than one, perhaps two places. However, slices to the buffer would be passed around quite liberally. So a buffer type from which one could borrow would indeed have been optimal.IIRC Mathias has suggested that it should be possible to tag arrays as intended for this kind of re-use, so that stomping prevention will never trigger, and you don't have to `assumeSafeAppend` each time you reduce the length.I spoke for a while with Dicebot at Dconf 2016 or 17 about this issue. IIRC, I suggested either using a custom type or custom runtime. He was not interested in either of these ideas, and it makes sense (large existing code base, didn't want to stray from mainline D). By far, the best mechanism to use is a custom type. Not only will that fix this problem as you can implement whatever behavior you want, but you also do not need to call opaque functions for appending either. It should outperform everything you could do in a generic runtime.Note that this was before (I think) destructor calls were added. The destructor calls are something that assumeSafeAppend is going to do, and won't be done with just setting length to 0. However, there are other options. We could introduce a druntime configuration option so when this specific situation happens (slice points at start of block and has 0 length), assumeSafeAppend is called automatically on the first append. Jonathan is right that this is not safe, but it could be an opt-in configuration option. I don't think configuring specific arrays makes a lot of sense, as this would require yet another optional bit that would have to be checked and allocated for all arrays. -SteveI don't even know if we had a single case where we had arrays of objects with destructors. The vast majority of our buffer were `char[]` and `ubyte[]`. We had some elaborate types, but I think destructors + buffer would have been frowned upon in code review. Also the reason we didn't modify druntime to just have the D1 behavior (that would have been a trivial change) was because how dependent on the new behavior druntime had become. It was also the motivation for the suggestion Joe mentioned. AFAIR I mentioned it in an internal issue, did a PoC implementation, but never got it to a state were it was mergeable. Also, while a custom type might sound better, it doesn't really interact well with the rest of the runtime, and it's an extra word to pass around (if passed by value).
Apr 26 2020
On 4/27/20 1:04 AM, Mathias LANG wrote:On Sunday, 26 April 2020 at 16:20:19 UTC, Steven Schveighoffer wrote:That is a minor cost compared to the actual appending.In terms of performance, depending on the task at hand, D1 code is slower than D2 appending, by the fact that there's a thread-local cache for appending for D2, and D1 only has a global one-array cache for the same. However, I'm assuming that since you were focused on D1, your usage naturally was written to take advantage of what D1 has to offer. The assumeSafeAppend call also uses this cache, and so it should be quite fast. But setting length to 0 is a ton faster, because you aren't calling an opaque function. So depending on the usage pattern, D2 with assumeSafeAppend can be faster, or it could be slower.Well, Sociomantic didn't use any kind of multi-threading in "user code". We had single-threaded fibers for concurrency, and process-level scaling for parallelism. Some corner cases were using threads, but it was for low level things (e.g. low latency file IO on Linux), which were highly scrutinized and stayed clear of the GC AFAIK. Note that accessing TLS *does* have a cost which is higher than accessing a global.By this reasoning, I would assume that D2 appending would definitely be slower, although I never profiled it.I tested the performance when I added the feature. D2 was significantly and measurably faster (at least for the appending 2 or more arrays). I searched through my old email, for appending 5M bytes to 2 arrays the original code was 13.99 seconds (on whatever system I was using in 2009), and 1.53 seconds with the cache. According to that email, I had similar results even with a 1-element cache, so somehow my code was faster, but I didn't know why. Quite possibly it's because the cache in D1 for looking up block info is behind the GC lock. Literally the only thing that is more expensive in D2 vs. D1 was the truncation of arrays. In D1 this is setting the length to 0, in D2, you needed to call assumeSafeAppend. This is why I suggested a flag that allows you to enable the original behavior.What I did profile tho, is `assumeSafeAppend`. The fact that it looks up GC metadata (taking the GC lock in the process) made it quite expensive given how often it was called (in D1 it was simply a no-op, and called defensively).The cache I referred to is to look up the GC metadata. In essence, when you append, you will look it up anyway. Either assumeSafeAppend or append will get the GC metadata into the cache, and then it is a straight lookup in the cache and this doesn't take a lock or do any expensive searches. The cache is designed to favor the most recent arrays first. This is an 8 element cache, so there are still cases where you will be having issues (like if you round-robin append to 9 arrays). I believe 8 elements was a sweet spot for performance that allowed reasonably fast appending with a reasonable number of concurrent arrays. Where D1 will fall down is if you are switching between more than one array, because the cache in D1 is only one element. Even if you are doing just one array, the cache is not for the array runtime, but for the GC. And it is based on the pointer queried, not the block data. A GC collection, for instance, is going to invalidate the cache.Yeah, that is unnecessary. It is not going to be that expensive, especially if you just were appending to that array, but again, more expensive than setting a word to 0.Well... Here's nothing I never really quite understood actually: Mihails *did* introduce a buffer type. See https://github.com/sociomantic-tsunami/ocean/blob/36c9fda09544ee5a0695a74186b06b32feda82d4/src/ocean/cor /Buffer.d#L116-L130 And we also had a (very old) similar utility here: https://github.com/sociomantic-tsunami/ocean/blob/36c9fda09544ee5a0695a74186b06b32feda82d4/src/ocean/util/cont iner/ConcatBuffer.d I always wanted to unify this, but never got to it. But if you look at the first link, it calls `assumeSafeAppend` twice, before and after setting the length. In practice it is only necessary *after* reducing the length, but as I mentioned, this is defensive programming.IIRC Mathias has suggested that it should be possible to tag arrays as intended for this kind of re-use, so that stomping prevention will never trigger, and you don't have to `assumeSafeAppend` each time you reduce the length.I spoke for a while with Dicebot at Dconf 2016 or 17 about this issue. IIRC, I suggested either using a custom type or custom runtime. He was not interested in either of these ideas, and it makes sense (large existing code base, didn't want to stray from mainline D). By far, the best mechanism to use is a custom type. Not only will that fix this problem as you can implement whatever behavior you want, but you also do not need to call opaque functions for appending either. It should outperform everything you could do in a generic runtime.For reference, most of our applications had a principled buffer use. The buffers would rarely be appended to from more than one, perhaps two places. However, slices to the buffer would be passed around quite liberally. So a buffer type from which one could borrow would indeed have been optimal.This all works actually better with the new runtime. The old one would reallocate if you appended to a slice that didn't start at the block start. The new version can detect it's at the end and allow appending.Of course! D1 didn't have destructors for structs ;)Note that this was before (I think) destructor calls were added. The destructor calls are something that assumeSafeAppend is going to do, and won't be done with just setting length to 0. However, there are other options. We could introduce a druntime configuration option so when this specific situation happens (slice points at start of block and has 0 length), assumeSafeAppend is called automatically on the first append. Jonathan is right that this is not safe, but it could be an opt-in configuration option. I don't think configuring specific arrays makes a lot of sense, as this would require yet another optional bit that would have to be checked and allocated for all arrays.I don't even know if we had a single case where we had arrays of objects with destructors. The vast majority of our buffer were `char[]` and `ubyte[]`. We had some elaborate types, but I think destructors + buffer would have been frowned upon in code review.Also the reason we didn't modify druntime to just have the D1 behavior (that would have been a trivial change) was because how dependent on the new behavior druntime had become. It was also the motivation for the suggestion Joe mentioned. AFAIR I mentioned it in an internal issue, did a PoC implementation, but never got it to a state were it was mergeable.Having a flag per array is going to be costly, but actually, there's a lot more junk in the block itself. Perhaps there's a spare bit somewhere that can be a flag for the append behavior.Also, while a custom type might sound better, it doesn't really interact well with the rest of the runtime, and it's an extra word to pass around (if passed by value).The "extra value" can be stored elsewhere -- just like the GC you could provide metadata for the capacity in a global AA or something. In any case, there were options. The way druntime is written, it's pretty good performance, in most cases BETTER performance than D1 for idiomatic D code. In fact the Tango folks asked me if I could add the feature to Tango's druntime, but I couldn't because it depends on TLS. For code that was highly focused on optimizing D1 with its idiosyncracies, it probably has worse performance. The frustration is understandable, but without the possibility of adaptation, there's not much one can do. -Steve
Apr 27 2020
This is what the D n.g. is about - informative, collegial, and useful! Thanks, fellows!
Apr 29 2020
On 25.04.20 12:15, Walter Bright wrote:On 4/24/2020 12:27 PM, Arine wrote:What's an example of such an optimization and why won't it introduce UB to safe code?There most definitely is a difference and the assembly generated with rust is better.D's live functions can indeed do such optimizations, though I haven't got around to implementing them in DMD's optimizer. There's nothing particularly difficult about it.
Apr 25 2020
On 4/25/2020 4:00 PM, Timon Gehr wrote:What's an example of such an optimization and why won't it introduce UB to safe code?live void test() { int a,b; foo(a, b); } live int foo(ref int a, ref int b) { a = 0; b = 1; return a; } ref a and ref b cannot refer to the same memory object.
Apr 25 2020
On 26.04.20 04:22, Walter Bright wrote:On 4/25/2020 4:00 PM, Timon Gehr wrote:Actually they can, even in safe live code.What's an example of such an optimization and why won't it introduce UB to safe code?live void test() { int a,b; foo(a, b); } live int foo(ref int a, ref int b) { a = 0; b = 1; return a; } ref a and ref b cannot refer to the same memory object.
Apr 26 2020
On 4/26/2020 12:45 AM, Timon Gehr wrote:On 26.04.20 04:22, Walter Bright wrote:Bug reports are welcome. Please tag them with the 'live' keyword in bugzilla.ref a and ref b cannot refer to the same memory object.Actually they can, even in safe live code.
Apr 26 2020
On 4/26/20 10:19 AM, Walter Bright wrote:On 4/26/2020 12:45 AM, Timon Gehr wrote:I can't do that because you did not agree it was a bug. According to your DIP and past discussions, the following is *intended* behavior: int bar(ref int x,ref int y) safe live{ x=0; y=1; return x; } void main() safe{ int x; import std.stdio; writeln(bar(x,x)); // 1 } I have always criticized this design, but so far you have stuck to it. I have stated many times that the main reason why it is bad is that you don't actually enforce any new invariant, so live does not enable any new patterns at least in safe code. In particular, if you start optimizing based on non-enforced and undocumented live assumptions, safe live code will not be memory safe. You can't optimize based on live and preserve memory safety. Given that you want to preserve interoperability, this is because it is tied to functions instead of types. live in its current form is useless except perhaps as a linting tool.On 26.04.20 04:22, Walter Bright wrote:Bug reports are welcome. Please tag them with the 'live' keyword in bugzilla.ref a and ref b cannot refer to the same memory object.Actually they can, even in safe live code.
Apr 26 2020
On 4/26/2020 2:52 PM, Timon Gehr wrote:I can't do that because you did not agree it was a bug. According to your DIP and past discussions, the following is *intended* behavior: int bar(ref int x,ref int y) safe live{ x=0; y=1; return x; } void main() safe{ int x; import std.stdio; writeln(bar(x,x)); // 1 } I have always criticized this design, but so far you have stuck to it. I have stated many times that the main reason why it is bad is that you don't actually enforce any new invariant, so live does not enable any new patterns at least in safe code. In particular, if you start optimizing based on non-enforced and undocumented live assumptions, safe live code will not be memory safe. You can't optimize based on live and preserve memory safety. Given that you want to preserve interoperability, this is because it is tied to functions instead of types. live in its current form is useless except perhaps as a linting tool.live's invariants rely on arguments passed to it that conform to its requirements. It's analogous to safe code relying on its arguments conforming. To get the checking here, main would have to be declared live, too.
Apr 26 2020
On 27.04.20 07:40, Walter Bright wrote:On 4/26/2020 2:52 PM, Timon Gehr wrote:No, it is not analogous, because only system or trusted code can get that wrong, not safe code. safe code itself is (supposed to be) verified, not trusted.I can't do that because you did not agree it was a bug. According to your DIP and past discussions, the following is *intended* behavior: int bar(ref int x,ref int y) safe live{ x=0; y=1; return x; } void main() safe{ int x; import std.stdio; writeln(bar(x,x)); // 1 } I have always criticized this design, but so far you have stuck to it. I have stated many times that the main reason why it is bad is that you don't actually enforce any new invariant, so live does not enable any new patterns at least in safe code. In particular, if you start optimizing based on non-enforced and undocumented live assumptions, safe live code will not be memory safe. You can't optimize based on live and preserve memory safety. Given that you want to preserve interoperability, this is because it is tied to functions instead of types. live in its current form is useless except perhaps as a linting tool.live's invariants rely on arguments passed to it that conform to its requirements. It's analogous to safe code relying on its arguments conforming. ...To get the checking here, main would have to be declared live, too.I understand the design. It just does not make sense. All of the code is annotated safe, but if you optimize based on unverified assumptions, it will not be memory safe. Is the goal of live really to undermine safe's guarantees?
Apr 27 2020
On 4/27/2020 7:26 AM, Timon Gehr wrote:I understand the design. It just does not make sense. All of the code is annotated safe, but if you optimize based on unverified assumptions, it will not be memory safe.It is a good point. The design of live up to this point did not change the way code was generated. I still want to see how much of a difference it makes, and will implement it but make it an option.
Apr 27 2020
On Monday, 27 April 2020 at 14:26:50 UTC, Timon Gehr wrote:No, it is not analogous, because only system or trusted code can get that wrong, not safe code. safe code itself is (supposed to be) verified, not trusted.the existence of any overly trusting code renders safe code liable to cause memory safety bugs. While the invalid accesses won't occur inside safe code, they can definitely be caused by them, even without the buggy safe code calling any trusted. Some day I'll have time to write up all my (many, many pages of) notes for this stuff... would have been for dconf, I guess now for dconf online?
Apr 28 2020
On Tuesday, 28 April 2020 at 13:44:05 UTC, John Colvin wrote:On Monday, 27 April 2020 at 14:26:50 UTC, Timon Gehr wrote:Would be eager to listen.No, it is not analogous, because only system or trusted code can get that wrong, not safe code. safe code itself is (supposed to be) verified, not trusted.the existence of any overly trusting code renders safe code liable to cause memory safety bugs. While the invalid accesses won't occur inside safe code, they can definitely be caused by them, even without the buggy safe code calling any trusted. Some day I'll have time to write up all my (many, many pages of) notes for this stuff... would have been for dconf, I guess now for dconf online?
Apr 28 2020
On 28.04.20 15:44, John Colvin wrote:the existence of any overly trusting code renders safe code liable to cause memory safety bugs. While the invalid accesses won't occur inside safe code, they can definitely be caused by them, even without the buggy safe code calling any trusted.I don't see how you arrive at "buggy safe code" here. You say it yourself: When there is "overly trusted code", then that's where the bug is.
Apr 28 2020
On 29.04.20 00:36, ag0aep6g wrote:On 28.04.20 15:44, John Colvin wrote:I guess he is talking about the case where trusted code calls buggy safe code and relies on it being correct to ensure memory safety. (However, this is still the fault of the trusted code. safe code cannot be blamed for violations of memory safety.)the existence of any overly trusting code renders safe code liable to cause memory safety bugs. While the invalid accesses won't occur inside safe code, they can definitely be caused by them, even without the buggy safe code calling any trusted.I don't see how you arrive at "buggy safe code" here. You say it yourself: When there is "overly trusted code", then that's where the bug is.
Apr 28 2020
24.04.2020 22:27, Arine пишет:On Thursday, 23 April 2020 at 15:57:01 UTC, drug wrote:Yes, your statement that Rust assembly output is better is wrong, because one single optimization applicable in some cases does not make Rust better in general. Period. Once again, Rust assembly output can be better in some cases. But there is the big difference between these two statements - "better in some cases", and "better in general". More over you are wrong twice. Because this optimization is not free at all. You pay for it in the form of restriction that you can not have more than one mutable reference. This means that cyclic data structures are unusually difficult compared to almost any other programming language. Also this optimization is available in C for a long time. Even more - in some cases GC based application can be faster that one with manual memory management because it allows to avoid numerous allocation/deallocation. What you are talking about is premature optimization in fact.And your statement that Rust assembly output is better is wrong.There most definitely is a difference and the assembly generated with rust is better. This is just a simple example to illustrate the difference. If you don't know why the difference is significant or why it is happening. There are a lot of great articles out there, sadly there are people such as yourself spreading misinformation that don't know what a borrow checker is and don't know Rust or why it is has gone as far as it has. This is why the borrow checker for D is going to fail. Because the person designing it, such as yourself, doesn't have any idea what they are redoing and have never even bothered to touch Rust or learn about it. Anyways I'm not your babysitter, if you don't understand the above, as most people seem to not bother to learn assembly anymore, you're on your own.Self-importance written all over your post. Here you make your third mistake - you are very far away of being able to be my babysitter. Trying to show your competence you show only your blindly ignorance. The world is much less trivial than a function with two mutable references not performing any useful work.
Apr 25 2020
On Friday, 24 April 2020 at 19:27:40 UTC, Arine wrote:On Thursday, 23 April 2020 at 15:57:01 UTC, drug wrote:A competent C Programmer could just write something like this. Or use restrict... int test(int* x, int* y) { int result = *x = 0; *y = 1; return result; } Produce this with gcc -O test(int*, int*): mov DWORD PTR [rdi], 0 mov DWORD PTR [rsi], 1 mov eax, 0 ret https://godbolt.org/z/rpM_eK So the statement rust produces better assembly is wrong. I it's on my todo-list to learn rust. What is really off-putting are those random fanatic rust fanboys. In your language: "If you don't know why the difference is significant or why it is happening", you should probably learn C before you start insulting people in a programming forum ;)And your statement that Rust assembly output is better is wrong.There most definitely is a difference and the assembly generated with rust is better. This is just a simple example to illustrate the difference. If you don't know why the difference is significant or why it is happening. There are a lot of great articles out there, sadly there are people such as yourself spreading misinformation that don't know what a borrow checker is and don't know Rust or why it is has gone as far as it has. This is why the borrow checker for D is going to fail. Because the person designing it, such as yourself, doesn't have any idea what they are redoing and have never even bothered to touch Rust or learn about it. Anyways I'm not your babysitter, if you don't understand the above, as most people seem to not bother to learn assembly anymore, you're on your own.
Apr 29 2020
On Wednesday, 29 April 2020 at 10:32:33 UTC, random wrote:A competent C Programmer could just write something like this. Or use restrict... int test(int* x, int* y) { int result = *x = 0; *y = 1; return result; }I'm incompetent so I would just write: int test(int* x, int* y) { *x = 0; *y = 1; return 0; }
Apr 29 2020
On Wednesday, 29 April 2020 at 10:36:59 UTC, IGotD- wrote:I'm incompetent so I would just write: int test(int* x, int* y) { *x = 0; *y = 1; return 0; }Ok in this simple case it's obvius. For a real world example look at the source for strcmp(): https://code.woboq.org/userspace/glibc/string/strcmp.c.html The trick is to load the value in a variable. The compiler can't optimize multiple pointer reads from the same pointer because the content could already be changed by an other pointer. restrict solves that, but if you now what is happening and why you can solve that by hand.
Apr 29 2020
On Wednesday, 29 April 2020 at 10:46:57 UTC, random wrote:Ok in this simple case it's obvius. For a real world example look at the source for strcmp(): https://code.woboq.org/userspace/glibc/string/strcmp.c.html The trick is to load the value in a variable. The compiler can't optimize multiple pointer reads from the same pointer because the content could already be changed by an other pointer. restrict solves that, but if you now what is happening and why you can solve that by hand.In the strcmp example, shouldn't the compiler be able to do the same optimizations as you would use restrict because both pointers are declared const and the content do not change?
Apr 29 2020
On Wednesday, 29 April 2020 at 12:37:29 UTC, IGotD- wrote:In the strcmp example, shouldn't the compiler be able to do the same optimizations as you would use restrict because both pointers are declared const and the content do not change?Good question. My strcmp example is actually really bad, because if you never write through any pointer it doesn't make a difference ;) The way it is written is still interesting. I made a quick test case to evaluate the influence of const: https://godbolt.org/z/qRwFa9 https://godbolt.org/z/iEj7LV https://godbolt.org/z/EMqDDy int test(int * x, int * y, <const?> int * <restrict?> z) { *y = *z; *x = *z; return *z; } As you can see from the compiler output const doesn't improve the optimization. I think the compiler can't optimize it because const doesn't give you real guarantees in C. You could just call the function like this. int a; test(&a, &a, &a); "One mans constant is an other mans variable."
Apr 29 2020
On Wednesday, 29 April 2020 at 16:19:55 UTC, random wrote: And of course the workaround if you don't want to use restrict: int test(int * x, int * y, int * z) { int tmp = *z; *y = tmp; *x = tmp; return tmp; } Produces the same as the restrict version. https://godbolt.org/z/yJJcMK
Apr 29 2020
On 4/29/2020 9:19 AM, random wrote:I think the compiler can't optimize it because const doesn't give you real guarantees in C.You'd be right.
Apr 29 2020
On Wednesday, 29 April 2020 at 10:32:33 UTC, random wrote: < I forgot to add this... Compile with gcc -O3: test(int*, int*): mov DWORD PTR [rdi], 0 xor eax, eax mov DWORD PTR [rsi], 1 ret https://godbolt.org/z/xW6w6W
Apr 29 2020
On Wednesday, 22 April 2020 at 22:34:32 UTC, Arine wrote:Not quite. Rust will generate better assembly as it can guarantee that use of an object is unique. Similar to C's "restrict" keyword but you get it for "free" across the entire application.Cool. Did not knew that. I know that different languages have different semantics and code that looks the same might produce different results so thats why I used a word equivalent instead of same. You can achieve the same goal in D as in Rust but the code would be different.
Apr 26 2020
On Wednesday, 22 April 2020 at 14:00:10 UTC, serge wrote: [...]2) Various Web performance test on https://www.techempower.com/benchmarks/#section=data-r18&hw=ph&test=query show pretty poor performance of D using vibe.d and hunt framework/libraries. Regardless of type of test - json, single query, multiple query and so on.[...]I am researching if D and D web framework whether it can be used as replacement python/django/flask within our company. Although if D web framework show worse performance than Go then probably it is not right tool for the job. Any comments and feedback would be appreciated.Vibe.d's performance in benchmarks has been discussed before[1]. From what I remember, the limiting factor is developer time allocation and profiling on specific hardware, which means it can probably be solved with money ;-) -- Bastiaan. [1] https://forum.dlang.org/post/qg9dud$hbo$1 digitalmars.com
Apr 22 2020
On Wednesday, 22 April 2020 at 14:00:10 UTC, serge wrote:My understanding that D is the language in similar ballpark performance league as C, C++, Rust.Yes. If you have time to optimize: there is preciously little difference speed-wise between native languages. Every native language using the same backend end up in the same ballbark, with tricks to get the code to the same baseline. The last percents wille be due to different handling of UB, integer overflow, aliasing... but in general the ethos of native language is to allow you to reach top native speed and in the end they will generate the exact same code. But, if your application is barely optimized, or more likely you don't have time to optimize properly, it becomes a bit more interesting. Defaults will matter a lot more and things like GC, whether the langage encourages copies, and the "idiomatic" style that is accepted will start to bear consequences (and even more so: libraries). This is what end up in benchmarks, but if the application was worth optimizing for it (in terms of added value) it would be optimized hard to get to that native ceiling. In short, the less useful an application is, the more it will display large differences between languages with similar low-level capabilities. It would be much more interesting to compare _backends_, but people keep comparing front-ends because it drives traffic and commentary.
Apr 22 2020
On Wednesday, 22 April 2020 at 16:23:58 UTC, Guillaume Piolat wrote:On Wednesday, 22 April 2020 at 14:00:10 UTC, serge wrote:Could you please elaborate on that? what are you referring to as backend? I am not interested to compare one small single operation - fib test already did that. To me techempower stats is pretty good indicator - it shows json processing, single/multiquery requests, database, static. Overall performance across those stats give pretty good idea, how language and web framework is created, its ecosystem. For example if language is fast on basic operations but two frameworks show less then adequate performance then obviously something wrong with the whole ecosystem - it could be difficult to create fast and efficient apps for average developer. For example Scala - powerfull but yet very complicated language with tons of problems. Most of Scala projects failed. It is very difficult and slow to create efficient applications for average developer. It kinds requires rocket scientist to write good code in Scala. Does D exhibit same problem?My understanding that D is the language in similar ballpark performance league as C, C++, Rust.Yes. If you have time to optimize: there is preciously little difference speed-wise between native languages. Every native language using the same backend end up in the same ballbark, with tricks to get the code to the same baseline. The last percents wille be due to different handling of UB, integer overflow, aliasing... but in general the ethos of native language is to allow you to reach top native speed and in the end they will generate the exact same code. But, if your application is barely optimized, or more likely you don't have time to optimize properly, it becomes a bit more interesting. Defaults will matter a lot more and things like GC, whether the langage encourages copies, and the "idiomatic" style that is accepted will start to bear consequences (and even more so: libraries). This is what end up in benchmarks, but if the application was worth optimizing for it (in terms of added value) it would be optimized hard to get to that native ceiling. In short, the less useful an application is, the more it will display large differences between languages with similar low-level capabilities. It would be much more interesting to compare _backends_, but people keep comparing front-ends because it drives traffic and commentary.
Apr 24 2020
On Friday, 24 April 2020 at 13:44:18 UTC, serge wrote:Could you please elaborate on that? what are you referring to as backend?I was mentionning LLVM vs GCC vs Intel compiler backend, the part that converts code to instructions after the original language is out of sight.To me techempower stats is pretty good indicator - it shows json processing, single/multiquery requests, database, static. Overall performance across those stats give pretty good idea, how language and web framework is created, its ecosystem. For example if language is fast on basic operations but two frameworks show less then adequate performance then obviously something wrong with the whole ecosystem - it could be difficult to create fast and efficient apps for average developer. For example Scala - powerfull but yet very complicated language with tons of problems. Most of Scala projects failed. It is very difficult and slow to create efficient applications for average developer. It kinds requires rocket scientist to write good code in Scala. Does D exhibit same problem?Very fair reasoning. I don't think D has as much problems as Scala, D has a very gentle learning curve and it's not difficult to be productive in. But I'd say most of D's problems are indeed ecosystem-related, possibly because of the kind of personnalities that D attracts : the reluctance from D programmers to gather around the same piece of code makes the ecosystem more insular than needed, as is typical with native programming. D code today has a tendency to balkanize based on various requirements such as exceptions or not, runtime or not, safe or not, -betterC or not... It seems to me languages where DIY is frowned upon (Java) or discouraged by the practice of FFI have better library ecosystems, for better or worse.
Apr 26 2020
On Sunday, 26 April 2020 at 12:37:48 UTC, Guillaume Piolat wrote:But I'd say most of D's problems are indeed ecosystem-related, possibly because of the kind of personnalities that D attracts : the reluctance from D programmers to gather around the same piece of code makes the ecosystem more insular than needed, as is typical with native programming. D code today has a tendency to balkanize based on various requirements such as exceptions or not, runtime or not, safe or not, -betterC or not... It seems to me languages where DIY is frowned upon (Java) or discouraged by the practice of FFI have better library ecosystems, for better or worse.These are connected. Languages like Java don't give you options. You will use the GC, you will use OOP. Imagine an XML library. Any Java XML DOM library will offer a XMLDocument object with a load method (or constructor). This is expected and more or less the same in every library. D doesn't force the paradigm on you. Some people will want to use the GC, some won't, some will want to use OOP, some will avoid it like fire. It's a tradeoff, for higher flexibility and power you trade some composability.
Apr 26 2020
On Fri, Apr 24, 2020 at 3:46 PM serge via Digitalmars-d <digitalmars-d puremagic.com> wrote:To me techempower stats is pretty good indicator - it shows json processing, single/multiquery requests, database, static. Overall performance across those stats give pretty good idea, how language and web framework is created, its ecosystem.Unfortunately there is a big issue with techempower. Because it is so popular almost every framework [language] try to have a best score in it. And in many cases this mean they use some hacks or tricks to achieve that. So in general techempower results are useless. From my own experience D performance is really good in a real word scenarios. Other issue with techempower benchmark is there is almost zero complexity. All tests do some basic operations on realy small datasets.
Apr 26 2020
On Sunday, 26 April 2020 at 16:59:44 UTC, Daniel Kozak wrote:Unfortunately there is a big issue with techempower. Because it is so popular almost every framework [language] try to have a best score in it. And in many cases this mean they use some hacks or tricks to achieve that. So in general techempower results are useless. From my own experience D performance is really good in a real word scenarios. Other issue with techempower benchmark is there is almost zero complexity. All tests do some basic operations on realy small datasets.It's nice to have a moral victory and claim to be above "those cheaters", but links to these benchmarks are shared in many places. If someone wants to see how fast D is, they will write "programming language benchmark" in their websearch of choice, and TechEmpower will be high in the results list. He will click, and go "oh wow, even PHP is faster than that D stuff". Whether it's cheating or not, perception matters and people will use such benchmarks to base their decision, even if it's unreasonable and doesn't apply to real world scenarios.
Apr 26 2020
On Sun, Apr 26, 2020 at 9:35 PM JN via Digitalmars-d <digitalmars-d puremagic.com> wrote:It's nice to have a moral victory and claim to be above "those cheaters", but links to these benchmarks are shared in many places. If someone wants to see how fast D is, they will write "programming language benchmark" in their websearch of choice, and TechEmpower will be high in the results list. He will click, and go "oh wow, even PHP is faster than that D stuff". Whether it's cheating or not, perception matters and people will use such benchmarks to base their decision, even if it's unreasonable and doesn't apply to real world scenarios.Yes I agree, this is the reason why I am improving those benchmarks from time to time, to make D faster than PHP :D
Apr 26 2020
On Sunday, 26 April 2020 at 16:59:44 UTC, Daniel Kozak wrote:On Fri, Apr 24, 2020 at 3:46 PM serge via Digitalmars-d <digitalmars-d puremagic.com> wrote:As somebody who implemented the Swoole+PHP and Crystal code at Techempowered, i can state that this statement is factually wrong. The code is very idiomatic code that anybody writes. Basic database calls, pool for connections, prepare statements, standard http module or frameworks. There is no magic in the code that try's to do direct system calls or has stripped down drivers or any other stuff that people normally will not use. Where you can see some funny business, is in the top 10 a 20, where Rust and co have some extreme optimized code, that is not how most people will write the code. But those are the extreme cases, what anybody with half a brain ignores because that is not how your write normal code. I always say: Look at the code to see if the results are normal or over optimized/unrealistic crap. If we compare normal implementations ( https://www.techempower.com/benchmarks/#section=test&runid=c7152e8f-5b33-4ae7-9e89-630af44bc8de& w=ph&test=plaintext ) like Futures: vibed-ldc-pgsql: 58k Crystal: 206k PHP+Swoole: 289k D's results are simply abysmal. We are talking basic idiomatic code here. This tells me more that D has a issue with its DB handling on those tests. We need to look at stuff like "hello world" ( plain text ), json, where the performance difference drops down to 2x. The plaintext is literally take a string and output it. We are talking route + echo in PHP and any other language. Or basic encoding a json and outputting it. A few lines of code, that is it. Yet D suffers still in those tests with a 2x issue. Does that not tell you that D or VibeD suffer from a actual performance issue? A point that clearly needs to be looked after. If the argument is dat D is not properly optimized, then what are PHP+Workerman/Swoole/..., Crystal?To me techempower stats is pretty good indicator - it shows json processing, single/multiquery requests, database, static. Overall performance across those stats give pretty good idea, how language and web framework is created, its ecosystem.Unfortunately there is a big issue with techempower. Because it is so popular almost every framework [language] try to have a best score in it. And in many cases this mean they use some hacks or tricks to achieve that. So in general techempower results are useless.Other issue with techempower benchmark is there is almost zero complexity. All tests do some basic operations on realy small datasets.Futures shows a more realistic real world web scenario. The rest mostly show weaknesses in each language+framework specific section. If your json score is low, there is a problem with your json library or the way your framework handles the requests. If your plaintext results are low ... You get the drill. If you simply try to scuff at a issue by stating "in real world we are faster" but you have benchmarks like this online... The nice things about techempowered is that it really shows if your language is fast for basic web tasks or not. It does not give a darn that your language can run "real world" fast, if there are underlying issues. For people who are interested in D for website hosting, its simply slower then the competitors. Do not like it? Then see where the issues are and fix them. Be it in the techempowered code, in D or in VibeD. But clearly there is a issue if D can not compete with implementations of other languages ( again, talking normal implementations, stuff that anybody will use ). If given the choice, what will people pick? D that simply ignores the web market or other languages/frameworks where the speed out of the door is great. Its funny seeing comments like this where a simple question by the OP, turns into a whole and totally useless technical discussion. Followed by some comment that comes down to"ignore it because everybody cheats" And people here wonder why D has issues with popularity. Really! Get out much? From my point of view, the comment is insulting and tantamount as to calling people like me, who implemented a few of the other languages as "cheaters", when its literally basic code that is used everywhere ( trust me, i am not some magic programmer, who knows C++ out of the back of his hand. I barely scrap by on my own with PHP and Ruby ). If the issue is at D its end. Be its D, Vibe.D or the Code used, then fix it but do not insult everybody else ( especially the people who wrote normal code ). As the saying goes: "always clean your own house first, before criticizing your neighbor's house".
Apr 26 2020
On Wednesday, 22 April 2020 at 14:00:10 UTC, serge wrote:My understanding that D is the language in similar ballpark performance league as C, C++, Rust. Hence the question - why web performance of those 2 web frameworks is so poor compared to rivals?Consider this benchmark from the thread next door, "Memory issues. GC not giving back memory to OS?": import std.stdio; void main(string[] args) { int[] v = [1, 2]; int n = 1 << 30; for (int i = 0; i < n - 2; ++i) { v ~= i; } writefln("v len: %s cap: %s\n", v.length, v.capacity); } With an average of four runs, compiled with gdc -O3, this takes 40s and has a max RSS of 7.9 GB. Here's the same benchmark changed to use std.container.array: void main() nogc { import core.stdc.stdio : printf; import std.container.array; Array!int v = Array!int(1, 2); foreach (i; 0 .. (1 << 30) - 2) v ~= i; printf("v len: %d cap: %d\n", v.length, v.capacity); } Same treatment: 3.3s and a max RSS of 4.01 GB. More than ten times faster. If you set out to make a similar benchmark in C++ or Rust you'll naturally see performance more like the second example than the first. So there's some extra tension here: D has high-convenience facilities like this that let it compete with scripting languages for ease of development, but after you've exercised some ease of development you might want to transition away from these facilities. D has other tensions, like "would you like the GC, or no?" or "would you like the the whole language with TypeInfo and AAs, or no?", or "would you like speed-of-light compile times, or would you like to do a lot of CTFE and static reflection?" And this is more how I'd characterize the language. Not as "it has this such-and-such performance ballpark and I should be very surprised if a particular web framework doesn't match that", but "it's a bunch of sublanguages in one and therefore you have to look closer at a given web framework to even say which sublanguage it's written in". I think the disadvantages of D being like this are obvious. An advantage of it being like this, is that if you one day decide that you'd prefer a D application have C++-style performance, you don't have to laboriously rewrite the application into a completely different language. The D-to-D FFI, as it were, is really good, so you can make transitions like that as needed, even to just the parts of the application that need them.
Apr 22 2020
On Thursday, 23 April 2020 at 00:00:29 UTC, mipri wrote:On Wednesday, 22 April 2020 at 14:00:10 UTC, serge wrote:I did check the library.. My understanding that proposal is to use the library with manual memory management without GC. My concern was that D web frameworks performed worse then frameworks in Go and Java which are GC only languages. Does it mean that GC in D is far from great that we need to avoid it in order to beat Java/Go? Probably would worth to stress - I didn't mean in fact beat, I would love to see stats on par with Go and Java but unfortunately D was few times slower - close to Python and Ruby... Despite the library can speed things up I believe the language/runtime should be able to work efficiently for such type of operations. We should not need to develop such libraries in order to have good performance. To me it is bug or poor implementation of GC or deficiency in design of runtime. The need for that type of libraries to solve deficiencies in runtime would not allow to focus on writing good code but instead to look for gotchas to get adequate solution.My understanding that D is the language in similar ballpark performance league as C, C++, Rust. Hence the question - why web performance of those 2 web frameworks is so poor compared to rivals?Consider this benchmark from the thread next door, "Memory issues. GC not giving back memory to OS?": import std.stdio; void main(string[] args) { int[] v = [1, 2]; int n = 1 << 30; for (int i = 0; i < n - 2; ++i) { v ~= i; } writefln("v len: %s cap: %s\n", v.length, v.capacity); } With an average of four runs, compiled with gdc -O3, this takes 40s and has a max RSS of 7.9 GB. Here's the same benchmark changed to use std.container.array: void main() nogc { import core.stdc.stdio : printf; import std.container.array; Array!int v = Array!int(1, 2); foreach (i; 0 .. (1 << 30) - 2) v ~= i; printf("v len: %d cap: %d\n", v.length, v.capacity); } Same treatment: 3.3s and a max RSS of 4.01 GB. More than ten times faster. If you set out to make a similar benchmark in C++ or Rust you'll naturally see performance more like the second example than the first. So there's some extra tension here: D has high-convenience facilities like this that let it compete with scripting languages for ease of development, but after you've exercised some ease of development you might want to transition away from these facilities. D has other tensions, like "would you like the GC, or no?" or "would you like the the whole language with TypeInfo and AAs, or no?", or "would you like speed-of-light compile times, or would you like to do a lot of CTFE and static reflection?" And this is more how I'd characterize the language. Not as "it has this such-and-such performance ballpark and I should be very surprised if a particular web framework doesn't match that", but "it's a bunch of sublanguages in one and therefore you have to look closer at a given web framework to even say which sublanguage it's written in". I think the disadvantages of D being like this are obvious. An advantage of it being like this, is that if you one day decide that you'd prefer a D application have C++-style performance, you don't have to laboriously rewrite the application into a completely different language. The D-to-D FFI, as it were, is really good, so you can make transitions like that as needed, even to just the parts of the application that need them.
Apr 24 2020
On Friday, 24 April 2020 at 13:58:53 UTC, serge wrote:On Thursday, 23 April 2020 at 00:00:29 UTC, mipri wrote:Yes, that does mean that D's GC still needs some improvements, although much have been done during the last year. Also note that while Java and Go are heavy GC languages, there are ways to do value based coding, and although Project Valhala is taking its time due to the engineering issues it addresses, it will eventually be done. Just like in all GC enabled languages that offer multiple allocation mechanisms alongside the GC you should approach it in stages. Use the GC for you initial solution and only in cases where it is obvious from the start that it might be a problem, or when proven that some issues might have to be adressed then look for value allocation, nogc, referenced counted collections and other low level style tricks. That is the nice thing about systems languages like D, you don't need to code like C from the start, and when you need to actually do it, the tools are available.On Wednesday, 22 April 2020 at 14:00:10 UTC, serge wrote:I did check the library.. My understanding that proposal is to use the library with manual memory management without GC. My concern was that D web frameworks performed worse then frameworks in Go and Java which are GC only languages. Does it mean that GC in D is far from great that we need to avoid it in order to beat Java/Go? Probably would worth to stress - I didn't mean in fact beat, I would love to see stats on par with Go and Java but unfortunately D was few times slower - close to Python and Ruby... Despite the library can speed things up I believe the language/runtime should be able to work efficiently for such type of operations. We should not need to develop such libraries in order to have good performance. To me it is bug or poor implementation of GC or deficiency in design of runtime. The need for that type of libraries to solve deficiencies in runtime would not allow to focus on writing good code but instead to look for gotchas to get adequate solution.My understanding that D is the language in similar ballpark performance league as C, C++, Rust. Hence the question - why web performance of those 2 web frameworks is so poor compared to rivals?Consider this benchmark from the thread next door, "Memory issues. GC not giving back memory to OS?": import std.stdio; void main(string[] args) { int[] v = [1, 2]; int n = 1 << 30; for (int i = 0; i < n - 2; ++i) { v ~= i; } writefln("v len: %s cap: %s\n", v.length, v.capacity); } With an average of four runs, compiled with gdc -O3, this takes 40s and has a max RSS of 7.9 GB. Here's the same benchmark changed to use std.container.array: void main() nogc { import core.stdc.stdio : printf; import std.container.array; Array!int v = Array!int(1, 2); foreach (i; 0 .. (1 << 30) - 2) v ~= i; printf("v len: %d cap: %d\n", v.length, v.capacity); } Same treatment: 3.3s and a max RSS of 4.01 GB. More than ten times faster. If you set out to make a similar benchmark in C++ or Rust you'll naturally see performance more like the second example than the first. So there's some extra tension here: D has high-convenience facilities like this that let it compete with scripting languages for ease of development, but after you've exercised some ease of development you might want to transition away from these facilities. D has other tensions, like "would you like the GC, or no?" or "would you like the the whole language with TypeInfo and AAs, or no?", or "would you like speed-of-light compile times, or would you like to do a lot of CTFE and static reflection?" And this is more how I'd characterize the language. Not as "it has this such-and-such performance ballpark and I should be very surprised if a particular web framework doesn't match that", but "it's a bunch of sublanguages in one and therefore you have to look closer at a given web framework to even say which sublanguage it's written in". I think the disadvantages of D being like this are obvious. An advantage of it being like this, is that if you one day decide that you'd prefer a D application have C++-style performance, you don't have to laboriously rewrite the application into a completely different language. The D-to-D FFI, as it were, is really good, so you can make transitions like that as needed, even to just the parts of the application that need them.
Apr 25 2020
On Wednesday, 22 April 2020 at 14:00:10 UTC, serge wrote:D has sparked my interested therefore last week I have started to look into the language and have completed D course on pluralsight. One of area where I would like to apply D in web application/cloud. Golang is not bad but I think D seems more powerful. Although during my research I have found interesting facts: 1) fib test (https://github.com/drujensen/fib) with D (compiled with ldc) showed really good performance results. 2) Various Web performance test on https://www.techempower.com/benchmarks/#section=data-r18&hw=ph&test=query show pretty poor performance of D using vibe.d and hunt framework/libraries. Regardless of type of test - json, single query, multiple query and so on. My understanding that D is the language in similar ballpark performance league as C, C++, Rust. Hence the question - why web performance of those 2 web frameworks is so poor compared to rivals? I would say few times worse. This is not a troll post by any means. I am researching if D and D web framework whether it can be used as replacement python/django/flask within our company. Although if D web framework show worse performance than Go then probably it is not right tool for the job. Any comments and feedback would be appreciated.I gave a talk at DConf 2018 you may be interested in. The talk goes over a set of benchmark studies I did. The video was lost, but the slides are here: https://github.com/eBay/tsv-utils/blob/master/docs/dconf2018.pdf. The slides are probably the easiest way to get an overview. The full details on the benchmark studies on the tsv-utils repo: https://github.com/eBay/tsv-utils/blob/master/docs/Performance.md --Jon
Apr 24 2020
On Wednesday, 22 April 2020 at 14:00:10 UTC, serge wrote:2) Various Web performance test on https://www.techempower.com/benchmarks/#section=data-r18&hw=ph&test=query show pretty poor performance of D using vibe.d and hunt framework/libraries. Regardless of type of test - json, single query, multiple query and so on.I don't have a reference off the top of my head, but IIRC much of the upper end of those benchmarks is the result less of inherent language or framework differences and more to do with which implementations have been strongly tailored for each particular benchmark. (I think this has been discussed on the D forums before.) That tailoring to the benchmark often means dropping things (e.g. certain kinds of validation or safety measures) that any realistic app would have to do. So the top end of those benchmark tables may be rather misleading when it comes to real-world usage. There may well be framework design decisions in place that have a stronger impact. For example I recall that the default "user-friendly" vibe.d tools for handling HTTP requests creates a situation where the easy thing to do is generate garbage per request. So unless one addresses that, it will put ready constraints on exactly how performant one can get. Note, this is _not_ a case of "the GC is bad" or "you can't get good performance with a GC". It's a case of, if you use the GC naively, rather than having a good strategy for preallocation and re-use of resources, you will force the GC into having to do work that can be avoided. So leaving aside misleading factors like extreme tailoring to the benchmark, I would suggest the memory allocation strategies in use are probably the first thing to look at in asking why those D implementations might be less performant could than they could be. When it comes to the frameworks, the questions might be: (i) are there any cases where that framework _forces_ you into a suboptimal memory (re)allocation strategy? and (ii) even if there aren't, how easy/idiomatic is it to use a performance-oriented allocation strategy?
Apr 25 2020