digitalmars.D.announce - On the D Blog: Lomuto's Comeback
- Mike Parker (10/10) May 14 2020 After reading a paper that grabbed his curiosity and wouldn't let
- Andrei Alexandrescu (6/20) May 14 2020 Thanks, Mike. HN has possibly categorized it as spam already. One thing
- Mike Parker (15/19) May 14 2020 Okay everyone, please use this link or search for "Lomuto's
- jmh530 (10/11) May 14 2020 Really interesting. Thanks for sharing.
- Andrei Alexandrescu (3/16) May 17 2020 Yes. topN also uses partitioning.
- SashaGreat (6/10) May 14 2020 If possible could you please next time share link with "old"
- JN (3/6) May 17 2020 There is a Chrome extension that automatically redirects to the
- Francesco Mecca (3/13) May 15 2020 Wow, very interesting article.
- Dmitry Olshansky (6/10) May 15 2020 Fantastic work and great result. Having privately done a very
- Joseph Rushton Wakeling (7/13) May 15 2020 Nice stuff!
- kinke (5/10) May 15 2020 Wrt. the LDC results, LDC v1.17 was shipped with LLVM 8.0.1,
- Iain Buclaw (7/16) Aug 03 2020 Sorry for the belated response, as far as I can see, gdc and g++ only di...
- Iain Buclaw (3/28) Aug 03 2020 I doubt Andrei will re-run the benchmarks now, but here's the PR (proble...
- Iain Buclaw (66/68) Aug 04 2020 Did a quick benchmark for n in `seq 1 10` ./lomuto.exe ${n}000000...
- Andrei Alexandrescu (21/27) Aug 04 2020 Awesome, thanks! That does solve a puzzler I had while benchmarking.
- Andrei Alexandrescu (3/35) Aug 04 2020 Oh, and there are a few comments to the original blog post I'd be glad
- Les De Ridder (7/13) May 15 2020 Great post, and nice to have another example for how bad branches
- Steven Schveighoffer (3/17) May 15 2020 Fantastic article!
- Jon Degenhardt (3/13) May 17 2020 Got posted again to Hacker News earlier today. Currently at
- Andrei Alexandrescu (3/17) May 17 2020 Looks like the blog post is enjoying a second wind after being posted by...
- Arjan (4/14) May 31 2020 A follow up article on this:
After reading a paper that grabbed his curiosity and wouldn't let go, Andrei set out to determine if Lomuto partitioning should still be considered inferior to Hoare for quicksort on modern hardware. This blog post details his results. Blog: https://dlang.org/blog/2020/05/14/lomutos-comeback/ Reddit: https://www.reddit.com/r/programming/comments/gjm6yp/lomutos_comeback_quicksort_partitioning/ HN: https://news.ycombinator.com/item?id=23179160
May 14 2020
On 5/14/20 9:26 AM, Mike Parker wrote:After reading a paper that grabbed his curiosity and wouldn't let go, Andrei set out to determine if Lomuto partitioning should still be considered inferior to Hoare for quicksort on modern hardware. This blog post details his results. Blog: https://dlang.org/blog/2020/05/14/lomutos-comeback/ Reddit: https://www.reddit.com/r/programming/comments/gjm6yp/lomutos_comeback_qui ksort_partitioning/ HN: https://news.ycombinator.com/item?id=23179160Thanks, Mike. HN has possibly categorized it as spam already. One thing they do is they detect (by using the "Referrer" header) whether the post has been shared via a direct link. They do so to prevent manipulation. The right way to share something on hackernews is to send people to https://news.ycombinator.com/newest and mention the time of sharing.
May 14 2020
On Thursday, 14 May 2020 at 13:40:24 UTC, Andrei Alexandrescu wrote:On 5/14/20 9:26 AM, Mike Parker wrote:The right way to share something on hackernews is to send people to https://news.ycombinator.com/newest and mention the time of sharing.Okay everyone, please use this link or search for "Lomuto's Comeback" in the search field. I've been hearing conflicting accounts of this for a while, with more people telling me it doesn't happen anymore. However, it seems posts were never flagged as spam for this. Instead, any upvotes from people coming through direct links *do not count*. Coupled with the fact that the FAQ still says posts are penalized for "asking for votes", I'm no longer going to share direct links to HN articles. Found multiple sources about it, but this 2015 post lays it all out and I assume it's still mostly relevant: https://wiredcraft.com/blog/how-to-post-on-hacker-news/ https://news.ycombinator.com/newsfaq.html
May 14 2020
On Thursday, 14 May 2020 at 13:40:24 UTC, Andrei Alexandrescu wrote:[snip]Really interesting. Thanks for sharing. I have recently been spending some spare time learning more about D's topN and pivotPartition implementation, which led me to your paper on fast deterministic selection. Would you consider changing the pivotPartition implementation based on this? Would the insights gleamed from this paper mean that a branchless version of topN could be faster?
May 14 2020
On 5/14/20 11:57 AM, jmh530 wrote:On Thursday, 14 May 2020 at 13:40:24 UTC, Andrei Alexandrescu wrote:Yes, and I encourage you to look into putting together a PR.[snip]Really interesting. Thanks for sharing. I have recently been spending some spare time learning more about D's topN and pivotPartition implementation, which led me to your paper on fast deterministic selection. Would you consider changing the pivotPartition implementation based on this?Would the insights gleamed from this paper mean that a branchless version of topN could be faster?Yes. topN also uses partitioning.
May 17 2020
On Thursday, 14 May 2020 at 13:26:23 UTC, Mike Parker wrote:... Reddit: https://www.reddit.com/r/programming/comments/gjm6yp/lomutos_comeback_quicksort_partitioning/ ...If possible could you please next time share link with "old" instead of "www"? Like: https://old.reddit.com/r/programming/comments/gjm6yp/lomutos_comeback_quicksort_partitioning/ Thanks, SG.
May 14 2020
On Thursday, 14 May 2020 at 14:11:57 UTC, SashaGreat wrote:If possible could you please next time share link with "old" instead of "www"? Like: https://old.reddit.com/r/programming/comments/gjm6yp/lomutos_comeback_quicksort_partitioning/There is a Chrome extension that automatically redirects to the old version of Reddit
May 17 2020
On Thursday, 14 May 2020 at 13:26:23 UTC, Mike Parker wrote:After reading a paper that grabbed his curiosity and wouldn't let go, Andrei set out to determine if Lomuto partitioning should still be considered inferior to Hoare for quicksort on modern hardware. This blog post details his results. Blog: https://dlang.org/blog/2020/05/14/lomutos-comeback/ Reddit: https://www.reddit.com/r/programming/comments/gjm6yp/lomutos_comeback_quicksort_partitioning/ HN: https://news.ycombinator.com/item?id=23179160Wow, very interesting article. Thanks for sharing.
May 15 2020
On Thursday, 14 May 2020 at 13:26:23 UTC, Mike Parker wrote:After reading a paper that grabbed his curiosity and wouldn't let go, Andrei set out to determine if Lomuto partitioning should still be considered inferior to Hoare for quicksort on modern hardware. This blog post details his results.Fantastic work and great result. Having privately done a very heavy critique of the narrow niche the article had chosen to explore I still recognize and love the results. — Dmitry Olshansky
May 15 2020
On Thursday, 14 May 2020 at 13:26:23 UTC, Mike Parker wrote:After reading a paper that grabbed his curiosity and wouldn't let go, Andrei set out to determine if Lomuto partitioning should still be considered inferior to Hoare for quicksort on modern hardware. This blog post details his results. Blog: https://dlang.org/blog/2020/05/14/lomutos-comeback/Nice stuff! One curious question -- unless I've misread things horribly, it looks like the D benchmarks for Lomuto branch-free are consistently slower than for C++. Any idea why that is? I would expect gcc/gdc and clang/ldc to produce effectively identical results for code like this.
May 15 2020
On Friday, 15 May 2020 at 10:28:41 UTC, Joseph Rushton Wakeling wrote:One curious question -- unless I've misread things horribly, it looks like the D benchmarks for Lomuto branch-free are consistently slower than for C++. Any idea why that is? I would expect gcc/gdc and clang/ldc to produce effectively identical results for code like this.Wrt. the LDC results, LDC v1.17 was shipped with LLVM 8.0.1, while the used clang is v10 based on LLVM 10, so that might account for some slight diffs.
May 15 2020
On 15/05/2020 12:28, Joseph Rushton Wakeling via Digitalmars-d-announce wrote:On Thursday, 14 May 2020 at 13:26:23 UTC, Mike Parker wrote:Sorry for the belated response, as far as I can see, gdc and g++ only differ on one line. auto delta = smaller & (read - first); This is lowered as: delta = smaller & (read - first) / 8; However, g++ uses an exact divide operator (semantically that ignores rounding), whereas gdc uses a truncating divide operator (semantically rounds the quotient towards zero). I'm willing to bet a beer on tweaking pointer subtraction will get gdc in lockstep with g++.After reading a paper that grabbed his curiosity and wouldn't let go, Andrei set out to determine if Lomuto partitioning should still be considered inferior to Hoare for quicksort on modern hardware. This blog post details his results. Blog: https://dlang.org/blog/2020/05/14/lomutos-comeback/Nice stuff! One curious question -- unless I've misread things horribly, it looks like the D benchmarks for Lomuto branch-free are consistently slower than for C++. Any idea why that is? I would expect gcc/gdc and clang/ldc to produce effectively identical results for code like this.
Aug 03 2020
On 03/08/2020 13:08, Iain Buclaw via Digitalmars-d-announce wrote:On 15/05/2020 12:28, Joseph Rushton Wakeling via Digitalmars-d-announce wrote:I doubt Andrei will re-run the benchmarks now, but here's the PR (problem reference) with patch attached. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96429On Thursday, 14 May 2020 at 13:26:23 UTC, Mike Parker wrote:Sorry for the belated response, as far as I can see, gdc and g++ only differ on one line. auto delta = smaller & (read - first); This is lowered as: delta = smaller & (read - first) / 8; However, g++ uses an exact divide operator (semantically that ignores rounding), whereas gdc uses a truncating divide operator (semantically rounds the quotient towards zero). I'm willing to bet a beer on tweaking pointer subtraction will get gdc in lockstep with g++.After reading a paper that grabbed his curiosity and wouldn't let go, Andrei set out to determine if Lomuto partitioning should still be considered inferior to Hoare for quicksort on modern hardware. This blog post details his results. Blog: https://dlang.org/blog/2020/05/14/lomutos-comeback/Nice stuff! One curious question -- unless I've misread things horribly, it looks like the D benchmarks for Lomuto branch-free are consistently slower than for C++. Any idea why that is? I would expect gcc/gdc and clang/ldc to produce effectively identical results for code like this.
Aug 03 2020
On 04/08/2020 03:14, Andrei Alexandrescu wrote:Interesting, thanks!Did a quick benchmark for n in `seq 1 10` ./lomuto.exe ${n}000000... gdc-baseline: min_milliseconds=53.2922 median_milliseconds=56.8761 min_milliseconds=111.2512 median_milliseconds=115.5812 min_milliseconds=168.8659 median_milliseconds=174.9421 min_milliseconds=228.9230 median_milliseconds=235.9950 min_milliseconds=291.4758 median_milliseconds=302.2652 min_milliseconds=354.6598 median_milliseconds=360.3230 min_milliseconds=420.6221 median_milliseconds=424.0275 min_milliseconds=490.9294 median_milliseconds=505.0082 min_milliseconds=535.9912 median_milliseconds=556.0680 min_milliseconds=619.8160 median_milliseconds=629.6446 gdc-pr96429: min_milliseconds=44.1008 median_milliseconds=46.2956 min_milliseconds=96.0864 median_milliseconds=99.4665 min_milliseconds=146.2498 median_milliseconds=151.5661 min_milliseconds=201.9479 median_milliseconds=207.0937 min_milliseconds=253.2555 median_milliseconds=265.6178 min_milliseconds=309.5941 median_milliseconds=317.8178 min_milliseconds=364.9312 median_milliseconds=381.9105 min_milliseconds=427.2581 median_milliseconds=437.9506 min_milliseconds=464.2838 median_milliseconds=482.9127 min_milliseconds=537.3167 median_milliseconds=550.9250 g++: min_milliseconds=44.0164 median_milliseconds=46.5057 min_milliseconds=91.2730 median_milliseconds=97.8507 min_milliseconds=142.8459 median_milliseconds=153.4782 min_milliseconds=196.4765 median_milliseconds=202.1877 min_milliseconds=251.5876 median_milliseconds=261.6350 min_milliseconds=299.8530 median_milliseconds=311.0719 min_milliseconds=351.9959 median_milliseconds=370.0437 min_milliseconds=412.4231 median_milliseconds=420.1336 min_milliseconds=462.4810 median_milliseconds=474.2444 min_milliseconds=539.3359 median_milliseconds=541.5321 Looks good, so committing patch. :-)
Aug 04 2020
On 8/4/20 4:19 AM, Iain Buclaw wrote:On 04/08/2020 03:14, Andrei Alexandrescu wrote:[snip]Interesting, thanks!Did a quick benchmark for n in `seq 1 10` ./lomuto.exe ${n}000000...Looks good, so committing patch. :-)Awesome, thanks! That does solve a puzzler I had while benchmarking. I'm thinking the story of discovering and fixing this would be a great follow-up in the blog. It doesn't quite mesh with Mike's current introductory series, but it could be done as an intermezzo a la "Now For Something Completely Different (And Much Lower Level)". Sketch of an intro: Upon reading "Lomuto's Comeback" in the D blog, I noticed the performance were consistently juuust a bit worse for the D version than for the C++ version for the same source code. My own measurements confirmed the same. That bothered me at two levels. First, people unfamiliar with the D language would form the opinion that D cannot reach the efficiency of C++. Second, as the gdc creator and maintainer, I knew for a fact the produced code must be literally identical. Any difference would pin point a bug somewhere in the code generation pipeline. So I set out to find it and fix it. This is the story of that investigation, which will take us through looking through disassembly, finding the culprit, devising a fix, confirming with measurements, and patching the open-source gdc compiler. ...
Aug 04 2020
On 8/4/20 9:49 AM, Andrei Alexandrescu wrote:On 8/4/20 4:19 AM, Iain Buclaw wrote:Oh, and there are a few comments to the original blog post I'd be glad to respond to in an appendix.On 04/08/2020 03:14, Andrei Alexandrescu wrote:[snip]Interesting, thanks!Did a quick benchmark for n in `seq 1 10` ./lomuto.exe ${n}000000...Looks good, so committing patch. :-)Awesome, thanks! That does solve a puzzler I had while benchmarking. I'm thinking the story of discovering and fixing this would be a great follow-up in the blog. It doesn't quite mesh with Mike's current introductory series, but it could be done as an intermezzo a la "Now For Something Completely Different (And Much Lower Level)". Sketch of an intro: Upon reading "Lomuto's Comeback" in the D blog, I noticed the performance were consistently juuust a bit worse for the D version than for the C++ version for the same source code. My own measurements confirmed the same. That bothered me at two levels. First, people unfamiliar with the D language would form the opinion that D cannot reach the efficiency of C++. Second, as the gdc creator and maintainer, I knew for a fact the produced code must be literally identical. Any difference would pin point a bug somewhere in the code generation pipeline. So I set out to find it and fix it. This is the story of that investigation, which will take us through looking through disassembly, finding the culprit, devising a fix, confirming with measurements, and patching the open-source gdc compiler. ...
Aug 04 2020
On Thursday, 14 May 2020 at 13:26:23 UTC, Mike Parker wrote:After reading a paper that grabbed his curiosity and wouldn't let go, Andrei set out to determine if Lomuto partitioning should still be considered inferior to Hoare for quicksort on modern hardware. This blog post details his results. Blog: https://dlang.org/blog/2020/05/14/lomutos-comeback/Great post, and nice to have another example for how bad branches can really be for performance! One note: The clang/ldc compiler explorer links for lomuto_partition_branchfree are wrong.
May 15 2020
On 5/14/20 9:26 AM, Mike Parker wrote:After reading a paper that grabbed his curiosity and wouldn't let go, Andrei set out to determine if Lomuto partitioning should still be considered inferior to Hoare for quicksort on modern hardware. This blog post details his results. Blog: https://dlang.org/blog/2020/05/14/lomutos-comeback/ Reddit: https://www.reddit.com/r/programming/comments/gjm6yp/lomutos_comeback_qui ksort_partitioning/ HN: https://news.ycombinator.com/item?id=23179160Fantastic article! -Steve
May 15 2020
On Thursday, 14 May 2020 at 13:26:23 UTC, Mike Parker wrote:After reading a paper that grabbed his curiosity and wouldn't let go, Andrei set out to determine if Lomuto partitioning should still be considered inferior to Hoare for quicksort on modern hardware. This blog post details his results. Blog: https://dlang.org/blog/2020/05/14/lomutos-comeback/ Reddit: https://www.reddit.com/r/programming/comments/gjm6yp/lomutos_comeback_quicksort_partitioning/ HN: https://news.ycombinator.com/item?id=23179160Got posted again to Hacker News earlier today. Currently at position 5.
May 17 2020
On 5/14/20 9:26 AM, Mike Parker wrote:After reading a paper that grabbed his curiosity and wouldn't let go, Andrei set out to determine if Lomuto partitioning should still be considered inferior to Hoare for quicksort on modern hardware. This blog post details his results. Blog: https://dlang.org/blog/2020/05/14/lomutos-comeback/ Reddit: https://www.reddit.com/r/programming/comments/gjm6yp/lomutos_comeback_qui ksort_partitioning/ HN: https://news.ycombinator.com/item?id=23179160Looks like the blog post is enjoying a second wind after being posted by soneone else on hackernews. It's in top 10 right now.
May 17 2020
On Thursday, 14 May 2020 at 13:26:23 UTC, Mike Parker wrote:After reading a paper that grabbed his curiosity and wouldn't let go, Andrei set out to determine if Lomuto partitioning should still be considered inferior to Hoare for quicksort on modern hardware. This blog post details his results. Blog: https://dlang.org/blog/2020/05/14/lomutos-comeback/ Reddit: https://www.reddit.com/r/programming/comments/gjm6yp/lomutos_comeback_quicksort_partitioning/ HN: https://news.ycombinator.com/item?id=23179160A follow up article on this: https://news.ycombinator.com/item?id=23363165 https://blog.reverberate.org/2020/05/29/hoares-rebuttal-bubble-sorts-comeback.html
May 31 2020