www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - More suggestions for find()

reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Got this link from the reddit discussion around the blog article: 
http://effbot.org/zone/stringlib.htm. The Bloom-filter-style trick looks 
quite cool. Anyone interested in running some more experiments? Thx! -- 
Andrei
Jun 18 2016
parent reply qznc <qznc web.de> writes:
On Saturday, 18 June 2016 at 18:54:28 UTC, Andrei Alexandrescu 
wrote:
 Got this link from the reddit discussion around the blog 
 article: http://effbot.org/zone/stringlib.htm. The 
 Bloom-filter-style trick looks quite cool. Anyone interested in 
 running some more experiments? Thx! -- Andrei
Compare with memmem. That is 4x faster than the current stuff. I guess vector instructions are key. There is a branch in my repo. Unfortunately, Im currently traveling and lost my bag with laptop at the airport. It will probably take a while before I can look into it again.
Jun 19 2016
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 06/19/2016 06:38 AM, qznc wrote:
 Unfortunately, Im currently traveling and lost my bag with laptop at the
 airport.
Bummer. I hope you find it! -- Andrei
Jun 19 2016
parent deadalnix <deadalnix gmail.com> writes:
On Sunday, 19 June 2016 at 11:49:06 UTC, Andrei Alexandrescu 
wrote:
 On 06/19/2016 06:38 AM, qznc wrote:
 Unfortunately, Im currently traveling and lost my bag with 
 laptop at the
 airport.
Bummer. I hope you find it! -- Andrei
Not in D, but: http://clang.llvm.org/doxygen/Lexer_8cpp_source.html Line 2284, function Lexer::SkipBlockComment.
Jun 19 2016
prev sibling parent reply qznc <qznc web.de> writes:
On Sunday, 19 June 2016 at 10:38:27 UTC, qznc wrote:
 On Saturday, 18 June 2016 at 18:54:28 UTC, Andrei Alexandrescu 
 wrote:
 Got this link from the reddit discussion around the blog 
 article: http://effbot.org/zone/stringlib.htm. The 
 Bloom-filter-style trick looks quite cool. Anyone interested 
 in running some more experiments? Thx! -- Andrei
Compare with memmem. That is 4x faster than the current stuff. I guess vector instructions are key. There is a branch in my repo.
More like 2x after looking again. For some more concrete numbers: LDC: Search in Alice in Wonderland std: 438 ±12 manual: 315 ±8 A2Phobos: 254 ±7 Chris: 325 ±9 Andrei: 434 ±11 Andrei2: 259 ±6 memmem: 100 ±0 Search in random short strings std: 127 ±14 manual: 124 ±12 A2Phobos: 107 ±8 Chris: 130 ±15 Andrei: 110 ±7 Andrei2: 106 ±7 memmem: 108 ±8 Mismatch in random long strings std: 534 ±432 manual: 587 ±428 A2Phobos: 353 ±274 Chris: 605 ±448 Andrei: 550 ±399 Andrei2: 360 ±280 memmem: 136 ±50 Search random haystack with random needle std: 293 ±211 manual: 265 ±141 A2Phobos: 212 ±163 Chris: 271 ±133 Andrei: 285 ±212 Andrei2: 215 ±164 memmem: 118 ±20 (avg slowdown vs fastest; absolute deviation) CPU ID: GenuineIntel Intel(R) Core(TM) i7-3770 CPU 3.40GHz DMD: Search in Alice in Wonderland std: 554 ±17 manual: 382 ±11 A2Phobos: 346 ±10 Chris: 501 ±18 Andrei: 498 ±14 Andrei2: 342 ±9 memmem: 100 ±0 Search in random short strings std: 153 ±31 manual: 131 ±19 A2Phobos: 108 ±9 Chris: 146 ±29 Andrei: 109 ±10 Andrei2: 106 ±7 memmem: 108 ±8 Mismatch in random long strings std: 674 ±542 manual: 651 ±469 A2Phobos: 445 ±362 Chris: 821 ±604 Andrei: 618 ±459 Andrei2: 436 ±354 memmem: 134 ±50 Search random haystack with random needle std: 379 ±307 manual: 284 ±175 A2Phobos: 233 ±177 Chris: 370 ±245 Andrei: 299 ±238 Andrei2: 238 ±188 memmem: 110 ±14 (avg slowdown vs fastest; absolute deviation) CPU ID: GenuineIntel Intel(R) Core(TM) i7-3770 CPU 3.40GHz
Jun 20 2016
next sibling parent reply Jack Stouffer <jack jackstouffer.com> writes:
On Monday, 20 June 2016 at 12:34:55 UTC, qznc wrote:
 On Sunday, 19 June 2016 at 10:38:27 UTC, qznc wrote:
 On Saturday, 18 June 2016 at 18:54:28 UTC, Andrei Alexandrescu 
 wrote:
 Got this link from the reddit discussion around the blog 
 article: http://effbot.org/zone/stringlib.htm. The 
 Bloom-filter-style trick looks quite cool. Anyone interested 
 in running some more experiments? Thx! -- Andrei
Compare with memmem. That is 4x faster than the current stuff. I guess vector instructions are key. There is a branch in my repo.
More like 2x after looking again
Cool :) What are the chances of getting this in Phobos?
Jun 20 2016
parent reply qznc <qznc web.de> writes:
On Monday, 20 June 2016 at 13:27:59 UTC, Jack Stouffer wrote:
 On Monday, 20 June 2016 at 12:34:55 UTC, qznc wrote:
 On Sunday, 19 June 2016 at 10:38:27 UTC, qznc wrote:
 On Saturday, 18 June 2016 at 18:54:28 UTC, Andrei 
 Alexandrescu wrote:
 Got this link from the reddit discussion around the blog 
 article: http://effbot.org/zone/stringlib.htm. The 
 Bloom-filter-style trick looks quite cool. Anyone interested 
 in running some more experiments? Thx! -- Andrei
Compare with memmem. That is 4x faster than the current stuff. I guess vector instructions are key. There is a branch in my repo.
More like 2x after looking again
Cool :) What are the chances of getting this in Phobos?
Low. It requires the GNU libc to link against. We don't want that dependency. We cannot port it directly since it is GPL code. It is even more of a special case, since it only works for the == predicate. I'm not sure about the vector instructions it requires. What we need is a clean room implementation of the two way string matching algorithm with vector instructions?
Jun 20 2016
next sibling parent Guillaume Chatelet <chatelet.guillaume gmail.com> writes:
On Monday, 20 June 2016 at 16:09:21 UTC, qznc wrote:
 On Monday, 20 June 2016 at 13:27:59 UTC, Jack Stouffer wrote:
 On Monday, 20 June 2016 at 12:34:55 UTC, qznc wrote:
 On Sunday, 19 June 2016 at 10:38:27 UTC, qznc wrote:
 On Saturday, 18 June 2016 at 18:54:28 UTC, Andrei 
 Alexandrescu wrote:
 [...]
Compare with memmem. That is 4x faster than the current stuff. I guess vector instructions are key. There is a branch in my repo.
More like 2x after looking again
Cool :) What are the chances of getting this in Phobos?
Low. It requires the GNU libc to link against. We don't want that dependency. We cannot port it directly since it is GPL code. It is even more of a special case, since it only works for the == predicate. I'm not sure about the vector instructions it requires. What we need is a clean room implementation of the two way string matching algorithm with vector instructions?
Maybe there's some inspiration to get from the source code: https://sourceware.org/git/?p=glibc.git;a=blob_plain;f=string/memmem.c;hb=HEAD https://sourceware.org/git/?p=glibc.git;a=blob_plain;f=string/str-two-way.h;hb=HEAD
Jun 21 2016
prev sibling parent Wyatt <wyatt.epp gmail.com> writes:
On Monday, 20 June 2016 at 16:09:21 UTC, qznc wrote:
 We cannot port it directly since it is GPL code.
Would it work to port from Musl instead? -Wyatt
Jun 21 2016
prev sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 6/20/16 8:34 AM, qznc wrote:
 On Sunday, 19 June 2016 at 10:38:27 UTC, qznc wrote:
 On Saturday, 18 June 2016 at 18:54:28 UTC, Andrei Alexandrescu wrote:
 Got this link from the reddit discussion around the blog article:
 http://effbot.org/zone/stringlib.htm. The Bloom-filter-style trick
 looks quite cool. Anyone interested in running some more experiments?
 Thx! -- Andrei
Compare with memmem. That is 4x faster than the current stuff. I guess vector instructions are key. There is a branch in my repo.
[snip] Super cool! One thought - the current algorithm computes a "skip". In the case of Alice in Wonderland, that skip is relatively small, I recall like 4. If the skip is large enough, the current algorithm is faster than memmem. So it would be best to compute the skip and then switch to memmem if the skip falls below a cutoff. -- Andrei
Jun 20 2016