digitalmars.D - std.algorithm.copy could be better optimized

TommiT (15/15) Jun 17 2013 std.algorithm.copy adheres to the specification of the C++

Andrei Alexandrescu (4/18) Jun 17 2013 Bugzilla. FWIW I did notice Duff can sometimes beat memmove, but that

TommiT (15/45) Jun 17 2013 Another thing that comes to mind is that...

Brad Roberts (10/41) Jun 17 2013 In my experience, this is an area that ends up being deferred over to th...

TommiT (6/6) Jun 18 2013 I added these two enhancement requests:

Andrei Alexandrescu (4/10) Jun 18 2013 You're on the right track. Specializing for types with certain traits is...

"TommiT" <tommitissari hotmail.com> writes:

std.algorithm.copy adheres to the specification of the C++ 
standard library function std::copy. That specification states 
that the source and target ranges may not overlap. Yet, all the 
major C++ standard libraries (libc++, libstdc++, Dinkum C++ 
Library) implement std::copy so that it becomes a memmove if the 
source and target element types are the same and the element type 
is std::is_trivially_copy_assignable. Thus, the current C++ 
specification seems to be too strict.

The current std.algorithm.copy implementation doesn't get 
optimized into a memmove when it would be safe to do so. It 
really should, because it's the fastest way to get the job done, 
and it allows the ranges to overlap. Also, the documentation 
should be changed to notify of this relaxation.

There's some benchmarks of memmove vs other methods:
http://nadeausoftware.com/articles/2012/05/c_c_tip_how_copy_memory_quickly

Jun 17 2013

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 6/17/13 1:36 PM, TommiT wrote:
 std.algorithm.copy adheres to the specification of the C++ standard
 library function std::copy. That specification states that the source
 and target ranges may not overlap. Yet, all the major C++ standard
 libraries (libc++, libstdc++, Dinkum C++ Library) implement std::copy so
 that it becomes a memmove if the source and target element types are the
 same and the element type is std::is_trivially_copy_assignable. Thus,
 the current C++ specification seems to be too strict.

 The current std.algorithm.copy implementation doesn't get optimized into
 a memmove when it would be safe to do so. It really should, because it's
 the fastest way to get the job done, and it allows the ranges to
 overlap. Also, the documentation should be changed to notify of this
 relaxation.

 There's some benchmarks of memmove vs other methods:
 http://nadeausoftware.com/articles/2012/05/c_c_tip_how_copy_memory_quickly

Bugzilla. FWIW I did notice Duff can sometimes beat memmove, but that 
was some 10 years ago.

Andrei

Jun 17 2013

"TommiT" <tommitissari hotmail.com> writes:

On Monday, 17 June 2013 at 17:43:25 UTC, Andrei Alexandrescu 
wrote:
 On 6/17/13 1:36 PM, TommiT wrote:
 std.algorithm.copy adheres to the specification of the C++ 
 standard
 library function std::copy. That specification states that the 
 source
 and target ranges may not overlap. Yet, all the major C++ 
 standard
 libraries (libc++, libstdc++, Dinkum C++ Library) implement 
 std::copy so
 that it becomes a memmove if the source and target element 
 types are the
 same and the element type is 
 std::is_trivially_copy_assignable. Thus,
 the current C++ specification seems to be too strict.

 The current std.algorithm.copy implementation doesn't get 
 optimized into
 a memmove when it would be safe to do so. It really should, 
 because it's
 the fastest way to get the job done, and it allows the ranges 
 to
 overlap. Also, the documentation should be changed to notify 
 of this
 relaxation.

 There's some benchmarks of memmove vs other methods:
 http://nadeausoftware.com/articles/2012/05/c_c_tip_how_copy_memory_quickly

 Bugzilla. FWIW I did notice Duff can sometimes beat memmove, 
 but that was some 10 years ago.

 Andrei

Another thing that comes to mind is that...
R find(alias pred = "a == b", R, E)(R haystack, E needle)
...could be optimized, when pred is "a == b", R is an array of 
E's, and E is either char, byte or ubyte, to call the c function 
memchr instead. It's a bit tricky though: quite recently a bug 
was found in this particular optimization in the standard library 
that ships with Visual Studio, as described in this interesting 
tutorial:

http://channel9.msdn.com/Series/C9-Lectures-Stephan-T-Lavavej-Core-C-/Stephan-T-Lavavej-Core-C-7-of-n

More generally, what I'm saying is that we should probably just 
go through some good C++ standard library implementation and copy 
all the optimizations that they have done. I'm sure there's much 
more.

Jun 17 2013

Brad Roberts <braddr puremagic.com> writes:

On 6/17/13 6:14 PM, TommiT wrote:
 On Monday, 17 June 2013 at 17:43:25 UTC, Andrei Alexandrescu wrote:
 On 6/17/13 1:36 PM, TommiT wrote:
 std.algorithm.copy adheres to the specification of the C++ standard
 library function std::copy. That specification states that the source
 and target ranges may not overlap. Yet, all the major C++ standard
 libraries (libc++, libstdc++, Dinkum C++ Library) implement std::copy so
 that it becomes a memmove if the source and target element types are the
 same and the element type is std::is_trivially_copy_assignable. Thus,
 the current C++ specification seems to be too strict.

 The current std.algorithm.copy implementation doesn't get optimized into
 a memmove when it would be safe to do so. It really should, because it's
 the fastest way to get the job done, and it allows the ranges to
 overlap. Also, the documentation should be changed to notify of this
 relaxation.

 There's some benchmarks of memmove vs other methods:
 http://nadeausoftware.com/articles/2012/05/c_c_tip_how_copy_memory_quickly

 Bugzilla. FWIW I did notice Duff can sometimes beat memmove, but that was some
10 years ago.

 Andrei

 Another thing that comes to mind is that...
 R find(alias pred = "a == b", R, E)(R haystack, E needle)
 ...could be optimized, when pred is "a == b", R is an array of E's, and E is
either char, byte or
 ubyte, to call the c function memchr instead. It's a bit tricky though: quite
recently a bug was
 found in this particular optimization in the standard library that ships with
Visual Studio, as
 described in this interesting tutorial:

 http://channel9.msdn.com/Series/C9-Lectures-Stephan-T-Lavavej-Core-C-/Stephan-T-Lavavej-Core-C-7-of-n

 More generally, what I'm saying is that we should probably just go through
some good C++ standard
 library implementation and copy all the optimizations that they have done. I'm
sure there's much more.

In my experience, this is an area that ends up being deferred over to the
compiler (via a builtin or 
direct use of mem* that the compiler is just smart enough to recognize and
implement itself) to let 
it generate the optimal code.  The thresholds for inline vs function call and
myriad of optimization 
techniques end up being super platform specific and duplicating all that
complication essentially 
guarantees that you're not going to do as good a job.  As an example, there's
people at intel that 
are paid to do little more than improve the code generation in gcc and often
submit new versions of 
the mem* routines both to gcc and to glibc.  It's actually amazing how often
that code is tweaked 
for both older generations of cpus and to add new ones.

- Brad

Jun 17 2013

"TommiT" <tommitissari hotmail.com> writes:

I added these two enhancement requests:

1) memmove optimization for std.algorithm.copy:
http://d.puremagic.com/issues/show_bug.cgi?id=10402

2) memchr optimization for std.algorithm.find:
http://d.puremagic.com/issues/show_bug.cgi?id=10403

...but I suspect they're just a tip of the ice berg.

Jun 18 2013

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 6/18/13 8:56 AM, TommiT wrote:
 I added these two enhancement requests:

 1) memmove optimization for std.algorithm.copy:
 http://d.puremagic.com/issues/show_bug.cgi?id=10402

 2) memchr optimization for std.algorithm.find:
 http://d.puremagic.com/issues/show_bug.cgi?id=10403

 ...but I suspect they're just a tip of the ice berg.

You're on the right track. Specializing for types with certain traits is 
par for the course in generic programming.

Andrei

Jun 18 2013

D Programming

C/C++ Programming

Other

digitalmars.D - std.algorithm.copy could be better optimized