www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - std.algorithm.copy could be better optimized

reply "TommiT" <tommitissari hotmail.com> writes:
std.algorithm.copy adheres to the specification of the C++ 
standard library function std::copy. That specification states 
that the source and target ranges may not overlap. Yet, all the 
major C++ standard libraries (libc++, libstdc++, Dinkum C++ 
Library) implement std::copy so that it becomes a memmove if the 
source and target element types are the same and the element type 
is std::is_trivially_copy_assignable. Thus, the current C++ 
specification seems to be too strict.

The current std.algorithm.copy implementation doesn't get 
optimized into a memmove when it would be safe to do so. It 
really should, because it's the fastest way to get the job done, 
and it allows the ranges to overlap. Also, the documentation 
should be changed to notify of this relaxation.

There's some benchmarks of memmove vs other methods:
http://nadeausoftware.com/articles/2012/05/c_c_tip_how_copy_memory_quickly
Jun 17 2013
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 6/17/13 1:36 PM, TommiT wrote:
 std.algorithm.copy adheres to the specification of the C++ standard
 library function std::copy. That specification states that the source
 and target ranges may not overlap. Yet, all the major C++ standard
 libraries (libc++, libstdc++, Dinkum C++ Library) implement std::copy so
 that it becomes a memmove if the source and target element types are the
 same and the element type is std::is_trivially_copy_assignable. Thus,
 the current C++ specification seems to be too strict.

 The current std.algorithm.copy implementation doesn't get optimized into
 a memmove when it would be safe to do so. It really should, because it's
 the fastest way to get the job done, and it allows the ranges to
 overlap. Also, the documentation should be changed to notify of this
 relaxation.

 There's some benchmarks of memmove vs other methods:
 http://nadeausoftware.com/articles/2012/05/c_c_tip_how_copy_memory_quickly
Bugzilla. FWIW I did notice Duff can sometimes beat memmove, but that was some 10 years ago. Andrei
Jun 17 2013
next sibling parent reply "TommiT" <tommitissari hotmail.com> writes:
On Monday, 17 June 2013 at 17:43:25 UTC, Andrei Alexandrescu 
wrote:
 On 6/17/13 1:36 PM, TommiT wrote:
 std.algorithm.copy adheres to the specification of the C++ 
 standard
 library function std::copy. That specification states that the 
 source
 and target ranges may not overlap. Yet, all the major C++ 
 standard
 libraries (libc++, libstdc++, Dinkum C++ Library) implement 
 std::copy so
 that it becomes a memmove if the source and target element 
 types are the
 same and the element type is 
 std::is_trivially_copy_assignable. Thus,
 the current C++ specification seems to be too strict.

 The current std.algorithm.copy implementation doesn't get 
 optimized into
 a memmove when it would be safe to do so. It really should, 
 because it's
 the fastest way to get the job done, and it allows the ranges 
 to
 overlap. Also, the documentation should be changed to notify 
 of this
 relaxation.

 There's some benchmarks of memmove vs other methods:
 http://nadeausoftware.com/articles/2012/05/c_c_tip_how_copy_memory_quickly
Bugzilla. FWIW I did notice Duff can sometimes beat memmove, but that was some 10 years ago. Andrei
Another thing that comes to mind is that... R find(alias pred = "a == b", R, E)(R haystack, E needle) ...could be optimized, when pred is "a == b", R is an array of E's, and E is either char, byte or ubyte, to call the c function memchr instead. It's a bit tricky though: quite recently a bug was found in this particular optimization in the standard library that ships with Visual Studio, as described in this interesting tutorial: http://channel9.msdn.com/Series/C9-Lectures-Stephan-T-Lavavej-Core-C-/Stephan-T-Lavavej-Core-C-7-of-n More generally, what I'm saying is that we should probably just go through some good C++ standard library implementation and copy all the optimizations that they have done. I'm sure there's much more.
Jun 17 2013
parent Brad Roberts <braddr puremagic.com> writes:
On 6/17/13 6:14 PM, TommiT wrote:
 On Monday, 17 June 2013 at 17:43:25 UTC, Andrei Alexandrescu wrote:
 On 6/17/13 1:36 PM, TommiT wrote:
 std.algorithm.copy adheres to the specification of the C++ standard
 library function std::copy. That specification states that the source
 and target ranges may not overlap. Yet, all the major C++ standard
 libraries (libc++, libstdc++, Dinkum C++ Library) implement std::copy so
 that it becomes a memmove if the source and target element types are the
 same and the element type is std::is_trivially_copy_assignable. Thus,
 the current C++ specification seems to be too strict.

 The current std.algorithm.copy implementation doesn't get optimized into
 a memmove when it would be safe to do so. It really should, because it's
 the fastest way to get the job done, and it allows the ranges to
 overlap. Also, the documentation should be changed to notify of this
 relaxation.

 There's some benchmarks of memmove vs other methods:
 http://nadeausoftware.com/articles/2012/05/c_c_tip_how_copy_memory_quickly
Bugzilla. FWIW I did notice Duff can sometimes beat memmove, but that was some 10 years ago. Andrei
Another thing that comes to mind is that... R find(alias pred = "a == b", R, E)(R haystack, E needle) ...could be optimized, when pred is "a == b", R is an array of E's, and E is either char, byte or ubyte, to call the c function memchr instead. It's a bit tricky though: quite recently a bug was found in this particular optimization in the standard library that ships with Visual Studio, as described in this interesting tutorial: http://channel9.msdn.com/Series/C9-Lectures-Stephan-T-Lavavej-Core-C-/Stephan-T-Lavavej-Core-C-7-of-n More generally, what I'm saying is that we should probably just go through some good C++ standard library implementation and copy all the optimizations that they have done. I'm sure there's much more.
In my experience, this is an area that ends up being deferred over to the compiler (via a builtin or direct use of mem* that the compiler is just smart enough to recognize and implement itself) to let it generate the optimal code. The thresholds for inline vs function call and myriad of optimization techniques end up being super platform specific and duplicating all that complication essentially guarantees that you're not going to do as good a job. As an example, there's people at intel that are paid to do little more than improve the code generation in gcc and often submit new versions of the mem* routines both to gcc and to glibc. It's actually amazing how often that code is tweaked for both older generations of cpus and to add new ones. - Brad
Jun 17 2013
prev sibling parent reply "TommiT" <tommitissari hotmail.com> writes:
I added these two enhancement requests:

1) memmove optimization for std.algorithm.copy:
http://d.puremagic.com/issues/show_bug.cgi?id=10402

2) memchr optimization for std.algorithm.find:
http://d.puremagic.com/issues/show_bug.cgi?id=10403

...but I suspect they're just a tip of the ice berg.
Jun 18 2013
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 6/18/13 8:56 AM, TommiT wrote:
 I added these two enhancement requests:

 1) memmove optimization for std.algorithm.copy:
 http://d.puremagic.com/issues/show_bug.cgi?id=10402

 2) memchr optimization for std.algorithm.find:
 http://d.puremagic.com/issues/show_bug.cgi?id=10403

 ...but I suspect they're just a tip of the ice berg.
You're on the right track. Specializing for types with certain traits is par for the course in generic programming. Andrei
Jun 18 2013