digitalmars.D - Dynamic array comparison optimization: check for equal pointers?
- Johan (19/19) Jun 02 2020 Hi all,
- Stefan Koch (3/22) Jun 02 2020 I would have expected this optimization to be present as well ...
- Stanislav Blinov (15/26) Jun 02 2020 That's, worst case, two branches and a scan, and best case one
- Johan (15/42) Jun 02 2020 The "actual scan" is a function call, which will be super slow
- Stanislav Blinov (12/25) Jun 02 2020 LDC (and I'm assuming GDC as well) is well aware of memcmp, it
- Steven Schveighoffer (7/19) Jun 02 2020 Let's assume LDC knows how to optimize this. How do you think it would
- Stanislav Blinov (18/25) Jun 02 2020 That actually is irrelevant. Whether it gets inlined, whether
- Steven Schveighoffer (7/27) Jun 02 2020 I'm saying the compiler can unwrite druntime, if it knows a better way.
- Stanislav Blinov (18/26) Jun 02 2020 ...Yes? And that will fall under a hypothetical case when the
- Steven Schveighoffer (4/21) Jun 02 2020 The worst case is opaque memcmp is called in all cases. No thanks. I'll
- Stanislav Blinov (11/18) Jun 02 2020 With the check in place, the worst case would be a useless
- Steven Schveighoffer (6/26) Jun 02 2020 I thought I remembered this being true, but maybe it was for class
- Jonathan M Davis (6/34) Jun 02 2020 The free function opEquals for classes does check the references first a...
- Steven Schveighoffer (5/15) Jun 02 2020 Sorry, to be clear I meant the array implementation prior to 2017
- Dominikus Dittes Scherkl (10/26) Jun 02 2020 I also would expect this kind of optimization in memcmp and not
- Johan (9/38) Jun 02 2020 Pointers are nothing to be scared of.
- NaN (5/11) Jun 02 2020 Getting two pointers to the same memory location may be such a
- Stanislav Blinov (3/6) Jun 02 2020 How do you pass pointers to memcmp if you're not aware of them? ;)
- Patrick Schluter (6/23) Jun 02 2020 memcmp() doesn't implement the check to avoid memcmp(NULL, NULL,
- Patrick Schluter (2/9) Jun 02 2020 https://stackoverflow.com/questions/16362925/can-i-pass-a-null-pointer-t...
- Johan (5/15) Jun 03 2020 Interesting argument!
- Andrei Alexandrescu (10/36) Jun 03 2020 Affirmative:
- Johan (14/36) Jun 03 2020 It's such a simple optimization idea that I think it must have
Hi all, I thought that we optimized dynamic array comparison with a check to see if the pointers are equal, but I don't see it in our array comparison implementation in druntime. Quick checking of glibc memcmp also showed that there is no short circuit for when the pointers of the slices are equal. I was expecting something like: ``` if ( sizes are not equal ) return false; if ( pointers are equal ) return true; return memcmp; ``` I am surprised that memcmp itself does not implement the pointers are equal check. Is there a big impact of adding the shortcircuit check to prevent scanning memory? cheers, Johan
Jun 02 2020
On Tuesday, 2 June 2020 at 19:38:04 UTC, Johan wrote:Hi all, I thought that we optimized dynamic array comparison with a check to see if the pointers are equal, but I don't see it in our array comparison implementation in druntime. Quick checking of glibc memcmp also showed that there is no short circuit for when the pointers of the slices are equal. I was expecting something like: ``` if ( sizes are not equal ) return false; if ( pointers are equal ) return true; return memcmp; ``` I am surprised that memcmp itself does not implement the pointers are equal check. Is there a big impact of adding the shortcircuit check to prevent scanning memory? cheers, JohanI would have expected this optimization to be present as well ... It seems like a no-brainer.
Jun 02 2020
On Tuesday, 2 June 2020 at 19:38:04 UTC, Johan wrote:I was expecting something like: ``` if ( sizes are not equal ) return false; if ( pointers are equal ) return true; return memcmp; ```That's, worst case, two branches and a scan, and best case one branch which may or may not be slower than the actual scan.I am surprised that memcmp itself does not implement the pointers are equal check. Is there a big impact of adding the shortcircuit check to prevent scanning memory?There can't be a definitive answer for this, it's dependent on architecture as well as size of data. As far as array comparison goes, if a compiler (talking about ldc/gdc here) can infer pointer/length information, it'll be free to throw away the scan. In all other cases, I'd rather see the comparison function, well, do the comparison, and leave branches up to the user. It's much better than to inflict the branches on everyone unconditionally (pun intended): if your data shows that you can benefit from the short-circuit, you can always do just that. Short of rethinking the design, of course ;) I mean, I can't imagine a scenario where a program would have to routinely compare identical memory ranges. Doesn't mean there aren't such scenarios though.
Jun 02 2020
On Tuesday, 2 June 2020 at 20:54:39 UTC, Stanislav Blinov wrote:On Tuesday, 2 June 2020 at 19:38:04 UTC, Johan wrote:The "actual scan" is a function call, which will be super slow (note that in druntime's implementation it cannot be optimized to a tail call) compared to the comparison and early return.I was expecting something like: ``` if ( sizes are not equal ) return false; if ( pointers are equal ) return true; return memcmp; ```That's, worst case, two branches and a scan, and best case one branch which may or may not be slower than the actual scan.You are assuming that an if-statement costs significant time, but that doesn't have to be the case. The pointer values already must be read from memory (otherwise you cannot do the memory scan), and with speculative execution the comparison+branch only costs very little (codegen could be coaxed to make if(false) be the default execution path) and potentially saves many many memory reads. Could someone take the effort and check the performance difference (for example the compiler itself) with this change? Thanks! JohanI am surprised that memcmp itself does not implement the pointers are equal check. Is there a big impact of adding the shortcircuit check to prevent scanning memory?There can't be a definitive answer for this, it's dependent on architecture as well as size of data. As far as array comparison goes, if a compiler (talking about ldc/gdc here) can infer pointer/length information, it'll be free to throw away the scan. In all other cases, I'd rather see the comparison function, well, do the comparison, and leave branches up to the user. It's much better than to inflict the branches on everyone unconditionally (pun intended): if your data shows that you can benefit from the short-circuit, you can always do just that. Short of rethinking the design, of course ;) I mean, I can't imagine a scenario where a program would have to routinely compare identical memory ranges. Doesn't mean there aren't such scenarios though.
Jun 02 2020
On Tuesday, 2 June 2020 at 22:01:20 UTC, Johan wrote:On Tuesday, 2 June 2020 at 20:54:39 UTC, Stanislav Blinov wrote:LDC (and I'm assuming GDC as well) is well aware of memcmp, it should not be a function call. At the very least in -Ox builds.That's, worst case, two branches and a scan, and best case one branch which may or may not be slower than the actual scan.The "actual scan" is a function call, which will be super slow (note that in druntime's implementation it cannot be optimized to a tail call) compared to the comparison and early return.You are assuming that an if-statement costs significant time, but that doesn't have to be the case. The pointer values already must be read from memory (otherwise you cannot do the memory scan), and with speculative execution the comparison+branch only costs very little (codegen could be coaxed to make if(false) be the default execution path) and potentially saves many many memory reads.I'm not assuming anything. The branch(es) may, or may not, be more costly than just comparing some words. Depending on architecture and the amount of data to compare. That's literally what I wrote :) May, or may not - implementation has no business assuming either way. That's why such a check absolutely should *not* be there. It should be where it belongs - in user code, backed up by profiling as necessary. Including it there may be beneficial for you specifically, but detrimental for everyone else.
Jun 02 2020
On 6/2/20 6:20 PM, Stanislav Blinov wrote:On Tuesday, 2 June 2020 at 22:01:20 UTC, Johan wrote:Let's assume LDC knows how to optimize this. How do you think it would do so? I'd imagine very similarly to how Johan wrote it. What makes you think LDC can't optimize out the check if there's a better way? Then the check is there in case the optimizer you are dealing with is going to call an opaque memcmp in all cases. -SteveOn Tuesday, 2 June 2020 at 20:54:39 UTC, Stanislav Blinov wrote:LDC (and I'm assuming GDC as well) is well aware of memcmp, it should not be a function call. At the very least in -Ox builds.That's, worst case, two branches and a scan, and best case one branch which may or may not be slower than the actual scan.The "actual scan" is a function call, which will be super slow (note that in druntime's implementation it cannot be optimized to a tail call) compared to the comparison and early return.
Jun 02 2020
On Tuesday, 2 June 2020 at 22:44:24 UTC, Steven Schveighoffer wrote:Let's assume LDC knows how to optimize this. How do you think it would do so? I'd imagine very similarly to how Johan wrote it. What makes you think LDC can't optimize out the check if there's a better way? Then the check is there in case the optimizer you are dealing with is going to call an opaque memcmp in all cases.That actually is irrelevant. Whether it gets inlined, whether even it's loop is unrolled by the compiler - that is all beside the point. If you hardcode the check in the library, that check is there for everyone using that library. On all hardware and for all data sizes. Irrespective of whether it can, on a given platform with a given array size, be actually faster to just do the call/loop/etc. than to do a check first. Irrespective of what a given compiler is capable of. There cannot be a universal optimization here. Users can always write a compare_arrays_ptrcheck if so required, but they cannot "unwrite" druntime. Same goes for actual memcmp. So let them do the minimal work required, and let users make higher-level decisions at higher level. If a program is attempting to "compare" identical memory ranges so often as to warrant a test - it's up to that program to deal with it, not the whole community.
Jun 02 2020
On 6/2/20 7:35 PM, Stanislav Blinov wrote:On Tuesday, 2 June 2020 at 22:44:24 UTC, Steven Schveighoffer wrote:I'm saying the compiler can unwrite druntime, if it knows a better way. Happens all the time with LDC, it will remove loops that it knows don't do anything. And the upside is that compilers that don't do the "smarter" thing will not needlessly call memcmp. -SteveLet's assume LDC knows how to optimize this. How do you think it would do so? I'd imagine very similarly to how Johan wrote it. What makes you think LDC can't optimize out the check if there's a better way? Then the check is there in case the optimizer you are dealing with is going to call an opaque memcmp in all cases.That actually is irrelevant. Whether it gets inlined, whether even it's loop is unrolled by the compiler - that is all beside the point. If you hardcode the check in the library, that check is there for everyone using that library. On all hardware and for all data sizes. Irrespective of whether it can, on a given platform with a given array size, be actually faster to just do the call/loop/etc. than to do a check first. Irrespective of what a given compiler is capable of. There cannot be a universal optimization here. Users can always write a compare_arrays_ptrcheck if so required, but they cannot "unwrite" druntime.
Jun 02 2020
On Wednesday, 3 June 2020 at 00:04:43 UTC, Steven Schveighoffer wrote:...Yes? And that will fall under a hypothetical case when the check just gets removed (or heck, inserted if it's faster). Leaving it in for all other cases when the compiler *can't* remove it. Make the best case better but the worst case worse? No thanks. For the third time, and let me be a little more explicit this time - if a program gives memcmp identical pointers so much as to warrant a test (which is found out by measuring) - the program is doing something wrong. *It* should do a test and not bother with memcmp at all. Or its logic needs to change so as to not have this problem in the first place. Same translates onto druntime's arrays. There's no definitive answer to Johan's original question. It depends. On hardware. On data. Therefore, that check shouldn't be there, it should be delegated to someone capable of making such decision - to the user.Users can always write a compare_arrays_ptrcheck if so required, but they cannot "unwrite" druntime.I'm saying the compiler can unwrite druntime, if it knows a better way. Happens all the time with LDC, it will remove loops that it knows don't do anything. And the upside is that compilers that don't do the "smarter" thing will not needlessly call memcmp.
Jun 02 2020
On 6/2/20 8:35 PM, Stanislav Blinov wrote:On Wednesday, 3 June 2020 at 00:04:43 UTC, Steven Schveighoffer wrote:The worst case is opaque memcmp is called in all cases. No thanks. I'll take a few extra cycles over that any day. -Steve....Yes? And that will fall under a hypothetical case when the check just gets removed (or heck, inserted if it's faster). Leaving it in for all other cases when the compiler *can't* remove it. Make the best case better but the worst case worse? No thanks.Users can always write a compare_arrays_ptrcheck if so required, but they cannot "unwrite" druntime.I'm saying the compiler can unwrite druntime, if it knows a better way. Happens all the time with LDC, it will remove loops that it knows don't do anything. And the upside is that compilers that don't do the "smarter" thing will not needlessly call memcmp.
Jun 02 2020
On Wednesday, 3 June 2020 at 01:10:55 UTC, Steven Schveighoffer wrote:With the check in place, the worst case would be a useless comparison plus the memcmp, or a failed branch on the off chance that you do try to compare identical ranges. Or a useless comparison and a failed branch + memcmp. Depends on your program. Whether the effect would be significant or immeasurable - also depends on your program, and your data, and your hardware. "I'm not sure what my program is doing, so let's make everyone else pay for it". That's what including that test, in memcmp or in druntime, effectively would be.....Yes? And that will fall under a hypothetical case when the check just gets removed (or heck, inserted if it's faster). Leaving it in for all other cases when the compiler *can't* remove it. Make the best case better but the worst case worse? No thanks.The worst case is opaque memcmp is called in all cases. No thanks. I'll take a few extra cycles over that any day.
Jun 02 2020
On 6/2/20 3:38 PM, Johan wrote:Hi all, I thought that we optimized dynamic array comparison with a check to see if the pointers are equal, but I don't see it in our array comparison implementation in druntime. Quick checking of glibc memcmp also showed that there is no short circuit for when the pointers of the slices are equal. I was expecting something like: ``` if ( sizes are not equal ) return false; if ( pointers are equal ) return true; return memcmp; ``` I am surprised that memcmp itself does not implement the pointers are equal check. Is there a big impact of adding the shortcircuit check to prevent scanning memory?I thought I remembered this being true, but maybe it was for class instances? Looking back in history, it seems prior to 2017, it was done by the compiler generating the code instead of a template, but seems like it did not include the pointer check. -Steve
Jun 02 2020
On Tuesday, June 2, 2020 3:18:40 PM MDT Steven Schveighoffer via Digitalmars-d wrote:On 6/2/20 3:38 PM, Johan wrote:The free function opEquals for classes does check the references first and has for years (as well doing checks like whether either reference is null). I've never looked at what the array implementation does though. - Jonathna M DavisHi all, I thought that we optimized dynamic array comparison with a check to see if the pointers are equal, but I don't see it in our array comparison implementation in druntime. Quick checking of glibc memcmp also showed that there is no short circuit for when the pointers of the slices are equal. I was expecting something like: ``` if ( sizes are not equal ) return false; if ( pointers are equal ) return true; return memcmp; ``` I am surprised that memcmp itself does not implement the pointers are equal check. Is there a big impact of adding the shortcircuit check to prevent scanning memory?I thought I remembered this being true, but maybe it was for class instances? Looking back in history, it seems prior to 2017, it was done by the compiler generating the code instead of a template, but seems like it did not include the pointer check.
Jun 02 2020
On 6/2/20 6:48 PM, Jonathan M Davis wrote:On Tuesday, June 2, 2020 3:18:40 PM MDT Steven Schveighoffer via Digitalmars-d wrote:Sorry, to be clear I meant the array implementation prior to 2017 implemented opEquals by generating code with the compiler. After that, it used the linked template. -SteveI thought I remembered this being true, but maybe it was for class instances? Looking back in history, it seems prior to 2017, it was done by the compiler generating the code instead of a template, but seems like it did not include the pointer check.The free function opEquals for classes does check the references first and has for years (as well doing checks like whether either reference is null). I've never looked at what the array implementation does though.
Jun 02 2020
On Tuesday, 2 June 2020 at 19:38:04 UTC, Johan wrote:Hi all, I thought that we optimized dynamic array comparison with a check to see if the pointers are equal, but I don't see it in our array comparison implementation in druntime. Quick checking of glibc memcmp also showed that there is no short circuit for when the pointers of the slices are equal. I was expecting something like: ``` if ( sizes are not equal ) return false; if ( pointers are equal ) return true; return memcmp; ``` I am surprised that memcmp itself does not implement the pointers are equal check.I also would expect this kind of optimization in memcmp and not in the functions using it. Anything at higher level should be unaware of pointers wherever possible. And how do you know this optimization is NOT in memcmp? It's extern C and the source may be not available - have you examined it? There are many implementations of memcmp available - maybe on you're platform the cost for this extra check is too high? (especially given the very low probability that it is ever called with src==dst).
Jun 02 2020
On Tuesday, 2 June 2020 at 21:27:44 UTC, Dominikus Dittes Scherkl wrote:On Tuesday, 2 June 2020 at 19:38:04 UTC, Johan wrote:Pointers are nothing to be scared of.Hi all, I thought that we optimized dynamic array comparison with a check to see if the pointers are equal, but I don't see it in our array comparison implementation in druntime. Quick checking of glibc memcmp also showed that there is no short circuit for when the pointers of the slices are equal. I was expecting something like: ``` if ( sizes are not equal ) return false; if ( pointers are equal ) return true; return memcmp; ``` I am surprised that memcmp itself does not implement the pointers are equal check.I also would expect this kind of optimization in memcmp and not in the functions using it. Anything at higher level should be unaware of pointers wherever possible.And how do you know this optimization is NOT in memcmp? It's extern C and the source may be not available - have you examined it?That's what I wrote, yes.There are many implementations of memcmp available - maybe on you're platform the cost for this extra check is too high? (especially given the very low probability that it is ever called with src==dst).glibc is a very commonly used library for memcmp. It does many other checks before starting the comparison (e.g. optimizing for pointer alignment), so the cost of the extra check is perhaps not even measurable. -Johan
Jun 02 2020
On Tuesday, 2 June 2020 at 22:05:06 UTC, Johan wrote:On Tuesday, 2 June 2020 at 21:27:44 UTC, Dominikus Dittes Scherkl wrote: glibc is a very commonly used library for memcmp. It does many other checks before starting the comparison (e.g. optimizing for pointer alignment), so the cost of the extra check is perhaps not even measurable.Getting two pointers to the same memory location may be such a rare occurrence that it's not worth having the check for it. IE, Adding a tiny overhead on 1000 calls can still cost more than a very large overhead on 1 call.
Jun 02 2020
On Tuesday, 2 June 2020 at 21:27:44 UTC, Dominikus Dittes Scherkl wrote:I also would expect this kind of optimization in memcmp and not in the functions using it. Anything at higher level should be unaware of pointers wherever possible.How do you pass pointers to memcmp if you're not aware of them? ;)
Jun 02 2020
On Tuesday, 2 June 2020 at 19:38:04 UTC, Johan wrote:Hi all, I thought that we optimized dynamic array comparison with a check to see if the pointers are equal, but I don't see it in our array comparison implementation in druntime. Quick checking of glibc memcmp also showed that there is no short circuit for when the pointers of the slices are equal. I was expecting something like: ``` if ( sizes are not equal ) return false; if ( pointers are equal ) return true; return memcmp; ``` I am surprised that memcmp itself does not implement the pointers are equal check. Is there a big impact of adding the shortcircuit check to prevent scanning memory?memcmp() doesn't implement the check to avoid memcmp(NULL, NULL, length) returning with 0 instead of segfaulting as it does if any of the 2 pointers in NULL. It's a case of consistant behaviour, either you tolerate NULL pointers in all cases or you segfault in all cases.
Jun 02 2020
On Wednesday, 3 June 2020 at 05:48:56 UTC, Patrick Schluter wrote:On Tuesday, 2 June 2020 at 19:38:04 UTC, Johan wrote:https://stackoverflow.com/questions/16362925/can-i-pass-a-null-pointer-to-memcmp[...]memcmp() doesn't implement the check to avoid memcmp(NULL, NULL, length) returning with 0 instead of segfaulting as it does if any of the 2 pointers in NULL. It's a case of consistant behaviour, either you tolerate NULL pointers in all cases or you segfault in all cases.
Jun 02 2020
On Wednesday, 3 June 2020 at 05:48:56 UTC, Patrick Schluter wrote:On Tuesday, 2 June 2020 at 19:38:04 UTC, Johan wrote:Interesting argument! But "undefined behavior" does not mean the program has to segfault. It's perfectly fine to return 0 for the UB case. -JohanI am surprised that memcmp itself does not implement the pointers are equal check. Is there a big impact of adding the shortcircuit check to prevent scanning memory?memcmp() doesn't implement the check to avoid memcmp(NULL, NULL, length) returning with 0 instead of segfaulting as it does if any of the 2 pointers in NULL. It's a case of consistant behaviour, either you tolerate NULL pointers in all cases or you segfault in all cases.
Jun 03 2020
On 6/3/20 1:48 AM, Patrick Schluter wrote:On Tuesday, 2 June 2020 at 19:38:04 UTC, Johan wrote:Affirmative: https://code.woboq.org/userspace/glibc/string/memcmp.c.html#301 My take on the original matter is - adding an easily-predictable branch is unlikely to affect speed in the most naive speculative execution hardware. However, prediction does take resources that would be otherwise available for other branches, and the branch adds to code size. Both of these are important in large applications. So inserting a test will look good in microbenchmarks but may be actually a negative in real-world use.Hi all, I thought that we optimized dynamic array comparison with a check to see if the pointers are equal, but I don't see it in our array comparison implementation in druntime. Quick checking of glibc memcmp also showed that there is no short circuit for when the pointers of the slices are equal. I was expecting something like: ``` if ( sizes are not equal ) return false; if ( pointers are equal ) return true; return memcmp; ``` I am surprised that memcmp itself does not implement the pointers are equal check. Is there a big impact of adding the shortcircuit check to prevent scanning memory?memcmp() doesn't implement the check to avoid memcmp(NULL, NULL, length) returning with 0 instead of segfaulting as it does if any of the 2 pointers in NULL. It's a case of consistant behaviour, either you tolerate NULL pointers in all cases or you segfault in all cases.
Jun 03 2020
On Wednesday, 3 June 2020 at 13:46:00 UTC, Andrei Alexandrescu wrote:On 6/3/20 1:48 AM, Patrick Schluter wrote:It's such a simple optimization idea that I think it must have been tested by many many people already; presumably the outcome was that the gain is not worth the cost. In case the data is _not_ equal, I guess the inequality actually shows up quite early so only a fraction of memory is scanned. In case the data _is_ equal, that's where pointer comparison can cut a significant amount of memory read, but perhaps in most cases the data being equal is still stored in different memory locations. Again, I'm very interested to see a perf comparison in a real application. Perhaps I can test this at Weka. -JohanOn Tuesday, 2 June 2020 at 19:38:04 UTC, Johan wrote:Affirmative: https://code.woboq.org/userspace/glibc/string/memcmp.c.html#301 My take on the original matter is - adding an easily-predictable branch is unlikely to affect speed in the most naive speculative execution hardware. However, prediction does take resources that would be otherwise available for other branches, and the branch adds to code size. Both of these are important in large applications. So inserting a test will look good in microbenchmarks but may be actually a negative in real-world use.I am surprised that memcmp itself does not implement the pointers are equal check. Is there a big impact of adding the shortcircuit check to prevent scanning memory?memcmp() doesn't implement the check to avoid memcmp(NULL, NULL, length) returning with 0 instead of segfaulting as it does if any of the 2 pointers in NULL. It's a case of consistant behaviour, either you tolerate NULL pointers in all cases or you segfault in all cases.
Jun 03 2020