digitalmars.D - Review of Andrei's std.benchmark
- Jens Mueller (33/33) Sep 17 2012 Hi,
- Andrei Alexandrescu (6/9) Sep 17 2012 One note - I moved the benchmark-related stuff from std.datetime
- =?UTF-8?B?IsOYaXZpbmQi?= (19/19) Sep 18 2012 I think the std.benchmark is definitely a useful library
- Andrei Alexandrescu (11/28) Sep 18 2012 After extensive tests with a variety of aggregate functions, I can say
- Thiez (5/8) Sep 19 2012 What if one tries to benchmark a nondeterministic function? In
- Andrei Alexandrescu (4/11) Sep 21 2012 I agree. Currently std.benchmark is not geared for measuring
- Manu (12/21) Sep 19 2012 The fastest execution time is rarely useful to me, I'm almost always muc...
- Peter Alexander (4/13) Sep 19 2012 The problem with slowest is that you end up with the occasional
- Jacob Carlborg (4/8) Sep 19 2012 That's way the average is good to have as well.
- Andrei Alexandrescu (5/11) Sep 21 2012 The occasional hiccup is often orders of magnitude slower than the rest,...
- Manu (3/14) Sep 20 2012 So then we need to start getting tricky, and choose the slowest one that...
- Andrei Alexandrescu (11/28) Sep 20 2012 The "best way" according to some of the people who've advised my
- Manu (11/45) Sep 20 2012 If you do more than a single iteration, the minimum will virtually alway...
- Andrei Alexandrescu (18/34) Sep 20 2012 To measure performance against cold cache, you could always clear the
- Manu (7/24) Sep 21 2012 Okay, I can buy this distinction in terminology.
- Andrei Alexandrescu (10/14) Sep 21 2012 That's a good angle. Profiling is currently done by the -profile switch,...
- Jacob Carlborg (7/14) Sep 21 2012 If you have an API that is fairly open and provides more of the raw
- Jonathan M Davis (3/8) Sep 21 2012 dmd has the -profile flag.
- Jacob Carlborg (4/7) Sep 20 2012 Why not min, max and average?
- Andrei Alexandrescu (3/8) Sep 20 2012 Because max and average are misleading and uninformative, as I explained...
- Andrei Alexandrescu (15/20) Sep 20 2012 For a very simple reason: unless the algorithm under benchmark is very
- Jacob Carlborg (9/22) Sep 21 2012 I may have completely misunderstood this but aren't we talking about
- Andrei Alexandrescu (9/30) Sep 21 2012 I disagree. I won't include something in my design just so people don't
- Jens Mueller (6/45) Sep 21 2012 You mean like extending StopWatch and allowing the user to provide the
- Simen Kjaeraas (7/14) Sep 22 2012 I certainly think the two use cases are similar enough to warrant their
- Manu (5/7) Sep 21 2012 This is only true for systems with a comprehensive pre-emptive OS runnin...
- Andrei Alexandrescu (4/14) Sep 21 2012 I think embedded systems that run e.g. Linux will be affected by task
- foobar (9/53) Sep 20 2012 From the responses on the thread clearly there isn't a "best way".
- Andrei Alexandrescu (16/29) Sep 20 2012 I don't quite agree. This is a domain in which intuition is having a
- Manu (11/16) Sep 21 2012 I don't think this is a 'fair' claim, the situation is that different
- Andrei Alexandrescu (5/22) Sep 21 2012 That's exactly where it all starts getting unprincipled. Just use the
- Graham Fawcett (11/38) Sep 19 2012 For comparison's sake, the Criterion benchmarking package for
- Andrei Alexandrescu (3/11) Sep 21 2012 Will look into it, thanks.
- Nick Sabalausky (33/43) Sep 20 2012 *Ahem*: http://zedshaw.com/essays/programmer_stats.html
- Andrei Alexandrescu (25/69) Sep 20 2012 What would be the orthodoxy? If orthodoxy is what google finds, it's
- Tove (7/12) Sep 21 2012 I reached the same conclusion and use the same method at work.
- Andrei Alexandrescu (3/6) Sep 21 2012 I think that's a great idea!
- Johannes Pfau (5/8) Sep 21 2012 A perfect use case for user defined attributes ;-)
- Jacob Carlborg (4/7) Sep 21 2012 Yes, we need user defined attributes and AST macros ASAP :)
- Manu (10/24) Sep 21 2012 Facebook isn't exactly 'realtime' software. Obviously, faster is always
- Andrei Alexandrescu (5/30) Sep 21 2012 You'd be surprised at how much we care about e.g. 90 percentile time to
- Nick Sabalausky (7/20) Sep 21 2012 (Emphasis added for proper context.)
- David Piepgrass (74/77) Sep 21 2012 Like others, I must also disagree in princple. The minimum sounds
- jerro (3/4) Sep 21 2012 It is possible to use a sampling profiler on D executables
- Dmitry Olshansky (24/59) Sep 21 2012 Real good profilers are the ones served by CPU vendor. See AMD's
- Andrei Alexandrescu (32/105) Sep 21 2012 I understand. What we currently do at Facebook is support benchmark
- David Piepgrass (6/14) Sep 21 2012 Huh? It's not nearly as confusing as reading the documentation
- Andrei Alexandrescu (9/35) Sep 21 2012 I misunderstood. I agree that it's a good thing to specify how
- David Piepgrass (11/38) Sep 22 2012 I am puzzled about what you think my suggestion meant. I am
- Nick Sabalausky (16/16) Sep 21 2012 Stepping back for a moment, I think we're facing two key issues here:
- Joseph Rushton Wakeling (15/26) Sep 23 2012 My feeling is that even with a good explanation in the docs, you're _sti...
- =?UTF-8?B?IsOYaXZpbmQi?= (8/8) Sep 19 2012 New question for you :)
- Andrei Alexandrescu (3/11) Sep 21 2012 I think you have discovered a major issue. Ideas on how to attack this?
- Jonathan M Davis (18/32) Sep 21 2012 Some of us have been asking for ages for the ability to mark a static
- =?UTF-8?B?IsOYaXZpbmQi?= (7/58) Sep 22 2012 Is there a way to solve the dependency issue without forbidding
- Andrei Alexandrescu (12/14) Sep 22 2012 I think an idea just occurred to me. The rules for static ctors and
- =?UTF-8?B?IsOYaXZpbmQi?= (6/21) Sep 22 2012 It gets a bit ugly maybe, but we could do a mix of the proposals
- =?UTF-8?B?IsOYaXZpbmQi?= (4/32) Sep 22 2012 We would probably not call it '@nycycliccheck' since you propose
- Simen Kjaeraas (7/14) Sep 22 2012 =
- Simen Kjaeraas (9/22) Sep 22 2012 ld =
- Jonathan M Davis (20/37) Sep 22 2012 uld
- Dmitry Olshansky (12/25) Sep 21 2012 Not ideal but...
- Jacob Carlborg (8/9) Sep 22 2012 The standard way to solve this would be to move the initialization code
- Jacob Carlborg (16/19) Sep 19 2012 * Why is "scheduleForBenchmarking" a string? Can't it be a template mixi...
- Jonathan M Davis (11/14) Sep 19 2012 util is one of the worst package names ever, because it means basically
- Jacob Carlborg (5/14) Sep 19 2012 I just think we have too many top level modules.
- Andrei Alexandrescu (10/28) Sep 22 2012 A default value of n would depend on the speed of the function and the
- Peter Alexander (14/14) Sep 19 2012 I don't see why `benchmark` takes (almost) all of its parameters
- Andrei Alexandrescu (4/7) Sep 21 2012 That is intentional - indirect calls would add undue overhead to the
- Peter Alexander (5/13) Sep 21 2012 I accept that it adds undue overhead. I just think that the
- Dmitry Olshansky (21/21) Sep 21 2012 I'd throw in a request to address the following.
Hi, it's my pleasure to announce the begin of the formal review of Andrei's std.benchmark. The review will start today and end in two weeks, on 1st of October. The review is followed by a week of voting which ends on 8th of October. Quoting Andrei from his request for formal review: "I reworked the benchmarking framework for backward compatibility, flexibility, and convenience. There are a few enhancement possibilities (such as tracking system/user time separately etc), but there is value in keeping things simple and convenient. Right now it really takes only one line of code and observing a simple naming convention to hook a module into the benchmarking framework." Code: https://github.com/D-Programming-Language/phobos/pull/794 Docs: http://dlang.org/phobos-prerelease/std_benchmark.html If std.benchmark is accepted it will likely lead to a deprecation of std.datetime's benchmark facilities. The code is provided as a pull requested and being (as usual) integrated by the auto tester for Mac OS X, FreeBSD, Linux and Windows (see (http://d.puremagic.com/test-results/pull-history.ghtml?repoid=3&pullid=794). In your comments you can/should address the * design * implementation * documentation * usefulness of the library. Provide information regarding the depth (ranging from very brief to in-depth) of your review and conclude explicitly whether std.benchmark should or shouldn't be included in Phobos. Post all feedback to this thread. Constructive feedback is very much appreciated. To conclude in more Andrei like words: Happy destruction! Jens
Sep 17 2012
On 9/17/12 5:13 PM, Jens Mueller wrote:If std.benchmark is accepted it will likely lead to a deprecation of std.datetime's benchmark facilities.One note - I moved the benchmark-related stuff from std.datetime unmodified into std.benchmark and left public aliases in place, so no code breakage is imminent. We may deprecate the aliases themselves later.To conclude in more Andrei like words: Happy destruction!Sounds about right! :o) Andrei
Sep 17 2012
I think the std.benchmark is definitely a useful library addition, but in my mind it currently a bit too limited. * All tests are run 1000 times. Depending on the length of the test to benchmark, this can be too much. In some cases it would be good to be able to trade the number of runs against accuracy. * For all tests, the best run is selected, but would it not be reasonable in some cases to get the average value? Maybe excluding the runs that are more than a couple std. deviations away from the mean value.. * Is there a way of specifying a test name other than the function-name when using the 'mixin(scheduleForBenchmarking)' approach to register benchmarks? * I would also like to be able (if possible) to register two mentioned things (number of runs and result strategy) with the mixin approach (or similar). * It seems like the baseline for subtraction from subsequent test runs is taken from a call to the test function, passing 1 to it. Shouldn't 0 be passed for this value? If these can be addressed, I would like it added to the library!
Sep 18 2012
On 9/18/12 5:07 PM, "Øivind" wrote:I think the std.benchmark is definitely a useful library addition, but in my mind it currently a bit too limited. * All tests are run 1000 times. Depending on the length of the test to benchmark, this can be too much. In some cases it would be good to be able to trade the number of runs against accuracy.It would be a good idea to make that a configurable parameter.* For all tests, the best run is selected, but would it not be reasonable in some cases to get the average value? Maybe excluding the runs that are more than a couple std. deviations away from the mean value..After extensive tests with a variety of aggregate functions, I can say firmly that taking the minimum time is by far the best when it comes to assessing the speed of a function.* Is there a way of specifying a test name other than the function-name when using the 'mixin(scheduleForBenchmarking)' approach to register benchmarks?Not currently. Probably a manual registration of an individual benchmark would make sense.* I would also like to be able (if possible) to register two mentioned things (number of runs and result strategy) with the mixin approach (or similar).Makes sense.* It seems like the baseline for subtraction from subsequent test runs is taken from a call to the test function, passing 1 to it. Shouldn't 0 be passed for this value?I'll look into that. Thanks, Andrei
Sep 18 2012
On Tuesday, 18 September 2012 at 22:01:30 UTC, Andrei Alexandrescu wrote:After extensive tests with a variety of aggregate functions, I can say firmly that taking the minimum time is by far the best when it comes to assessing the speed of a function.What if one tries to benchmark a nondeterministic function? In such a case one might well be interested in the best run, worst run, and the average.
Sep 19 2012
On 9/19/12 4:12 AM, Thiez wrote:On Tuesday, 18 September 2012 at 22:01:30 UTC, Andrei Alexandrescu wrote:I agree. Currently std.benchmark is not geared for measuring non-deterministic functions. AndreiAfter extensive tests with a variety of aggregate functions, I can say firmly that taking the minimum time is by far the best when it comes to assessing the speed of a function.What if one tries to benchmark a nondeterministic function? In such a case one might well be interested in the best run, worst run, and the average.
Sep 21 2012
On 19 September 2012 01:02, Andrei Alexandrescu < SeeWebsiteForEmail erdani.org> wrote:On 9/18/12 5:07 PM, "=C3=98ivind" wrote:The fastest execution time is rarely useful to me, I'm almost always much more interested in the slowest execution time. In realtime software, the slowest time is often the only important factor, everything must be designed to tolerate this possibility. I can also imagine other situations where multiple workloads are competing for time, the average time may be more useful in that case. Side question: Running a test over and over pre-populates the cache with all associated data after the first cycle... The cache needs to be randomised between each cycle to get realistic results.* For all tests, the best run is selected, but would it not bereasonable in some cases to get the average value? Maybe excluding theruns that are more than a couple std. deviations away from the mean value..After extensive tests with a variety of aggregate functions, I can say firmly that taking the minimum time is by far the best when it comes to assessing the speed of a function.
Sep 19 2012
The fastest execution time is rarely useful to me, I'm almost always much more interested in the slowest execution time. In realtime software, the slowest time is often the only important factor, everything must be designed to tolerate this possibility. I can also imagine other situations where multiple workloads are competing for time, the average time may be more useful in that case.The problem with slowest is that you end up with the occasional OS hiccup or GC collection which throws the entire benchmark off. I see your point, but unless you can prevent the OS from interrupting, the time would be meaningless.
Sep 19 2012
On 2012-09-19 11:38, Peter Alexander wrote:The problem with slowest is that you end up with the occasional OS hiccup or GC collection which throws the entire benchmark off. I see your point, but unless you can prevent the OS from interrupting, the time would be meaningless.That's way the average is good to have as well. -- /Jacob Carlborg
Sep 19 2012
On 9/19/12 3:54 PM, Jacob Carlborg wrote:On 2012-09-19 11:38, Peter Alexander wrote:The occasional hiccup is often orders of magnitude slower than the rest, which means it will ruin the average. You may have meant "median", which has more merit, but then I'd say why bother - just use the minimum. AndreiThe problem with slowest is that you end up with the occasional OS hiccup or GC collection which throws the entire benchmark off. I see your point, but unless you can prevent the OS from interrupting, the time would be meaningless.That's way the average is good to have as well.
Sep 21 2012
On 19 September 2012 12:38, Peter Alexander <peter.alexander.au gmail.com>wrote:The fastest execution time is rarely useful to me, I'm almost always muchSo then we need to start getting tricky, and choose the slowest one that is not beyond an order of magnitude or so outside the average?more interested in the slowest execution time. In realtime software, the slowest time is often the only important factor, everything must be designed to tolerate this possibility. I can also imagine other situations where multiple workloads are competing for time, the average time may be more useful in that case.The problem with slowest is that you end up with the occasional OS hiccup or GC collection which throws the entire benchmark off. I see your point, but unless you can prevent the OS from interrupting, the time would be meaningless.
Sep 20 2012
On 9/20/12 2:42 AM, Manu wrote:On 19 September 2012 12:38, Peter Alexander <peter.alexander.au gmail.com <mailto:peter.alexander.au gmail.com>> wrote: The fastest execution time is rarely useful to me, I'm almost always much more interested in the slowest execution time. In realtime software, the slowest time is often the only important factor, everything must be designed to tolerate this possibility. I can also imagine other situations where multiple workloads are competing for time, the average time may be more useful in that case. The problem with slowest is that you end up with the occasional OS hiccup or GC collection which throws the entire benchmark off. I see your point, but unless you can prevent the OS from interrupting, the time would be meaningless. So then we need to start getting tricky, and choose the slowest one that is not beyond an order of magnitude or so outside the average?The "best way" according to some of the people who've advised my implementation of the framework at Facebook is to take the mode of the measurements distribution, i.e. the time at the maximum density. I implemented that (and it's not easy). It yielded numbers close to the minimum, but less stable and needing more iterations to become stable (when they do get indeed close to the minimum). Let's use the minimum. It is understood it's not what you'll see in production, but it is an excellent proxy for indicative and reproducible performance numbers. Andrei
Sep 20 2012
On 20 September 2012 15:36, Andrei Alexandrescu < SeeWebsiteForEmail erdani.org> wrote:On 9/20/12 2:42 AM, Manu wrote:If you do more than a single iteration, the minimum will virtually always be influenced by ideal cache pre-population, which is unrealistic. Memory locality is often the biggest contributing performance hazard in many algorithms, and usually the most unpredictable. I want to know about that in my measurements. Reproducibility is not important to me as accuracy. And I'd rather be conservative(/pessimistic) with the error. What guideline would you apply to estimate 'real-world' time spent when always working with hyper-optimistic measurements?On 19 September 2012 12:38, Peter Alexander <peter.alexander.au gmail.com <mailto:peter.alexander.au **gmail.com<peter.alexander.au gmail.com>>> wrote: The fastest execution time is rarely useful to me, I'm almost always much more interested in the slowest execution time. In realtime software, the slowest time is often the only important factor, everything must be designed to tolerate this possibility. I can also imagine other situations where multiple workloads are competing for time, the average time may be more useful in that case. The problem with slowest is that you end up with the occasional OS hiccup or GC collection which throws the entire benchmark off. I see your point, but unless you can prevent the OS from interrupting, the time would be meaningless. So then we need to start getting tricky, and choose the slowest one that is not beyond an order of magnitude or so outside the average?The "best way" according to some of the people who've advised my implementation of the framework at Facebook is to take the mode of the measurements distribution, i.e. the time at the maximum density. I implemented that (and it's not easy). It yielded numbers close to the minimum, but less stable and needing more iterations to become stable (when they do get indeed close to the minimum). Let's use the minimum. It is understood it's not what you'll see in production, but it is an excellent proxy for indicative and reproducible performance numbers.
Sep 20 2012
On 9/20/12 10:05 AM, Manu wrote:On 20 September 2012 15:36, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org <mailto:SeeWebsiteForEmail erdani.org>> wrote: Let's use the minimum. It is understood it's not what you'll see in production, but it is an excellent proxy for indicative and reproducible performance numbers. If you do more than a single iteration, the minimum will virtually always be influenced by ideal cache pre-population, which is unrealistic.To measure performance against cold cache, you could always clear the cache using one of the available methods, see http://stackoverflow.com/questions/1756825/cpu-cache-flush. Probably std.benchmark could include a routine that does that. But performance on cold would actually be most unrealistic and uninformative, as loading the memory into cache will dominate the work that the algorithm is doing, so essentially the benchmark would evaluate the memory bandwidth against the working set of the algorithm. That may be occasionally useful, but I'd argue that most often the interest in benchmarking is to measure repeated application of a function, not occasional use of it.Memory locality is often the biggest contributing performance hazard in many algorithms, and usually the most unpredictable. I want to know about that in my measurements. Reproducibility is not important to me as accuracy. And I'd rather be conservative(/pessimistic) with the error. What guideline would you apply to estimate 'real-world' time spent when always working with hyper-optimistic measurements?The purpose of std.benchmark is not to estimate real-world time. (That is the purpose of profiling.) Instead, benchmarking measures and provides a good proxy of that time for purposes of optimizing the algorithm. If work is done on improving the minimum time given by the benchmark framework, it is reasonable to expect that performance in-situ will also improve. Andrei
Sep 20 2012
On 21 September 2012 07:17, Andrei Alexandrescu < SeeWebsiteForEmail erdani.org> wrote:On 9/20/12 10:05 AM, Manu wrote:Okay, I can buy this distinction in terminology. What I'm typically more interested in is profiling. I do occasionally need to do some benchmarking by your definition, so I'll find this useful, but should there then be another module to provide a 'profiling' API? Also worked into this API?Memory locality is often the biggest contributingperformance hazard in many algorithms, and usually the mostunpredictable. I want to know about that in my measurements. Reproducibility is not important to me as accuracy. And I'd rather be conservative(/pessimistic) with the error.What guideline would you apply to estimate 'real-world' time spent when always working with hyper-optimistic measurements?The purpose of std.benchmark is not to estimate real-world time. (That is the purpose of profiling.) Instead, benchmarking measures and provides a good proxy of that time for purposes of optimizing the algorithm. If work is done on improving the minimum time given by the benchmark framework, it is reasonable to expect that performance in-situ will also improve.
Sep 21 2012
On 9/21/12 10:58 AM, Manu wrote:What I'm typically more interested in is profiling. I do occasionally need to do some benchmarking by your definition, so I'll find this useful, but should there then be another module to provide a 'profiling' API? Also worked into this API?That's a good angle. Profiling is currently done by the -profile switch, and there are a couple of library functions associated with it. To my surprise, that documentation page has not been ported to the dlang.org style: http://digitalmars.com/ctg/trace.html I haven't yet thought whether std.benchmark should add more profiling-related primitives. I'd opine for releasing it without such for the time being. Thanks, Andrei
Sep 21 2012
On 2012-09-21 18:21, Andrei Alexandrescu wrote:That's a good angle. Profiling is currently done by the -profile switch, and there are a couple of library functions associated with it. To my surprise, that documentation page has not been ported to the dlang.org style: http://digitalmars.com/ctg/trace.html I haven't yet thought whether std.benchmark should add more profiling-related primitives. I'd opine for releasing it without such for the time being.If you have an API that is fairly open and provides more of the raw results then one can build a more profiling like solution on top of that. This can later be used to create a specific profiling module if we choose to do so. -- /Jacob Carlborg
Sep 21 2012
On Friday, September 21, 2012 17:58:05 Manu wrote:Okay, I can buy this distinction in terminology. What I'm typically more interested in is profiling. I do occasionally need to do some benchmarking by your definition, so I'll find this useful, but should there then be another module to provide a 'profiling' API? Also worked into this API?dmd has the -profile flag. - Jonathan M Davis
Sep 21 2012
On 2012-09-20 14:36, Andrei Alexandrescu wrote:Let's use the minimum. It is understood it's not what you'll see in production, but it is an excellent proxy for indicative and reproducible performance numbers.Why not min, max and average? -- /Jacob Carlborg
Sep 20 2012
On 9/20/12 1:37 PM, Jacob Carlborg wrote:On 2012-09-20 14:36, Andrei Alexandrescu wrote:Because max and average are misleading and uninformative, as I explained. AndreiLet's use the minimum. It is understood it's not what you'll see in production, but it is an excellent proxy for indicative and reproducible performance numbers.Why not min, max and average?
Sep 20 2012
On 9/20/12 2:37 PM, Jacob Carlborg wrote:On 2012-09-20 14:36, Andrei Alexandrescu wrote:For a very simple reason: unless the algorithm under benchmark is very long-running, max is completely useless, and it ruins average as well. For virtually all benchmarks I've run, the distribution of timings is a half-Gaussian very concentrated around the minimum. Say you have a minimum of e.g. 73 us. Then there would be a lot of results close to that; the mode of the distribution would be very close, e.g. 75 us, and the more measurements you take, the closer the mode is to the minimum. Then you have a few timings up to e.g. 90 us. And finally you will inevitably have a few outliers at some milliseconds. Those are orders of magnitude larger than anything of interest and are caused by system interrupts that happened to fall in the middle of the measurement. Taking those into consideration and computing the average with those outliers simply brings useless noise into the measurement process. AndreiLet's use the minimum. It is understood it's not what you'll see in production, but it is an excellent proxy for indicative and reproducible performance numbers.Why not min, max and average?
Sep 20 2012
On 2012-09-21 06:23, Andrei Alexandrescu wrote:For a very simple reason: unless the algorithm under benchmark is very long-running, max is completely useless, and it ruins average as well.I may have completely misunderstood this but aren't we talking about what do include in the output of the benchmark? In that case, if you don't like max and average just don't look at it.For virtually all benchmarks I've run, the distribution of timings is a half-Gaussian very concentrated around the minimum. Say you have a minimum of e.g. 73 us. Then there would be a lot of results close to that; the mode of the distribution would be very close, e.g. 75 us, and the more measurements you take, the closer the mode is to the minimum. Then you have a few timings up to e.g. 90 us. And finally you will inevitably have a few outliers at some milliseconds. Those are orders of magnitude larger than anything of interest and are caused by system interrupts that happened to fall in the middle of the measurement. Taking those into consideration and computing the average with those outliers simply brings useless noise into the measurement process.After your replay to one of Manu's post, I think I misunderstood the std.benchmark module. I was thinking more of profiling. But are these quite similar tasks, couldn't std.benchmark work for both? -- /Jacob Carlborg
Sep 21 2012
On 9/21/12 5:39 AM, Jacob Carlborg wrote:On 2012-09-21 06:23, Andrei Alexandrescu wrote:I disagree. I won't include something in my design just so people don't look at it most of the time. Min and average are most of the time an awful thing to include, and will throw off people with bizarre results. If it's there, it's worth looking at. Note how all columns are directly comparable (I might add, unlike other approaches to benchmarking).For a very simple reason: unless the algorithm under benchmark is very long-running, max is completely useless, and it ruins average as well.I may have completely misunderstood this but aren't we talking about what do include in the output of the benchmark? In that case, if you don't like max and average just don't look at it.This is an interesting idea. It would delay release quite a bit because I'd need to design and implement things like performance counters and such. AndreiFor virtually all benchmarks I've run, the distribution of timings is a half-Gaussian very concentrated around the minimum. Say you have a minimum of e.g. 73 us. Then there would be a lot of results close to that; the mode of the distribution would be very close, e.g. 75 us, and the more measurements you take, the closer the mode is to the minimum. Then you have a few timings up to e.g. 90 us. And finally you will inevitably have a few outliers at some milliseconds. Those are orders of magnitude larger than anything of interest and are caused by system interrupts that happened to fall in the middle of the measurement. Taking those into consideration and computing the average with those outliers simply brings useless noise into the measurement process.After your replay to one of Manu's post, I think I misunderstood the std.benchmark module. I was thinking more of profiling. But are these quite similar tasks, couldn't std.benchmark work for both?
Sep 21 2012
Andrei Alexandrescu wrote:On 9/21/12 5:39 AM, Jacob Carlborg wrote:You mean like extending StopWatch and allowing the user to provide the measuring code, i.e. counting the number of instructions. This would be very useful. Is it possible to make sure that these changes can be introduced later without breaking the API? JensOn 2012-09-21 06:23, Andrei Alexandrescu wrote:I disagree. I won't include something in my design just so people don't look at it most of the time. Min and average are most of the time an awful thing to include, and will throw off people with bizarre results. If it's there, it's worth looking at. Note how all columns are directly comparable (I might add, unlike other approaches to benchmarking).For a very simple reason: unless the algorithm under benchmark is very long-running, max is completely useless, and it ruins average as well.I may have completely misunderstood this but aren't we talking about what do include in the output of the benchmark? In that case, if you don't like max and average just don't look at it.This is an interesting idea. It would delay release quite a bit because I'd need to design and implement things like performance counters and such.For virtually all benchmarks I've run, the distribution of timings is a half-Gaussian very concentrated around the minimum. Say you have a minimum of e.g. 73 us. Then there would be a lot of results close to that; the mode of the distribution would be very close, e.g. 75 us, and the more measurements you take, the closer the mode is to the minimum. Then you have a few timings up to e.g. 90 us. And finally you will inevitably have a few outliers at some milliseconds. Those are orders of magnitude larger than anything of interest and are caused by system interrupts that happened to fall in the middle of the measurement. Taking those into consideration and computing the average with those outliers simply brings useless noise into the measurement process.After your replay to one of Manu's post, I think I misunderstood the std.benchmark module. I was thinking more of profiling. But are these quite similar tasks, couldn't std.benchmark work for both?
Sep 21 2012
On 2012-21-09 22:58:36, Andrei Alexandrescu wrote:On 9/21/12 5:39 AM, Jacob Carlborg wrote:I certainly think the two use cases are similar enough to warrant their inclusion in a common module. That does not preclude std.benchmark being included as is now, and extended with profiling features at some later point. -- SimenAfter your replay to one of Manu's post, I think I misunderstood the std.benchmark module. I was thinking more of profiling. But are these quite similar tasks, couldn't std.benchmark work for both?This is an interesting idea. It would delay release quite a bit because I'd need to design and implement things like performance counters and such.
Sep 22 2012
On 21 September 2012 07:23, Andrei Alexandrescu < SeeWebsiteForEmail erdani.org> wrote:For a very simple reason: unless the algorithm under benchmark is very long-running, max is completely useless, and it ruins average as well.This is only true for systems with a comprehensive pre-emptive OS running on the same core. Most embedded systems will only be affected by cache misses and bus contention, in that situation, max is perfectly acceptable.
Sep 21 2012
On 9/21/12 11:14 AM, Manu wrote:On 21 September 2012 07:23, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org <mailto:SeeWebsiteForEmail erdani.org>> wrote: For a very simple reason: unless the algorithm under benchmark is very long-running, max is completely useless, and it ruins average as well. This is only true for systems with a comprehensive pre-emptive OS running on the same core. Most embedded systems will only be affected by cache misses and bus contention, in that situation, max is perfectly acceptable.I think embedded systems that run e.g. Linux will be affected by task switching. Andrei
Sep 21 2012
On Thursday, 20 September 2012 at 12:35:15 UTC, Andrei Alexandrescu wrote:On 9/20/12 2:42 AM, Manu wrote:From the responses on the thread clearly there isn't a "best way". There are different use-cases with different tradeoffs so why not allow the user to choose the policy best suited for their use-case? I'd suggest to provide a few reasonable common choices to choose from, as well as a way to provide a user defined calculation (function pointer/delegate?)On 19 September 2012 12:38, Peter Alexander <peter.alexander.au gmail.com <mailto:peter.alexander.au gmail.com>> wrote: The fastest execution time is rarely useful to me, I'm almost always much more interested in the slowest execution time. In realtime software, the slowest time is often the only important factor, everything must be designed to tolerate this possibility. I can also imagine other situations where multiple workloads are competing for time, the average time may be more useful in that case. The problem with slowest is that you end up with the occasional OS hiccup or GC collection which throws the entire benchmark off. I see your point, but unless you can prevent the OS from interrupting, the time would be meaningless. So then we need to start getting tricky, and choose the slowest one that is not beyond an order of magnitude or so outside the average?The "best way" according to some of the people who've advised my implementation of the framework at Facebook is to take the mode of the measurements distribution, i.e. the time at the maximum density. I implemented that (and it's not easy). It yielded numbers close to the minimum, but less stable and needing more iterations to become stable (when they do get indeed close to the minimum). Let's use the minimum. It is understood it's not what you'll see in production, but it is an excellent proxy for indicative and reproducible performance numbers. Andrei
Sep 20 2012
On 9/20/12 3:01 PM, foobar wrote:On Thursday, 20 September 2012 at 12:35:15 UTC, Andrei Alexandrescu wrote:I don't quite agree. This is a domain in which intuition is having a hard time, and at least some of the responses come from an intuitive standpoint, as opposed from hard data. For example, there's this opinion that taking the min, max, and average is the "fair" thing to do and the most informative. However, all noise in measuring timing is additive. Unless you talk about performance of entire large systems with networking, I/O, and the such, algorithms running in memory are inevitably spending time doing work, to which various sources of noise (system interrupts, clock quantization, benchmarking framework) just _add_ some time. Clearly these components do affect the visible duration of the algorithm, but if you want to improve it you need to remove the noise.Let's use the minimum. It is understood it's not what you'll see in production, but it is an excellent proxy for indicative and reproducible performance numbers. AndreiFrom the responses on the thread clearly there isn't a "best way".There are different use-cases with different tradeoffs so why not allow the user to choose the policy best suited for their use-case? I'd suggest to provide a few reasonable common choices to choose from, as well as a way to provide a user defined calculation (function pointer/delegate?)Reasonable choices are great, but in this case it's a bit difficult to figure what's reasonable. Andrei
Sep 20 2012
On 21 September 2012 07:30, Andrei Alexandrescu < SeeWebsiteForEmail erdani.org> wrote:I don't quite agree. This is a domain in which intuition is having a hard time, and at least some of the responses come from an intuitive standpoint, as opposed from hard data. For example, there's this opinion that taking the min, max, and average is the "fair" thing to do and the most informative.I don't think this is a 'fair' claim, the situation is that different people are looking for different statistical information, and you can distinguish it with whatever terminology you prefer. You are only addressing a single use case; 'benchmarking', by your definition. I'm more frequently interested in profiling than 'benchmark'ing, and I think both are useful to have. The thing is, the distinction between 'benchmarking' and 'profiling' is effectively implemented via nothing more than the sampling algorithm; min vs avg, so is it sensible to expose the distinction in the API in this way?
Sep 21 2012
On 9/20/12 3:42 AM, Manu wrote:On 19 September 2012 12:38, Peter Alexander <peter.alexander.au gmail.com <mailto:peter.alexander.au gmail.com>> wrote: The fastest execution time is rarely useful to me, I'm almost always much more interested in the slowest execution time. In realtime software, the slowest time is often the only important factor, everything must be designed to tolerate this possibility. I can also imagine other situations where multiple workloads are competing for time, the average time may be more useful in that case. The problem with slowest is that you end up with the occasional OS hiccup or GC collection which throws the entire benchmark off. I see your point, but unless you can prevent the OS from interrupting, the time would be meaningless. So then we need to start getting tricky, and choose the slowest one that is not beyond an order of magnitude or so outside the average?That's exactly where it all starts getting unprincipled. Just use the minimum. Just. Use. The. Minimum. Andrei
Sep 21 2012
On Wednesday, 19 September 2012 at 08:28:36 UTC, Manu wrote:On 19 September 2012 01:02, Andrei Alexandrescu < SeeWebsiteForEmail erdani.org> wrote:For comparison's sake, the Criterion benchmarking package for Haskell is worth a look: http://www.serpentine.com/blog/2009/09/29/criterion-a-new-benchmarking-library-for-haskell/ Criterion accounts for clock-call costs, displays various central tendencies, reports outliers (and their significance --- whether the variance is significantly affected by the outliers), etc., etc. It's a very well conceived benchmarking system, and might well be worth stealing from. Best, GrahamOn 9/18/12 5:07 PM, "Øivind" wrote:The fastest execution time is rarely useful to me, I'm almost always much more interested in the slowest execution time. In realtime software, the slowest time is often the only important factor, everything must be designed to tolerate this possibility. I can also imagine other situations where multiple workloads are competing for time, the average time may be more useful in that case.* For all tests, the best run is selected, but would it not bereasonable in some cases to get the average value? Maybe excluding theruns that are more than a couple std. deviations away from the mean value..After extensive tests with a variety of aggregate functions, I can say firmly that taking the minimum time is by far the best when it comes to assessing the speed of a function.
Sep 19 2012
On 9/19/12 3:59 PM, Graham Fawcett wrote:For comparison's sake, the Criterion benchmarking package for Haskell is worth a look: http://www.serpentine.com/blog/2009/09/29/criterion-a-new-benchmarking-library-for-haskell/ Criterion accounts for clock-call costs, displays various central tendencies, reports outliers (and their significance --- whether the variance is significantly affected by the outliers), etc., etc. It's a very well conceived benchmarking system, and might well be worth stealing from.Will look into it, thanks. Andrei
Sep 21 2012
On Tue, 18 Sep 2012 18:02:10 -0400 Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:On 9/18/12 5:07 PM, "=D8ivind" wrote:*Ahem*: http://zedshaw.com/essays/programmer_stats.html Your claim that the minimum time is sufficient is...ummm...extremely unorthodox, to say the least. As such, you're going to need a far more convincing argument than "It worked well for me." I assume I don't need to preach that "Extraordinary claims require extraordinary evidence". But, condensing benchmarks and statistics down to "take the minimum" and saying that's sufficient is one heck of an extraordinary claim. If you feel that you can provide sufficiently extraordinary justification, then please do. Otherwise, I think we'll need richer results. At the very least there should be an easy way to get at the raw results programmatically so we can run whatever stats/plots/visualizations/output-formats we want. I didn't see anything like that browsing through the docs, but it's possible I may have missed it. That brings up another question too: I like the idea of a one-stop-benchmarking-shop, much like we have for unittests, but maybe reporting shouldn't be so tightly integrated and left more open for integration with a proper statistics lib and more generalized output abilities? But of course, that doesn't preclude having a nice built-in, but optional, default report. (Again though, maybe I'm overlooking something already in the module?) One other nitpick: My initial impression is that the "benchmark_relative_file read" stuff seems a bit kludgey (and confusing to visually parse). Is there maybe a better way to handle that? For example, inspired by getopt: printBenchmarks!( "file write", { std.file.write("/tmp/deleteme", "hello, world!"); }, BenchmarkOption.relative, "file read", { std.file.read("/tmp/deleteme"); }, "array creation", { new char[32]; }) ();* For all tests, the best run is selected, but would it not be reasonable in some cases to get the average value? Maybe excluding the runs that are more than a couple std. deviations away from the mean value..=20 After extensive tests with a variety of aggregate functions, I can say firmly that taking the minimum time is by far the best when it comes to assessing the speed of a function. =20
Sep 20 2012
On 9/20/12 11:03 PM, Nick Sabalausky wrote:On Tue, 18 Sep 2012 18:02:10 -0400 Andrei Alexandrescu<SeeWebsiteForEmail erdani.org> wrote:I'm not sure I figure how this applies to the discussion at hand.On 9/18/12 5:07 PM, "Øivind" wrote:*Ahem*: http://zedshaw.com/essays/programmer_stats.html* For all tests, the best run is selected, but would it not be reasonable in some cases to get the average value? Maybe excluding the runs that are more than a couple std. deviations away from the mean value..After extensive tests with a variety of aggregate functions, I can say firmly that taking the minimum time is by far the best when it comes to assessing the speed of a function.Your claim that the minimum time is sufficient is...ummm...extremely unorthodox, to say the least.What would be the orthodoxy? If orthodoxy is what google finds, it's good we're not orthodox.As such, you're going to need a far more convincing argument than "It worked well for me."Sure. I have just detailed the choices made by std.benchmark in a couple of posts. At Facebook we measure using the minimum, and it's working for us. We've tried other approaches (such as taking the mode of the distribution). Turns out the minimum is better every time. Take a look at the early return in estimateTime(): https://github.com/facebook/folly/blob/master/folly/Benchmark.cpp#L136I assume I don't need to preach that "Extraordinary claims require extraordinary evidence". But, condensing benchmarks and statistics down to "take the minimum" and saying that's sufficient is one heck of an extraordinary claim. If you feel that you can provide sufficiently extraordinary justification, then please do.My claim is unremarkable. All I'm saying is the minimum running time of an algorithm on a given input is a stable and indicative proxy for the behavior of the algorithm in general. So it's a good target for optimization. There might be some confusion that std.benchmark does profiling. That's not part of its charter.Otherwise, I think we'll need richer results. At the very least there should be an easy way to get at the raw results programmatically so we can run whatever stats/plots/visualizations/output-formats we want. I didn't see anything like that browsing through the docs, but it's possible I may have missed it.Currently std.benchmark does not expose raw results for the sake of simplicity. It's easy to expose such, but I'd need a bit more convincing about their utility.That brings up another question too: I like the idea of a one-stop-benchmarking-shop, much like we have for unittests, but maybe reporting shouldn't be so tightly integrated and left more open for integration with a proper statistics lib and more generalized output abilities? But of course, that doesn't preclude having a nice built-in, but optional, default report. (Again though, maybe I'm overlooking something already in the module?)That's pretty much what's happening. There's an API for collecting timings, and then there's an API for printing those with a default format.One other nitpick: My initial impression is that the "benchmark_relative_file read" stuff seems a bit kludgey (and confusing to visually parse). Is there maybe a better way to handle that? For example, inspired by getopt: printBenchmarks!( "file write", { std.file.write("/tmp/deleteme", "hello, world!"); }, BenchmarkOption.relative, "file read", { std.file.read("/tmp/deleteme"); }, "array creation", { new char[32]; }) ();The issue here is automating the benchmark of a module, which would require some naming convention anyway. Andrei
Sep 20 2012
On Friday, 21 September 2012 at 04:44:58 UTC, Andrei Alexandrescu wrote:I reached the same conclusion and use the same method at work. Considering min will converge towards a stable value quite quickly... would it not be a reasonable default to auto detect when the min is stable with some degree of statistical certainty...?Andrei Alexandrescu<SeeWebsiteForEmail erdani.org> wrote:My claim is unremarkable. All I'm saying is the minimum running time of an algorithm on a given input is a stable and indicative proxy for the behavior of the algorithm in general. So it's a good target for optimization.
Sep 21 2012
On 9/21/12 5:46 AM, Tove wrote:Considering min will converge towards a stable value quite quickly... would it not be a reasonable default to auto detect when the min is stable with some degree of statistical certainty...?I think that's a great idea! Andrei
Sep 21 2012
Am Fri, 21 Sep 2012 00:45:44 -0400 schrieb Andrei Alexandrescu <SeeWebsiteForEmail erdani.org>:The issue here is automating the benchmark of a module, which would require some naming convention anyway.A perfect use case for user defined attributes ;-) benchmark void foo(){} benchmark("File read test") void foo(){}
Sep 21 2012
On 2012-09-21 19:45, Johannes Pfau wrote:A perfect use case for user defined attributes ;-) benchmark void foo(){} benchmark("File read test") void foo(){}Yes, we need user defined attributes and AST macros ASAP :) -- /Jacob Carlborg
Sep 21 2012
On 21 September 2012 07:45, Andrei Alexandrescu < SeeWebsiteForEmail erdani.org> wrote:As such, you're going to need a far moreFacebook isn't exactly 'realtime' software. Obviously, faster is always better, but it's not in a situation where if you slip a sync point by 1ms in an off case, it's all over. You can lose 1ms here, and make it up at a later time, and the result is the same. But again, this feeds back to your distinction between benchmarking and profiling. Otherwise, I think we'll need richer results. At the very least thereconvincing argument than "It worked well for me."Sure. I have just detailed the choices made by std.benchmark in a couple of posts. At Facebook we measure using the minimum, and it's working for us.Custom visualisation, realtime charting/plotting, user supplied reduce function?should be an easy way to get at the raw results programmatically so we can run whatever stats/plots/visualizations/**output-formats we want. I didn't see anything like that browsing through the docs, but it's possible I may have missed it.Currently std.benchmark does not expose raw results for the sake of simplicity. It's easy to expose such, but I'd need a bit more convincing about their utility.
Sep 21 2012
On 9/21/12 11:12 AM, Manu wrote:On 21 September 2012 07:45, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org <mailto:SeeWebsiteForEmail erdani.org>> wrote: As such, you're going to need a far more convincing argument than "It worked well for me." Sure. I have just detailed the choices made by std.benchmark in a couple of posts. At Facebook we measure using the minimum, and it's working for us. Facebook isn't exactly 'realtime' software. Obviously, faster is always better, but it's not in a situation where if you slip a sync point by 1ms in an off case, it's all over. You can lose 1ms here, and make it up at a later time, and the result is the same. But again, this feeds back to your distinction between benchmarking and profiling.You'd be surprised at how much we care about e.g. 90 percentile time to interaction.Otherwise, I think we'll need richer results. At the very least there should be an easy way to get at the raw results programmatically so we can run whatever stats/plots/visualizations/__output-formats we want. I didn't see anything like that browsing through the docs, but it's possible I may have missed it. Currently std.benchmark does not expose raw results for the sake of simplicity. It's easy to expose such, but I'd need a bit more convincing about their utility. Custom visualisation, realtime charting/plotting, user supplied reduce function?Hrm, that sounds like an entire new project. Andrei
Sep 21 2012
On Fri, 21 Sep 2012 17:00:29 -0400 Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:On 9/21/12 11:12 AM, Manu wrote:(Emphasis added for proper context.)On 21 September 2012 07:45, Andrei Alexandrescu Currently std.benchmark does not expose raw results for the sake of simplicity. It's easy to expose such, but I'd need a bit ***more convincing about their utility***.That doesn't diminish their utility. Keep in mind, nobody's suggesting putting all of that into std.benchmark (certainly not initially anyway), but the idea is to at least have the door open for them.Custom visualisation, realtime charting/plotting, user supplied reduce function?Hrm, that sounds like an entire new project.
Sep 21 2012
After extensive tests with a variety of aggregate functions, I can say firmly that taking the minimum time is by far the best when it comes to assessing the speed of a function.Like others, I must also disagree in princple. The minimum sounds like a useful metric for functions that (1) do the same amount of work in every test and (2) are microbenchmarks, i.e. they measure a small and simple task. If the benchmark being measured either (1) varies the amount of work each time (e.g. according to some approximation of real-world input, which obviously may vary)* or (2) measures a large system, then the average and standard deviation and even a histogram may be useful (or perhaps some indicator whether the runtimes are consistent with a normal distribution or not). If the running-time is long then the max might be useful (because things like task-switching overhead probably do not contribute that much to the total). * I anticipate that you might respond "so, only test a single input per benchmark", but if I've got 1000 inputs that I want to try, I really don't want to write 1000 functions nor do I want 1000 lines of output from the benchmark. An average, standard deviation, min and max may be all I need, and if I need more detail, then I might break it up into 10 groups of 100 inputs. In any case, the minimum runtime is not the desired output when the input varies. It's a little surprising to hear "The purpose of std.benchmark is not to estimate real-world time. (That is the purpose of profiling)"... Firstly, of COURSE I would want to estimate real-world time with some of my benchmarks. For some benchmarks I just want to know which of two or three approaches is faster, or to get a coarse ball-park sense of performance, but for others I really want to know the wall-clock time used for realistic inputs. Secondly, what D profiler actually helps you answer the question "where does the time go in the real-world?"? The D -profile switch creates an instrumented executable, which in my experience (admittedly not experience with DMD) severely distorts running times. I usually prefer sampling-based profiling, where the executable is left unchanged and a sampling program interrupts the program at random and grabs the call stack, to avoid the distortion effect of instrumentation. Of course, instrumentation is useful to find out what functions are called the most and whether call frequencies are in line with expectations, but I wouldn't trust the time measurements that much. As far as I know, D doesn't offer a sampling profiler, so one might indeed use a benchmarking library as a (poor) substitute. So I'd want to be able to set up some benchmarks that operate on realistic data, with perhaps different data in different runs in order to learn about how the speed varies with different inputs (if it varies a lot then I might create more benchmarks to investigate which inputs are processed quickly, and which slowly.) Some random comments about std.benchmark based on its documentation: - It is very strange that the documentation of printBenchmarks uses neither of the words "average" or "minimum", and doesn't say how many trials are done.... I suppose the obvious interpretation is that it only does one trial, but then we wouldn't be having this discussion about averages and minimums right? Øivind says tests are run 1000 times... but it needs to be configurable per-test (my idea: support a _x1000 suffix in function names, or _for1000ms to run the test for at least 1000 milliseconds; and allow a multiplier when when running a group of benchmarks, e.g. a multiplier argument of 0.5 means to only run half as many trials as usual.) Also, it is not clear from the documentation what the single parameter to each benchmark is (define "iterations count".) - The "benchmark_relative_" feature looks quite useful. I'm also happy to see benchmarkSuspend() and benchmarkResume(), though benchmarkSuspend() seems redundant in most cases: I'd like to just call one function, say, benchmarkStart() to indicate "setup complete, please start measuring time now." - I'm glad that StopWatch can auto-start; but the documentation should be clearer: does reset() stop the timer or just reset the time to zero? does stop() followed by start() start from zero or does it keep the time on the clock? I also think there should be a method that returns the value of peek() and restarts the timer at the same time (perhaps stop() and reset() should just return peek()?) - After reading the documentation of comparingBenchmark and measureTime, I have almost no idea what they do.
Sep 21 2012
As far as I know, D doesn't offer a sampling profiler,It is possible to use a sampling profiler on D executables though. I usually use perf on Linux and AMD CodeAnalyst on Windows.
Sep 21 2012
On 21-Sep-12 22:49, David Piepgrass wrote:After extensive tests with a variety of aggregate functions, I can say firmly that taking the minimum time is by far the best when it comes to assessing the speed of a function.As far as I know, D doesn't offer a sampling profiler, so one might indeed use a benchmarking library as a (poor) substitute. So I'd want to be able to set up some benchmarks that operate on realistic data, with perhaps different data in different runs in order to learn about how the speed varies with different inputs (if it varies a lot then I might create more benchmarks to investigate which inputs are processed quickly, and which slowly.)Real good profilers are the ones served by CPU vendor. See AMD's CodeAnalyst or Intel's VTune. They could even count number of branch predictions, cache misses etc. It is certainly out of the charter of module or for that matter any standard library code.Some random comments about std.benchmark based on its documentation: - It is very strange that the documentation of printBenchmarks uses neither of the words "average" or "minimum", and doesn't say how many trials are done.... I suppose the obvious interpretation is that it only does one trial, but then we wouldn't be having this discussion about averages and minimums right?See the algorithm in action here: https://github.com/D-Programming-Language/phobos/pull/794/files#L2R381 In other word a function is run 10^n times with n is picked so that total time is big enough to be a trustworthy measurement. Then run-time is time/10^n. Øivind says tests are run 1000 times... The above 1000 times, picking the minimum as the best. Obviously it'd be good to be configurable. butit needs to be configurable per-test (my idea: support a _x1000 suffix in function names, or _for1000ms to run the test for at least 1000 milliseconds; and allow a multiplier when when running a group of benchmarks, e.g. a multiplier argument of 0.5 means to only run half as many trials as usual.) Also, it is not clear from the documentation what the single parameter to each benchmark is (define "iterations count".)- The "benchmark_relative_" feature looks quite useful. I'm also happy to see benchmarkSuspend() and benchmarkResume(), though benchmarkSuspend() seems redundant in most cases: I'd like to just call one function, say, benchmarkStart() to indicate "setup complete, please start measuring time now." - I'm glad that StopWatch can auto-start; but the documentation should be clearer: does reset() stop the timer or just reset the time to zero? does stop() followed by start() start from zero or does it keep the time on the clock? I also think there should be a method that returns the value of peek() and restarts the timer at the same time (perhaps stop() and reset() should just return peek()?)It's the same as the usual stopwatch (as in the real hardware thingy). Thus: - reset just resets numbers to zeros - stop just stops counting - start just starts counting - peek imitates taking a look at numbers on a device ;)- After reading the documentation of comparingBenchmark and measureTime, I have almost no idea what they do.I think that comparingBenchmark was present in std.datetime and is carried over as is. -- Dmitry Olshansky
Sep 21 2012
On 9/21/12 2:49 PM, David Piepgrass wrote:That is correct.After extensive tests with a variety of aggregate functions, I can say firmly that taking the minimum time is by far the best when it comes to assessing the speed of a function.Like others, I must also disagree in princple. The minimum sounds like a useful metric for functions that (1) do the same amount of work in every test and (2) are microbenchmarks, i.e. they measure a small and simple task.If the benchmark being measured either (1) varies the amount of work each time (e.g. according to some approximation of real-world input, which obviously may vary)* or (2) measures a large system, then the average and standard deviation and even a histogram may be useful (or perhaps some indicator whether the runtimes are consistent with a normal distribution or not). If the running-time is long then the max might be useful (because things like task-switching overhead probably do not contribute that much to the total). * I anticipate that you might respond "so, only test a single input per benchmark", but if I've got 1000 inputs that I want to try, I really don't want to write 1000 functions nor do I want 1000 lines of output from the benchmark. An average, standard deviation, min and max may be all I need, and if I need more detail, then I might break it up into 10 groups of 100 inputs. In any case, the minimum runtime is not the desired output when the input varies.I understand. What we currently do at Facebook is support benchmark functions with two parameters (see https://github.com/facebook/folly/blob/master/folly/docs/Benchmark.md). One is the number of iterations, the second is "problem size", akin to what you're discussing. I chose to not support that in this version of std.benchmark because it can be tackled later easily, but I probably need to add it now, sigh.It's a little surprising to hear "The purpose of std.benchmark is not to estimate real-world time. (That is the purpose of profiling)"... Firstly, of COURSE I would want to estimate real-world time with some of my benchmarks. For some benchmarks I just want to know which of two or three approaches is faster, or to get a coarse ball-park sense of performance, but for others I really want to know the wall-clock time used for realistic inputs.I would contend that a benchmark without a baseline is very often misguided. I've seen tons and tons and TONS of nonsensical benchmarks lacking a baseline. "I created one million smart pointers, it took me only one millisecond!" Well how long did it take you to create one million dumb pointers? Choosing good baselines and committing to good comparisons instead of un-based absolutes is what makes the difference between a professional and a well-intended dilettante.Secondly, what D profiler actually helps you answer the question "where does the time go in the real-world?"? The D -profile switch creates an instrumented executable, which in my experience (admittedly not experience with DMD) severely distorts running times. I usually prefer sampling-based profiling, where the executable is left unchanged and a sampling program interrupts the program at random and grabs the call stack, to avoid the distortion effect of instrumentation. Of course, instrumentation is useful to find out what functions are called the most and whether call frequencies are in line with expectations, but I wouldn't trust the time measurements that much. As far as I know, D doesn't offer a sampling profiler, so one might indeed use a benchmarking library as a (poor) substitute. So I'd want to be able to set up some benchmarks that operate on realistic data, with perhaps different data in different runs in order to learn about how the speed varies with different inputs (if it varies a lot then I might create more benchmarks to investigate which inputs are processed quickly, and which slowly.)I understand there's a good case to be made for profiling. If this turns out to be an acceptance condition for std.benchmark (which I think it shouldn't), I'll define one.Some random comments about std.benchmark based on its documentation: - It is very strange that the documentation of printBenchmarks uses neither of the words "average" or "minimum", and doesn't say how many trials are done....Because all of those are irrelevant and confusing. We had an older framework at Facebook that reported those numbers, and they were utterly and completely meaningless. Besides the trials column contained numbers that were not even comparable. Everybody was happy when I removed them with today's simple and elegant numbers.I suppose the obvious interpretation is that it only does one trial, but then we wouldn't be having this discussion about averages and minimums right? Øivind says tests are run 1000 times... but it needs to be configurable per-test (my idea: support a _x1000 suffix in function names, or _for1000ms to run the test for at least 1000 milliseconds; and allow a multiplier when when running a group of benchmarks, e.g. a multiplier argument of 0.5 means to only run half as many trials as usual.)I don't think that's a good idea.Also, it is not clear from the documentation what the single parameter to each benchmark is (define "iterations count".)The documentation could include that, but I don't want to overspecify.- The "benchmark_relative_" feature looks quite useful. I'm also happy to see benchmarkSuspend() and benchmarkResume(), though benchmarkSuspend() seems redundant in most cases: I'd like to just call one function, say, benchmarkStart() to indicate "setup complete, please start measuring time now."Good point. I think this is a minor encumbrance, so it's good to keep generality.- I'm glad that StopWatch can auto-start; but the documentation should be clearer: does reset() stop the timer or just reset the time to zero? does stop() followed by start() start from zero or does it keep the time on the clock? I also think there should be a method that returns the value of peek() and restarts the timer at the same time (perhaps stop() and reset() should just return peek()?) - After reading the documentation of comparingBenchmark and measureTime, I have almost no idea what they do.Yah, these are moved over from std.datetime. I'll need to make a couple more passes through the dox. Andrei
Sep 21 2012
Huh? It's not nearly as confusing as reading the documentation and not having the faintest idea what it will do. The way the benchmarker works is somehow 'irrelevant'? The documentation doesn't even indicate that the functions are to be run more than once!!Some random comments about std.benchmark based on its documentation: - It is very strange that the documentation of printBenchmarks uses neither of the words "average" or "minimum", and doesn't say how many trials are done....Because all of those are irrelevant and confusing.I don't think that's a good idea.I have never seen you make such vague arguments, Andrei.
Sep 21 2012
On 9/21/12 5:36 PM, David Piepgrass wrote:I misunderstood. I agree that it's a good thing to specify how benchmarking proceeds.Huh? It's not nearly as confusing as reading the documentation and not having the faintest idea what it will do. The way the benchmarker works is somehow 'irrelevant'? The documentation doesn't even indicate that the functions are to be run more than once!!Some random comments about std.benchmark based on its documentation: - It is very strange that the documentation of printBenchmarks uses neither of the words "average" or "minimum", and doesn't say how many trials are done....Because all of those are irrelevant and confusing.I had expanded my point elsewhere. Your suggestion was:I don't think that's a good idea.I have never seen you make such vague arguments, Andrei.- It is very strange that the documentation of printBenchmarks uses neither of the words "average" or "minimum", and doesn't say how many trials are done.... I suppose the obvious interpretation is that it only does one trial, but then we wouldn't be having this discussion about averages and minimums right? Øivind says tests are run 1000 times... but it needs to be configurable per-test (my idea: support a _x1000 suffix in function names, or _for1000ms to run the test for at least 1000 milliseconds; and allow a multiplier when when running a group of benchmarks, e.g. a multiplier argument of 0.5 means to only run half as many trials as usual.) Also, it is not clear from the documentation what the single parameter to each benchmark is (define "iterations count".)I don't think it's a good idea because the "for 1000 ms" doesn't say anything except how good the clock resolution was on the system. I'm as strongly convinced we shouldn't print useless information as I am we should print useful information. Andrei
Sep 21 2012
I am puzzled about what you think my suggestion meant. I am suggesting allowing the user to configure how long benchmarking takes. Some users might want to run their benchmark for an hour to get stable and reliable numbers; others don't want to wait and want to see results ASAP. Perhaps the *same* user will want to run benchmarks quickly while developing them and then do a "final run" with more trials once their benchmark suite is complete. Also, some individual benchmark functions will take microseconds to complete; others may take seconds to complete. All I'm suggesting are simple ways to avoid wasting users' time, without making std.benchmark overly complicated.- It is very strange that the documentation of printBenchmarks uses neither of the words "average" or "minimum", and doesn't say how many trials are done.... I suppose the obvious interpretation is that it only does one trial, but then we wouldn't be having this discussion about averages and minimums right? Øivind says tests are run 1000 times... but it needs to be configurable per-test (my idea: support a _x1000 suffix in function names, or _for1000ms to run the test for at least 1000 milliseconds; and allow a multiplier when when running a group of benchmarks, e.g. a multiplier argument of 0.5 means to only run half as many trials as usual.) Also, it is not clear from the documentation what the single parameter to each benchmark is (define "iterations count".)I don't think it's a good idea because the "for 1000 ms" doesn't say anything except how good the clock resolution was on the system. I'm as strongly convinced we shouldn't print useless information as I am we should print useful information.
Sep 22 2012
Stepping back for a moment, I think we're facing two key issues here: The first key issue is that the docs for std.benchmark don't adequately explain Andre's intended charter/scope for it, it's methodology or the rationale for its methodology. So people see "benchmark" and they think "oh, ok, for timing stuff", but it appears to be intended as being for very specific use-cases. I think this entire discussion serves as evidence that, at the very least, it needs to communicate that scope/methodology/rationale better that it currently does. If all of us are having trouble "getting it", then others certainly will too. Aside from that, there's the second key issue: whether the current intended scope is sufficient. Should it be more general in scope and not so specialized? Personally, I would tend to think do, and I think that seems to the the popular notion. But I don't know for sure. If it should be more generalized, than does it need to be so for the first iteration, or can it be done later after being added to phobos? That, I have no idea.
Sep 21 2012
On 22/09/12 07:10, Nick Sabalausky wrote:I think this entire discussion serves as evidence that, at the very least, it needs to communicate that scope/methodology/rationale better that it currently does. If all of us are having trouble "getting it", then others certainly will too.My feeling is that even with a good explanation in the docs, you're _still_ going to have a regular stream of people showing up on the mailing lists going, "Hey, why can't I get my preferred metric with std.benchmark??!!" So, even if there's good reason to think their preferences are daft, it might be worth supporting what they want to do, just to avoid that continuous stream of requests.Aside from that, there's the second key issue: whether the current intended scope is sufficient. Should it be more general in scope and not so specialized? Personally, I would tend to think do, and I think that seems to the the popular notion. But I don't know for sure. If it should be more generalized, than does it need to be so for the first iteration, or can it be done later after being added to phobos? That, I have no idea.This is what I was wondering, whether it's possible to take the current functionality but leave the door open to extending it with a different choice of metrics. Under the extended version, the default metric would be as it is currently, the docs would explain why this default makes sense and the caveats related to other metrics, but ultimately if the user wanted to use them, they'd be available. But I'd only like to see that happening if there is a "natural" path to the extended version rather than breaking changes or significant rewrites.
Sep 23 2012
New question for you :) To register benchmarks, the 'scheduleForBenchmarking' mixin inserts a shared static initializer into the module. If I have a module A and a module B, that both depend on eachother, than this will probably not work..? The runtime will detect the init cycle and fail with the following error: "Cycle detected between modules with ctors/dtors" Or am I wrong now?
Sep 19 2012
On 9/19/12 4:11 PM, "Øivind" wrote:New question for you :) To register benchmarks, the 'scheduleForBenchmarking' mixin inserts a shared static initializer into the module. If I have a module A and a module B, that both depend on eachother, than this will probably not work..? The runtime will detect the init cycle and fail with the following error: "Cycle detected between modules with ctors/dtors" Or am I wrong now?I think you have discovered a major issue. Ideas on how to attack this? Andrei
Sep 21 2012
On Friday, September 21, 2012 15:59:31 Andrei Alexandrescu wrote:On 9/19/12 4:11 PM, "Øivind" wrote:Some of us have been asking for ages for the ability to mark a static constructor as not depending on anything so that the runtime _doesn't_ think that there's a circular dependency, but Walter has been against the idea when it's been brought up. That would _really_ help here. Without redesigning std.benchmark so that it doesn't use static constructors, I don't know how you can fix that. Normally, if you really need a static constructor, you go through the pain of creating a separate module which does the initialization for you (like std.stdio does). But that won't work in this case, because you're mixing it in. So, unless you can redesign it so that std.benchmark doesn't require static constructors, it may have to be a limitation of std.benchmark that it can't be used where it would create a circular dependency. Unfortunately, the circular dependency issue makes static constructors almost useless outside of isolated cases, even though they rarely actually have circular dependencies. It's one of the few places in D that I'd say that there's a major design flaw. - Jonathan M DavisNew question for you :) To register benchmarks, the 'scheduleForBenchmarking' mixin inserts a shared static initializer into the module. If I have a module A and a module B, that both depend on eachother, than this will probably not work..? The runtime will detect the init cycle and fail with the following error: "Cycle detected between modules with ctors/dtors" Or am I wrong now?I think you have discovered a major issue. Ideas on how to attack this?
Sep 21 2012
On Friday, 21 September 2012 at 20:43:13 UTC, Jonathan M Davis wrote:On Friday, September 21, 2012 15:59:31 Andrei Alexandrescu wrote:Is there a way to solve the dependency issue without forbidding static constructors in modules with cyclic dependencies? I.e. for two modules with static init referencing eachother, an analysis of the static init code could be used to detect circular dependencies instead of the import statements?On 9/19/12 4:11 PM, "Øivind" wrote:Some of us have been asking for ages for the ability to mark a static constructor as not depending on anything so that the runtime _doesn't_ think that there's a circular dependency, but Walter has been against the idea when it's been brought up. That would _really_ help here. Without redesigning std.benchmark so that it doesn't use static constructors, I don't know how you can fix that. Normally, if you really need a static constructor, you go through the pain of creating a separate module which does the initialization for you (like std.stdio does). But that won't work in this case, because you're mixing it in. So, unless you can redesign it so that std.benchmark doesn't require static constructors, it may have to be a limitation of std.benchmark that it can't be used where it would create a circular dependency. Unfortunately, the circular dependency issue makes static constructors almost useless outside of isolated cases, even though they rarely actually have circular dependencies. It's one of the few places in D that I'd say that there's a major design flaw. - Jonathan M DavisNew question for you :) To register benchmarks, the 'scheduleForBenchmarking' mixin inserts a shared static initializer into the module. If I have a module A and a module B, that both depend on eachother, than this will probably not work..? The runtime will detect the init cycle and fail with the following error: "Cycle detected between modules with ctors/dtors" Or am I wrong now?I think you have discovered a major issue. Ideas on how to attack this?
Sep 22 2012
On 9/22/12 8:28 AM, "Øivind" wrote:Is there a way to solve the dependency issue without forbidding static constructors in modules with cyclic dependencies?I think an idea just occurred to me. The rules for static ctors and dtors were invented before "import" was allowed inside a scope. We could have taken advantage of that. Say we restrict symbol visibility inside static cdtors to ONLY symbols within the current module. If some static cdtor needs a symbol from a different module, it must import it explicitly (even if the current module already imports it). In this setup it should be possible to compute, in a fine-grained manner, the dependencies of static cdtors. Unfortunately that would be a breaking change. Andrei
Sep 22 2012
On Saturday, 22 September 2012 at 13:03:06 UTC, Andrei Alexandrescu wrote:On 9/22/12 8:28 AM, "Øivind" wrote:It gets a bit ugly maybe, but we could do a mix of the proposals that have come before and this one, e.g. add a nocycliccheck (or similar) to the static constructor, and in that case only allow access to current module and those imorted inside the ctor scope..Is there a way to solve the dependency issue without forbidding static constructors in modules with cyclic dependencies?I think an idea just occurred to me. The rules for static ctors and dtors were invented before "import" was allowed inside a scope. We could have taken advantage of that. Say we restrict symbol visibility inside static cdtors to ONLY symbols within the current module. If some static cdtor needs a symbol from a different module, it must import it explicitly (even if the current module already imports it). In this setup it should be possible to compute, in a fine-grained manner, the dependencies of static cdtors. Unfortunately that would be a breaking change. Andrei
Sep 22 2012
On Saturday, 22 September 2012 at 13:25:47 UTC, Øivind wrote:On Saturday, 22 September 2012 at 13:03:06 UTC, Andrei Alexandrescu wrote:We would probably not call it ' nycycliccheck' since you propose to still do these checks, but only local imports :) Would need another name for it.On 9/22/12 8:28 AM, "Øivind" wrote:It gets a bit ugly maybe, but we could do a mix of the proposals that have come before and this one, e.g. add a nocycliccheck (or similar) to the static constructor, and in that case only allow access to current module and those imorted inside the ctor scope..Is there a way to solve the dependency issue without forbidding static constructors in modules with cyclic dependencies?I think an idea just occurred to me. The rules for static ctors and dtors were invented before "import" was allowed inside a scope. We could have taken advantage of that. Say we restrict symbol visibility inside static cdtors to ONLY symbols within the current module. If some static cdtor needs a symbol from a different module, it must import it explicitly (even if the current module already imports it). In this setup it should be possible to compute, in a fine-grained manner, the dependencies of static cdtors. Unfortunately that would be a breaking change. Andrei
Sep 22 2012
On 2012-09-22, 15:28, =C3=98ivind wrote:=It gets a bit ugly maybe, but we could do a mix of the proposals that==have come before and this one, e.g. add a nocycliccheck (or similar)=to the static constructor, and in that case only allow access to =current module and those imorted inside the ctor scope..We would probably not call it ' nycycliccheck' since you propose to =still do these checks, but only local imports :) Would need another na=me =for it.noglobalimports. :p -- = Simen
Sep 22 2012
On 2012-09-22, 15:04, Andrei Alexandrescu wrote:On 9/22/12 8:28 AM, "=C3=98ivind" wrote:cIs there a way to solve the dependency issue without forbidding stati=constructors in modules with cyclic dependencies?I think an idea just occurred to me. The rules for static ctors and =dtors were invented before "import" was allowed inside a scope. We cou=ld =have taken advantage of that. Say we restrict symbol visibility inside static cdtors to ONLY symbols==within the current module. If some static cdtor needs a symbol from a ==different module, it must import it explicitly (even if the current =module already imports it). In this setup it should be possible to compute, in a fine-grained =manner, the dependencies of static cdtors. Unfortunately that would be a breaking change.That *is* neat. I guess putting it on the deprecation path could work. This is a change we'd really, *really* like to see. -- = Simen
Sep 22 2012
On Saturday, September 22, 2012 09:04:09 Andrei Alexandrescu wrote:On 9/22/12 8:28 AM, "=C3=98ivind" wrote:ticIs there a way to solve the dependency issue without forbidding sta=uldconstructors in modules with cyclic dependencies?=20 I think an idea just occurred to me. The rules for static ctors and dtors were invented before "import" was allowed inside a scope. We co=have taken advantage of that. =20 Say we restrict symbol visibility inside static cdtors to ONLY symbol=swithin the current module. If some static cdtor needs a symbol from a=different module, it must import it explicitly (even if the current module already imports it). =20 In this setup it should be possible to compute, in a fine-grained manner, the dependencies of static cdtors. =20 Unfortunately that would be a breaking change.It's a nice thought, but it wouldn't work. If nothing else .di files co= mpletely=20 ruin it. 1. I don't think that it's actually required that static constructors b= e in a=20 .di file. So, the compiler couldn't know for sure whether the modules b= eing=20 imported had static constructors. 2. Even if all static constructors had to be in a .di file, local impor= ts ruin=20 it, because the function bodies (which can definitely be elided from .d= i files)=20 could contain local imports to modules which have static constructors a= nd=20 cause circular dependencies. - Jonathan M Davis
Sep 22 2012
On 21-Sep-12 23:59, Andrei Alexandrescu wrote:On 9/19/12 4:11 PM, "Øivind" wrote:Not ideal but... Make scheduleForBenchmarking to mixin in something else but not code - say global templated struct with certain name. Then it should be possible to do: benchmarkModules!(module1, module2, ...); That would search for this specific anchor at the top scope of modules and collect all info. I'm not sure we can pass module names as alias parameters but I think our meta-programming tricksters certainly did something along the these lines. -- Dmitry OlshanskyNew question for you :) To register benchmarks, the 'scheduleForBenchmarking' mixin inserts a shared static initializer into the module. If I have a module A and a module B, that both depend on eachother, than this will probably not work..? The runtime will detect the init cycle and fail with the following error: "Cycle detected between modules with ctors/dtors" Or am I wrong now?I think you have discovered a major issue. Ideas on how to attack this?
Sep 21 2012
On 2012-09-21 21:59, Andrei Alexandrescu wrote:I think you have discovered a major issue. Ideas on how to attack this?The standard way to solve this would be to move the initialization code from a static constructor to a function what will be called and do the initialization lazy. But since this several layers of string mixins what will make it more complicated. Can you give a short example of how the mixed in code will look like? -- /Jacob Carlborg
Sep 22 2012
On 2012-09-17 23:13, Jens Mueller wrote:Post all feedback to this thread. Constructive feedback is very much appreciated. To conclude in more Andrei like words: Happy destruction!* Why is "scheduleForBenchmarking" a string? Can't it be a template mixin? * What's the most appropriate way of just timing a block of code? Something like this: auto time = benchmark!({ /* some code */ })(1); If that's the case then I suggest setting a default value of "1" for the "n" parameter. * If I want to format the printed result differently, say in HTML, how would I do that? Should I use the "benchmark" function and iterate the BenchmarkResult array? * BTW why doesn't benchmark return the BenchmarkResult array? * Is this module so important to keep it as a top level module? I'm thinking something like a utility package or a time/date package. How about std.util.benchmark? -- /Jacob Carlborg
Sep 19 2012
On Wednesday, September 19, 2012 09:13:40 Jacob Carlborg wrote:* Is this module so important to keep it as a top level module? I'm thinking something like a utility package or a time/date package. How about std.util.benchmark?util is one of the worst package names ever, because it means basically nothing. Any function could go in there. As for a time/date package, we already have std.datetime (which will hopefully be split into the package std.datetime at some point, but we need something like DIP 15 or 16 before we can do that), and we're moving the benchmarking _out_ of there. If std.datetime were already a package, then maybe putting it in there would make some sense, but benchmarking is arguably fundamentally different from what the rest of std.datetime does. I really so no problem with benchmarking being its own thing, and std.benchmark works just fine for that. - Jonathan M Davis
Sep 19 2012
On 2012-09-19 09:58, Jonathan M Davis wrote:util is one of the worst package names ever, because it means basically nothing. Any function could go in there.Well, the "util" package in Phobos is called "std".As for a time/date package, we already have std.datetime (which will hopefully be split into the package std.datetime at some point, but we need something like DIP 15 or 16 before we can do that), and we're moving the benchmarking _out_ of there. If std.datetime were already a package, then maybe putting it in there would make some sense, but benchmarking is arguably fundamentally different from what the rest of std.datetime does. I really so no problem with benchmarking being its own thing, and std.benchmark works just fine for that.I just think we have too many top level modules. -- /Jacob Carlborg
Sep 19 2012
On 9/19/12 3:13 AM, Jacob Carlborg wrote:On 2012-09-17 23:13, Jens Mueller wrote:Good point, I'll look into it.Post all feedback to this thread. Constructive feedback is very much appreciated. To conclude in more Andrei like words: Happy destruction!* Why is "scheduleForBenchmarking" a string? Can't it be a template mixin?* What's the most appropriate way of just timing a block of code? Something like this: auto time = benchmark!({ /* some code */ })(1); If that's the case then I suggest setting a default value of "1" for the "n" parameter.A default value of n would depend on the speed of the function and the granularity of the system's timer. That overload of benchmark is imperfect (rather rigid) but we must keep it for backwards compatibility.* If I want to format the printed result differently, say in HTML, how would I do that? Should I use the "benchmark" function and iterate the BenchmarkResult array?That is correct. printBenchmarks() does not use any magic - it just looks at that same array.* BTW why doesn't benchmark return the BenchmarkResult array?Will look into it.* Is this module so important to keep it as a top level module? I'm thinking something like a utility package or a time/date package. How about std.util.benchmark?Not sure. Andrei
Sep 22 2012
I don't see why `benchmark` takes (almost) all of its parameters as template parameters. It looks quite odd, seems unnecessary, and (if I'm not mistaken) makes certain use cases quite difficult. For example, suppose I want to benchmark a function several times with different parameters and names, how would I do that? foreach (i; 0..10) { printBenchmark!( format("Test %d", i), { someFunc(i); } )(); } This won't work because i isn't known at compile time, and for some use cases it can't be known at compile time. I wouldn't mind if there was some real benefit to taking these as template arguments, but there doesn't seem to be any value at all -- it just limits usage.
Sep 19 2012
On 9/19/12 4:06 AM, Peter Alexander wrote:I don't see why `benchmark` takes (almost) all of its parameters as template parameters. It looks quite odd, seems unnecessary, and (if I'm not mistaken) makes certain use cases quite difficult.That is intentional - indirect calls would add undue overhead to the measurements. Andrei
Sep 21 2012
On Friday, 21 September 2012 at 19:54:12 UTC, Andrei Alexandrescu wrote:On 9/19/12 4:06 AM, Peter Alexander wrote:I accept that it adds undue overhead. I just think that the function would be more usable with non-template parameters (as per my example). I also think the overhead would be negligible.I don't see why `benchmark` takes (almost) all of its parameters as template parameters. It looks quite odd, seems unnecessary, and (if I'm not mistaken) makes certain use cases quite difficult.That is intentional - indirect calls would add undue overhead to the measurements.
Sep 21 2012
I'd throw in a request to address the following. Suppose we have a function F and a set of inputs S that are supposedly different scenarios we optimize for. What is interesting is to benchmark all of F(S[i]) as |S| separate functions greatly saving on boilerplate (and helping readability). One way would to allow passing in an input range of ArgumentTuples to F. Say as prefix: void benchmark_f(int a, double b, string s){ ... } enum benchmark_data_f = [ tuple(1, 2.0, "hi"), tuple(2, 3.0, "bye") ]; Then in the results it'd look as: f(1, 2.0, "hi") <ns/iter> <iter/s> f(2, 3.0, "bye") <ns/iter> <iter/s> Using any input range is interestingly flexible e.g. : enum benchmark_data_x = cortesianProduct(iota(1, 3), iota(1, 3)); //we should probably have it in std.range somewhere void benchmark_x(int a, int b){ ... } That being said I don't really get the benefit of passing iteration count to the function being benched. To allow it to do initialization step once then do resumeBenchmark() and run some inner loop n times? -- Dmitry Olshansky
Sep 21 2012