digitalmars.D - std.benchmark ready for review. Manager sought after
- Andrei Alexandrescu (16/16) Apr 07 2012 Hello,
- Vladimir Panteleev (11/17) Apr 07 2012 Nice, some comments:
- Andrei Alexandrescu (7/23) Apr 07 2012 To me too :o).
- Vladimir Panteleev (17/24) Apr 08 2012 Hmm, maybe there's a misunderstanding, but what I meant was:
- Andrei Alexandrescu (3/15) Apr 08 2012 I understand, thanks.
- Somedude (3/26) Apr 08 2012 Like it. Would it be a good idea to add a column with an average memory
- Andrei Alexandrescu (6/8) Apr 08 2012 Interesting idea. I saw
- Marco Leise (5/15) Apr 08 2012 Oh please, I use these functions already and with the similarities among...
- Somedude (8/24) Apr 08 2012 I thought about monitoring as well.
- Dmitry Olshansky (9/11) Apr 08 2012 In general it's next to impossible and/or entirely OS-specific.
- Somedude (2/15) Apr 08 2012 That's what I thought, thank you.
- Piotr Szturmaj (7/13) Apr 08 2012 For algorithms that process sequences, it would be nice to have results
- Andrei Alexandrescu (4/18) Apr 08 2012 The framework aims at generality and simplicity. I'd rather have it
- Denis Shelomovskij (11/11) Apr 08 2012 Very good but minimum isn't a best guess. Personally I (and there will
- Andrei Alexandrescu (16/24) Apr 08 2012 I've analyzed this quite a bit at work and the average and median are
- Manfred Nowak (10/13) Apr 08 2012 On the contrary:
- Andrei Alexandrescu (20/32) Apr 08 2012 That sounds quite tenuous to me. How do you measure it, and what
- Manfred Nowak (28/47) Apr 09 2012 This is in doubt, because you yourself wrote "the machine itself has
- Andrei Alexandrescu (13/45) Apr 09 2012 Which is great, unless the program wants to measure the cache memory
- Francois Chabot (5/8) Apr 09 2012 I disagree. If a regression suddenly causes a function to become
- Andrei Alexandrescu (5/11) Apr 09 2012 But cache binding depends on a variety of cache characteristics, i.e.
- Manfred Nowak (23/31) Apr 09 2012 I did not mean to run two or more benchmarks in parallel but only
- Andrei Alexandrescu (9/30) Apr 14 2012 I doubt that's the case. Once we discuss running several instances in
- Denis Shelomovskij (21/44) Apr 09 2012 Yes of course. I mean than an algorithm itself should follow some
- Andrei Alexandrescu (10/17) Apr 09 2012 As I explained, the average takes noise and outliers (some very large,
- Denis Shelomovskij (8/18) Apr 10 2012 Looks like a misunderstanding because of my bad English: "Recording the
- Martin Nowak (4/10) Apr 10 2012 The benchmarked function itself could have a relevant variance, e.g.
- Dmitry Olshansky (16/24) Apr 08 2012 +1
- Andrei Alexandrescu (4/8) Apr 08 2012 This would be interesting to explore, could you give some more detail
- Dmitry Olshansky (9/16) Apr 08 2012 I thought to just throw it in. But if were to do it in under an hour:
- Caligo (4/8) Apr 08 2012 I probably missed this somewhere, but what happens to `std.datetime :
- Andrei Alexandrescu (4/14) Apr 08 2012 Yes, I forgot to mention that. Part of the proposal is that
- Steven Schveighoffer (2/2) Apr 09 2012 Added to trello.
- Francois Chabot (12/12) Apr 09 2012 Why is there so much emphasis on printBenchmarks()?
- Andrei Alexandrescu (10/15) Apr 09 2012 The functionality is to make benchmark easy to use, meaningful, and easy...
- Somedude (5/20) Apr 09 2012 The printBenchmark facility is cool and should be included imho.
- Andrei Alexandrescu (5/8) Apr 09 2012 Yes, I had unittest in mind when writing the library. If one needs more
- Jens Mueller (33/54) Apr 10 2012 I come to think the same.
- Andrei Alexandrescu (16/45) Apr 14 2012 It's in the names. If the name of a benchmark starts with
- Jens Mueller (11/64) Apr 15 2012 I know. That wasn't my question. How do I choose between percentage vs.
- Andrei Alexandrescu (8/8) Apr 14 2012 There have been quite a few good comments, but no review manager offer.
- Jacob Carlborg (6/14) Apr 15 2012 As far as I know, there are already several other projects/modules in
- Jonathan M Davis (6/22) Apr 16 2012 There are, but none of them are being reviewed at the moment or being pu...
- Jens Mueller (6/8) Apr 15 2012 I will do this.
- Lars T. Kyllingstad (4/13) Apr 16 2012 The review process is based on Boost's:
- Jens Mueller (5/20) Apr 16 2012 The page is shorter than expected. Last time I checked I found something
- Dmitry Olshansky (24/43) Apr 17 2012 It's peer review followed by voting and that's all about it.
- Jens Mueller (4/53) Apr 17 2012 Andrei is already preparing. Review will start soon.
- Andrei Alexandrescu (6/7) Apr 17 2012 I'd like to gather a bit of confidence that it's okay with people that
- SomeDude (7/15) Apr 17 2012 I agree, there is no good reason for the queue to be stopped like
- Jonathan M Davis (6/12) Apr 17 2012 I have no problem with it. None of the others is being pushed for review...
- H. S. Teoh (6/12) Apr 17 2012 [...]
- Jonathan M Davis (4/15) Apr 17 2012 Check it in? You mean start the review on it? I hope that you don't mean...
- Marco Leise (5/15) Apr 18 2012 In a supermarket, there are three people in front of you in the queue. T...
- SomeDude (2/8) Apr 18 2012 Well, we are not in a supermarket. It was correct to ask.
- Famous (47/47) Apr 15 2012 Instead of the originally proposed layout
- Famous (1/10) Apr 15 2012
- =?UTF-8?B?Ik5vcmRsw7Z3Ig==?= (236/255) Sep 27 2013 Where have std.benchmark gone?
Hello, I finally found the time to complete std.benchmark. I got to a very simple API design, starting where I like it: one line of code. Code is in the form of a pull request at https://github.com/D-Programming-Language/phobos/pull/529. (There's some noise in there caused by my git n00biness). Documentation is at http://erdani.com/d/web/phobos-prerelease/std_benchmark.html. If reasonable and at all possible, I'd like to bump the priority of this proposal. Clearly D's user base is highly interested in efficiency, and many of the upcoming libraries have efficiency a virtual part of their design. So we should get std.benchmark in soon and require that new addition come with benchmarks for their essential functionality. My vision is that in the future Phobos will have a significant benchmarks battery, which will help improving Phobos and porting to new platforms. Andrei
Apr 07 2012
On Sunday, 8 April 2012 at 03:25:16 UTC, Andrei Alexandrescu wrote:Hello, I finally found the time to complete std.benchmark. I got to a very simple API design, starting where I like it: one line of code.Nice, some comments: 1) Is it possible to do something about the "kilonanoseconds", and print the two-letter SI time abbreviation instead (μs/ns/ms/s/...)? 2) The meaning of "epoch" in this context is unfamiliar to me:Measurement is done in epochs. For each function benchmarked, the smallest time is taken over all epochs.3) "benchmark_relative_file read" should be replaced with a language construct. E.g. a function call like relativeBenchmark("file read"), or an enum value like getopt's. 4) There is a TODO in benchmarkSuspend's example.
Apr 07 2012
On 4/7/12 11:45 PM, Vladimir Panteleev wrote:On Sunday, 8 April 2012 at 03:25:16 UTC, Andrei Alexandrescu wrote:Cool.Hello, I finally found the time to complete std.benchmark. I got to a very simple API design, starting where I like it: one line of code.Nice, some comments: 1) Is it possible to do something about the "kilonanoseconds", and print the two-letter SI time abbreviation instead (μs/ns/ms/s/...)?2) The meaning of "epoch" in this context is unfamiliar to me:To me too :o).Measurement is done in epochs. For each function benchmarked, the smallest time is taken over all epochs.3) "benchmark_relative_file read" should be replaced with a language construct. E.g. a function call like relativeBenchmark("file read"), or an enum value like getopt's.No can do. Need a function name-based convention so we can automate scheduleForBenchmarking.4) There is a TODO in benchmarkSuspend's example.Will destroy. Andrei
Apr 07 2012
On Sunday, 8 April 2012 at 05:41:11 UTC, Andrei Alexandrescu wrote:Hmm, maybe there's a misunderstanding, but what I meant was: The "benchmark_relative_" prefix makes sense for function names (for scheduleForBenchmarking), but not so much for string literals for benchmark names. The string literal "benchmark_relative_file read" looks like the words "benchmark relative file" are grouped together, with "read" added on. So, my suggestion would be to wrap the "benchmark_relative_" prefix - when used with benchmark name strings - into a semantical function / enum / etc. In my example above, relativeBenchmark would be: string relativeBenchmark(string s) { return "benchmark_relative_" ~ s; } I suppose it can be summed up as a tradeoff between complexity (you need to explain both the function name usage and the relativeBenchmark wrapper usage) vs. code prettiness.3) "benchmark_relative_file read" should be replaced with a language construct. E.g. a function call like relativeBenchmark("file read"), or an enum value like getopt's.No can do. Need a function name-based convention so we can automate scheduleForBenchmarking.
Apr 08 2012
On 4/8/12 2:02 AM, Vladimir Panteleev wrote:The "benchmark_relative_" prefix makes sense for function names (for scheduleForBenchmarking), but not so much for string literals for benchmark names. The string literal "benchmark_relative_file read" looks like the words "benchmark relative file" are grouped together, with "read" added on. So, my suggestion would be to wrap the "benchmark_relative_" prefix - when used with benchmark name strings - into a semantical function / enum / etc. In my example above, relativeBenchmark would be: string relativeBenchmark(string s) { return "benchmark_relative_" ~ s; } I suppose it can be summed up as a tradeoff between complexity (you need to explain both the function name usage and the relativeBenchmark wrapper usage) vs. code prettiness.I understand, thanks. Andrei
Apr 08 2012
Le 08/04/2012 05:25, Andrei Alexandrescu a crit :Hello, I finally found the time to complete std.benchmark. I got to a very simple API design, starting where I like it: one line of code. Code is in the form of a pull request at https://github.com/D-Programming-Language/phobos/pull/529. (There's some noise in there caused by my git n00biness). Documentation is at http://erdani.com/d/web/phobos-prerelease/std_benchmark.html. If reasonable and at all possible, I'd like to bump the priority of this proposal. Clearly D's user base is highly interested in efficiency, and many of the upcoming libraries have efficiency a virtual part of their design. So we should get std.benchmark in soon and require that new addition come with benchmarks for their essential functionality. My vision is that in the future Phobos will have a significant benchmarks battery, which will help improving Phobos and porting to new platforms. AndreiLike it. Would it be a good idea to add a column with an average memory used ?
Apr 08 2012
On 4/8/12 3:16 AM, Somedude wrote:Like it. Would it be a good idea to add a column with an average memory used ?Interesting idea. I saw http://stackoverflow.com/questions/1674652/c-c-memory-usage- pi-in-linux-windows and it looks like it's not an easy problem. Should we make this part of the initial release? Andrei
Apr 08 2012
Am Sun, 08 Apr 2012 09:35:14 -0500 schrieb Andrei Alexandrescu <SeeWebsiteForEmail erdani.org>:On 4/8/12 3:16 AM, Somedude wrote:Oh please, I use these functions already and with the similarities amongst operating systems they should be somewhere in Phobos! My thought was more in the line of std.procmgmt/std.process though. Since they are process management functions useful for server monitoring as well, not just benchmarking. -- MarcoLike it. Would it be a good idea to add a column with an average memory used ?Interesting idea. I saw http://stackoverflow.com/questions/1674652/c-c-memory-usage- pi-in-linux-windows and it looks like it's not an easy problem. Should we make this part of the initial release? Andrei
Apr 08 2012
Le 08/04/2012 18:21, Marco Leise a crit :Am Sun, 08 Apr 2012 09:35:14 -0500 schrieb Andrei Alexandrescu <SeeWebsiteForEmail erdani.org>:I thought about monitoring as well. But then again, following your remark, monitoring may in fact be a completely different matter. This would need a little bit more thought I guess. Adding monitoring capabilities to processes would be interesting. But I guess we are already quite far from benchmarking now. Unless std.benchmark makes use of these monitoring capabilities.On 4/8/12 3:16 AM, Somedude wrote:Oh please, I use these functions already and with the similarities amongst operating systems they should be somewhere in Phobos! My thought was more in the line of std.procmgmt/std.process though. Since they are process management functions useful for server monitoring as well, not just benchmarking.Like it. Would it be a good idea to add a column with an average memory used ?Interesting idea. I saw http://stackoverflow.com/questions/1674652/c-c-memory-usage- pi-in-linux-windows and it looks like it's not an easy problem. Should we make this part of the initial release? Andrei
Apr 08 2012
On 08.04.2012 12:16, Somedude wrote: [snip]Like it. Would it be a good idea to add a column with an average memory used ?In general it's next to impossible and/or entirely OS-specific. What can be done I think is adding a query function to GC interface that returns amount of RAM currently allocated on it's heap. So adding GC heap usage can be done albeit it's not really a stable metric. This way one gets a nice hint on why something is slow ;) -- Dmitry Olshansky
Apr 08 2012
Le 08/04/2012 17:48, Dmitry Olshansky a crit :On 08.04.2012 12:16, Somedude wrote: [snip]That's what I thought, thank you.Like it. Would it be a good idea to add a column with an average memory used ?In general it's next to impossible and/or entirely OS-specific. What can be done I think is adding a query function to GC interface that returns amount of RAM currently allocated on it's heap. So adding GC heap usage can be done albeit it's not really a stable metric. This way one gets a nice hint on why something is slow ;)
Apr 08 2012
Andrei Alexandrescu wrote:I finally found the time to complete std.benchmark. I got to a very simple API design, starting where I like it: one line of code. Code is in the form of a pull request at https://github.com/D-Programming-Language/phobos/pull/529. (There's some noise in there caused by my git n00biness). Documentation is at http://erdani.com/d/web/phobos-prerelease/std_benchmark.html.For algorithms that process sequences, it would be nice to have results represented in cycles per item, This should give more consistent results across different CPU familes and different clock speeds. Specifically, I think about cycles per byte, see http://en.wikipedia.org/wiki/Cycles_per_byte Example: http://www.cryptopp.com/benchmarks.html
Apr 08 2012
On 4/8/12 5:46 AM, Piotr Szturmaj wrote:Andrei Alexandrescu wrote:The framework aims at generality and simplicity. I'd rather have it simple and universal rather than subtle. AndreiI finally found the time to complete std.benchmark. I got to a very simple API design, starting where I like it: one line of code. Code is in the form of a pull request at https://github.com/D-Programming-Language/phobos/pull/529. (There's some noise in there caused by my git n00biness). Documentation is at http://erdani.com/d/web/phobos-prerelease/std_benchmark.html.For algorithms that process sequences, it would be nice to have results represented in cycles per item, This should give more consistent results across different CPU familes and different clock speeds. Specifically, I think about cycles per byte, see http://en.wikipedia.org/wiki/Cycles_per_byte Example: http://www.cryptopp.com/benchmarks.html
Apr 08 2012
Very good but minimum isn't a best guess. Personally I (and there will be a lot of such maniacs I suppose) will think that this (minimum) time can be significantly smaller than average time. So a parameter (probably with a default value) should be added. Something like enum of flags telling what we want to know. At least these looks usable: minTime, <some mean time>, maxTime, standardDeviation, graph (yes, good old ASCII art). Yes, graph is needed. -- Денис В. Шеломовский Denis V. Shelomovskij
Apr 08 2012
On 4/8/12 11:59 AM, Denis Shelomovskij wrote:Very good but minimum isn't a best guess. Personally I (and there will be a lot of such maniacs I suppose) will think that this (minimum) time can be significantly smaller than average time.I've analyzed this quite a bit at work and the average and median are not very informative. You need the mode, but in most benchmarks the mode is very close to the minimum, so using the minimum is even better. In speed measurements, all noise is additive (there's no noise that may make a benchmark appear to run faster). There are also quite a few outliers. Recording the average will include a fair amount of noise. Clearly there is noise during normal use as well, but incorporating it in benchmarks as a matter of course reduces the usefulness of benchmarks as a mean to improve performance of the benchmarked code.So a parameter (probably with a default value) should be added. Something like enum of flags telling what we want to know. At least these looks usable: minTime, <some mean time>, maxTime, standardDeviation, graph (yes, good old ASCII art).Standard deviation is also not very useful because it includes all outliers (some of which are very far away from the mode).Yes, graph is needed.I am not sure about that. We may provide the raw measurement data for programs that want to plot things, but plotting is beyond the charter of std.benchmark. Andrei
Apr 08 2012
Andrei Alexandrescu wrote:Clearly there is noise during normal use as well, but incorporating it in benchmarks as a matter of course reduces the usefulness of benchmarksOn the contrary: 1) The "noise during normal use" has to be measured in order to detect the sensibility of the benchmarked program to that noise. 2) The noise the benchmarked program produces has to be measured too, because the running benchmarked program probably increases the noise for all other running programs. In addition: the noise produced by a machine under heavy load might bring the performance of the benchmarked program down to zero. -manfred
Apr 08 2012
On 4/8/12 3:03 PM, Manfred Nowak wrote:Andrei Alexandrescu wrote:That sounds quite tenuous to me. How do you measure it, and what conclusions do you draw other than there's a more or less other stuff going on on the machine, and the machine itself has complex interactions? Far as I can tell a time measurement result is: T = A + Q + N where: A > 0 is actual benchmark time Q > 0 quantization noise (uniform distribution) N > 0 various other noises (interrupts, task switching, networking, CPU dynamically changing frequency, etc). Many people jump on Gaussian as an approximation, but my tests suggest it's hardly so because it has a lot of jerky outliers. How do we estimate A given T?Clearly there is noise during normal use as well, but incorporating it in benchmarks as a matter of course reduces the usefulness of benchmarksOn the contrary: 1) The "noise during normal use" has to be measured in order to detect the sensibility of the benchmarked program to that noise.2) The noise the benchmarked program produces has to be measured too, because the running benchmarked program probably increases the noise for all other running programs.How to measure that? Also, that noise does not need to be measured as much as eliminated to the extent possible. This is because the benchmark app noise is a poor model of the application-induced noise.In addition: the noise produced by a machine under heavy load might bring the performance of the benchmarked program down to zero.Of course. That's why the documentation emphasizes the necessity of baselines. A measurement without baselines is irrelevant. Andrei
Apr 08 2012
Andrei Alexandrescu wrote:all noise is additive (there's no noise that may make a benchmark appear to run faster)This is in doubt, because you yourself wrote "the machine itself has complex interactions". This complex interactions might lower the time needed for an operation of the benchmarked program. Examples that come to mind: a) needed data is already in a (faster) cache because it belongs to a memory block, from which some data is needed by some program not belonging to the benchmarked set---and that block isnt replaced yet. b) needed data is stored in a hdd whose I/O scheduler uses the elevator algorithm and serves the request by pure chance instantly, because the position of the needed data is between two positions accessed by some programs not belonging to the benchmarked set. Especially a hdd, if used, will be responsible for a lot of noise you define as "quantization noise (uniform distribution)" even if the head stays at the same cylinder. Not recognizing this noise would only mean that the data is cached and interpreting the only true read from the hdd as a jerky outlier sems quite wrong.For example by running more than one instance of the benchmarked program in paralell and use the thereby gathered statistical routines to spread T into the additiv components A, Q and N.1) The "noise during normal use" has to be measured in order to detect the sensibility of the benchmarked program to that noise.How do you measure it, and what conclusions do you draw other than there's a more or less other stuff going on on the machine, and the machine itself has complex interactions? Far as I can tell a time measurement result is: T = A + Q + NSimilar to the above note.2) The noise the benchmarked program produces has to be measured too, because the running benchmarked program probably increases the noise for all other running programs.How to measure that?Also, that noise does not need to be measured as much as eliminated to the extent possible.I wouldn't define two programs to be equivalent based on the time until completion only. That time might be identical for both programs, but if only one of the programs increases the answering time of the machine to inacceptability I would choose the other. -manfred
Apr 09 2012
On 4/9/12 9:25 AM, Manfred Nowak wrote:Andrei Alexandrescu wrote:Which is great, unless the program wants to measure the cache memory itself, in which case it would use special assembler instructions or large memset()s. (We do such at Facebook.)all noise is additive (there's no noise that may make a benchmark appear to run faster)This is in doubt, because you yourself wrote "the machine itself has complex interactions". This complex interactions might lower the time needed for an operation of the benchmarked program. Examples that come to mind: a) needed data is already in a (faster) cache because it belongs to a memory block, from which some data is needed by some program not belonging to the benchmarked set---and that block isnt replaced yet.b) needed data is stored in a hdd whose I/O scheduler uses the elevator algorithm and serves the request by pure chance instantly, because the position of the needed data is between two positions accessed by some programs not belonging to the benchmarked set. Especially a hdd, if used, will be responsible for a lot of noise you define as "quantization noise (uniform distribution)" even if the head stays at the same cylinder. Not recognizing this noise would only mean that the data is cached and interpreting the only true read from the hdd as a jerky outlier sems quite wrong.If the goal is to measure the seek time of the HDD, the benchmark itself should make sure the HDD cache is cleared. (What I recall they do on Linux is unmounting and remounting the drive.) Otherwise, it adds a useless component to the timing.I disagree with running two benchmarks in parallel because that exposes them to even more noise (scheduling, CPU count, current machine load etc). I don't understand the part of the sentence starting with "...use the thereby...", I'd be grateful if you elaborated. AndreiFor example by running more than one instance of the benchmarked program in paralell and use the thereby gathered statistical routines to spread T into the additiv components A, Q and N.1) The "noise during normal use" has to be measured in order to detect the sensibility of the benchmarked program to that noise.How do you measure it, and what conclusions do you draw other than there's a more or less other stuff going on on the machine, and the machine itself has complex interactions? Far as I can tell a time measurement result is: T = A + Q + N
Apr 09 2012
Which is great, unless the program wants to measure the cache memory itself, in which case it would use special assembler instructions or large memset()s. (We do such at Facebook.)I disagree. If a regression suddenly causes a function to become heavily cache-bound, it should show up in benchmarks somehow, regardless of the previous expected behavior of the function. -- Francois
Apr 09 2012
On 4/9/12 11:29 AM, Francois Chabot wrote:But cache binding depends on a variety of cache characteristics, i.e. the machine. The question is whether we accept a heavy dependence of the benchmark on the machine. AndreiWhich is great, unless the program wants to measure the cache memory itself, in which case it would use special assembler instructions or large memset()s. (We do such at Facebook.)I disagree. If a regression suddenly causes a function to become heavily cache-bound, it should show up in benchmarks somehow, regardless of the previous expected behavior of the function.
Apr 09 2012
Andrei Alexandrescu wrote:For example by running more than one instance of the benchmarked program in paralell and use the thereby gathered statistical routines to spread T into the additiv components A, Q and N.I disagree with running two benchmarks in parallel because that exposes them to even more noise (scheduling, CPU count, current machine load etc).I did not mean to run two or more benchmarks in parallel but only more instances of the benchmarked program _without_ the environment supplied by the benchmakr. Of course may there be more noise. But if so, that noise is dependent on the additional running instances and to nothing else. Now let T(n) be the time needed to run n instances in parallel on a single processor. According to your definition and your remark above: T(n) = n*A + Q + N + P(n) where P(n)>=0 and P(1)==0 is the additional noise with which the benchmarked program disturbs itself. Please observe that Q and N eliminate themselves on subtraction: T(2) - T(1) = A + P(2) - P(1) T(3) - T(2) = A + P(3) - P(2) ... P(n+1)-P(n) measures the sensitivity of the benchmarked program to its one noise. As you wrote yourself its value should be close to zero until the machine is close to falling flat.I don't understand the part of the sentence starting with "...use the thereby...", I'd be grateful if you elaborated.Ohh..., an unrecognized deletion. I have written: | use the thereby gathered _data to feed_ statistical routines to | spread T as elaborated above. -manfred
Apr 09 2012
(Resuming a few past discussions about std.benchmark.) On 4/9/12 1:26 PM, Manfred Nowak wrote:Andrei Alexandrescu wrote:I doubt that's the case. Once we discuss running several instances in parallel, I predict the vagaries of scheduling become much more prevalent.I disagree with running two benchmarks in parallel because that exposes them to even more noise (scheduling, CPU count, current machine load etc).I did not mean to run two or more benchmarks in parallel but only more instances of the benchmarked program _without_ the environment supplied by the benchmakr. Of course may there be more noise. But if so, that noise is dependent on the additional running instances and to nothing else.Now let T(n) be the time needed to run n instances in parallel on a single processor. According to your definition and your remark above: T(n) = n*A + Q + N + P(n) where P(n)>=0 and P(1)==0 is the additional noise with which the benchmarked program disturbs itself. Please observe that Q and N eliminate themselves on subtraction: T(2) - T(1) = A + P(2) - P(1) T(3) - T(2) = A + P(3) - P(2) ... P(n+1)-P(n) measures the sensitivity of the benchmarked program to its one noise. As you wrote yourself its value should be close to zero until the machine is close to falling flat.It would be great if that applied, but I'm afraid things get significantly more complex. For example, the noise will have a per-processor component, and the processes will compete for cache among themselves. Andrei
Apr 14 2012
08.04.2012 21:31, Andrei Alexandrescu пишет:On 4/8/12 11:59 AM, Denis Shelomovskij wrote:Yes of course. I mean than an algorithm itself should follow some restrictions for such measurement. It should run with the same speed every time. Many algorithms follows this convention, but not all. Why will recording the average produce so much noise? As I see, floating point arithmetic is now used without a strong reason so it looks like a time of this part isn't valuable. Or is it just a temporary solution? Anyway it should be configurable using a CT parameter, so it will not abuse one who doesn't need it.Very good but minimum isn't a best guess. Personally I (and there will be a lot of such maniacs I suppose) will think that this (minimum) time can be significantly smaller than average time.I've analyzed this quite a bit at work and the average and median are not very informative. You need the mode, but in most benchmarks the mode is very close to the minimum, so using the minimum is even better. In speed measurements, all noise is additive (there's no noise that may make a benchmark appear to run faster). There are also quite a few outliers. Recording the average will include a fair amount of noise.Clearly there is noise during normal use as well, but incorporating it in benchmarks as a matter of course reduces the usefulness of benchmarks as a mean to improve performance of the benchmarked code.A graph is needed exactly because of that. Without a graph it really gives very little.So a graph is needed.So a parameter (probably with a default value) should be added. Something like enum of flags telling what we want to know. At least these looks usable: minTime, <some mean time>, maxTime, standardDeviation, graph (yes, good old ASCII art).Standard deviation is also not very useful because it includes all outliers (some of which are very far away from the mode).Sorry, meant a histogram, not a curve. A histogram can be shown in a console very well. And it is needed because its the easiest way to show benchmarked program behaviour (and noise behaviour). It also requires only about 80 integers to store information and shouldn't produce much noise. IMHO, a histogram gives lots of information and will be a good addition. -- Денис В. Шеломовский Denis V. ShelomovskijYes, graph is needed.I am not sure about that. We may provide the raw measurement data for programs that want to plot things, but plotting is beyond the charter of std.benchmark.
Apr 09 2012
On 4/9/12 2:06 AM, Denis Shelomovskij wrote:Why will recording the average produce so much noise?As I explained, the average takes noise and outliers (some very large, e.g. milliseconds in a benchmark that takes microseconds) into account. The minimum is shielded from this issue. In the limit, the minimum for infinitely many measurements is the sought-after result.As I see, floating point arithmetic is now used without a strong reason so it looks like a time of this part isn't valuable. Or is it just a temporary solution?I don't understand "time of this part".Anyway it should be configurable using a CT parameter, so it will not abuse one who doesn't need it.We'd like the framework to do the right thing, and the average does not seem to be it.IMHO, a histogram gives lots of information and will be a good addition.I disagree. Andrei
Apr 09 2012
09.04.2012 17:26, Andrei Alexandrescu пишет:On 4/9/12 2:06 AM, Denis Shelomovskij wrote:Looks like a misunderstanding because of my bad English: "Recording the average will include a fair amount of noise" means for me that a process of "recording the average" produces "a fair amount of noise" itself, not that "a result includes noise". -- Денис В. Шеломовский Denis V. ShelomovskijWhy will recording the average produce so much noise?As I explained, the average takes noise and outliers (some very large, e.g. milliseconds in a benchmark that takes microseconds) into account. The minimum is shielded from this issue. In the limit, the minimum for infinitely many measurements is the sought-after result.As I see, floating point arithmetic is now used without a strong reason so it looks like a time of this part isn't valuable. Or is it just a temporary solution?I don't understand "time of this part".
Apr 10 2012
I've analyzed this quite a bit at work and the average and median are not very informative. You need the mode, but in most benchmarks the mode is very close to the minimum, so using the minimum is even better.How is it better?In speed measurements, all noise is additive (there's no noise that may make a benchmark appear to run faster). There are also quite a few outliers. Recording the average will include a fair amount of noise.The benchmarked function itself could have a relevant variance, e.g. when using randomized algorithms. You already get that from the LRU table in the array append cache.
Apr 10 2012
On 08.04.2012 20:59, Denis Shelomovskij wrote:Very good but minimum isn't a best guess. Personally I (and there will be a lot of such maniacs I suppose) will think that this (minimum) time can be significantly smaller than average time.Prime example is networking.So a parameter (probably with a default value) should be added. Something like enum of flags telling what we want to know. At least these looks usable: minTime, <some mean time>, maxTime, standardDeviation,+1 graph (yes, good old ASCII art).Yes, graph is needed.No not ASCII art. It's too coarse grained and the only good thing it can do is look nicely in terminal (or give an overall picture but it's limited to say ~25 blocks in a histogram). What would be nice to have an option to output bare-bones info so that one can easily adapt it and feed the result to e.g. gnuplot. I think csv or it's version with tabs is good enough in this case. Another cool addition IMHO would be parametric benchmarks, so there is a function and a set of parameters (one parameter is fine too) to benchmark on. It makes that much more sense with graphs as algorithm profile plotted for various inputs (sizes) can be very revealing. -- Dmitry Olshansky
Apr 08 2012
On 4/8/12 12:35 PM, Dmitry Olshansky wrote:Another cool addition IMHO would be parametric benchmarks, so there is a function and a set of parameters (one parameter is fine too) to benchmark on. It makes that much more sense with graphs as algorithm profile plotted for various inputs (sizes) can be very revealing.This would be interesting to explore, could you give some more detail please? Andrei
Apr 08 2012
On 08.04.2012 21:40, Andrei Alexandrescu wrote:On 4/8/12 12:35 PM, Dmitry Olshansky wrote:I thought to just throw it in. But if were to do it in under an hour: benchmark_func_blah(int problem_size); auto benchmark_params_func_blah = [ 1, 10, 100, 1_000, 10_000 ...]; or slightly more general - use the range protocol: auto benchmark_params_func_blah = getRangeOfValues(); I'm not sure the prefix is place to put it but here the rough idea. -- Dmitry OlshanskyAnother cool addition IMHO would be parametric benchmarks, so there is a function and a set of parameters (one parameter is fine too) to benchmark on. It makes that much more sense with graphs as algorithm profile plotted for various inputs (sizes) can be very revealing.This would be interesting to explore, could you give some more detail please?
Apr 08 2012
On Sat, Apr 7, 2012 at 10:25 PM, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:Hello, I finally found the time to complete std.benchmark. I got to a very simple API design, starting where I like it: one line of code. AndreiI probably missed this somewhere, but what happens to `std.datetime : benchmark` ? Is it going to be deprecated ?
Apr 08 2012
On 4/8/12 1:03 PM, Caligo wrote:On Sat, Apr 7, 2012 at 10:25 PM, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:Yes, I forgot to mention that. Part of the proposal is that std.datetime.StopWatch and std.datetime.benchmark are be deprecated. AndreiHello, I finally found the time to complete std.benchmark. I got to a very simple API design, starting where I like it: one line of code. AndreiI probably missed this somewhere, but what happens to `std.datetime : benchmark` ? Is it going to be deprecated ?
Apr 08 2012
Why is there so much emphasis on printBenchmarks()? benchmark() and runBenchmarks() are clearly the core of this library, and yet they are relegated to second-class citizen: "Oh, I guess you can use this". Normally, I wouldn't be so picky, but this is a standard library. Focus should be on functionality. Providing formatted output is a nice bonus, but to me, it's just a bonus. Any benchmarking part of a large project is bound to format the output itself (to log benchmark results against revisions in a database or something like that). Also, benchmark() and runBenchmarks() are kind of confusing at first glance. Something along the lines of benchmarkModule() and benchmarkAllModules() would be more sensible.
Apr 09 2012
On 4/9/12 10:23 AM, Francois Chabot wrote:Why is there so much emphasis on printBenchmarks()? benchmark() and runBenchmarks() are clearly the core of this library, and yet they are relegated to second-class citizen: "Oh, I guess you can use this". Normally, I wouldn't be so picky, but this is a standard library. Focus should be on functionality.The functionality is to make benchmark easy to use, meaningful, and easy to interpret. I don't want to add a complicated library for postprocessing benchmarks because most nobody will use it. The first function in the documentation is what most people will want to bring themselves to using. The functions that provide the data are eminently available so I disagree with the "second-class citizen" characterization. You want to use them, use them. They don't need to be given rockstar billing. Andrei
Apr 09 2012
Le 09/04/2012 17:23, Francois Chabot a écrit :Why is there so much emphasis on printBenchmarks()? benchmark() and runBenchmarks() are clearly the core of this library, and yet they are relegated to second-class citizen: "Oh, I guess you can use this". Normally, I wouldn't be so picky, but this is a standard library. Focus should be on functionality. Providing formatted output is a nice bonus, but to me, it's just a bonus. Any benchmarking part of a large project is bound to format the output itself (to log benchmark results against revisions in a database or something like that). Also, benchmark() and runBenchmarks() are kind of confusing at first glance. Something along the lines of benchmarkModule() and benchmarkAllModules() would be more sensible.The printBenchmark facility is cool and should be included imho. It helps benchmarking being as standard as unit testing. We don't want to have to write again and again the same boilerplate code for such trivial uses.
Apr 09 2012
On 4/9/12 11:44 AM, Somedude wrote:It helps benchmarking being as standard as unit testing. We don't want to have to write again and again the same boilerplate code for such trivial uses.Yes, I had unittest in mind when writing the library. If one needs more than one statement to get an informative benchmark off the ground, we failed. Andrei
Apr 09 2012
Andrei Alexandrescu wrote:Hello, I finally found the time to complete std.benchmark. I got to a very simple API design, starting where I like it: one line of code. Code is in the form of a pull request at https://github.com/D-Programming-Language/phobos/pull/529. (There's some noise in there caused by my git n00biness). Documentation is at http://erdani.com/d/web/phobos-prerelease/std_benchmark.html. If reasonable and at all possible, I'd like to bump the priority of this proposal. Clearly D's user base is highly interested in efficiency, and many of the upcoming libraries have efficiency a virtual part of their design. So we should get std.benchmark in soon and require that new addition come with benchmarks for their essential functionality. My vision is that in the future Phobos will have a significant benchmarks battery, which will help improving Phobos and porting to new platforms.I come to think the same. How come that the times based relative report and the percentage based relative report are mixed in one result? And how do I choose which one I'd like to see in the output. When benchmarking you can measure different things at the same time. In this regard the current proposal is limited. It just measures wall clock time. I believe extending the StopWatch to measure e.g. user CPU time is a useful addition. In general, allowing user defined measurements would be great. E.g. to measure the time spend in user mode. () => { tms t; times(&t); return t.tms_utime; } Note, that this code does not need to be portable. You can also use version() else static assert. Things that come to mind that I'd like to measure. Time measurements: * User CPU time * System CPU time * Time spent in memory allocations Count measurements: * Memory usage * L1/L2/L3 cache misses * Number of executed instructions * Number of memory allocations Of course wall clock time is the ultimate measure when benchmarking. But often you need to investigate further (doing more measurements). Do you think adding this is worthwhile? All in all the user interface has been greatly simplified. Jens
Apr 10 2012
On 4/10/12 5:40 AM, Jens Mueller wrote:How come that the times based relative report and the percentage based relative report are mixed in one result? And how do I choose which one I'd like to see in the output.It's in the names. If the name of a benchmark starts with benchmark_relative_, then that benchmark is considered relative to the last non-relative benchmark. Using a naming convention allows complete automation in benchmarking a module. I figure it's fine that all results appear together because the absence of data in the relative column clarifies which is which.When benchmarking you can measure different things at the same time. In this regard the current proposal is limited. It just measures wall clock time. I believe extending the StopWatch to measure e.g. user CPU time is a useful addition.Generally I fear piling too much on StopWatch because every feature adds its own noise. But there's value in collecting the result of times(). What would be the Windows equivalent?In general, allowing user defined measurements would be great. E.g. to measure the time spend in user mode. () => { tms t; times(&t); return t.tms_utime; } Note, that this code does not need to be portable. You can also use version() else static assert. Things that come to mind that I'd like to measure. Time measurements: * User CPU time * System CPU time * Time spent in memory allocations Count measurements: * Memory usage * L1/L2/L3 cache misses * Number of executed instructions * Number of memory allocations Of course wall clock time is the ultimate measure when benchmarking. But often you need to investigate further (doing more measurements). Do you think adding this is worthwhile?Absolutely. I just fear about expanding the charter of the framework too much. Let's see: * Memory usage is, I think, difficult in Windows. * Estimating cache misses and executed instructions is significant research * Number of memory allocations requires instrumenting druntime Andrei
Apr 14 2012
Andrei Alexandrescu wrote:On 4/10/12 5:40 AM, Jens Mueller wrote:I know. That wasn't my question. How do I choose between percentage vs. factors for a relative benchmark, e.g. 200% vs. 2?How come that the times based relative report and the percentage based relative report are mixed in one result? And how do I choose which one I'd like to see in the output.It's in the names. If the name of a benchmark starts with benchmark_relative_, then that benchmark is considered relative to the last non-relative benchmark. Using a naming convention allows complete automation in benchmarking a module. I figure it's fine that all results appear together because the absence of data in the relative column clarifies which is which.I'm not a Windows user but GetProcessTimes seems to be the Windows equivalent. http://msdn.microsoft.com/en-us/library/windows/desktop/ms683223%28v=vs.85%29.aspxWhen benchmarking you can measure different things at the same time. In this regard the current proposal is limited. It just measures wall clock time. I believe extending the StopWatch to measure e.g. user CPU time is a useful addition.Generally I fear piling too much on StopWatch because every feature adds its own noise. But there's value in collecting the result of times(). What would be the Windows equivalent?I think that std.benchmark shouldn't do this. But I think we should figure out whether adding a user-defined measurement function is possible. Making sure that be we have a design that is flexible enough to capture different measurements. JensIn general, allowing user defined measurements would be great. E.g. to measure the time spend in user mode. () => { tms t; times(&t); return t.tms_utime; } Note, that this code does not need to be portable. You can also use version() else static assert. Things that come to mind that I'd like to measure. Time measurements: * User CPU time * System CPU time * Time spent in memory allocations Count measurements: * Memory usage * L1/L2/L3 cache misses * Number of executed instructions * Number of memory allocations Of course wall clock time is the ultimate measure when benchmarking. But often you need to investigate further (doing more measurements). Do you think adding this is worthwhile?Absolutely. I just fear about expanding the charter of the framework too much. Let's see: * Memory usage is, I think, difficult in Windows. * Estimating cache misses and executed instructions is significant research * Number of memory allocations requires instrumenting druntime
Apr 15 2012
There have been quite a few good comments, but no review manager offer. Could someone please take this role? Again, it would be great to get std.benchmark in sooner rather than later because it can be used by subsequent submissions (many of which allege efficiency as a major advantage) to show they improve over the state of the art. Thanks, Andrei
Apr 14 2012
On 2012-04-15 04:58, Andrei Alexandrescu wrote:There have been quite a few good comments, but no review manager offer. Could someone please take this role? Again, it would be great to get std.benchmark in sooner rather than later because it can be used by subsequent submissions (many of which allege efficiency as a major advantage) to show they improve over the state of the art. Thanks, AndreiAs far as I know, there are already several other projects/modules in the pipeline before std.benchmark. If we are to review them in the order they are submitted. -- /Jacob Carlborg
Apr 15 2012
On Sunday, April 15, 2012 13:41:28 Jacob Carlborg wrote:On 2012-04-15 04:58, Andrei Alexandrescu wrote:There are, but none of them are being reviewed at the moment or being pushed for a review. And there _is_ an argument for std.benchmark being high priority based on how it could affect future reviews. We should probably start reviewing _something_ soon though. We haven't had very good momentum on that of late. - Jonathan M DavisThere have been quite a few good comments, but no review manager offer. Could someone please take this role? Again, it would be great to get std.benchmark in sooner rather than later because it can be used by subsequent submissions (many of which allege efficiency as a major advantage) to show they improve over the state of the art. Thanks, AndreiAs far as I know, there are already several other projects/modules in the pipeline before std.benchmark. If we are to review them in the order they are submitted.
Apr 16 2012
Andrei Alexandrescu wrote:There have been quite a few good comments, but no review manager offer. Could someone please take this role?I will do this. But I will need to get more familiar with the process. And add it to http://prowiki.org/wiki4d/wiki.cgi?ReviewQueue for future review managers. Jens
Apr 15 2012
On Sunday, 15 April 2012 at 16:23:32 UTC, Jens Mueller wrote:Andrei Alexandrescu wrote:The review process is based on Boost's: http://www.boost.org/community/reviews.html#Review_Manager -LarsThere have been quite a few good comments, but no review manager offer. Could someone please take this role?I will do this. But I will need to get more familiar with the process. And add it to http://prowiki.org/wiki4d/wiki.cgi?ReviewQueue for future review managers.
Apr 16 2012
Lars T. Kyllingstad wrote:On Sunday, 15 April 2012 at 16:23:32 UTC, Jens Mueller wrote:The page is shorter than expected. Last time I checked I found something way longer. Since the Phobos' review process is based on Boost's where does deviate? JensAndrei Alexandrescu wrote:The review process is based on Boost's: http://www.boost.org/community/reviews.html#Review_ManagerThere have been quite a few good comments, but no review manager offer. Could someone please take this role?I will do this. But I will need to get more familiar with the process. And add it to http://prowiki.org/wiki4d/wiki.cgi?ReviewQueue for future review managers.
Apr 16 2012
On 17.04.2012 1:00, Jens Mueller wrote:Lars T. Kyllingstad wrote:It's peer review followed by voting and that's all about it. I don't think you should seek any formal standards, documents, guidelines, etc. The manager role is rather simple: 1. Manger picks module from review queue and posts an announcement that formal review of it starts today. Post should contain relevant information about module and links to source/documentation. Also importantly it sets the exact time the review ends and the exact time voting ends. (usually 2 weeks review, 1 week for voting) 2. When review ends Manager ether opens a vote thread. If it's obvious that module needs futher work (on explicit request from author) Manager may choose to prolong or postpone review thus putting it back into queue. 3. When vote period ends count up the votes and declare the result. And that's it. There are no hard rule on which module has the priority in queue. It is loosely calculated on basis of how long it was ready for review and how important the functionality is. P.S. If you are not sure are up to it just say the word and I'll volunteer instead. We need to push this forward as the review queue is going to overflow real soon ;) -- Dmitry OlshanskyOn Sunday, 15 April 2012 at 16:23:32 UTC, Jens Mueller wrote:The page is shorter than expected. Last time I checked I found something way longer. Since the Phobos' review process is based on Boost's where does deviate?Andrei Alexandrescu wrote:The review process is based on Boost's: http://www.boost.org/community/reviews.html#Review_ManagerThere have been quite a few good comments, but no review manager offer. Could someone please take this role?I will do this. But I will need to get more familiar with the process. And add it to http://prowiki.org/wiki4d/wiki.cgi?ReviewQueue for future review managers.
Apr 17 2012
Dmitry Olshansky wrote:On 17.04.2012 1:00, Jens Mueller wrote:Many thanks for your explanation.Lars T. Kyllingstad wrote:It's peer review followed by voting and that's all about it. I don't think you should seek any formal standards, documents, guidelines, etc. The manager role is rather simple: 1. Manger picks module from review queue and posts an announcement that formal review of it starts today. Post should contain relevant information about module and links to source/documentation. Also importantly it sets the exact time the review ends and the exact time voting ends. (usually 2 weeks review, 1 week for voting) 2. When review ends Manager ether opens a vote thread. If it's obvious that module needs futher work (on explicit request from author) Manager may choose to prolong or postpone review thus putting it back into queue. 3. When vote period ends count up the votes and declare the result. And that's it. There are no hard rule on which module has the priority in queue. It is loosely calculated on basis of how long it was ready for review and how important the functionality is.On Sunday, 15 April 2012 at 16:23:32 UTC, Jens Mueller wrote:The page is shorter than expected. Last time I checked I found something way longer. Since the Phobos' review process is based on Boost's where does deviate?Andrei Alexandrescu wrote:The review process is based on Boost's: http://www.boost.org/community/reviews.html#Review_ManagerThere have been quite a few good comments, but no review manager offer. Could someone please take this role?I will do this. But I will need to get more familiar with the process. And add it to http://prowiki.org/wiki4d/wiki.cgi?ReviewQueue for future review managers.P.S. If you are not sure are up to it just say the word and I'll volunteer instead. We need to push this forward as the review queue is going to overflow real soon ;)Andrei is already preparing. Review will start soon. Jens
Apr 17 2012
On 4/17/12 2:53 PM, Jens Mueller wrote:Andrei is already preparing. Review will start soon.I'd like to gather a bit of confidence that it's okay with people that std.benchmark skips to the front of the queue instead of waiting in line. Please reply to this if agree/disagree. Thanks, Andrei
Apr 17 2012
On Tuesday, 17 April 2012 at 21:15:31 UTC, Andrei Alexandrescu wrote:On 4/17/12 2:53 PM, Jens Mueller wrote:I agree, there is no good reason for the queue to be stopped like it is now, but since noone seems to step up for the other modules, let's not waste time and review your module. Hopefully, the other modules will be reviewed ASAP. We gain nothing in waiting.Andrei is already preparing. Review will start soon.I'd like to gather a bit of confidence that it's okay with people that std.benchmark skips to the front of the queue instead of waiting in line. Please reply to this if agree/disagree. Thanks, Andrei
Apr 17 2012
On Tuesday, April 17, 2012 16:15:43 Andrei Alexandrescu wrote:On 4/17/12 2:53 PM, Jens Mueller wrote:I have no problem with it. None of the others is being pushed for review or promoted at present. I'd just as soon have the std.benchmark review get under way rather than sit around waiting for one of those earlier in the queue to get started. - Jonathan M DavisAndrei is already preparing. Review will start soon.I'd like to gather a bit of confidence that it's okay with people that std.benchmark skips to the front of the queue instead of waiting in line. Please reply to this if agree/disagree.
Apr 17 2012
On Tue, Apr 17, 2012 at 04:15:43PM -0500, Andrei Alexandrescu wrote:On 4/17/12 2:53 PM, Jens Mueller wrote:[...] Let's just check it in since it's ready. No point in waiting. T -- Some days you win; most days you lose.Andrei is already preparing. Review will start soon.I'd like to gather a bit of confidence that it's okay with people that std.benchmark skips to the front of the queue instead of waiting in line. Please reply to this if agree/disagree.
Apr 17 2012
On Tuesday, April 17, 2012 18:55:07 H. S. Teoh wrote:On Tue, Apr 17, 2012 at 04:15:43PM -0500, Andrei Alexandrescu wrote:Check it in? You mean start the review on it? I hope that you don't mean "check it in" as in commit it. It needs to be reviewed and voted on first. - Jonathan M DavisOn 4/17/12 2:53 PM, Jens Mueller wrote:[...] Let's just check it in since it's ready. No point in waiting.Andrei is already preparing. Review will start soon.I'd like to gather a bit of confidence that it's okay with people that std.benchmark skips to the front of the queue instead of waiting in line. Please reply to this if agree/disagree.
Apr 17 2012
Am Tue, 17 Apr 2012 16:15:43 -0500 schrieb Andrei Alexandrescu <SeeWebsiteForEmail erdani.org>:On 4/17/12 2:53 PM, Jens Mueller wrote:In a supermarket, there are three people in front of you in the queue. Then they all decide they forgot something and disappear in the back. You exchange some looks with the cashier and decide that you'll be long finished before any of them come back. Go ahead. -- MarcoAndrei is already preparing. Review will start soon.I'd like to gather a bit of confidence that it's okay with people that std.benchmark skips to the front of the queue instead of waiting in line. Please reply to this if agree/disagree. Thanks, Andrei
Apr 18 2012
On Wednesday, 18 April 2012 at 09:44:12 UTC, Marco Leise wrote:Am Tue, 17 Apr 2012 16:15:43 -0500 In a supermarket, there are three people in front of you in the queue. Then they all decide they forgot something and disappear in the back. You exchange some looks with the cashier and decide that you'll be long finished before any of them come back. Go ahead.Well, we are not in a supermarket. It was correct to ask.
Apr 18 2012
Instead of the originally proposed layout ================================================================ Benchmark relative ns/iter iter/s ================================================================ ---[ module_one ]----------------------------------------------- file write 140.2K 7.1K file read 517.8% 27.1K 36.9K array creation 1.2Kx 116.0 8.6M ================================================================ I would like to see something more similar to the following: ================================================================ Benchmark Time Performance --------------- name s/iter iter/s factor -----------------------[ module_one ]--- ------- ------- ------- file write 140.20µ 7.13k file read 27.10µ 36.90k 5.17 array creation 116.00n 8.62M 1.21k ================================================================ Transition to the new layout can be achieved as follows: * Consider layout guidelines given in http://mirrors.ctan.org/info/german/tabsatz/tabsatz.pdf * Reorder columns considering their semantic and introduce groups. * Rename 'relative' to 'factor' and avoid usage of 'x' for 'times'. * Divide column 'ns/iter' by a billion (see Vladimir's post). * Right align module name (abbreviate if it is too long). * Avoid '%' but rely on the prefixes specified in the SI: http://en.wikipedia.org/w/index.php?title=International_System_of_Units&oldid=487057838#Units_and_prefixes Then, using a fixed width of 7 characters we have 3 significant digits, at least: 123450000 -> 123.45M 12345000 -> 12.35M 1234500 -> 1.23M 123450 -> 123.45k 12345 -> 12.35k 1234.5 -> 1.23k 123.45 -> 123.45 12.345 -> 12.35 1.2345 -> 1.23 0.12345 -> 123.45m 0.012345 -> 12.35m 0.0012345 -> 1.23m Cheers, Famous
Apr 15 2012
Maybe the layout is not destroyed, this time:================================================================ Benchmark Time Performance --------------- name s/iter iter/s factor -----------------------[ module_one ]--- ------- ------- ------- file write 140.20µ 7.13k file read 27.10µ 36.90k 5.17 array creation 116.00n 8.62M 1.21k ================================================================
Apr 15 2012
On Sunday, 8 April 2012 at 03:25:16 UTC, Andrei Alexandrescu wrote:Hello, I finally found the time to complete std.benchmark. I got to a very simple API design, starting where I like it: one line of code. Code is in the form of a pull request at https://github.com/D-Programming-Language/phobos/pull/529. (There's some noise in there caused by my git n00biness). Documentation is at http://erdani.com/d/web/phobos-prerelease/std_benchmark.html. If reasonable and at all possible, I'd like to bump the priority of this proposal. Clearly D's user base is highly interested in efficiency, and many of the upcoming libraries have efficiency a virtual part of their design. So we should get std.benchmark in soon and require that new addition come with benchmarks for their essential functionality. My vision is that in the future Phobos will have a significant benchmarks battery, which will help improving Phobos and porting to new platforms. AndreiWhere have std.benchmark gone? I've written a benchmarking framework in C++ that I would like to port to D. If you've already implemented some cool stuff its better to reuse/merge it with mine. My C++ code has a bench of external deps (not included in this post). If you're interested I'll send you the rest. It's inspired by Haskell's powerful benchmarking framework(s) because it cant automatically figure out how to randomize various kinds of types including recursive instantiatinos of builtins, STL pairs, tuples and even containers using a randomization framework that uses a STL-container-type_trait (rand.hpp which is not included). I'm guessing implementing such a framework in D is an order of magnitude easier than in C++. Innovative things about timed.hpp is constant time benchmarking meaning that the loop will generate a larger and large instance of the structure until a specified amount of time has passed. I believe this i a more user-friendly way of benchmarking than most other frameworks do. And with D's supercool struct/class tupleof even more automation of instance generation is possible. Destroy! Code: /*! \file timed.hpp * \brief Benchmark Functions on Auto-Randomized Arguments. * \author Per Nordlöw (per.nordlow gmail.com) * \license Boost * * \see http://en.cppreference.com/w/cpp/thread/async * \see http://stackoverflow.com/questions/10059239/gccs-behaviour-with-stdasyncstdlaunchasync-vs-clangs-behaviour * \see http://stackoverflow.com/questions/8895350/quickcheck-like-template-function-benching-in-c * \see http://functionalcpp.wordpress.com/2013/08/03/generalized-function-evaluation/ * \see http://bryanstamour.com/2012/05/13/timing-functions-with-chrono.html * \see https://limcore.wordpress.com/2008/04/13/8/ * \see http://www.randombit.net/bitbashing/programming/parallel_function_invocation_using_std_async.html * \todo Use const boost::posix_time::time_duration & dur = boost::posix_time::seconds(1) */ #pragma once #include <utility> #include <vector> #include <future> #include <chrono> #include <thread> #include <type_traits> #include "rand.hpp" #include "type_traits_x.hpp" namespace pnw { struct timed { typedef std::chrono::high_resolution_clock hrc; typedef std::chrono::milliseconds millis; typedef std::chrono::microseconds micros; public: timed(size_t max_length = 1024*256, millis total_duration_max = millis(1000), millis call_duration_max = millis(0), uint n_tries = 1) : m_max_length(max_length), m_tot_dur(total_duration_max), m_call_dur(call_duration_max), m_n_tries(n_tries) {} /*! Benchmark Function/Lambda/Functor \p f with non-void return value. * \see http://bryanstamour.com/2012/05/13/timing-functions-with-chrono.html */ template<class Func> auto call(Func&& f) -> typename std::enable_if<! std::is_void<decltype(f())>::value, // if non-void return std::pair<decltype(f()), typename hrc::duration> >::type { decltype(f()) result; auto tA = hrc::now(); for (uint i = 0; i < m_n_tries; i++) { result = f(); } // call it \c m_n_tries times auto span = hrc::now() - tA; return std::make_pair(result, span); } /*! Benchmark Function/Lambda/Functor \p f with \c void return value. * \see http://bryanstamour.com/2012/05/13/timing-functions-with-chrono.html */ template<class Func> auto call(Func&& f) -> typename std::enable_if<std::is_void<decltype(f())>::value, // if void return typename hrc::duration>::type { auto tA = hrc::now(); for (uint i = 0; i < m_n_tries; i++) { f(); } // call it \c m_n_tries times return hrc::now() - tA; // reverse order? } #if 0 /*! Call Function \p f after \p us microseconds. * \see http://stackoverflow.com/questions/10804683/delayed-function-call */ template<class Func, class... Args> inline std::future<typename std::result_of<Func(Args...)>::type> delayed_async(Func&& f, Args&&... args, micros us) { return std::async(std::launch::async, [f, us, args...] () { std::this_thread::sleep_for(us); return f(args...); // call } ); } #endif /*! Benchmark the \em Single Argument Function \p f using Randomized Instances of type \p Cont. * \tparam C Container Type. * \tparam B benchS (Times) Container. * \param[in] f Function. * \return vector of pairs of \c C-instance size and execute-time. * \todo Estimate Time Complexity Model: O(c), O(n), ..., O(n^p), O(log(n)) O(exp(n)) * and predict cost of next iteration to halt within time-limit. * \see http://stackoverflow.com/questions/7245235/syntax-of-c-templates-with-function-type-parameters * \see http://en.cppreference.com/w/cpp/chrono/steady_clock */ template<class C, class B = std::vector<std::pair<size_t, double> >typename std::enable_if<is_container<C>::value, B>::type call_s(void (*f)(C&)) { const auto deadline = hrc::now() + m_tot_dur; B bm; // benchmarks C c; // test data for (size_t n = 1; n <= m_max_length; n *= 2) { // efficient way of adding more random values to a vector auto c2 = rand_n<C>(n); c.insert(begin(c), begin(c2), end(c2)); // \see http://stackoverflow.com/a/2551785/683710 auto tA = hrc::now(); // log start for (uint i = 0; i < m_n_tries; i++) { f(c); } // call it \c m_n_tries times auto span = hrc::now() - tA; if (m_call_dur.count() != 0 and span >= m_call_dur) { break; } // if function call took too long auto count = std::chrono::duration_cast<std::chrono::microseconds>(span).count(); bm.emplace_back(n, static_cast<double>(count)/1e6); // store if (hrc::now() + span >= deadline) { // if no time for yet another measure taking at least \c span break; } } return bm; } /*! Benchmark the \em Single Argument Function \p f using Randomized Instances of type \p Cont. * \see Previous overload of \c call_s */ template<class C, class B = std::vector<std::pair<size_t, double> > > typename std::enable_if<is_container<C>::value, B>::type call_s(void (*f)(const C&)) { return call_s(reinterpret_cast<void (*)(C&)>(f)); } /*! Benchmark Function \p f using Randomized Instances of type \p Cont. * \tparam C Container. * \tparam B benchS (Times) Container. * \param[in] f Function. * \return vector of pairs of \c C-instance size and execute-time. */ template<class C, class B = std::vector<std::pair<size_t, double> >typename std::enable_if<is_container<C>::value, B>::type call(void (*f)(typename C::iterator, typename C::iterator)) { const auto deadline = hrc::now() + m_tot_dur; B bm; // benchmarks C c; // test data for (size_t n = 1; n <= m_max_length; n *= 2) { // efficient way of adding more random values to a vector auto c2 = rand_n<C>(n); c.insert(begin(c), begin(c2), end(c2)); // \see http://stackoverflow.com/a/2551785/683710 auto tA = hrc::now(); // log start for (uint i = 0; i < m_n_tries; i++) { f(c.begin(), c.end()); } // call it \c m_n_tries times auto span = hrc::now() - tA; if (m_call_dur.count() != 0 and span >= m_call_dur) { break; } // if function call took too long auto count = std::chrono::duration_cast<std::chrono::microseconds>(span).count(); bm.emplace_back(n, static_cast<double>(count)/1e6); // store if (hrc::now() + span >= deadline) { // if there's no time for more measures break; } } return bm; } public: size_t m_max_length = 1024*256; ///< Maximum container instance length of \c T. millis m_tot_dur = millis(1000); ///< Maximum time to wait for whole benchmarking to complete. millis m_call_dur = millis(0); ///< Maximum time to wait for a call \p f to complete. uint m_n_tries = 1; ///< Number of tries per size of randomizeda instance. std::launch m_launch_policy = std::launch::async | std::launch::deferred; ///< Thread launch policy. }; }
Sep 27 2013