digitalmars.D.announce - TSV Utilities release with LTO and PGO enabled

Jon Degenhardt (13/13) Jan 14 2018 I just released a new version of eBay's TSV Utilities. The cool

Martin Nowak (5/8) Jan 15 2018 Yay, I'm usually seeing double digit improvements for PGO alone,

Jon Degenhardt (16/25) Jan 15 2018 Last spring I made a few quick tests of both separately. That was

Johan Engelen (7/22) Jan 16 2018 Because PGO optimizes for the given profile, it would help a lot

Jon Degenhardt (45/49) Jan 16 2018 The profiling used is checked into the repo and run as part of a

Johan Engelen (10/15) Jan 17 2018 Great, thanks for the details, I agree.

Jon Degenhardt (6/21) Jan 17 2018 Thanks Johan, much appreciated. :)

Jon Degenhardt <jond noreply.com> writes:

I just released a new version of eBay's TSV Utilities. The cool 
thing about the release is not about changes in toolkit, but that 
it was possible to build everything using LDC's support for Link 
Time Optimization (LTO) and Profile Guided Optimization (PGO). 
This includes running the optimizations on both the application 
code and the D standard libraries (druntime and phobos). Further, 
it was all doable on Travis-CI (Linux and MacOS), including 
building release binaries available from the GitHub release page.

Combined, LTO and PGO resulted in performance improvements 
greater than 25% on three of my standard six benchmarks, and five 
of the six improved at least 8%.

Release info: 
https://github.com/eBay/tsv-utils-dlang/releases/tag/v1.1.16

Jan 14 2018

Martin Nowak <code dawg.eu> writes:

On Sunday, 14 January 2018 at 23:18:42 UTC, Jon Degenhardt wrote:
 Combined, LTO and PGO resulted in performance improvements 
 greater than 25% on three of my standard six benchmarks, and 
 five of the six improved at least 8%.

Yay, I'm usually seeing double digit improvements for PGO alone, 
and single digit improvements for LTO. Meaning PGO has more 
effect even though LTO seems to be the more hyped one.
Have you bothered benchmarking them separately?

Jan 15 2018

Jon Degenhardt <jond noreply.com> writes:

On Tuesday, 16 January 2018 at 00:19:24 UTC, Martin Nowak wrote:
 On Sunday, 14 January 2018 at 23:18:42 UTC, Jon Degenhardt 
 wrote:
 Combined, LTO and PGO resulted in performance improvements 
 greater than 25% on three of my standard six benchmarks, and 
 five of the six improved at least 8%.

 Yay, I'm usually seeing double digit improvements for PGO 
 alone, and single digit improvements for LTO. Meaning PGO has 
 more effect even though LTO seems to be the more hyped one.
 Have you bothered benchmarking them separately?

Last spring I made a few quick tests of both separately. That was 
just against the app code, without druntime/phobos. Saw some 
benefit from LTO, mainly one of the tools, and not much from PGO.

More recently I tried LTO standalone and LTO plus PGO, both 
against app code and druntime/phobos, but not PGO standalone. The 
LTO benchmarks are here: 
https://github.com/eBay/tsv-utils-dlang/blob/master/docs/dlang-m
etup-14dec2017.pdf. I've haven't published the LTO + PGO benchmarks.

The takeaway from my tests is that LTO and PGO will benefit 
different apps differently, perhaps in ways not easily predicted. 
One of my tools benefited primarily from PGO, two primarily from 
LTO, and one materially from both. So, it is worth trying both.

For both, the big win was from optimizing across app code and 
libs (druntime/phobos in my case). It'd be interesting to see if 
other apps see similar behavior, either with phobos/druntime or 
other libraries, perhaps libraries from dub dependencies.

Jan 15 2018

Johan Engelen <j j.nl> writes:

On Tuesday, 16 January 2018 at 02:45:39 UTC, Jon Degenhardt wrote:
 On Tuesday, 16 January 2018 at 00:19:24 UTC, Martin Nowak wrote:
 On Sunday, 14 January 2018 at 23:18:42 UTC, Jon Degenhardt 
 wrote:
 Combined, LTO and PGO resulted in performance improvements 
 greater than 25% on three of my standard six benchmarks, and 
 five of the six improved at least 8%.

 Yay, I'm usually seeing double digit improvements for PGO 
 alone, and single digit improvements for LTO. Meaning PGO has 
 more effect even though LTO seems to be the more hyped one.
 Have you bothered benchmarking them separately?

 Last spring I made a few quick tests of both separately. That 
 was just against the app code, without druntime/phobos. Saw 
 some benefit from LTO, mainly one of the tools, and not much 
 from PGO.

Because PGO optimizes for the given profile, it would help a lot 
if you clarified how you do your PGO benchmarking. What kind of 
test load profile you used for optimization and what test load 
you use for the time measurement.

Regardless, it's fun to hear your test results :-)
   Johan

Jan 16 2018

Jon Degenhardt <jond noreply.com> writes:

On Tuesday, 16 January 2018 at 22:04:52 UTC, Johan Engelen wrote:
 Because PGO optimizes for the given profile, it would help a 
 lot if you clarified how you do your PGO benchmarking. What 
 kind of test load profile you used for optimization and what 
 test load you use for the time measurement.

The profiling used is checked into the repo and run as part of a 
PGO build, it is available for inspection. The benchmarks used 
for deltas are also documented, they the ones used in the 
benchmark comparison to similar tools done in March 2017. This 
report is in the repo 
(https://github.com/eBay/tsv-utils-dlang/blob/master/docs/Performance.md).

However, it's hard to imagine anyone perusing the repo for this 
stuff, so I'll try to summarize what I did below.

Benchmarks - Six different tests of rather different but common 
operations run on large data files. The six tests were chosen 
because for each I was able to find at least three other tools, 
written in native compiled languages, with similar functionality. 
There are other valuable benchmarks, but I haven't published them.

Profiling - Profiling was developed separately for each tool. For 
each I generated several data files with data representative of 
typical uses cases. Generally numeric or text data in several 
forms and distributions. The data was unrelated to the data used 
in benchmarks, which is from publicly available machine learning 
data sets. However, personal judgement was used in the generation 
of the data sets, so it's not free from bias.

After generating the data, I generated a set of run options 
specific to each tool. As an example, tsv-filter selects data 
file lines based on various numeric and text criteria (e.g. 
less-than). There are a bit over 50 comparison operations, plus a 
few meta operations. The profiling runs ensure all the operations 
are run at least once, but that the most important overweighted. 
The ldc.profile.resetAll call was used to exclude all the initial 
setup code (command line argument processing). This was nice 
because it meant the data files could be small relative to 
real-world sets, and it runs fast enough to do at part of the 
build step (ie. on Travis-CI).

Look at 
https://github.com/eBay/tsv-utils-dlang/tree/master/tsv-filter/profile_data to
see a concrete example (tsv-filter). In that directory are five data files and
a shell script that runs the commands and collects the data.

This was done for four of the tools covering five of the 
benchmarks. I skipped one of the tools (tsv-join), as it's harder 
to come up with a concise set of profile operations for it.

I then ran the standard benchmarks I usually report on in various 
D venues.

Clearly personal judgment played a role. However, the tools are 
reasonably task focused, and I did take basic steps to ensure the 
benchmark data and tests were separate from the training 
data/tests. For these reasons, my confidence is good that the 
results are reasonable and well founded.

--Jon

Jan 16 2018

Johan Engelen <j j.nl> writes:

On Wednesday, 17 January 2018 at 04:37:04 UTC, Jon Degenhardt 
wrote:
 Clearly personal judgment played a role. However, the tools are 
 reasonably task focused, and I did take basic steps to ensure 
 the benchmark data and tests were separate from the training 
 data/tests. For these reasons, my confidence is good that the 
 results are reasonable and well founded.

Great, thanks for the details, I agree.
Hope it's useful for others to see these details.

(btw, did you also check the performance gains when using the 
profile of the benchmark itself, to learn about the upper-bound 
of PGO for your program?)

I'll merge the IR PGO addition into LDC master soon. Don't know 
what difference it'll make.

-Johan

Jan 17 2018

Jon Degenhardt <jond noreply.com> writes:

On Wednesday, 17 January 2018 at 21:49:52 UTC, Johan Engelen 
wrote:
 On Wednesday, 17 January 2018 at 04:37:04 UTC, Jon Degenhardt 
 wrote:
 Clearly personal judgment played a role. However, the tools 
 are reasonably task focused, and I did take basic steps to 
 ensure the benchmark data and tests were separate from the 
 training data/tests. For these reasons, my confidence is good 
 that the results are reasonable and well founded.

 Great, thanks for the details, I agree.
 Hope it's useful for others to see these details.

Thanks Johan, much appreciated. :)

 (btw, did you also check the performance gains when using the 
 profile of the benchmark itself, to learn about the upper-bound 
 of PGO for your program?)

 I'll merge the IR PGO addition into LDC master soon. Don't know 
 what difference it'll make.

No, I didn't do an upper-bounds check, that's a good idea. I plan 
to test the IR based PGO when it's available, I'll run an 
upper-bounds check as part of it.

Jan 17 2018

D Programming

C/C++ Programming

Other

digitalmars.D.announce - TSV Utilities release with LTO and PGO enabled