digitalmars.D.learn - apache spark - not disk or network bound but CPU bound
"Laeeth Isharc" <nospamlaeeth nospam.laeeth.com> writes:
http://radar.oreilly.com/2015/04/investigating-sparks-performance.html "For many who use and deploy Apache Spark, knowing how to find critical bottlenecks is extremely important. In a recent O’Reilly webcast, Making Sense of Spark Performance, Spark committer and PMC member Kay Ousterhout gave a brief overview of how Spark works, and dove into how she measured performance bottlenecks using new metrics, including block-time analysis. Ousterhout walked through high-level takeaways from her in-depth analysis of several workloads, and offered a live demo of a new performance analysis tool and explained how you can use it to improve your Spark performance. Her research uncovered surprising insights into Spark’s performance on two benchmarks (TPC-DS and the Big Data Benchmark), and one production workload. As part of our overall series of webcasts on big data, data science, and engineering, this webcast debunked commonly held ideas surrounding network performance, showing that CPU — not I/O — is often a critical bottleneck, and demonstrated how to identify and fix stragglers."
May 01 2015
"monty" <monty python.org> writes:
On Friday, 1 May 2015 at 11:15:01 UTC, Laeeth Isharc wrote:http://radar.oreilly.com/2015/04/investigating-sparks-performance.htmlpaper: http://www.eecs.berkeley.edu/~keo/publications/nsdi15-final147.pdf
May 01 2015