Advanced Apache Spark Meetup 10-07-2015 Chris Fregly – Spark Beats Hadoop Sorting Challenge
6:30-7pm: Arrive and Mingle
7-7:15pm: Announcements, Quick Recap of Last Meetup
7:15pm-8:30pm: Deep Dive into How Spark Beat Hadoop @ 100TB Daytona GraySort Challenge.
8:30pm-9pm: Q&A, De-mingle, and Leave
I’ll be giving a quick preview of my Oct 12th London Spark Meetup Talk on Project Tungsten. I’m doing this talk in on Nov 12th in SF – as well as down the peninsula shortly after assuming we can find a host down that way. Please email me at firstname.lastname@example.org if you’re interesting in hosting!
We’ll cover Tungsten’s “bare metal” approach to performance optimizations including mechanical sympathy, CPU cache hierarchy awareness, Direct Cache Access (DCA), MESI for multi-processor/multi-core/multi-thread CPU cache synchronization, Linux perf for data CPU cache miss analysis, optimizing matrix multiplication to minimize CPU cache link misses, and a bunch of other low-level sweetness.
This will be a hard-core session with demo’s and lots of audience participation, so please come ready with questions and comedy.
Code-level Deep Dive into the optimizations that allowed Spark to win the Daytona GraySort Challenge.
We’ll discuss the following at a code level:
1) Sort-based Shuffle (less OS resources)
2) Netty-based Network module (epoll, async, ByteBuffer reuse)
3) External Shuffle Service (also allows for auto-scaling of Worker nodes)
4) AlphaSort style cache locality optimizations
We hope you will enjoy this and some our 14k+ other artificial intelligence videos. We keep adding new channels and playlists all the time, so the number of fresh videos keeps growing every day.
BTC 3KqW2c7wrhJDxAjBaywzj74mF2u5uZg665 (get a BTC wallet, get free BTC)