Strata + Hadoop World 2016 - San Jose, California: Video Compilation

Video description

Strata + Hadoop World 2016 - San Jose, California: Video Compilation

Video description

Make data work, a simple phrase a mile deep, was the theme of Strata+ Hadoop San Jose 2016. The conference delivered on the theme by offering inspiration, guidance, and practical know-how from 363 experts on virtually every aspect of the big data compendium, including: Long-form tutorials on Hadoop operations, machine learning, visualization, and data platform architecture taught by pros at Cloudera and Silicon Valley Data Science. Skill building sessions on Python, R, Apache Spark, Kafka, Kudu, and Cassandra delivered by Hadoop specialists at Dato, Google, Continuum Analytics, and Confluent. Explorations of data innovations like Druid, NoLamda, Ground, Apache Drill, and the Azure Data Factory conducted by visionaries from Yahoo, Tuplejump, UC Berkeley AMP Lab, MapR Technologies, and Microsoft. MBA intensives on the approaches data-oriented startups and venture capitalists use to boost innovation and disrupt incumbent business models delivered by thought-leaders at Zetta Venture Partners, H2O.ai, 3D Robotics, and Orbital Insight. Download the videos or view them through our HD player. It’s a big river.

All access pass to 16 keynotes, 18 tutorials, and 174 individual sessions
The future of Hadoop by Doug Cutting and Mike Cafarella, cofounders of Apache Hadoop
Specialized content tracks in security, finance, media, retail, transportation, and health care
51 sessions covering real time analytics; 20 on data innovations; 38 on machine learning and AI
Case studies from Cigna, LinkedIn, Eventbrite, Quora, Twitter, Google, and more
Plus a round of great jokes about data science by comedian Paula Poundstone

Publisher resources

View/Submit Errata

Keynotes

Apache Hadoop at 10 - Doug Cutting (Cloudera)

Driving the on-demand economy with predictive analytics - Eric Frenkiel (MemSQL)

Machine learning for human rights advocacy: Big benefits, serious consequences - Megan Price (Human Rights Data Analysis Group)

Let’s get real: Acting on data in real time - Jack Norris (MapR Technologies)

Delivering information in context - Ian Andrews (Pivotal)

Using commerce data to fuel innovation - Bruce Andrews (US Department of Commerce)

Summoning the demon: My perspective from the belly of the beast of AI - Jana Eggers (Nara Logics)

Using computer vision to understand big visual data - Alyosha Efros (UC Berkeley)

Apache Hadoop meets cybersecurity - Tom Reilly (Cloudera) and Alan Ross (Intel Corporation)

Thinking like a Bayesian - Julia Galef (Center for Applied Rationality)

Connected brains - Joseph Sirosh (Microsoft)

Building practical AI systems - Adam Cheyer (Viv)

Advanced analytics and the mystery of the missing jeans - Bob Rogers (Intel)

What’s next for BDAS (the Berkeley Data Analytics Stack)? - Michael Franklin (AMPLab/UC Berkeley)

Open by design, open for data - Adam Kocoloski (IBM)

Nonsense science - Paula Poundstone (Star of NPR’s #1 radio show, “Wait Wait…Don’t Tell Me”)

Cultivate

The 21st century leader: Shaping the future - Eric McNulty

Build to lead: Solve leadership challenges using the Lego Serious Play methodology - Dieter Reuther, Donna Denio (Team Dynamics Boston)

Culture is your company’s operation system - Dave Gray (XPLANE)

Cross-functional leadership for high-performance product teams - Dan Olsen (The Lean Product Playbook)

The proven ROI of designing culture - Kristi Woolsey (MAYA)

There’s nothing basic about the basics of leadership - Michael Lopp (Pinterest)

The importance of technical onboarding, training, and mentoring - Kate Heddleston (Kate Heddleston LLC)

Radical candor: Be a better boss - Kim Scott (Radical Candor, Inc.)

Accomplish big goals with objectives and key results - Christina Wodtke (Wodtke Consulting)

Hiring engineers shouldn’t hurt - Erin Ptacek (Starfighters.io)

Scaling Teams - David Loftesness

Ask the CTO: Hard questions, honest answers - Camille Fournier (Formerly Rent the Runway), Michael Lopp (Pinterest)

How to eat change for breakfast: Building an experimental enterprise - Sanjay Mathur (Silicon Valley Data Science)

Data Innovations

Analyzing billions of users with Druid and Theta Sketches - Eric Tschetter (Yahoo)

Grounding big data: A meta-imperative - Joe Hellerstein (UC Berkeley), Vikram Sreekanti (Berkeley AMP Lab)

Unified namespace and tiered storage in Alluxio - Calvin Jia (Alluxio), Jiri Simsa (Alluxio)

Building the data infrastructure of the future with persistent memory - Derrick Harris (Mesosphere), Rob Peglar (Micron Technology, Inc), Milind Bhandarkar (Ampool, Inc.), Anil Goel (SAP), Todd Lipcon (Cloudera, Inc.)

Just-in-time optimizing a database - Ted Dunning (MapR Technologies)

Putting Kafka into overdrive - Todd Palino (LinkedIn), Gwen Shapira (Confluent)

Streaming architecture: Why flow instead of state? - Ted Dunning (MapR Technologies)

Elasticsearch and Apache Lucene for Apache Spark and MLlib - Costin Leau (Elastic)

Deploying Hadoop on user namespace containers - Abin Shahab (Altiscale)

Netflix: Making big data small - Daniel Weeks (Netflix)

Lessons learned building a scalable self-serve, real-time, multitenant monitoring service at Yahoo - Sumeet Singh (Yahoo), Mridul Jain (Yahoo)

Data applications and infrastructure at Coursera - Roshan Sumbaly (Coursera Inc), Pierre Barthelemy (Coursera)

When one data center is not enough: Building large-scale stream infrastructure across multiple data centers with Apache Kafka - Guozhang Wang (Confluent)

Toppling the mainframe: Enterprise-grade streaming under 2 ms on Hadoop - Ilya Ganelin (Capital One Data Innovation Lab)

Architecting immediacy: The design of a high-performance, portable wrangling engine - Joe Hellerstein (UC Berkeley), Seshadri Mahalingam (Trifacta)

Building DistributedLog, a high-performance replicated log service - Sijie Guo (Twitter)

Architecting distributed systems for failure: How Druid guarantees data availability - Fangjin Yang (Imply)

Did you accidentally build a database? - Spencer Kimball (Cockroach Labs)

Secrets of natural language UIs: Translating English into computer actions - Joseph Turian (Workday), Alex Nisnevich (Bayes Impact)

Data Science Advanced Analytics

Data wrangling and intro to pandas - Part 1 - T.J. Alumbaugh (Continuum Analytics), James Powell (NumFOCUS)

Data wrangling and intro to pandas - Part 2 - T.J. Alumbaugh (Continuum Analytics), James Powell (NumFOCUS)

Intro to data visualization with Bokeh - Part 1 - Bryan Van de Ven (Continuum Analytics), Sarah Bird (Aptivate)

Intro to data visualization with Bokeh - Part 2 - Bryan Van de Ven (Continuum Analytics), Sarah Bird (Aptivate)

Intro to machine learning with scikit-learn - Part 1 - Jake Vanderplas (eScience Institute, University of Washington), Katrina Riehl (Continuum Analytics)

Intro to machine learning with scikit-learn - Part 2 - Jake Vanderplas (eScience Institute, University of Washington), Katrina Riehl (Continuum Analytics)

R quickstart: Transform and visualize data - Garrett Grolemund (RStudio, Inc.)

Validating models in R - Part 1 - Nina Zumel (Win-Vector LLC), John Mount (Win Vector LLC)

Validating models in R - Part 2 - Nina Zumel (Win-Vector LLC), John Mount (Win Vector LLC)

Scaling R: Analytics for big data - Stephen Elston (Quantia Analytics, LLC)

Reproducible reports with big data - Garrett Grolemund (RStudio, Inc.)

A year of anomalies: Building shared infrastructure for anomaly detection - Chris Sanden (Netflix), Christopher Colburn (Netflix)

Augmenting machine learning with human computation for better personalization - Eric Colson (Stitch Fix)

Real-time fraud detection using process mining with Spark Streaming - Hylke Hendriksen (ING)

Building a marketplace: Eventbrite’s approach to search and recommendation - John Berryman (Eventbrite)

Docker for data scientists - Michelangelo D’Agostino (Civis Analytics)

How to make analytic operations look more like DevOps: Lessons learned moving machine-learning algorithms to production environments - Robert Grossman (University of Chicago)

Analyzing time series data with Spark - Sandy Ryza (Cloudera)

Faster conclusions using in-memory columnar SQL and machine learning - Wes McKinney (Cloudera), Jacques Nadeau (Dremio)

Putting the “science” into data science: The importance of reproducibility and peer review for quantitative research - Erik Andrejko (The Climate Corporation)

Can deep neural networks save your neural network? Artificial intelligence, sensors, and strokes - Brandon Ballinger (Cardiogram), Johnson Hsieh (Cardiogram)

Deep learning and recurrent neural networks applied to electronic health records - Josh Patterson (Patterson Consulting), David Kale (University of Southern California), Zachary Lipton (University of California, San Diego)

Data science teams: Hold out for the unicorn or build bands of steeds? - Michael Dauber (Amplify), Yael Garten (LinkedIn), Monica Rogati (Data Natives), Daniel Tunkelang (Various)

How LinkedIn built a text analytics platform at scale - Chi-Yi Kuan (LinkedIn), Weidong Zhang (LinkedIn), Yongzheng Zhang (LinkedIn)

Python scalability: A convenient truth - Travis Oliphant (Continuum Analytics)

Data modeling for data science: Simplify your workload with complex types - Marcel Kornacker (Cloudera)

Atom smashing using machine learning at CERN - Siddha Ganju (Carnegie Mellon University)

Large-scale product classification via text and image-based signals using a fusion of discriminative and deep learning-based classifiers - Sreeni Iyer (quadanalytix), Anurag Bhardwaj (Quad Analytix)

Vowpal Wabbit: The essence of speed in machine learning - Jeroen Janssens (Tilburg University)

The polyglot Beaker notebook - Scott Draves (Two Sigma Open Source)

Data-driven Business

What’s gone horribly wrong. . .and how you can protect yourself - Farrah Bostic (The Difference Engine), Paul Soldera (Equation Research)

The rise of the data selfie - Trina Chiasson (Tableau Software)

The future of data and culture - Leah Hunter (Tech Journalist), Amber Case (Esri), Todd Harple (Intel), Claire Michell (Temboo)

Big data sustainability: An environmental management systems analogy - Jonathan King (Ericsson)

Kosher collection: Best practices in data handling - Charles Givre (Booz | Allen | Hamilton)

Three rules every mobile product needs to follow to be successful - Sophie-Charlotte Moatti (Products That Count)

Mapping the matrix: Open cartography with scientific and spatial data - Aurelia Moser (Mozilla Science)

US EPA: A data-driven decision-making agency - Robin Thottungal (US Environmental Protection Agency)

My AlgorithmicMe: Our representation in data - Joerg Blumtritt (Datarella), Majken Sander (BusinessAnalyst.dk)

Stream science: Measuring the new currency of the music industry - Jonathan Gosier (AuDigent)

Making on-demand grocery delivery profitable with data science - Jeremy Stanley (Instacart)

Virtual reality for immersive data visualization - Bob Levy (Virtual Cove)

You have more data than you think. Time to put it to work - Jana Eggers (Nara Logics)

The power of personalization in the travel industry using big data - Sara Ahmadian (Seamless Planet)

How cognitive computing is changing data science for the better - Michael Ludden (IBM Watson)

Afraid of the future? You should be. Deep learning is eating your lunch—and mine. - Arno Candel (H2O.ai)

From drop to deluge: The upcoming wave of enterprise drone data - Keith Bigelow (3D Robotics)

Machine vision is making sense of the explosion of data from space - James Crawford (Orbital Insight)

Opportunities for hardware acceleration in data analytics - Kanu Gulati (Zetta Venture Partners)

Deploying deep learning at scale - Naveen Rao (Nervana)

Virtual reality in 2016 and in the future - Timoni West (Unity Labs)

Network intelligence at LinkedIn - Michael Conover (LinkedIn)

Data science 3.0: Empowering common end users with integrated solutions in a world of tools for engineers and scientists - Faisal Farooq (IBM Watson Health), Balaji Krishnapuram (IBM Watson Health)

Big science problems, big data solutions - Mr Prabhat (Berkeley Lab)

Of market makers and middlemen: How technology is transforming global trade - Renee DiResta (Haven)

Enabling smart consumer health decisions using prediction and personalization - Matt Butner (Stride Health)

Engineering industrial biology with data - Joshua Hoffman (Zymergen)

The business case for Spark, Kafka, and friends - Edd Dumbill (Silicon Valley Data Science)

Distributed systems in one lesson - Tim Berglund (DataStax)

How to use your data science team: Becoming a data-driven organization - Yael Garten (LinkedIn)

Cloud computing and big data - Ben Sharma (Zaloni)

Data visualizations decoded - Julie Rodriguez (Sapient Global Markets)

Developing a modern enterprise data strategy - Part 1 - Edd Dumbill (Silicon Valley Data Science), John Akred (Silicon Valley Data Science)

Developing a modern enterprise data strategy - Part 2 - Edd Dumbill (Silicon Valley Data Science), John Akred (Silicon Valley Data Science)

Developing a modern enterprise data strategy - Part 3 - Edd Dumbill (Silicon Valley Data Science), John Akred (Silicon Valley Data Science)

Developing a modern enterprise data strategy - Part 4 - Edd Dumbill (Silicon Valley Data Science), John Akred (Silicon Valley Data Science)

Empowering business users to lead with data - Denise McInerney (Intuit)

Why a data career is a great choice, now more than ever - Jin Zhang (CA Technologies), Jerry Overton (CSC), Michele Chambers (Continuum Analytics)

Automating decision making with big data: How to make it work - Andreas Schmidt (Blue Yonder)

Best practices for achieving customer 360 - Steven Totman (Cloudera), Nick Curcuru (MasterCard Advisors), Robert Bagley (ClickFox), Lori Bieda (Bank of Montreal)

Working on the blockchain gang: Crunching and visualizing bitcoin data - Benedikt Koehler (DataLion)

Adopting analytics: The Autodesk journey - Adam Sugano (Autodesk)

Inside Cigna’s big data journey - Jeffrey Shmain (Cloudera), Mohammad Quraishi (Cigna)

Data scientists, you can help save lives - Jeremy Howard (Enlitic)

How big data is helping to save babies around the world - Linus Liang (Embrace), Brad Allen (Silicon Valley Data Science)

Publicly broadcasting data exhaust at a public broadcaster - Christopher Berry (Canadian Broadcasting Corporation)

Transforming Telefónica - John Belchamber (Telefónica), Arturo Canales (Telefónica)

Enterprise Adoption

Apache Hadoop operations for production systems - Part 1 - Kathleen Ting (Cloudera), Vikram Srivastava (Cloudera, Inc.), Darren Lo (Cloudera), Jordan Hambleton (Cloudera, Inc.)

Apache Hadoop operations for production systems - Part 2 - Kathleen Ting (Cloudera), Vikram Srivastava (Cloudera, Inc.), Darren Lo (Cloudera), Jordan Hambleton (Cloudera, Inc.)

Apache Hadoop operations for production systems - Part 3 - Kathleen Ting (Cloudera), Vikram Srivastava (Cloudera, Inc.), Darren Lo (Cloudera), Jordan Hambleton (Cloudera, Inc.)

Apache Hadoop operations for production systems - Part 4 - Kathleen Ting (Cloudera), Vikram Srivastava (Cloudera, Inc.), Darren Lo (Cloudera), Jordan Hambleton (Cloudera, Inc.)

Apache Hadoop operations for production systems: Troubleshooting - Kathleen Ting (Cloudera), Vikram Srivastava (Cloudera, Inc.), Darren Lo (Cloudera), Jordan Hambleton (Cloudera, Inc.)

Apache Hadoop operations for production systems: Enterprise Considerations Part 1 - Kathleen Ting (Cloudera), Vikram Srivastava (Cloudera, Inc.), Darren Lo (Cloudera), Jordan Hambleton (Cloudera, Inc.)

Apache Hadoop operations for production systems: Enterprise Considerations Part 2 - Kathleen Ting (Cloudera), Vikram Srivastava (Cloudera, Inc.), Darren Lo (Cloudera), Jordan Hambleton (Cloudera, Inc.)

Developing a big data business strategy - Bill Schmarzo (EMC)

How to build a successful data lake - Alex Gorelik (Waterline Data)

Bringing the Apache Hadoop ecosystem to the Google Cloud Platform - Jennifer Wu (Cloudera), James Malone (Google)

eBay analysts and governed self-service analysis: Delivering “turn-by-turn” smart suggestions - Debora Seys (eBay)

An introduction to Transamerica’s product recommendation platform - Vishal Bamba (Transamerica), Nitin Prabhu (Transamerica)

Not your father’s database: How to use Apache Spark properly in your big data architecture - Vida Ha (Databricks)

Amazon for information: Building a modern data catalog - Aaron Kalb (Alation)

10 concepts the enterprise decision maker needs to understand about Hadoop - Donald Miner (Miner Kasch)

Old industries, sexy data: How machine learning is reshaping the world’s backbone industries - David Beyer (Amplify Partners)

Best practices for enterprise adoption of big data in the cloud - Prat Moghe (Cazena)

Self-service, interactive analytics at multipetabyte scale in capital markets regulation on the cloud - Scott Donaldson (FINRA), Matt Cardillo (FINRA)

Netflix’s big leap from Oracle to Cassandra - Roopa Tangirala (Netflix)

Strategies for agile instrumentation, ingestion, and analytics across many platforms and products - Yann Landrin (Autodesk), Charlie Crocker (Autodesk)

BI on Hadoop: What are your options? - Jacques Nadeau (Dremio)

Analyzing drivers of Net Promoter Score and their impact on customer engagement in the OTA industry - Krishnan Venkata (LatentView Analytics), Jose Abelenda (Hotwire)

Building a scalable, secure data platform: If I knew then what I know now - Bill Loconzolo (Intuit)

Hadoop Internals Development

Hadoop application architectures: Fraud detection - Part 1 - Jonathan Seidman (Cloudera), Ted Malaska (Cloudera), Mark Grover (Cloudera), Gwen Shapira (Confluent)

Hadoop application architectures: Fraud detection - Part 2 - Jonathan Seidman (Cloudera), Ted Malaska (Cloudera), Mark Grover (Cloudera), Gwen Shapira (Confluent)

Hadoop application architectures: Fraud detection - Part 3 - Jonathan Seidman (Cloudera), Ted Malaska (Cloudera), Mark Grover (Cloudera), Gwen Shapira (Confluent)

Hadoop application architectures: Fraud detection - Part 4 - Jonathan Seidman (Cloudera), Ted Malaska (Cloudera), Mark Grover (Cloudera), Gwen Shapira (Confluent)

The next 10 years of Apache Hadoop - Ben Lorica (O’Reilly Media), Doug Cutting (Cloudera), Mike Cafarella (University of Michigan)

Hadoop’s storage gap: Resolving transactional-access and analytic-performance tradeoffs with Apache Kudu (incubating) - Todd Lipon (Cloudera, Inc.)

Format wars: From VHS and Beta to Avro and Parquet - Silvia Oliveros (Silicon Valley Data Science), Stephen O’Sullivan (Silicon Valley Data Science)

Hadoop Use Cases

Hadoop without borders: Building on-prem, cloud, and hybrid data flows - Hiren Shah (Microsoft), Anand Subbaraj (Microsoft)

Uber, your Hadoop has arrived: Powering intelligence for Uber’s real-time marketplace - Vinoth Chandar (Uber)

Hadoop in the cloud: Good fit or round peg in a square hole?

Successful enterprise data hub design patterns at BT - Phillip Radley (BT)

Subject-matter experts and access to rich data: A case study in protecting a network from the Brobot distributed denial of service attacks. - John Omernik (Secureworks)

Architecting HBase in the field - Jean-Marc Spaggiari (Cloudera), Kevin O’Dell (Rocana)

In search of database nirvana: The challenges of delivering HTAP - Rohit Jain (Esgyn)

Big data for telcos: A trio of use cases - Amy O’Connor (Cloudera)

Scalable schema management for Hadoop and Spark applications - Kelvin Chu (Uber), Evan Richards (Uber)

How the oil and gas industry is igniting a spark with information fusion and metadata analytics - Brian Clark (Objectivity), Marco Ippolito (CGG GeoSoftware)

High-performance clickstream analytics with Apache Phoenix and HBase - Arun Thangamani (CDK)

Hardcore Data Science

Lessons learned from building real-life machine-learning systems - Xavier Amatriain (Quora)

The how and why of feature engineering - Alice Zheng (Dato)

A scalable implementation of deep learning on Spark - Alexander Ulanov (Hewlett-Packard Labs)

BIDMach on Spark: Machine learning at the outer limits - John Canny (UC Berkeley)

Dynamic memory networks for visual and textual question answering - Stephen Merity (MetaMind)

Scalable ensemble learning with H2O - Erin Ledell (H2O.ai)

Detecting and scoring anomalies with calibrated probabilistic models - Alexander Gray (Skytree, Inc.)

Scalable collective reasoning in graphs - Lise Getoor (University of California, Santa Cruz)

Phase retrieval algorithms for gigapixel microscopy - Laura Waller (UC Berkeley)

TensorFlow: Machine learning for everyone - Rajat Monga (Google)

A deep dive into DeepDive - Mike Cafarella (University of Michigan)

IoT Real-time

An introduction to time series with Team Apache - Part 1 - Patrick McFadin (DataStax)

An introduction to time series with Team Apache - Part 2 - Patrick McFadin (DataStax)

An introduction to time series with Team Apache - Part 3 - Patrick McFadin (DataStax)

An introduction to time series with Team Apache - Part 4 - Patrick McFadin (DataStax)

Distributed stream processing with Apache Kafka - Jay Kreps (Confluent)

Real-time Hadoop: What an ideal messaging system should bring to Hadoop - Ted Dunning (MapR Technologies)

How to turn your house into a robot: An adaptive-learning algorithm for the Internet of Things - Brandon Rohrer (Microsoft)

IoT in the enterprise: A look at Intel (IoT) Inside - Moty Fania (Intel)

Fast data made easy with Apache Kafka and Apache Kudu (incubating) - Ted Malaska (Cloudera), Jeff Holoman (Cloudera)

Embeddable data transformation for real-time streams - Joey Echeverria (Rocana)

Twitter Heron at scale - Karthik Ramasamy (Twitter)

Transforming industrial enterprises with data science: From deterministic machines to probabilistic systems - Sean Murphy (PingThings)

Apache Flink: Streaming done right - Kostas Tzoumas (data Artisans)

Scaling your business with a messaging platform on the Zeta Architecture - Jim Scott (MapR Technologies, Inc.)

Pulsar: Real-time analytics at scale leveraging Kafka, Kylin, and Druid - Tony Ng (eBay, Inc.)

Overcoming the top 5 hurdles to real-time analytics - Pat McGarry (Ryft)

Law, Ethics, Governance

We enhance privilege with supervised machine learning - Michael Williams (Fast Forward Labs)

Data ethics (not what you think) - Louis Suarez-Potts (Age of Peers, Inc.)

Big data ethics and a future for privacy - Jonathan King (Ericsson)

It’s a brave new world: Avoiding legal privacy and security snafus with big data and the IoT - Alysa Z. Hutnik (Kelley Drye Warren LLP), Kristi Wolff (Kelley Drye)

Security

A practitioner’s guide to securing your Hadoop cluster - Mubashir Kazia (Cloudera), Ben Spivey (Cloudera), Sravya Tirukkovalur (Cloudera), Michael Yoder (Cloudera)

A practitioner’s guide to securing your Hadoop cluster: Authorization - Sravya Tirukkovalur (Cloudera)

A practitioner’s guide to securing your Hadoop cluster: Encryption of Data in Transit - Michael Yoder (Cloudera)

A practitioner’s guide to securing your Hadoop cluster: Data Governance - Ben Spivey (Cloudera)

A practitioner’s guide to securing your Hadoop cluster - HDFS Encryption at Rest - Mubashir Kazia (Cloudera)

Attack graphs: Visually exploring 300M alerts per day - Leo Meyerovich (Graphistry), Joshua Patterson (Accenture Technology Labs), Michael Wendt (Accenture Technology Labs)

Securing Apache Kafka - Jun Rao (Confluent)

Simplifying Hadoop with RecordService, a secure and unified data access path for compute frameworks - Chao Sun (Cloudera), Alex Leblang (Cloudera)

Leveraging Spark to analyze billions of user actions to reveal hidden fraudsters - Yinglian Xie (DataVisor, Inc.)

Protecting enterprise data in Apache Hadoop - Don Bosco Durai (Hortonworks, Inc.)

Governance for custom Hadoop applications via the enterprise (meta)data hub - Chang She (Cloudera)

Spark Beyond

Guest talk: Choosing an optimal storage backend for your Spark use case - Sameer Farooqui and Vida Ha (Databricks)

Architecting a data platform - Part 1 - John Akred (Silicon Valley Data Science), Stephen O’Sullivan (Silicon Valley Data Science), Gary Dusbabek (Silicon Valley Data Science)

Architecting a data platform - Part 2 - John Akred (Silicon Valley Data Science), Stephen O’Sullivan (Silicon Valley Data Science), Gary Dusbabek (Silicon Valley Data Science)

Architecting a data platform - Part 3 - John Akred (Silicon Valley Data Science), Stephen O’Sullivan (Silicon Valley Data Science), Gary Dusbabek (Silicon Valley Data Science)

Architecting a data platform - Part 4 - John Akred (Silicon Valley Data Science), Stephen O’Sullivan (Silicon Valley Data Science), Gary Dusbabek (Silicon Valley Data Science)

Building machine-learning apps with Spark: MLlib, ML Pipelines, and GraphX - Part 1 - Jayant Shekhar (Cloudera), Amandeep Khurana (Cloudera), Krishna Sankar (Volvo Cars), Vartika Singh (Cloudera)

Building machine-learning apps with Spark: MLlib, ML Pipelines, and GraphX - Part 2 - Jayant Shekhar (Cloudera), Amandeep Khurana (Cloudera), Krishna Sankar (Volvo Cars), Vartika Singh (Cloudera)

Building machine-learning apps with Spark: MLlib, ML Pipelines, and GraphX - Part 3 - Jayant Shekhar (Cloudera), Amandeep Khurana (Cloudera), Krishna Sankar (Volvo Cars), Vartika Singh (Cloudera)

Building machine-learning apps with Spark: MLlib, ML Pipelines, and GraphX - Part 4 - Jayant Shekhar (Cloudera), Amandeep Khurana (Cloudera), Krishna Sankar (Volvo Cars), Vartika Singh (Cloudera)

The state of Spark and where it is going in 2016 - Reynold Xin (Databricks)

SparkNet: Training deep networks in Spark - Robert Nishihara (UC Berkeley)

Fast big data analytics and machine learning using Alluxio and Spark in Baidu - Bin Fan (Alluxio), Haojun Wang (Baidu)

Scala and the JVM as a big data platform: Lessons from Apache Spark - Dean Wampler (Lightbend)

Designing a scalable real-time data platform using Akka, Spark Streaming, and Kafka - Alex Silva (Pluralsight)

Testing and validating Spark programs - Holden Karau (IBM)

Apache Spark and real-time analytics: From interactive queries to streaming - Michael Armbrust (Databricks)

Taking Spark Streaming to the next level with DataFrames - Tathagata Das (Databricks)

Breaking Spark: Top 5 mistakes to avoid when using Apache Spark in production - Neelesh Srinivas Salian (Cloudera)

Cancer genomics analysis in the cloud with Spark and ADAM - Timothy Danford (Tamr, Inc.)