Strata + Hadoop World 2017 - San Jose, California

Video description

Strata + Hadoop World 2017 - San Jose, California

Video description

Strata + Hadoop World San Jose 2017 gathered 325 of the globe's leading minds in technology and business to demonstrate how big data, machine learning, and analytics are changing not only business, but society itself. This video compilation provides a complete recording of each of the conference's 167technical sessions, 23 long-form tutorials, and 17 keynotes. Some of the featured speakers you'll hear from include: Confluent CEO Jay Kreps on stream processing and its impact on how businesses deal with real-time data; Amazon Ad platform leader Alice Zheng on the best feature engineering methods for machine learning pipelines; Eric Colson of Stitch Fix on how to build a great data science team; DataVisor CEO Yinglian Xie on how Spark's in-memory big data security analytics can identify nefarious sleeper cells; and Pinterest Chief Scientist Jure Leskovec on Pixie, the graph-based system that makes personalized recommendations to 100+ million users in real time.

Get this compilation and you'll enjoy unfettered access to the Strata Business Summit and a set of 29 carefully curated sessions specifically tailored for the C-level business executive. Taught by top data strategists and thinkers at Silicon Valley Data Science, MapR Technologies, LinkedIn, Unisys, UC Berkeley, Deloitte Touche Consulting, and from VCs at Kleiner, Perkins, Caufield & Byers, the Summit is like an MBA in data-driven business. You'll receive a hand-picked lineup of executive briefings on key issues, such as predictive analytics and machine learning, Cloud strategy, governance security and privacy, IoT, and artificial intelligence.

The 23 tutorials included in the compilation cover big data topics such as a review of Apache Spark 2.0 core concepts; an exploration of stream processing from the basics through Apache Beam; a practical look at how to do scalable, end-to-end data science in R on single machines and on Spark clusters; overviews of how to get started in Tensor Flow, architect a data platform, Scala and Spark, build data applications in AWS, build a data pipeline with Kafka, secure your Hadoop clusters; and how to visualize large, complex datasets with R, Hadoop, and Spark. Each of the conference's 17 keynote sessions are included, as well as all of the 167 specialized sessions, covering topics such as PyTorch, a flexible and intuitive framework for deep learning; Docker on Yarn; Spark structured streaming; the Netflix data platform; RubiX, a caching framework for big data engines in the cloud; Stanford University's Weld, an optimizing runtime for high-performance data analytics; and much, much more.

Learn from 325 of the world's top thinker-doers in big data, machine learning, and analytics
Enjoy a center row view at each of the conference's 167 sessions, 23 tutorials, and 17 keynotes
Gain total access to the Strata Business Summit – 29 sessions tailored for the business strategist
Hear how top companies like Comcast, American Express, and ING built their data strategies
See Data 101 – a comprehensive tutorial covering the core principles of data architecture
Watch the mindbender between Pokemon Go creater Phil Keslin and neuroscientist Beau Cronic
Learn from Cloudera's Hadoop experts on data governance, Spark, Kudu, and the Cloud
See how IBM implements deep learning to predict breast cancer proliferation scores
Get intensives on Hadoop cluster security, D3 visualization, and using R for scalable data analytics
Hear Google explain machine learning, TensorFlow, and Apache Beam stream processing
Enjoy a comprehensive recording with 200+ hours of material to explore at your own pace

Publisher resources

View/Submit Errata

Keynotes

The machine-learning renaissance Mike Olson (Cloudera)

Applying data and machine learning to scale education Daphne Koller (Calico Labs | Coursera)

Turning the internet upside down: Driving big data right to the edge (sponsored by MapR) Ted Dunning (MapR Technologies)

Launching Pokémon GO Phil Keslin (Niantic, Inc.), Beau Cronin (Embedding.js)

Machines and the magic of fast learning (sponsored by MemSQL) Eric Frenkiel (MemSQL)

Becoming smarter about credible news Tom Reilly (Cloudera), Khalid Al-Kofahi (Thomson Reuters)

Making good robots Andra Keay (Silicon Valley Robotics)

Big data, AI, the genome, and everything (sponsored by Microsoft) Vijay Narayanan (Microsoft)

Ray: A Distributed Execution Framework for Emerging AI Applications - Michael Jordan (UC Berkeley)

Driving enterprise open source adoption, from data lake to AI (sponsored by Teradata) Ron Bodkin (Think Big Analytics)

Data in disasters: Saving lives and innovating in real time Desiree Matel-Anderson (The Field Innovation Team)

Machine learning is about your data and deployment, not just model development (sponsored by IBM) Dinesh Nirmal (IBM)

Machine learning at Google (sponsored by Google) Rob Craft (Google)

Data 101

The business case for deep learning, Spark, and friends - Edd Wilder-James (Silicon Valley Data Science)

Why stream? The advantages of working with streaming data - Ellen Friedman (Independent)

Cloudy with a chance of on-prem - Jim Scott (MapR Technologies, Inc.)

Stats: What you need to know - Gabriela de Queiroz (R-Ladies)

What is AI? - Melanie Warrick (Skymind)

Visualization without guesswork - Aneesh Karve (Quilt Data, Inc)

Big data the Cloud

Architecting and building enterprise-class Spark and Hadoop in cloud environments - James Malone (Google), John Mikula (Google Cloud) - Part 1

Architecting and building enterprise-class Spark and Hadoop in cloud environments - James Malone (Google), John Mikula (Google Cloud) - Part 2

Architecting and building enterprise-class Spark and Hadoop in cloud environments - James Malone (Google), John Mikula (Google Cloud) - Part 3

Architecting and building enterprise-class Spark and Hadoop in cloud environments - James Malone (Google), John Mikula (Google Cloud) - Part 4

Moving big data as a service to a multicloud world - Sriram Ganesan (Qubole), Prakhar Jain (Qubole)

BI and SQL analytics with Hadoop in the cloud - Henry Robinson (Cloudera), Alex Gutow (Cloudera)

Running a Cloudera cluster in production on Azure - Paige Liu (Microsoft), John Zhuge (Cloudera)

RubiX: A caching framework for big data engines in the cloud - Shubham Tagra (Qubole)

The enterprise geospatial platform: A perfect fusion of cloud and open source technologies - Naghman Waheed (Monsanto), Martin Mendez-Costabel (Monsanto)

Practical considerations for running Spark workloads in the cloud - Anand Iyer (Cloudera), Eugene Fratkin (Cloudera)

Alluxio (formerly Tachyon): The journey thus far and the road ahead - Haoyuan Li (Alluxio), Calvin Jia (Alluxio)

Data science advanced analytics

Getting started with TensorFlow - Amy Unruh (Google) and Yufeng Guo (Google) - Part 1

Getting started with TensorFlow - Amy Unruh (Google) and Yufeng Guo (Google) - Part 2

Getting started with TensorFlow - Amy Unruh (Google) and Yufeng Guo (Google) - Part 3

Getting started with TensorFlow - Amy Unruh (Google) and Yufeng Guo (Google) - Part 4

Guerrilla guide to Python and Apache Hadoop - Juliet Hougland (Cloudera) - Part 1

Guerrilla guide to Python and Apache Hadoop - Juliet Hougland (Cloudera) - Part 2

Guerrilla guide to Python and Apache Hadoop - Juliet Hougland (Cloudera) - Part 3

Modeling big data with R, sparklyr, and Apache Spark - John Mount (Win-Vector LLC) - Part 1

Modeling big data with R, sparklyr, and Apache Spark - John Mount (Win-Vector LLC) - Part 2

Modeling big data with R, sparklyr, and Apache Spark - John Mount (Win-Vector LLC) - Part 3

Modeling big data with R, sparklyr, and Apache Spark - John Mount (Win-Vector LLC) - Part 4

Scalable deep learning for the enterprise with DL4J - Tom Hanlon (Skymind), Dave Kale (Skymind), Susan Eraly (Skymind), and Josh Patterson (Skymind) - Part 1

Scalable deep learning for the enterprise with DL4J - Tom Hanlon (Skymind), Dave Kale (Skymind), Susan Eraly (Skymind), and Josh Patterson (Skymind) - Part 2

Scalable deep learning for the enterprise with DL4J - Tom Hanlon (Skymind), Dave Kale (Skymind), Susan Eraly (Skymind), and Josh Patterson (Skymind) - Part 3

Uber’s data science workbench - Peng Du (Uber Inc.) and Randy Wei (Uber Inc.)

How Microsoft predicts churn of cloud customers using deep learning and explains those predictions in an interpretable way - Feng Zhu (Microsoft), Valentine Fontama (Microsoft)

Intelligent pattern profiling on semistructured data with machine learning - Sean Kandel (Trifacta), Karthik Sethuraman (Trifacta)

Squeezing deep learning onto mobile phones - Anirudh Koul (Microsoft)

Recommending 1+ billion items to 100+ million users in real time: Harnessing the structure of the user-to-object graph to extract ranking signals at scale - Jure Leskovec (Pinterest)

Semantic natural language understanding at scale using Spark, machine-learned annotators, and deep-learned ontologies - David Talby (Atigeo), Claudiu Branzan (G2 Web Services)

Leveraging deep learning to predict breast cancer proliferation scores with Apache Spark and Apache SystemML - Michael Dusenberry (IBM Spark Technology Center), Frederick Reiss (IBM Spark Technology Center)

PyTorch: A flexible and intuitive framework for deep learning - James Bradbury (Salesforce Research)

The dangers of statistical significance when studying weak effects in big data: From natural experiments to p-hacking - Robert Grossman (University of Chicago)

Tensor abuse in the workplace - Ted Dunning (MapR Technologies)

The frontiers of attention and memory in neural networks - Stephen Merity (Salesforce Research)

Automatic speaker segmentation: Using machine learning to identify who is speaking when - Matar Haller (Winton Capital)

Feature engineering for diverse data types - Alice Zheng (Amazon)

When is data science a house of cards? Replicating data science conclusions - June Andrews (Pinterest), Frances Haugen (Pinterest)

Distributed deep learning on AWS using MXNet - Anima Anandkumar (UC Irvine)

The state of TensorFlow today and where it is headed in 2017 - Rajat Monga (Google)

Clustering user sessions with NLP methods in complex internet applications - Dorna Bandari (Pinterest Inc.)

Weld: An optimizing runtime for high-performance data analytics - Shoumik Palkar (Stanford University)

Learning from incomplete, imperfect data with probabilistic programming - Michael Lee Williams (Fast Forward Labs)

The power of persuasion modeling - Michelangelo D’Agostino (Civis Analytics), Bill Lattner (Civis Analytics)

Making self-service data science a reality - Matt Brandwein (Cloudera), Tristan Zajonc (Cloudera)

The app trap: Why every mobile app needs anomaly detection - Ira Cohen (Anodot)

Predicting customer lifetime value for a subscription-based business - Chao Zhong (Microsoft)

Building a recommender from a big behavior graph over Cassandra - Gleicon Moraes (luc.id), Arthur Grava (Luizalabs)

Seven steps to high-velocity data analytics with DataOps - Christopher Bergh (DataKitchen), Gil Benghiat (DataKitchen)

Machine learning to automate localization with Apache Spark and other open source tools - Michelle Casbon (Qordoba)

Compressed linear algebra in Apache SystemML - Frederick Reiss (IBM Spark Technology Center), Arvind Surve (IBM)

Leveraging open source automated data science tools - Eduardo Arino de la Rubia (Domino Data Lab)

Law, ethics, governance

Executive Briefing: Doing data right—Legal best practices for making your data work - Alysa Z. Hutnik (Kelley Drye Warren LLP), Crystal Skelton (Kelley Drye Warren LLP)

Big data governance for the hybrid cloud: Best practices and how-to - Mark Donsky (Cloudera), Sudhanshu Arora (Cloudera)

Data at risk: Backing up the world’s research data - Max Ogden (Independent)

Spark beyond

Spark camp: Apache Spark 2.0 for analytics and text mining with Spark ML - Andy Konwinski (Databricks) - Part 1

Spark camp: Apache Spark 2.0 for analytics and text mining with Spark ML - Andy Konwinski (Databricks) - Part 2

Spark camp: Apache Spark 2.0 for analytics and text mining with Spark ML - Andy Konwinski (Databricks) - Part 3

Spark camp: Apache Spark 2.0 for analytics and text mining with Spark ML - Andy Konwinski (Databricks) - Part 4

Spark camp: Apache Spark 2.0 for analytics and text mining with Spark ML - Andy Konwinski (Databricks) - Part 5

Architecting a data platform - John Akred (Silicon Valley Data Science), Stephen O’Sullivan (Silicon Valley Data Science) - Part 1

Architecting a data platform - John Akred (Silicon Valley Data Science), Stephen O’Sullivan (Silicon Valley Data Science) - Part 2

Architecting a data platform - John Akred (Silicon Valley Data Science), Stephen O’Sullivan (Silicon Valley Data Science) - Part 3

Architecting a data platform - John Akred (Silicon Valley Data Science), Stephen O’Sullivan (Silicon Valley Data Science) - Part 4

Unraveling data with Spark using machine learning - Vartika Singh (Cloudera), Jayant Shekhar (Sparkflows Inc.), Jeffrey Shmain (Cloudera) - Part 1

Unraveling data with Spark using machine learning - Vartika Singh (Cloudera), Jayant Shekhar (Sparkflows Inc.), Jeffrey Shmain (Cloudera) - Part 2

Unraveling data with Spark using machine learning - Vartika Singh (Cloudera), Jayant Shekhar (Sparkflows Inc.), Jeffrey Shmain (Cloudera) - Part 3

Unraveling data with Spark using machine learning - Vartika Singh (Cloudera), Jayant Shekhar (Sparkflows Inc.), Jeffrey Shmain (Cloudera) - Part 4

Just enough Scala for Spark - Dean Wampler (Lightbend) - Part 1

Just enough Scala for Spark - Dean Wampler (Lightbend) - Part 2

Just enough Scala for Spark - Dean Wampler (Lightbend) - Part 3

Just enough Scala for Spark - Dean Wampler (Lightbend) - Part 4

Zillow: Transforming real estate through big data and machine learning - Jasjeet Thind (Zillow)

Spark Structured Streaming for machine learning - Holden Karau (IBM), Seth Hendrickson (IBM)

Sparklyr: An R interface for Apache Spark - Edgar Ruiz (RStudio)

Spark at scale in Bing: Use cases and lessons learned - Kaarthik Sivashanmugam (Microsoft)

Hoodie: Incremental processing on Hadoop at Uber - Vinoth Chandar (Uber), Prasanna Rajaperumal (Uber)

How Spark can fail or be confusing and what you can do about it - Yin Huai (Databricks)

Debugging Apache Spark - Holden Karau (IBM), Joey Echeverria (Rocana)

Effective Spark with Alluxio - Calvin Jia (Alluxio)

Visualization user experience

Exploration and visualization of large, complex datasets with R, Hadoop, and Spark - Stephen Elston (Quantia Analytics, LLC), Ryan Hafen (Hafen Consulting) - Part 1

Exploration and visualization of large, complex datasets with R, Hadoop, and Spark - Stephen Elston (Quantia Analytics, LLC), Ryan Hafen (Hafen Consulting) - Part 2

Exploration and visualization of large, complex datasets with R, Hadoop, and Spark - Stephen Elston (Quantia Analytics, LLC), Ryan Hafen (Hafen Consulting) - Part 3

Exploration and visualization of large, complex datasets with R, Hadoop, and Spark - Stephen Elston (Quantia Analytics, LLC), Ryan Hafen (Hafen Consulting) - Part 4

Introduction to visualizations using D3 - Brian Suda (optional.is) - Part 1

Introduction to visualizations using D3 - Brian Suda (optional.is) - Part 2

Introduction to visualizations using D3 - Brian Suda (optional.is) - Part 3

Introduction to visualizations using D3 - Brian Suda (optional.is) - Part 4

Data Science and Design Or, on the unpredictability of the iterative design process - Rumman Chowdhury (Accenture)

Beyond polarization: Data UX for a diversity of workers - Joe Hellerstein (UC Berkeley), Giorgio Caviglia (Trifacta), Alon Bartur (Trifacta)

Bringing data into design: How to craft personalized user experiences - Ricky Hennessy (frog), Charlie Burgoyne (frog)

Why the next wave of data lineage is driven by automation, visualization, and interaction - Sean Kandel (Trifacta)

Building interactive data products for risk measurement and monitoring - Warren Reed (US Treasury’s Office of Financial Research)

Platform security cybersecurity

A practitioner’s guide to securing your Hadoop cluster - Mark Donsky (Cloudera), Andre Araujo (Cloudera), Michael Yoder (Cloudera), Manish Ahluwalia (Cloudera) - Part 1

A practitioner’s guide to securing your Hadoop cluster - Mark Donsky (Cloudera), Andre Araujo (Cloudera), Michael Yoder (Cloudera), Manish Ahluwalia (Cloudera) - Part 2

A practitioner’s guide to securing your Hadoop cluster - Mark Donsky (Cloudera), Andre Araujo (Cloudera), Michael Yoder (Cloudera), Manish Ahluwalia (Cloudera) - Part 3

A practitioner’s guide to securing your Hadoop cluster - Mark Donsky (Cloudera), Andre Araujo (Cloudera), Michael Yoder (Cloudera), Manish Ahluwalia (Cloudera) - Part 4

Paint the landscape and secure your data center with Apache Spot - Cesar Berho (Intel), Alan Ross (Intel)

Cloudy with a chance of fraud: A look at cloud-hosted attack trends - Ting-Fang Yen (DataVisor)

Pluggable security in Hadoop - Yuliya Feldman (Dremio Corporation)

Don’t sleep on sleeper cells: Using big data to drive detection - Yinglian Xie (DataVisor)

Malicious site detection with large-scale belief propagation - Alexander Ulanov (Hewlett Packard Labs), Manish Marwah (Hewlett Packard Labs)

Data engineering and architecture

Big data for operational insights - Felix Gorodishter (GoDaddy)

Shifting left for continuous quality in an Agile data world - Avinash Padmanabhan (Intuit)

Mistakes were made, but not by us: Lessons from a year of supporting Apache Kafka - Ryan Pridgeon (Confluent), Dustin Cote (Confluent)

Achieving real-time ingestion and analysis of security events through Kafka and Metron - Kevin Mao (Capital One)

The Netflix data platform: Now and in the future - Kurt Brown (Netflix)

Making architecture choices for small and big data problems - Nischal HP (Unnati Data Labs), Raghotham Sripadraj (Unnati Data Labs)

Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at LinkedIn - Shirshanka Das (LinkedIn), Yael Garten (LinkedIn)

The future of column-oriented data processing with Arrow and Parquet - Julien Le Dem (Dremio), Jacques Nadeau (Dremio)

DevOps for models: How to manage millions of models in production - Teresa Tung (Accenture Labs), Jurgen Weichenberger (Accenture Analytics), Ishmeet Grewal (Accenture Technology Labs)

One cluster does not fit all: Architecture patterns for multicluster Apache Kafka deployments - Gwen Shapira (Confluent)

Deep learning for IT operations intelligence using open source tools - Shivnath Babu (Duke University | Unravel Data Systems)