• All Days

    April 8  |   April 9  |   April 10  |   April 11

    Day One

    Sunday, April 8

    8:00AM

    9:00AM

    Breakfast/Registration

    4th Floor Lobby

    9:00AM

    12:30PM

    Tutorial 1: Introduction to Machine Learning with SciKit Learn

    Ian Stokes-Reese

    Tutorial 2: Packaging in the conda Ecosystem

    Michael Sarahan

    12:30PM

    1:30AM

    Lunch

    1:30PM

    5:00PM

    Tutorial 3: Up & Running with Anaconda Enterprise

    Kris Overholt, Daniel Rodriguez

    Tutorial 4: Practical Data Science and ML with GPUs

    Stan Seibert

    5:00PM

    6:00PM

    Break

    6:00PM

    8:00PM

    Opening Reception

    4th Floor Lobby

    Day Two

    Monday, April 9

    9:00AM

    10:00AM

    Breakfast/Registration

    4th Floor Lobby

    10:00AM

    10:50AM

    Opening Keynote

    Scott Collison & Peter Wang

    11:00AM

    11:50AM

    • Quick and Easy TensorFlow on AE5

      Anaconda Michael Grant

      Description coming soon!

    • Learning in Cycles: Implementing Sustainable Machine Learning Models in Production

      Real World Andrew Therriault

      Machine learning textbooks tend to focus too narrowly on specific algorithms or code without looking at the bigger picture. One key real-world application that's rarely covered: predictive models which are regularly updated with new data stemming from earlier predictions. Done poorly, repeated models can amplify the errors and biases of their initial versions. But when done right, they can learn from those mistakes over time, and employ the results of previous versions as new training data to keep the model fresh and productive over the course of months or years of applied use. With examples from Andrew’s own work in the political, nonprofit, and civic data science fields, this talk will introduce a framework for designing machine learning models that get better over time.

    • Deep Learning with Just a Little Bit of Data

      Open Source Michael Bernico

      There’s no question that deep learning is changing the field of machine learning at an extremely rapid pace. Given enough data, deep learning can solve problems we couldn’t imagine just a few years ago. But what do we do when there isn’t enough data? Can we still apply deep learning when we only have hundreds, or thousands of data points?
      In this talk we will discuss doing deep learning with very little data. We will discuss the topic of transfer learning, which we find to be immensely useful for the business applications of deep learning. Finally, we will present some original research that shows just how far we can go with transfer learning on very small volumes of data.

    11:50AM

    1:00PM

    Lunch and Sponsor Showcase

    1:00PM

    1:50PM

    • Deploying Python and R to Spark and Hadoop

      Anaconda Daniel Rodriguez

      As Python, R, and Spark position themselves as the industry-standard tools for data analysis, questions often arise on how to use these tools better together. Anaconda Enterprise provides an easy yet powerful architecture that allows people to connect from interactive sessions and deployments to running Spark clusters from Python and R.

      We will take a look at the Anaconda Enterprise 5 architecture for connecting to Hadoop/Spark clusters that is powered by Sparkmagic and Apache Livy (incubating) while taking a look at the benefits of this architecture and how it allows users to securely and easily connect to remote Hadoop/Spark clusters. We will also look at how Anaconda Enterprise enables users to do runtime distribution of custom Anaconda installers using Cloudera Parcels and Ambari Management Packs, allowing data scientists to ship Anaconda environments and leverage libraries from Anaconda.

      Finally, we will look at examples of connecting to remote clusters with Python and R in two use cases: interactive development and production deployment and how Anaconda Enterprise adapts and allows you to use the tools you know and like to do your work.

    • Achoo: Using Machine Learning to Fight My Son’s Asthma

      Real World Tim Dobbins

      Achoo uses a Raspberry Pi to predict if Tim’s son will need his inhaler on any given day using weather, pollen, and air quality data. If the prediction for a given day is above a specified threshold, the Pi will email both Tim and the school nurse, notifying her that he may need preemptive treatment. The system is designed to be language-agnostic with regard to the predictive models used. The backend is built with Python/Flask.

    • Making Convolutional Neural Networks Uncool

      Open Source Sanyam Bhutani

      Sanyam was recently admitted into the fast.ai v2 international fellowship. This talk is inspired by fast.ai’s “Making Neural Nets Uncool Again” teaching philosophy.
      Sanyam will introduce the most important architecture in computer vision in a top-down manner. He will review the best CNN architectures and demonstrate how to use them in deep learning applications. Attendees need only simple Python knowledge and high school math to be able to develop state-of-the-art models and a deep learning image application in under one hour. This talk aims to introduce a broader audience to CNNs by focusing on code and applications rather than math.

    2:00PM

    2:50PM

    • Architecting AE5 Deployments

      Anaconda Kris Overholt

      Description coming soon!

    • How to Use Data Science for Social Justice Work

      Real World Eric Schles

      Eric will show participants how to create change in the world with technology and data science. He'll use his own journey in combating human trafficking as an example, extracting out themes from these examples. There will be technical demos throughout, showing the gallery of technical solutions Eric has used to try to solve this social problem. The intention is to show the audience how these solutions generalize. In doing so, the hope is, participants will be able to solve their own social issues.

    • Getting Started with Anaconda Distribution

      Open Source Ian Stokes-Reese

      Description coming soon!

    2:50PM

    3:10PM

    Afternoon Break and Sponsor Showcase

    3:10PM

    4:00PM

    • Enterprise Package Governance

      Anaconda Duane Lawrence

      Open Source Data Science tools have ushered in an unprecedented wave of innovation. Rather than spend days, weeks, or months writing an algorithm from scratch, Data Scientists can find an open source package and implement the algorithm in seconds.

      While the benefits of open source are clear to data scientists, enterprise IT administrators have some concerns. Just who exactly authored these open source packages? How can we be sure that they are secure? When data science teams create their own packages, how can these internal tools be shared and governed securely? The questions abound and for many the answers appear frightening.

      Rest assured. This talk will walk through best practices for enterprise package governance. IT admins will learn how to securely manage open source packages, strategies for whitelisting and blacklisting, and how to easily share internal packages securely.

    • Coming Soon!

      Real World

      Coming soon!

    • Production-grade Packaging with Anaconda

      Open Source Mahmoud Hashemi

      Anaconda always has been a powerful platform for data analysts and scientists across the Python world. The same reasons that make it work for those groups also apply to engineers building and shipping scalable services: easy access to prebuilt packages, including system packages not managed by pip, and other packages not conveniently provided by the operating system.
      This talk will cover using conda and conda envs in real-world industrial settings, what makes conda special for software engineers, and the challenges and goals of packaging. Mahmoud will provide real-world examples using Anaconda to build an OS package (RPM) and Docker images.

    4:10PM

    5:00PM

    • Using Machine Learning to Drive Sales: An Introduction to Anaconda Enterprise

      Anaconda Gus Cavanaugh

      Description coming soon!

    • Data Engineering for Data Scientists

      Real World Max Humber

      When models and data applications are pushed to production, they become brittle black boxes that can and will break. In this talk you’ll learn how to one-up your data science workflow with a little engineering! Or more specifically, about how to to improve the reliability and quality of your data applications... all so that your models won’t break (or at least won’t break as often)!
      Examples for this session will be in Python 3.6+ and will rely on: logging to allow us to debug and diagnose things while they’re running, Click to develop “beautiful” command line interfaces with minimal boiler-plating, and Pytest to write short, elegant, and maintainable tests.

    • Convolutional Neural Networks (CNNs): a game changer for Computer Vision

      Open Source Tassos Sarbanes

      Many call visual data the “dark matter” of the internet. There are many disciplines—such as biology, physics, psychology, mathematics, and computer science—surrounding computer vision. This talk will present a brief history of computer vision and processing’s evolution and revolution. We’ll review important studies including Larry Roberts’s Block World, the Summer Vision Project by MIT, Vision by David Marr, Explaining Visual Science by David Lowe, Normalized Cut by Shi & Malik, and Face Detection by Viola & Jones.
      The focus of the talk will be the introduction of convolutional neural networks (CNNs) and their huge impact on the computer vision space. Code examples based on Jupyter Notebooks will be presented by implementing the PASCAL Visual Object and ImageNet (WordNet) datasets. We’ll also cover general topics in machine learning and deep learning related to visual recognition, such as object detection, action classification, and image captioning.

    Day Three

    Tuesday, April 10

    8:00AM

    9:00AM

    Breakfast/Registration

    4th Floor Lobby

    9:00AM

    9:50AM

    Keynote Address

    10:00AM

    10:50AM

    • conda: Tips & Tricks

      Anaconda Kale Franz

      Description coming soon!

    • Building Better Badass Cars

      Real World Peter Buschbacher

      Cars are incredibly difficult to manufacture. The fracture between IT and Business has forced a lot of analytical development down the drain in the past. However, with current capabilities for data extraction, analysis, and computation, vehicle production is being continuously improved upon. This talk will focus on how analysis, questioning, and foundational data science have helped plant managers across the globe bring solutions to difficult problems in the manufacturing sphere.

    • Building a Data Science Team using Open Source Data Science

      Open Source Katrina Riehl

      Open source data science technologies have changed the face of building and operating a data science organization. In this talk, Katrina will explore how and why open source technologies are necessary for the success of businesses hoping to use data science and machine learning to power innovation. She will discuss how HomeAway.com is using tools like Anaconda, conda, and other Python-powered open source libraries to change how they look at their market and stay competitive. She will also discuss her journey in making Python a first-class citizen in a traditionally Java-based organization while growing a data science team from the ground up.

    11:00AM

    11:50AM

    • AE5 Deployment and Integration Deep-Dive

      Anaconda Daniel Rodriguez

      We will explore various data sources, formats, and tools in the data science ecosystem and how Anaconda Enterprise makes it very easy to integrate with various data sources and distributed compute engines available to data scientists. Discover how to bridge the gap between your IT governance and security concerns related to remote data and compute resources and empower your data-hungry analytics team.

    • Machine Learning Crash Course

      Real World Samuel Taylor

      Machine learning is surrounded by so much hype it can seem like magic. Learn the math behind the magic in this whirlwind tour of machine learning. After spending a few minutes learning about machine learning theory, we'll jump right into practice with three different use cases:

      • Teaching a computer sign language (supervised learning)
      • Predicting hourly energy load in the state of Texas (time series/forecasting)
      • Using machine learning to find your next job (recommender systems—content-based filtering)

      With each use case, we'll discover new techniques applicable in real-world machine learning problems.

    • GPU-Accelerating UDFs in PySpark with Numba and PyGDF

      Open Source Joshua Patterson

      With advances in computer hardware such as 10 gigabit network cards, infiniband, and solid state drives all becoming commodity offerings, the new bottleneck in big data technologies is very commonly the processing power of the CPU. In order to meet the computational demand desired by users, enterprises have had to resort to extreme scale out approaches just to get the processing power they need. One of the most well known technologies in this space, Apache Spark, has numerous enterprises publicly talking about the challenges in running multiple 1000+ node clusters to give their users the processing power they need. This talk is based on work completed by NVIDIA’s Applied Solutions Engineering team. Attendees will learn how they were able to GPU-accelerate UDFs in PySpark using open source technologies such as Numba and PyGDF, the lessons they learned in the process, and how they were able to accelerate workloads in a fraction of the hardware footprint.

    11:50AM

    1:00PM

    Lunch and Sponsor Showcase

    1:00PM

    1:50PM

    • Coming soon!

      Anaconda

      Coming soon!

    • Coming soon!

      Real World

      Coming soon!

    • Real-Time Processing with Dask

      Open Source Matt Rocklin

      Description coming soon!

    2:00PM

    2:50PM

    • Model Management with Anaconda Enterprise

      Anaconda Michael Grant

      Description coming soon!

    • What You Gonna Do With All That Malware: Malware Analysis and Machine Learning
      When You Can’t Fit It All On One Server

      Real World Austin West & Drew Bonasera

      MultiScanner is an open source malware analysis framework that assists the user in evaluating a set of files by automatically running a suite of tools and aggregating the output. The true power of this system is that it stores all the outputs from all of an analyst’s malware analysis tools in one highly performant, searchable, and scalable data store.
      This talk will focus on one such analytic known as ExeMANA. Exe-MANA is a deep neural network written entirely in Python for detecting if a Portable Executable file is malicious or benign using only static analysis. Exe-MANA is a great example of how easy it is to prototype data science techniques in Python with little to no experience in data science. Austin and Drew will go over the basic process for building Exe-MANA, and how they leverage MultiScanner to help speed up this process and continue the training process as they get new data.

    • Coming soon!

      Open Source

      Coming soon!

    2:50PM

    3:10PM

    Afternoon Break and Sponsor Showcase

    3:10PM

    4:00PM

    • Coming soon!

      Anaconda

      Coming soon!

    • How to Make Your Data Scientists Happy - a Use-case Backed Approach for Enabling Data Science in the Enterprise

      Real World Hussain Sultan & Tim Horan

      The potential of data science and rapid analytics in enterprise is broadly accepted. If done right, processes that took weeks can take minutes and the impossible becomes possible. However, if done incorrectly, the promise of data science will stagnate. Bringing new models and insights to market will be hampered, if not entirely blocked, by siloed legacy infrastructure and processes. Furthermore, over time you will struggle to retain top data science talent and maintain organizational support and investment.
      Enabling data scientists within an enterprise requires a well-thought out approach from an organization, technology, and business results perspective. In this talk, Tim and Hussain will share common pitfalls to data science enablement in the enterprise and provide their recommendations to avoid them. Taking an example, actionable use case from the financial services industry, they will focus on how Anaconda plays a pivotal role in setting up big data infrastructure, integrating data science experimentation and production environments, and deploying insights to production. Along the way, they will highlight opportunities for leveraging open source and unleashing data science teams while meeting regulatory and compliance challenges.

    • Jumpstart Writing Continuous Applications with Structured Streaming Python APIs in Apache Spark

      Open Source Jules Damj

      We are in the midst of a Big Data Zeitgeist in which data comes at us fast, in myriad forms and formats at intermittent intervals or in a continuous stream, and we need to respond to streaming data immediately. This need has created a notion of writing a streaming application that reacts and interacts with data in real-time. We call this a continuous application.
      In this talk we will explore the concepts and motivations behind continuous applications and how Structured Streaming Python APIs in Apache Spark 2.x enables writing them. We also will examine the programming model behind Structured Streaming and the APIs that support them. Through a short demo and code examples, Jules will demonstrate how to write an end-to-end Structured Streaming application that reacts and interacts with both real-time and historic data to perform advanced analytics using Spark SQL, DataFrames, and Datasets APIs.

    4:10PM

    5:00PM

    • Coming soon!

      Anaconda

      Coming soon!

    • Coming soon!

      Real World

      Coming soon!

    • Coming soon!

      Open Source

      Coming soon!

    5:00PM

    6:00PM

    Break

    6:00PM

    7:00PM

    Shuttles Running between JW Marriott and Fair Market

    7:00PM

    10:00PM

    AnacondaCON Carne Offsite Party @ Fair Market

    Day Four

    Wednesday, April 11

    9:00AM

    10:00AM

    Breakfast/Registration

    4th Floor Lobby

    10:00AM

    10:50AM

    • Anaconda Distribution Roadmap

      Anaconda Crystal Soja

      We will cover the six parts of the Anaconda Distribution and how they are connected. These include the Anaconda and miniconda installers, repo.anaconda.com, anaconda.org, Anaconda Navigator, conda, and conda-build. We will then delve into the upcoming plans and release cadences for each aspect. We will also cover additional themes like signed packages and an improved conda/pip/wheels user experience that will impact multiple parts of the Anaconda Distribution.

    • IoT Predictive Maintenance using Recurrent Neural Networks

      Real World Justin Brandenburg

      The idea behind predictive maintenance is that the failure patterns of various types of equipment are predictable. If an organization can accurately predict when a piece of hardware will fail, and replace that component before it fails, it can achieve much higher levels of operational efficiency. With many devices now including sensor data and other components that send diagnosis reports, predictive maintenance using big data is increasingly accurate and effective. In this case, how can we enhance our data monitoring to predict the next event?
      This talk will present an actual use case in the IoT industry 4.0 space. Justin will present an entire workflow of data ingestion, bulk ETL, data exploration, model training, testing, and deployment in a real time streaming architecture that can scale. He will demonstrate how he used Anaconda Python 3.5 and Pyspark 2.1.0 to wrangle data and train a recurrent neural network to predict whether the next event in a real time stream indicated that maintenance was required.

    • Accelerating Deep Learning with GPUs

      Open Source Stan Selbert

      Coming soon!

    11:00AM

    11:50AM

    • conda Deep Dive

      Anaconda Kale Franz

      Coming soon!

    • Setting Big Data on Fire: the FireCARES and NFORS projects

      Real World Craig Weinschenk

      Local government decision-makers often alter fire department resources faster than fire service leaders can evaluate the potential impact. These decisions can leave a community without sufficient resources to respond to emergency calls safely, efficiently, and effectively. The Fire Community Assessment/Response Evaluation System (FireCARES) provides fire departments the ability to add a technical basis to what historically has been an anecdotal discussion regarding community hazards and risks as well as the impact of changes to fire department resource levels. To accomplish this task, FireCARES provides three scores for each community based on the available data: the Community Risk Score, the Fire Department Performance Score, and the Safe Grade. These scores are generated from exploiting an expansive, multi-layered data set combining fire incidents, outcomes, and community risk characteristics. Fire incident data is not without flaws as it primarily relies on firefighters for data entry. Additionally, on the national level there is a two-year data lag. To overcome this obstacle, we have built the National Fire Operations Reporting System (NFORS), a real-time data analysis tool which leverages modern data practices while removing firefighters from data entry.

    • Parallel scikit-learn with Dask

      Open Source Tom Augspurger

      Coming soon!

    11:50AM

    1:00PM

    Lunch

    1:00PM

    1:50PM

    • Coming soon!

      Anaconda

      Coming soon!

    • Causal Inference in Tech

      Real World Jenny Lin

      This session deals with how to conscientiously approach causal inference in large, messy data sets common in tech, in the absence of an experiment (or when experimental setup was not ideal). In the real world, correlation is sometimes not enough basis for a million dollar business decision. That's where causal inference comes in. Causal inference establishes a causal link between effect X and outcome Y and is often necessary for making critical and expensive business choices. Many pitfalls exist that render "simple" causal analyses entirely misleading and potentially costly. Here, Jenny will discuss some of the approaches taken at Yelp in determining causality when faced with a common question across tech firms: how do we know that our implementation of feature X caused an effect on metric Y and what was the size of the effect? Factors to correct for when extrapolating causality include: selection bias into the comparison groups, time trends in the outcome feature, time period mismatches across observations, addressing multicollinearity, clustering standard errors, and more! Jenny will walk through a stylized example of a causal inference problem she ran into at Yelp and showcase how one can easily arrive at a very misleading conclusion when not correcting for the aforementioned issues.

    • Conda, Docker & Kubernetes: The Cloud-Native Future of Data Science

      Open Source Mathew Lodge

      Coming soon!

    2:00PM

    2:50PM

    • Fraud Prevention in Financial Services: Advanced Anaconda Enterprise

      Anaconda Gus Cavanaugh

      Coming soon!

    • Coming soon!

      Real World

      Coming soon!

    • Coming soon!

      Open Source

      Coming soon!

    3:00PM

    3:50PM

    Closing Keynote

    David Yeager

  • All Tracks

    Anaconda  |   Real World  |   Open Source

    Day Two

    Monday, April 9

    11:00AM

    11:50AM

    • Quick and Easy TensorFlow on AE5

      Anaconda Michael Grant

      Coming soon!

    1:00PM

    1:50PM

    • Deploying Python and R to Spark and Hadoop

      Anaconda Daniel Rodriguez

      Coming soon!

    2:00PM

    2:50PM

    • Architecting AE5 Deployments

      Anaconda Kris Overholt

      Coming soon!

    3:10PM

    4:00PM

    • Enterprise Package Governance

      Anaconda Duane Lawrence

    4:10PM

    5:00PM

    • Using Machine Learning to Drive Sales: An Introduction to Anaconda Enterprise

      Anaconda Gus Cavanaugh

      Description coming soon!

    Day Three

    Tuesday, April 10

    10:00AM

    10:50AM

    • conda: Tips & Tricks

      Anaconda Kale Franz

      Description coming soon!

    11:00AM

    11:50AM

    • AE5 Deployment and Integration Deep-Dive

      Anaconda Daniel Rodriguez

      Description coming soon!

    1:00PM

    1:50PM

    • Coming soon!

      Anaconda

      Description coming soon!

    2:00PM

    2:50PM

    • Model Management with Anaconda Enterprise

      Anaconda Michael Grant

      Description coming soon!

    3:10PM

    4:00PM

    • Coming soon!

      Anaconda

      Description coming soon!

    4:10PM

    5:00PM

    • Coming soon!

      Anaconda

      Description coming soon!

    Day Four

    Wednesday, April 11

    10:00AM

    10:50AM

    • Anaconda Distribution Roadmap

      Anaconda Crystal Soja

      Description coming soon!

    11:00AM

    11:50AM

    • conda Deep Dive

      Anaconda Kale Franz

      Description coming soon!

    1:00PM

    1:50PM

    • Coming soon!

      Anaconda

      Description coming soon!

    2:00PM

    2:50PM

    • Fraud Prevention in Financial Services: Advanced Anaconda Enterprise

      Anaconda Gus Cavanaugh

      Description coming soon!

    Day Two

    Monday, April 9

    11:00AM

    11:50AM

    • Learning in Cycles: Implementing Sustainable Machine Learning Models in Production

      Real World Andrew Therriault

      Machine learning textbooks tend to focus too narrowly on specific algorithms or code without looking at the bigger picture. One key real-world application that's rarely covered: predictive models which are regularly updated with new data stemming from earlier predictions. Done poorly, repeated models can amplify the errors and biases of their initial versions. But when done right, they can learn from those mistakes over time, and employ the results of previous versions as new training data to keep the model fresh and productive over the course of months or years of applied use. With examples from Andrew’s own work in the political, nonprofit, and civic data science fields, this talk will introduce a framework for designing machine learning models that get better over time.

    1:00PM

    1:50PM

    • Achoo: Using Machine Learning to Fight My Son’s Asthma

      Real World Tim Dobbins

      Achoo uses a Raspberry Pi to predict if Tim’s son will need his inhaler on any given day using weather, pollen, and air quality data. If the prediction for a given day is above a specified threshold, the Pi will email both Tim and the school nurse, notifying her that he may need preemptive treatment. The system is designed to be language-agnostic with regard to the predictive models used. The backend is built with Python/Flask.

    2:00PM

    2:50PM

    • How to Use Data Science for Social Justice Work

      Real World Eric Schles

      Eric will show participants how to create change in the world with technology and data science. He'll use his own journey in combating human trafficking as an example, extracting out themes from these examples. There will be technical demos throughout, showing the gallery of technical solutions Eric has used to try to solve this social problem. The intention is to show the audience how these solutions generalize. In doing so, the hope is, participants will be able to solve their own social issues.

    3:10PM

    4:00PM

    • Coming soon!

      Real World

      Description coming soon!

    4:10PM

    5:00PM

    • Data Engineering for Data Scientists

      Real World Max Humber

      When models and data applications are pushed to production, they become brittle black boxes that can and will break. In this talk you’ll learn how to one-up your data science workflow with a little engineering! Or more specifically, about how to to improve the reliability and quality of your data applications... all so that your models won’t break (or at least won’t break as often)!
      Examples for this session will be in Python 3.6+ and will rely on: logging to allow us to debug and diagnose things while they’re running, Click to develop “beautiful” command line interfaces with minimal boiler-plating, and Pytest to write short, elegant, and maintainable tests.

    Day Three

    Tuesday, April 10

    10:00AM

    10:50AM

    • Building Better Badass Cars

      Real World Peter Buschbacher

      Cars are incredibly difficult to manufacture. The fracture between IT and Business has forced a lot of analytical development down the drain in the past. However, with current capabilities for data extraction, analysis, and computation, vehicle production is being continuously improved upon. This talk will focus on how analysis, questioning, and foundational data science have helped plant managers across the globe bring solutions to difficult problems in the manufacturing sphere.

    11:00AM

    11:50AM

    • Machine Learning Crash Course

      Real World Samuel Taylor

      Machine learning is surrounded by so much hype it can seem like magic. Learn the math behind the magic in this whirlwind tour of machine learning. After spending a few minutes learning about machine learning theory, we'll jump right into practice with three different use cases:

      • Teaching a computer sign language (supervised learning)
      • Predicting hourly energy load in the state of Texas (time series/forecasting)
      • Using machine learning to find your next job (recommender systems—content-based filtering)
        • With each use case, we'll discover new techniques applicable in real-world machine learning problems.

    1:00PM

    1:50PM

    • Coming soon!

      Real World

      Description coming soon!

    2:00PM

    2:50PM

    • What You Gonna Do With All That Malware: Malware Analysis and Machine Learning
      When You Can’t Fit It All On One Server

      Real World Austin West & Drew Bonasera

      MultiScanner is an open source malware analysis framework that assists the user in evaluating a set of files by automatically running a suite of tools and aggregating the output. The true power of this system is that it stores all the outputs from all of an analyst’s malware analysis tools in one highly performant, searchable, and scalable data store.
      This talk will focus on one such analytic known as ExeMANA. Exe-MANA is a deep neural network written entirely in Python for detecting if a Portable Executable file is malicious or benign using only static analysis. Exe-MANA is a great example of how easy it is to prototype data science techniques in Python with little to no experience in data science. Austin and Drew will go over the basic process for building Exe-MANA, and how they leverage MultiScanner to help speed up this process and continue the training process as they get new data.

    3:10PM

    4:00PM

    • How to Make Your Data Scientists Happy - a Use-case Backed Approach for Enabling Data Science in the Enterprise

      Real World Hussain Sultan & Tim Horan

      The potential of data science and rapid analytics in enterprise is broadly accepted. If done right, processes that took weeks can take minutes and the impossible becomes possible. However, if done incorrectly, the promise of data science will stagnate. Bringing new models and insights to market will be hampered, if not entirely blocked, by siloed legacy infrastructure and processes. Furthermore, over time you will struggle to retain top data science talent and maintain organizational support and investment.
      Enabling data scientists within an enterprise requires a well-thought out approach from an organization, technology, and business results perspective. In this talk, Tim and Hussain will share common pitfalls to data science enablement in the enterprise and provide their recommendations to avoid them. Taking an example, actionable use case from the financial services industry, they will focus on how Anaconda plays a pivotal role in setting up big data infrastructure, integrating data science experimentation and production environments, and deploying insights to production. Along the way, they will highlight opportunities for leveraging open source and unleashing data science teams while meeting regulatory and compliance challenges.

    4:10PM

    5:00PM

    • Coming soon!

      Real World

      Description coming soon!

    Day Four

    Wednesday, April 11

    10:00AM

    10:50AM

    • IoT Predictive Maintenance using Recurrent Neural Networks

      Real World Justin Brandenburg

      The idea behind predictive maintenance is that the failure patterns of various types of equipment are predictable. If an organization can accurately predict when a piece of hardware will fail, and replace that component before it fails, it can achieve much higher levels of operational efficiency. With many devices now including sensor data and other components that send diagnosis reports, predictive maintenance using big data is increasingly accurate and effective. In this case, how can we enhance our data monitoring to predict the next event?
      This talk will present an actual use case in the IoT industry 4.0 space. Justin will present an entire workflow of data ingestion, bulk ETL, data exploration, model training, testing, and deployment in a real time streaming architecture that can scale. He will demonstrate how he used Anaconda Python 3.5 and Pyspark 2.1.0 to wrangle data and train a recurrent neural network to predict whether the next event in a real time stream indicated that maintenance was required.

    11:00AM

    11:50AM

    • Setting Big Data on Fire: the FireCARES and NFORS projects

      Real World Craig Weinschenk

      Local government decision-makers often alter fire department resources faster than fire service leaders can evaluate the potential impact. These decisions can leave a community without sufficient resources to respond to emergency calls safely, efficiently, and effectively. The Fire Community Assessment/Response Evaluation System (FireCARES) provides fire departments the ability to add a technical basis to what historically has been an anecdotal discussion regarding community hazards and risks as well as the impact of changes to fire department resource levels. To accomplish this task, FireCARES provides three scores for each community based on the available data: the Community Risk Score, the Fire Department Performance Score, and the Safe Grade. These scores are generated from exploiting an expansive, multi-layered data set combining fire incidents, outcomes, and community risk characteristics. Fire incident data is not without flaws as it primarily relies on firefighters for data entry. Additionally, on the national level there is a two-year data lag. To overcome this obstacle, we have built the National Fire Operations Reporting System (NFORS), a real-time data analysis tool which leverages modern data practices while removing firefighters from data entry.

    1:00PM

    1:50PM

    • Causal Inference in Tech

      Real World Jenny Lin

      This session deals with how to conscientiously approach causal inference in large, messy data sets common in tech, in the absence of an experiment (or when experimental setup was not ideal). In the real world, correlation is sometimes not enough basis for a million dollar business decision. That's where causal inference comes in. Causal inference establishes a causal link between effect X and outcome Y and is often necessary for making critical and expensive business choices. Many pitfalls exist that render "simple" causal analyses entirely misleading and potentially costly. Here, Jenny will discuss some of the approaches taken at Yelp in determining causality when faced with a common question across tech firms: how do we know that our implementation of feature X caused an effect on metric Y and what was the size of the effect? Factors to correct for when extrapolating causality include: selection bias into the comparison groups, time trends in the outcome feature, time period mismatches across observations, addressing multicollinearity, clustering standard errors, and more! Jenny will walk through a stylized example of a causal inference problem she ran into at Yelp and showcase how one can easily arrive at a very misleading conclusion when not correcting for the aforementioned issues.

    2:00PM

    2:50PM

    • Coming soon!

      Real World

      Description coming soon!

    Day Two

    Monday, April 9

    11:00AM

    11:50AM

    • Deep Learning with Just a Little Bit of Data

      Open Source Michael Bernico

      There’s no question that deep learning is changing the field of machine learning at an extremely rapid pace. Given enough data, deep learning can solve problems we couldn’t imagine just a few years ago. But what do we do when there isn’t enough data? Can we still apply deep learning when we only have hundreds, or thousands of data points?
      In this talk we will discuss doing deep learning with very little data. We will discuss the topic of transfer learning, which we find to be immensely useful for the business applications of deep learning. Finally, we will present some original research that shows just how far we can go with transfer learning on very small volumes of data.

    1:00PM

    1:50PM

    • Making Convolutional Neural Networks Uncool

      Open Source Sanyam Bhutani

      Sanyam was recently admitted into the fast.ai v2 international fellowship. This talk is inspired by fast.ai’s “Making Neural Nets Uncool Again” teaching philosophy.
      Sanyam will introduce the most important architecture in computer vision in a top-down manner. He will review the best CNN architectures and demonstrate how to use them in deep learning applications. Attendees need only simple Python knowledge and high school math to be able to develop state-of-the-art models and a deep learning image application in under one hour. This talk aims to introduce a broader audience to CNNs by focusing on code and applications rather than math.

    2:00PM

    2:50PM

    • Getting Started with Anaconda Distribution

      Open Source Ian Stokes-Reese

      Description coming soon!

    3:10PM

    4:00PM

    • Production-grade Packaging with Anaconda

      Open Source Mahmoud Hashemi

      Anaconda always has been a powerful platform for data analysts and scientists across the Python world. The same reasons that make it work for those groups also apply to engineers building and shipping scalable services: easy access to prebuilt packages, including system packages not managed by pip, and other packages not conveniently provided by the operating system.
      This talk will cover using conda and conda envs in real-world industrial settings, what makes conda special for software engineers, and the challenges and goals of packaging. Mahmoud will provide real-world examples using Anaconda to build an OS package (RPM) and Docker images.

    4:10PM

    5:00PM

    • Convolutional Neural Networks (CNNs): a game changer for Computer Vision

      Open Source Tassos Sarbanes

      Many call visual data the “dark matter” of the internet. There are many disciplines—such as biology, physics, psychology, mathematics, and computer science—surrounding computer vision. This talk will present a brief history of computer vision and processing’s evolution and revolution. We’ll review important studies including Larry Roberts’s Block World, the Summer Vision Project by MIT, Vision by David Marr, Explaining Visual Science by David Lowe, Normalized Cut by Shi & Malik, and Face Detection by Viola & Jones.
      The focus of the talk will be the introduction of convolutional neural networks (CNNs) and their huge impact on the computer vision space. Code examples based on Jupyter Notebooks will be presented by implementing the PASCAL Visual Object and ImageNet (WordNet) datasets. We’ll also cover general topics in machine learning and deep learning related to visual recognition, such as object detection, action classification, and image captioning.

    Day Three

    Tuesday, April 10

    10:00AM

    10:50AM

    • Building a Data Science Team using Open Source Data Science

      Open Source Katrina Riehl

      Open source data science technologies have changed the face of building and operating a data science organization. In this talk, Katrina will explore how and why open source technologies are necessary for the success of businesses hoping to use data science and machine learning to power innovation. She will discuss how HomeAway.com is using tools like Anaconda, conda, and other Python-powered open source libraries to change how they look at their market and stay competitive. She will also discuss her journey in making Python a first-class citizen in a traditionally Java-based organization while growing a data science team from the ground up.

    11:00AM

    11:50AM

    • GPU-Accelerating UDFs in PySpark with Numba and PyGDF

      Open Source Joshua Patterson

      With advances in computer hardware such as 10 gigabit network cards, infiniband, and solid state drives all becoming commodity offerings, the new bottleneck in big data technologies is very commonly the processing power of the CPU. In order to meet the computational demand desired by users, enterprises have had to resort to extreme scale out approaches just to get the processing power they need. One of the most well known technologies in this space, Apache Spark, has numerous enterprises publicly talking about the challenges in running multiple 1000+ node clusters to give their users the processing power they need. This talk is based on work completed by NVIDIA’s Applied Solutions Engineering team. Attendees will learn how they were able to GPU-accelerate UDFs in PySpark using open source technologies such as Numba and PyGDF, the lessons they learned in the process, and how they were able to accelerate workloads in a fraction of the hardware footprint.

    1:00PM

    1:50PM

    • Real-Time Processing with Dask

      Open Source Matt Rocklin

      Description coming soon!

    2:00PM

    2:50PM

    • Coming soon!

      Open Source

      Description coming soon!

    3:10PM

    4:00PM

    • Jumpstart Writing Continuous Applications with Structured Streaming Python APIs in Apache Spark

      Open Source Jules Damj

      We are in the midst of a Big Data Zeitgeist in which data comes at us fast, in myriad forms and formats at intermittent intervals or in a continuous stream, and we need to respond to streaming data immediately. This need has created a notion of writing a streaming application that reacts and interacts with data in real-time. We call this a continuous application.
      In this talk we will explore the concepts and motivations behind continuous applications and how Structured Streaming Python APIs in Apache Spark 2.x enables writing them. We also will examine the programming model behind Structured Streaming and the APIs that support them. Through a short demo and code examples, Jules will demonstrate how to write an end-to-end Structured Streaming application that reacts and interacts with both real-time and historic data to perform advanced analytics using Spark SQL, DataFrames, and Datasets APIs.

    4:10PM

    5:00PM

    • Coming soon!

      Open Source

      Description coming soon!

    Day Four

    Wednesday, April 11

    10:00AM

    10:50AM

    • Accelerating Deep Learning with GPUs

      Open Source Stan Selbert

      Description coming soon!

    11:00AM

    11:50AM

    • Parallel scikit-learn with Dask

      Open Source Tom Augspurger

      Description coming soon!

    1:00PM

    1:50PM

    • Conda, Docker & Kubernetes: The Cloud-Native Future of Data Science

      Open Source Mathew Lodge

      Description coming soon!

    2:00PM

    2:50PM

    • Coming soon!

      Open Source

      Description coming soon!