This podcast helps Machine Learning Engineers become the best at what they do. Join host Charlie You every week as he talks to the brightest minds in data scien...
This podcast helps Machine Learning Engineers become the best at what they do. Join host Charlie You every week as he talks to the brightest minds in data scien...
Meer
Beschikbare afleveringen
5 van 32
Diving Deep into Synthetic Data with Alex Watson of Gretel.ai
Alex Watson is the co-founder and CEO of Gretel.ai, a startup that offers APIs for creating anonymized and synthetic datasets. Previously he was the founder of Harvest.ai, whose product Macie, an analytics platform protecting against data breaches, was acquired by AWS.Learn more about Alex and Gretel AI:http://gretel.aiEvery Thursday I send out the most useful things I’ve learned, curated specifically for the busy machine learning engineer. Sign up here: https://www.cyou.ai/newsletterFollow Charlie on Twitter: https://twitter.com/CharlieYouAISubscribe to ML Engineered: https://mlengineered.com/listenComments? Questions? Submit them here: http://bit.ly/mle-surveyTake the Giving What We Can Pledge: https://www.givingwhatwecan.org/Timestamps:02:15 Introducing Alex Watson03:45 How Alex was first exposed to programming05:00 Alex's experience starting Harvest AI, getting acquired by AWS, and integrating their product at massive scale21:20 How Alex first saw the opportunity for Gretel.ai24:20 The most exciting use-cases for synthetic data28:55 Theoretical guarantees of anonymized data with differential privacy36:40 Combining pre-training with synthetic data38:40 When to anonymize data and when to synthesize it41:25 How Gretel's synthetic data engine works44:50 Requirements of a dataset to create a synthetic version49:25 Augmenting datasets with synthetic examples to address representation bias52:45 How Alex recommends teams get started with Gretel.ai59:00 Expected accuracy loss from training models on synthetic data01:03:15 Biggest surprises from building Gretel.ai01:05:25 Organizational patterns for protecting sensitive data01:07:40 Alex's vision for Gretel's data catalog01:11:15 Rapid fire questionsLinks:Gretel.ai BlogNetFlix Cancels Recommendation Contest After Privacy LawsuitGreylock - The Github of DataImproving massively imbalanced datasets in machine learning with synthetic dataDeep dive on generating synthetic data for HealthcareGretel’s New Synthetic Performance ReportThe...
20-4-2021
1:19:11
A Practical Approach to Learning Machine Learning with Radek Osmulski (Earth Species Project)
Radek Osmulski is a fully self-taught machine learning engineer. After getting tired of his corporate job, he taught himself programming and started a new career as a Ruby on Rails developer. He then set out to learn machine learning. Since then, he's been a Fast AI International Fellow, become a Kaggle Master, and is now an AI Data Engineer on the Earth Species Project.Learn more about Radek:https://www.radekosmulski.comhttps://twitter.com/radekosmulskiEvery Thursday I send out the most useful things I’ve learned, curated specifically for the busy machine learning engineer. Sign up here: http://cyou.ai/newsletterFollow Charlie on Twitter: https://twitter.com/CharlieYouAISubscribe to ML Engineered: https://mlengineered.com/listenComments? Questions? Submit them here: http://bit.ly/mle-surveyTake the Giving What We Can Pledge: https://www.givingwhatwecan.org/Timestamps:02:15 How Radek got interested in programming and computer science09:00 How Radek taught himself machine learning26:40 The skills Radek learned from Fast AI39:20 Radek's recommendations for people learning ML now51:30 Why Radek is writing a book01:01:20 Radek's work at the Earth Species Project01:10:15 How the ESP collects animal language data01:21:05 Rapid fire questionsLinks:Radek's Book "Meta-Learning"Andrew Ng ML CourseraFast AIUniversal Language Model Fine-tuning for Text ClassificationHow to do Machine Learning EfficientlyNPR - Two Heartbeats a MinuteEarth Species ProjectA Guide to the Good LifeThe Origin of WealthMake TimeYou Are Here
30-3-2021
1:38:02
From Data Science Leader to ML Researcher with Rodrigo Rivera (Skoltech ADASE, Samsung NEXT)
Rodrigo Rivera is a machine learning researcher at the Advanced Data Analytics in Science and Engineering Group at Skoltech and technical director of Samsung Next. He's previously been in data science and research leadership roles at companies all around the world including Rocket Internet and Philip-Morris.Learn more about Rodrigo:https://rodrigo-rivera.com/https://twitter.com/rodrigorivrEvery Thursday I send out the most useful things I’ve learned, curated specifically for the busy machine learning engineer. Sign up here: https://www.cyou.ai/newsletterFollow Charlie on Twitter: https://twitter.com/CharlieYouAISubscribe to ML Engineered: https://mlengineered.com/listenComments? Questions? Submit them here: http://bit.ly/mle-surveyTake the Giving What We Can Pledge: https://www.givingwhatwecan.org/Timestamps:03:00 How Rodrigo got started in computer science and started his first company10:40 Rodrigo's experiences leading data science teams at Rocket Internet and PMI26:15 Leaving industry to get a PhD in machine learning28:55 Data science collaboration between business and academia32:45 Rodrigo's research interest in time series data39:25 Topological data analysis45:35 Framing effective research as a startup48:15 Neural Prophet01:04:10 The potential future of Julia for numerical computing01:08:20 Most exciting opportunities for ML in industry01:15:05 Rodrigo's advice for listeners01:17:00 Rapid fire questionsLinks:Rodrigo's Google ScholarAdvanced Data Analytics in Science and Engineering GroupNeural ProphetM-CompetitionsMachine Learning RefinedFoundations of Machine LearningA First Course in Machine Learning
23-3-2021
1:23:54
The Future of ML and AI Infrastructure and Ethics with Dan Jeffries (Pachyderm, AI Infrastructure Alliance)
Dan Jeffries is the chief technical evangelist at Pachyderm, a leading data science platform. He's a prominent writer and speaker on all things related to the future. He's been in software for over two decades, many of those at Redhat, and is the founder of the AI Infrastructure Alliance and Practical AI Ethics.Learn more about Dan:https://twitter.com/Dan_Jeffries1https://medium.com/@dan.jeffriesEvery Thursday I send out the most useful things I’ve learned, curated specifically for the busy machine learning engineer. Sign up here: http://cyou.ai/newsletterFollow Charlie on Twitter: https://twitter.com/CharlieYouAISubscribe to ML Engineered: https://mlengineered.com/listenComments? Questions? Submit them here: http://bit.ly/mle-surveyTake the Giving What We Can Pledge: https://www.givingwhatwecan.org/Timestamps:02:15 How Dan got started in computer science06:50 What Dan is most excited about in AI14:45 Where we are in the adoption curve of ML20:40 The "Canonical Stack" of ML32:00 Dan's goal for the AI Infrastructure Alliance40:55 "Problems that ML startups don't know they're going to have"49:00 Closed vs open source tools in the Canonical Stack01:00:05 Building out the "boring" part of the infrastructure to enable exciting applications01:08:40 Dan's practical approach to AI Ethics01:23:50 Rapid fire questionsLinks:PachydermAI Infrastructure AlliancePractical AI Ethics AllianceRise of the Canonical Stack in Machine LearningRise of AI - The Age of AI in 2030Google MagentaAlphaGo DocumentaryThinking in BetsA History of the World in 6 GlassesSuper-Thinking
16-3-2021
1:36:50
Developing Feast, the Leading Open Source Feature Store, with Willem Pienaar (Gojek, Tecton)
Willem Pienaar is the co-creator of Feast, the leading open source feature store, which he leads the development of as a tech lead at Tecton. Previously, he led the ML platform team at Gojek, a super-app in Southeast Asia.Learn more:https://twitter.com/willpienaarhttps://feast.dev/Every Thursday I send out the most useful things I’ve learned, curated specifically for the busy machine learning engineer. Sign up here: https://www.cyou.ai/newsletterFollow Charlie on Twitter: https://twitter.com/CharlieYouAISubscribe to ML Engineered: https://mlengineered.com/listenComments? Questions? Submit them here: http://bit.ly/mle-surveyTake the Giving What We Can Pledge: https://www.givingwhatwecan.org/Timestamps:02:15 How Willem got started in computer science03:40 Paying for college by starting an ISP05:25 Willem's experience creating Gojek's ML platform21:45 Issues faced that led to the creation of Feast26:45 Lessons learned building Feast33:45 Integrating Feast with data quality monitoring tools40:10 What it looks like for a team to adopt Feast44:20 Feast's current integrations and future roadmap46:05 How a data scientist would use Feast when creating a model49:40 How the feature store pattern handles DAGs of models52:00 Priorities for a startup's data infrastructure55:00 Integrating with Amundsen, Lyft's data catalog57:15 The evolution of data and MLOps tool standards for interoperability01:01:35 Other tools in the modern data stack01:04:30 The interplay between open and closed source offeringsLinks:Feast's GithubGojek Data Science BlogData Build Tool (DBT)Tensorflow Data Validation (TFDV)A State of FeastGoogle BigQueryLyft AmundsenCortexKubeflowMLFlow
This podcast helps Machine Learning Engineers become the best at what they do. Join host Charlie You every week as he talks to the brightest minds in data science, artificial intelligence, and software engineering to discover how they bring cutting edge research out of the lab and into products that people love. You'll learn the skills, tools, and best practices you can use to build better ML systems and accelerate your career in this flourishing new field.