Powered by RND
PodcastsTechnologieVanishing Gradients

Vanishing Gradients

Hugo Bowne-Anderson
Vanishing Gradients
Nieuwste aflevering

Beschikbare afleveringen

5 van 61
  • Episode 61: The AI Agent Reliability Cliff: What Happens When Tools Fail in Production
    Most AI teams find their multi-agent systems devolving into chaos, but ML Engineer Alex Strick van Linschoten argues they are ignoring the production reality. In this episode, he draws on insights from the LLM Ops Database (750+ real-world deployments then; now nearly 1,000!) to systematically measure and engineer constraint, turning unreliable prototypes into robust, enterprise-ready AI. Drawing from his work at Zen ML, Alex details why success requires scaling down and enforcing MLOps discipline to navigate the unpredictable "Agent Reliability Cliff". He provides the essential architectural shifts, evaluation hygiene techniques, and practical steps needed to move beyond guesswork and build scalable, trustworthy AI products. We talk through: - Why "shoving a thousand agents" into an app is the fastest route to unmanageable chaos - The essential MLOps hygiene (tracing and continuous evals) that most teams skip - The optimal (and very low) limit for the number of tools an agent can reliably use - How to use human-in-the-loop strategies to manage the risk of autonomous failure in high-sensitivity domains - The principle of using simple Python/RegEx before resorting to costly LLM judges LINKS The LLMOps Database: 925 entries as of today....submit a use case to help it get to 1K! (https://www.zenml.io/llmops-database) Upcoming Events on Luma (https://lu.ma/calendar/cal-8ImWFDQ3IEIxNWk) Watch the podcast video on YouTube (https://youtu.be/-YQjKH3wRvc) 🎓 Learn more: -This was a guest Q&A from Building LLM Applications for Data Scientists and Software Engineers (https://maven.com/hugo-stefan/building-llm-apps-ds-and-swe-from-first-principles?promoCode=AI20) — https://maven.com/hugo-stefan/building-llm-apps-ds-and-swe-from-first-principles?promoCode=AI20 Next cohort starts November 3: come build with us!
    --------  
    28:04
  • Episode 60: 10 Things I Hate About AI Evals with Hamel Husain
    Most AI teams find "evals" frustrating, but ML Engineer Hamel Husain argues they’re just using the wrong playbook. In this episode, he lays out a data-centric approach to systematically measure and improve AI, turning unreliable prototypes into robust, production-ready systems. Drawing from his experience getting countless teams unstuck, Hamel explains why the solution requires a "revenge of the data scientists." He details the essential mindset shifts, error analysis techniques, and practical steps needed to move beyond guesswork and build AI products you can actually trust. We talk through: The 10(+1) critical mistakes that cause teams to waste time on evals Why "hallucination scores" are a waste of time (and what to measure instead) The manual review process that finds major issues in hours, not weeks A step-by-step method for building LLM judges you can actually trust How to use domain experts without getting stuck in endless review committees Guest Bryan Bischof's "Failure as a Funnel" for debugging complex AI agents If you're tired of ambiguous "vibe checks" and want a clear process that delivers real improvement, this episode provides the definitive roadmap. LINKS Hamel's website and blog (https://hamel.dev/) Hugo speaks with Philip Carter (Honeycomb) about aligning your LLM-as-a-judge with your domain expertise (https://vanishinggradients.fireside.fm/51) Hamel Husain on Lenny's pocast, which includes a live demo of error analysis (https://www.lennysnewsletter.com/p/why-ai-evals-are-the-hottest-new-skill) The episode of VG in which Hamel and Hugo talk about Hamel's "data consulting in Vegas" era (https://vanishinggradients.fireside.fm/9) Upcoming Events on Luma (https://lu.ma/calendar/cal-8ImWFDQ3IEIxNWk) Watch the podcast video on YouTube (https://youtube.com/live/QEk-XwrkqhI?feature=share) Hamel's AI evals course, which he teaches with Shreya Shankar (UC Berkeley): starts Oct 6 and this link gives 35% off! (https://maven.com/parlance-labs/evals?promoCode=GOHUGORGOHOME) https://maven.com/parlance-labs/evals?promoCode=GOHUGORGOHOME 🎓 Learn more: Hugo's course: Building LLM Applications for Data Scientists and Software Engineers (https://maven.com/s/course/d56067f338) — https://maven.com/s/course/d56067f338
    --------  
    1:13:15
  • Episode 59: Patterns and Anti-Patterns For Building with AI
    John Berryman (Arcturus Labs; early GitHub Copilot engineer; co-author of Relevant Search and Prompt Engineering for LLMs) has spent years figuring out what makes AI applications actually work in production. In this episode, he shares the “seven deadly sins” of LLM development — and the practical fixes that keep projects from stalling. From context management to retrieval debugging, John explains the patterns he’s seen succeed, the mistakes to avoid, and why it helps to think of an LLM as an “AI intern” rather than an all-knowing oracle. We talk through: - Why chasing perfect accuracy is a dead end - How to use agents without losing control - Context engineering: fitting the right information in the window - Starting simple instead of over-orchestrating - Separating retrieval from generation in RAG - Splitting complex extractions into smaller checks - Knowing when frameworks help — and when they slow you down A practical guide to avoiding the common traps of LLM development and building systems that actually hold up in production. LINKS: Context Engineering for AI Agents, a free, upcoming lightning lesson from John and Hugo (https://maven.com/p/4485aa/context-engineering-for-ai-agents) The Hidden Simplicity of GenAI Systems, a previous lightning lesson from John and Hugo (https://maven.com/p/a8195d/the-hidden-simplicity-of-gen-ai-systems) Roaming RAG – RAG without the Vector Database, by John (https://arcturus-labs.com/blog/2024/11/21/roaming-rag--rag-without-the-vector-database/) Cut the Chit-Chat with Artifacts, by John (https://arcturus-labs.com/blog/2024/11/11/cut-the-chit-chat-with-artifacts/) Prompt Engineering for LLMs by John and Albert Ziegler (https://amzn.to/4gChsFf) Relevant Search by John and Doug Turnbull (https://amzn.to/3TXmDHk) Arcturus Labs (https://arcturus-labs.com/) Watch the podcast on YouTube (https://youtu.be/mKTQGKIUq8M) Upcoming Events on Luma (https://lu.ma/calendar/cal-8ImWFDQ3IEIxNWk) 🎓 Learn more: Hugo's course (this episode was a guest Q&A from the course): Building LLM Applications for Data Scientists and Software Engineers (https://maven.com/s/course/d56067f338) — https://maven.com/s/course/d56067f338
    --------  
    47:37
  • Episode 58: Building GenAI Systems That Make Business Decisions with Thomas Wiecki (PyMC Labs)
    While most conversations about generative AI focus on chatbots, Thomas Wiecki (PyMC Labs, PyMC) has been building systems that help companies make actual business decisions. In this episode, he shares how Bayesian modeling and synthetic consumers can be combined with LLMs to simulate customer reactions, guide marketing spend, and support strategy. Drawing from his work with Colgate and others, Thomas explains how to scale survey methods with AI, where agents fit into analytics workflows, and what it takes to make these systems reliable. We talk through: Using LLMs as “synthetic consumers” to simulate surveys and test product ideas How Bayesian modeling and causal graphs enable transparent, trustworthy decision-making Building closed-loop systems where AI generates and critiques ideas Guardrails for multi-agent workflows in marketing mix modeling Where generative AI breaks (and how to detect failure modes) The balance between useful models and “correct” models If you’ve ever wondered how to move from flashy prototypes to AI systems that actually inform business strategy, this episode shows what it takes. LINKS: The AI MMM Agent, An AI-Powered Shortcut to Bayesian Marketing Mix Insights (https://www.pymc-labs.com/blog-posts/the-ai-mmm-agent) AI-Powered Decision Making Under Uncertainty Workshop w/ Allen Downey & Chris Fonnesbeck (PyMC Labs) (https://youtube.com/live/2Auc57lxgeU) The Podcast livestream on YouTube (https://youtube.com/live/so4AzEbgSjw?feature=share) Upcoming Events on Luma (https://lu.ma/calendar/cal-8ImWFDQ3IEIxNWk) 🎓 Learn more: Hugo's course: Building LLM Applications for Data Scientists and Software Engineers (https://maven.com/s/course/d56067f338) — https://maven.com/s/course/d56067f338
    --------  
    1:00:45
  • Episode 57: AI Agents and LLM Judges at Scale: Processing Millions of Documents (Without Breaking the Bank)
    While many people talk about “agents,” Shreya Shankar (UC Berkeley) has been building the systems that make them reliable. In this episode, she shares how AI agents and LLM judges can be used to process millions of documents accurately and cheaply. Drawing from work on projects ranging from databases of police misconduct reports to large-scale customer transcripts, Shreya explains the frameworks, error analysis, and guardrails needed to turn flaky LLM outputs into trustworthy pipelines. We talk through: - Treating LLM workflows as ETL pipelines for unstructured text - Error analysis: why you need humans reviewing the first 50–100 traces - Guardrails like retries, validators, and “gleaning” - How LLM judges work — rubrics, pairwise comparisons, and cost trade-offs - Cheap vs. expensive models: when to swap for savings - Where agents fit in (and where they don’t) If you’ve ever wondered how to move beyond unreliable demos, this episode shows how to scale LLMs to millions of documents — without breaking the bank. LINKS Shreya's website (https://www.sh-reya.com/) DocETL, A system for LLM-powered data processing (https://www.docetl.org/) Upcoming Events on Luma (https://lu.ma/calendar/cal-8ImWFDQ3IEIxNWk) Watch the podcast video on YouTube (https://youtu.be/3r_Hsjy85nk) Shreya's AI evals course, which she teaches with Hamel "Evals" Husain (https://maven.com/parlance-labs/evals?promoCode=GOHUGORGOHOME) 🎓 Learn more: Hugo's course: Building LLM Applications for Data Scientists and Software Engineers (https://maven.com/s/course/d56067f338) — https://maven.com/s/course/d56067f338
    --------  
    41:27

Meer Technologie podcasts

Over Vanishing Gradients

A podcast about all things data, brought to you by data scientist Hugo Bowne-Anderson. It's time for more critical conversations about the challenges in our industry in order to build better compasses for the solution space! To this end, this podcast will consist of long-format conversations between Hugo and other people who work broadly in the data science, machine learning, and AI spaces. We'll dive deep into all the moving parts of the data world, so if you're new to the space, you'll have an opportunity to learn from the experts. And if you've been around for a while, you'll find out what's happening in many other parts of the data world.
Podcast website

Luister naar Vanishing Gradients, Lex Fridman Podcast en vele andere podcasts van over de hele wereld met de radio.net-app

Ontvang de gratis radio.net app

  • Zenders en podcasts om te bookmarken
  • Streamen via Wi-Fi of Bluetooth
  • Ondersteunt Carplay & Android Auto
  • Veel andere app-functies
Social
v7.23.9 | © 2007-2025 radio.de GmbH
Generated: 10/16/2025 - 5:42:11 PM