Powered by RND
PodcastsZaken en persoonlijke financiënLatent Space: The AI Engineer Podcast

Latent Space: The AI Engineer Podcast

swyx + Alessio
Latent Space: The AI Engineer Podcast
Nieuwste aflevering

Beschikbare afleveringen

5 van 142
  • The Future of Notebooks - with Akshay Agrawal of Marimo
    Akshay Agrawal joins us to talk about Marimo and their vision for the future of Python notebooks, and how it’s the perfect canvas for AI-driven data analysis. 0:00 Introduction 0:46 Overview of Marimo and Its Features 2:33 Origin Story and Motivation Behind Marimo 4:26 Demo: Classical Machine Learning with MNIST in Marimo 6:52 Notebook Compatibility and Conversion from Jupyter 7:42 Demo: Interactive Notebook with Custom UI and Layout 10:08 AI-Native Utilities and Code Generation with Language Models 11:36 Dependency Management and Integration with UV Package Manager 13:00 Demo: Data Annotation Workflow Using a PS5 Controller 15:51 Starting from Scratch: Blank Canvas AI Use Cases 18:27 Context Formatting for AI Code Generation 19:54 Chat Interface and Local/Remote Model Support 21:01 WebAssembly Support and MoLab Cloud-Hosted Notebooks 23:21 Future Plans and Breaking Out of Old Notebook Habits 25:40 Running Marimo Notebooks as Scripts or Data Apps 26:44 Exploring AI Agents and Community Contributions 26:56 Call to Action: How to Get Started and Contribute
    --------  
  • Cline: the open source coding agent that doesn't cut costs
    Saoud Rizwan and Pash from Cline joined us to talk about why fast apply models got bitter lesson'd, how they pioneered the plan + act paradigm for coding, and why non-technical people use IDEs to do marketing and generate slides. Full writeup: https://www.latent.space/p/cline X: https://x.com/latentspacepod Chapters: 00:00 - Introductions 01:35 - Plan and Act Paradigm 05:37 - Model Evaluation and Early Development of Cline 08:14 - Use Cases of Cline Beyond Coding 09:09 - Why Cline is a VS Code Extension and Not a Fork 12:07 - Economic Value of Programming Agents 16:07 - Early Adoption for MCPs 19:35 - Local vs Remote MCP Servers 22:10 - Anthropic's Role in MCP Registry 22:49 - Most Popular MCPs and Their Use Cases 25:26 - Challenges and Future of MCP Monetization 27:32 - Security and Trust Issues with MCPs 28:56 - Alternative History Without MCP 29:43 - Market Positioning of Coding Agents and IDE Integration Matrix 32:57 - Visibility and Autonomy in Coding Agents 35:21 - Evolving Definition of Complexity in Programming Tasks 38:16 - Forks of Cline and Open Source Regrets 40:07 - Simplicity vs Complexity in Agent Design 46:33 - How Fast Apply Got Bitter Lesson'd 49:12 - Cline's Business Model and Bring-Your-Own-API-Key Approach 54:18 - Integration with OpenRouter and Enterprise Infrastructure 55:32 - Impact of Declining Model Costs 57:48 - Background Agents and Multi-Agent Systems 1:00:42 - Vision and Multi-Modalities 1:01:07 - State of Context Engineering 1:07:37 - Memory Systems in Coding Agents 1:10:14 - Standardizing Rules Files Across Agent Tools 1:11:16 - Cline's Personality and Anthropomorphization 1:12:55 - Hiring at Cline and Team Culture Chapters 00:00:00 Introduction and Guest Intros 00:00:29 What is Klein? Product Overview 00:01:42 Plan and Act Paradigm 00:05:22 Model Evolution and Building Klein 00:07:40 Beyond Coding: Klein as a General Agent 00:09:12 Why Focus on VS Code Extension? 00:11:26 The Future of Programming and Agentic Paradigm 00:12:34 Economic Value: Programming vs. Other Use Cases 00:16:04 MCP Ecosystem: Growth and Marketplace 00:21:30 Security, Discoverability, and Trust in MCPs 00:22:55 Popular MCPs and Workflow Automation 00:25:30 Monetization and Payments for MCPs 00:37:53 Competition, Forks, and Open Source Philosophy 00:40:39 RAG, Fast Apply, and Agentic Simplicity 00:50:11 Business Model and Enterprise Adoption 00:57:04 Background Agents, Multi-Agent Systems, and CLI 01:00:41 Context Engineering and Memory 01:12:39 Team, Culture, and Closing Thoughts
    --------  
    1:15:43
  • Personalized AI Language Education — with Andrew Hsu, Speak
    Speak (https://speak.com) may not be very well known to native English speakers, but they have come from a slow start in 2016 to emerge as one of the favorite partners of OpenAI, with their Startup Fund leading and joining their Series B and C as one of the new AI-native unicorns, noting that “Speak has the potential to revolutionize not just language learning, but education broadly”. Today we speak with Speak’s CTO, Andrew Hsu, on the journey of building the “3rd generation” of language learning software (with Rosetta Stone being Gen 1, and Duolingo being Gen 2). Speak’s premise is that speech and language models can now do what was previously only possible with human tutors—provide fluent, responsive, and adaptive instruction—and this belief has shaped its product and company strategy since its early days. https://www.linkedin.com/in/adhsu/ https://speak.com One of the most interesting strategic decisions discussed in the episode is Speak’s early focus on South Korea. While counterintuitive for a San Francisco-based startup, the decision was influenced by a combination of market opportunity and founder proximity via a Korean first employee. South Korea’s intense demand for English fluency and a highly competitive education market made it a proving ground for a deeply AI-native product. By succeeding in a market saturated with human-based education solutions, Speak validated its model and built strong product-market fit before expanding to other Asian markets and eventually, globally. The arrival of Whisper and GPT-based LLMs in 2022 marked a turning point for Speak. Suddenly, capabilities that were once theoretical—real-time feedback, semantic understanding, conversational memory—became technically feasible. Speak didn’t pivot, but rather evolved into its second phase: from a supplemental practice tool to a full-featured language tutor. This transition required significant engineering work, including building custom ASR models, managing latency, and integrating real-time APIs for interactive lessons. It also unlocked the possibility of developing voice-first, immersive roleplay experiences and a roadmap to real-time conversational fluency. To scale globally and support many languages, Speak is investing heavily in AI-generated curriculum and content. Instead of manually scripting all lessons, they are building agents and pipelines that can scaffold curriculum, generate lesson content, and adapt pedagogically to the learner. This ties into one of Speak’s most ambitious goals: creating a knowledge graph that captures what a learner knows and can do in a target language, and then adapting the course path accordingly. This level-adjusting tutor model aims to personalize learning at scale and could eventually be applied beyond language learning to any educational domain. Finally, the conversation touches on the broader implications of AI-powered education and the slow real-world adoption of transformative AI technologies. Despite the capabilities of GPT-4 and others, most people’s daily lives haven’t changed dramatically. Speak sees itself as part of the generation of startups that will translate AI’s raw power into tangible consumer value. The company is also a testament to long-term conviction—founded in 2016, it weathered years of slow growth before AI caught up to its vision. Now, with over $50M ARR, a growing B2B arm, and plans to expand across languages and learning domains, Speak represents what AI-native education could look like in the next decade. Chapters 00:00:00 Introductions & Thiel Fellowship Origins 00:02:13 Genesis of Speak: Early Vision & Market Focus 00:03:44 Building the Product: Iterations and Lessons Learned 00:10:59 AI’s Role in Language Learning 00:13:49 Scaling Globally & B2B Expansion 00:16:30 Why Korea? Localizing for Success 00:19:08 Content Creation, The Speak Method, and Engineering Culture 00:23:31 The Impact of Whisper and LLM Advances 00:29:08 AI-Generated Content & Measuring Fluency 00:35:30 Personalization, Dialects, and Pronunciation 00:39:38 Immersive Learning, Multimodality, and Real-Time Voice 00:50:02 Engineering Challenges & Company Culture 00:53:20 Beyond Languages: B2B, Knowledge Graphs, and Broader Learning 00:57:32 Fun Stories, Lessons, and Reflections 01:02:03 Final Thoughts: The Future of AI Learning & Slow Takeoff
    --------  
    1:04:09
  • AI Video Is Eating The World — Olivia and Justine Moore, a16z
    When the first video diffusion models started emerging, they were little more than just “moving pictures” - still frames extended a few seconds in either direction in time. There was a ton of excitement about OpenAI’s Sora on release through 2024, but so far only Sora-lite has been widely released. Meanwhile, other good videogen models like Genmo Mochi, Pika, MiniMax T2V, Tencent Hunyuan Video, and Kuaishou’s Kling have emerged, but the reigning king this year seems to be Google’s Veo 3, which for the first time has added native audio generation into their model capabilities, eliminating the need for a whole class of lipsynching tooling and SFX editing. The rise of Veo 3 unlocks a whole new category of AI Video creators that many of our audience may not have been exposed to, but is undeniably effective and important particularly in the “kids” and “brainrot” segments of the global consumer internet platforms like Tiktok, YouTube and Instagram. By far the best documentarians of these trends for laypeople are Olivia and Justine Moore, both partners at a16z, who not only collate the best examples from all over the web, but dabble in video creation themselves to put theory into practice. We’ve been thinking of dabbling in AI brainrot on a secondary channel for Latent Space, so we wanted to get the braindump from the Moore twins on how to make a Latent Space Brainrot channel. Jump on in! Chapters 00:00:00 Introductions & Guest Welcome 00:00:49 The Rise of Generative Media 00:02:24 AI Video Trends: Italian Brain Rot & Viral Characters 00:05:00 Following Trends & Creating AI Content 00:07:17 Hands-On with AI Video Creation 00:18:36 Monetization & Business of AI Content 00:23:34 Platforms, Models, and the Creator Stack 00:37:22 Native Content vs. Clipping & Going Viral 00:41:52 Prompt Theory & Meta-Trends in AI Creativity 00:47:42 Professional, Commercial, and Platform-Specific AI Video 00:48:57 Wrap-Up & Final Thoughts
    --------  
    49:27
  • Information Theory for Language Models: Jack Morris
    Our last AI PhD grad student feature was Shunyu Yao, who happened to focus on Language Agents for his thesis and immediately went to work on them for OpenAI. Our pick this year is Jack Morris, who bucks the “hot” trends by -not- working on agents, benchmarks, or VS Code forks, but is rather known for his work on the information theoretic understanding of LLMs, starting from embedding models and latent space representations (always close to our heart). Jack is an unusual combination of doing underrated research but somehow still being to explain them well to a mass audience, so we felt this was a good opportunity to do a different kind of episode going through the greatest hits of a high profile AI PhD, and relate them to questions from AI Engineering. Papers and References made AI grad school: https://x.com/jxmnop/status/1933884519557353716A new type of information theory: https://x.com/jxmnop/status/1904238408899101014EmbeddingsText Embeddings Reveal (Almost) As Much As Text: https://arxiv.org/abs/2310.06816Contextual document embeddings https://arxiv.org/abs/2410.02525Harnessing the Universal Geometry of Embeddings: https://arxiv.org/abs/2505.12540Language modelsGPT-style language models memorize 3.6 bits per param: https://x.com/jxmnop/status/1929903028372459909Approximating Language Model Training Data from Weights: https://arxiv.org/abs/2506.15553https://x.com/jxmnop/status/1936044666371146076LLM Inversion"There Are No New Ideas In AI.... Only New Datasets"https://x.com/jxmnop/status/1910087098570338756https://blog.jxmo.io/p/there-are-no-new-ideas-in-ai-onlymisc reference: https://junyanz.github.io/CycleGAN/ — for others hiring AI PhDs, Jack also wanted to shout out his coauthor Zach Nussbaum, his coauthor on Nomic Embed: Training a Reproducible Long Context Text Embedder.
    --------  
    1:18:13

Meer Zaken en persoonlijke financiën podcasts

Over Latent Space: The AI Engineer Podcast

The podcast by and for AI Engineers! In 2024, over 2 million readers and listeners came to Latent Space to hear about news, papers and interviews in Software 3.0. We cover Foundation Models changing every domain in Code Generation, Multimodality, AI Agents, GPU Infra and more, directly from the founders, builders, and thinkers involved in pushing the cutting edge. Striving to give you both the definitive take on the Current Thing down to the first introduction to the tech you'll be using in the next 3 months! We break news and exclusive interviews from OpenAI, Anthropic, Gemini, Meta (Soumith Chintala), Sierra (Bret Taylor), tiny (George Hotz), Databricks/MosaicML (Jon Frankle), Modular (Chris Lattner), Answer.ai (Jeremy Howard), et al. Full show notes always on https://latent.space
Podcast website

Luister naar Latent Space: The AI Engineer Podcast, Doorzetters | met Ruud Hendriks en Richard Bross en vele andere podcasts van over de hele wereld met de radio.net-app

Ontvang de gratis radio.net app

  • Zenders en podcasts om te bookmarken
  • Streamen via Wi-Fi of Bluetooth
  • Ondersteunt Carplay & Android Auto
  • Veel andere app-functies
Social
v7.21.1 | © 2007-2025 radio.de GmbH
Generated: 7/19/2025 - 12:24:55 AM