The Architect of Scale: Ion Stoica on Open Source, AI, and the Future of Data
Ion Stoica is a professor of computer science at UC Berkeley, Co-Founder and Executive Chairman of Databricks, and a key architect of the Apache Spark project. Most recently, he’s the Co-Founder of Anyscale, which leverages the open source Ray framework developed in-lab to enable scalable AI workloads, much like Spark revolutionized large-scale data processing.In this episode of The Data T, we chat with Stoica about his illustrious career, how his obsession with solving hard technical problems led him from networking research to peer-to-peer video, Apache Spark, and ultimately Databricks. He recounts turning Spark’s open-source momentum into a successful enterprise business, crediting speed of execution and targeted hiring for the company’s rise and urging founders to move fast and recruit experienced operators early. Stoica warns that tomorrow’s workloads will demand vertically integrated, multi-accelerator systems. Optimistic yet realistic about AI, he sees reliability and “human-in-the-loop” workflows as today’s gating factors and advises data professionals to embrace continuous learning as the industry accelerates.Hosted by Armon Petrossian and Satish Jayanthi, co-founders of Coalesce.Key topics:The origins of Apache Spark and DatabricksCommercializing open source projectsScaling AI infrastructure complexityAdvice for data practitionersResources:About Coalesce: https://coalesce.io/about/Coalesce podcast archive (The Data T): https://coalesce.io/podcast/
--------
35:34
--------
35:34
Model as You Go: A New Take on BI
In this episode of The Data T podcast, Armon Petrossian and Satish Jayanthi sit down with Colin Zima, co-founder and CEO of Omni, to talk about what it takes to reinvent business intelligence. Colin shares his journey from Looker’s early days to building Omni, the lessons learned along the way, and his philosophy of ruthless pragmatism when it comes to data modeling, product development, and AI. The conversation dives deep into the evolving BI landscape, the importance of semantics in AI, and how building trust—whether with customers or stakeholders—can be the real engine of innovation. If you're into the future of data platforms, building in public, or just want to hear what happens when two modern data stack founders compare notes, this one's for you. Main topics: Early startup challenges and reflectionsThe evolving BI landscape The evolution of the Modern Data Stack Perspectives on data modeling The role of semantic layers in AI Hot takes in data analytics Collaboration as a key to success Building trust in data teams Resources: About Omni: https://omni.co/ About Coalesce: https://coalesce.io/about/ Coalesce podcast archive (The Data T): https://coalesce.io/podcast/
--------
41:04
--------
41:04
Data Therapy in the Age of Relentless Innovation and AI
In this episode of The Data T podcast, Armon Petrossian and Satish Jayanthi talk with Nicho Mann, founder and CEO of Stratos Consulting, about the emotional and operational challenges facing data professionals today. Nicho introduces the idea of “data therapy” — a way to talk about the mounting pressure, backlog, and burnout that teams are experiencing as AI hype accelerates. With firsthand stories from the biotech and pharmaceutical space, he explains how many companies are overwhelmed, under-resourced, and unsure how to move forward. The conversation digs into how to focus on what matters, where AI is genuinely helping, and why a more agile, empathetic approach to data work is urgently needed. Main topics: “Data therapy” concept AI: hype and expectations vs. reality Overwhelmed data teams Executive pressure to innovate Automating manual, repetitive tasks Optimizing data frameworks for AI The importance of subject matter expertise How to stay grounded in fast change Resources: About Stratos Consulting: https://stratosconsulting.com/ About Coalesce: https://coalesce.io/about/ The Data T archive: https://coalesce.io/podcast/
--------
33:04
--------
33:04
AI-Driven Data Catalogs
Fresh off the announcement of Coalesce acquiring AI-driven data catalog company CastorDoc, now Coalesce Catalog, co-founders Tristan Mayer and Xavier de Boisredon join the Data T podcast to talk about how it all came together, why his marks a major shift in the industry, how the Modern Data Stack is evolving, and what’s next.The conversation highlights both companies’ shared vision of making data governance simpler, more intuitive, and embedded early in the data lifecycle. The founders break down three categories of AI applications in data—AI-assisted governance, metadata-driven analytics, and enterprise AI use cases—and stress how strong data foundations are essential to enable all of them. Watch the full episode to learn more about the challenges of making data governance accessible and actionable to a broad range of data users, how organizations can solve those challenges by shifting governance left, and more.Key TopicsCastorDoc’s founding storyShifting data governance leftExploring AI use cases in data governance and data catalogsBuilding trust through data strategyThe democratization of data skills and insightsResources➡️ About Coalesce: https://coalesce.io/➡️ Practical Data Modeling Substack: https://practicaldatamodeling.substac...➡️ Coalesce podcast archive: https://coalesce.io/podcast/
--------
45:19
--------
45:19
Data Hot Takes with Joe Reis
In this episode of The Data T, Joe Reis returns to discuss his journey since his last appearance on our podcast. We chat about the lasting impact of his book, Fundamentals of Data Engineering, and his latest focus on data modeling. Data modeling is the focus of Joe’s upcoming book, Practical Data Modeling, which he is writing publicly on his Substack.We also explore broader industry trends, including the hype around AI, the resurgence of data governance, and shifting perspectives on data modeling methodologies. Tune in to hear Joe’s candid takes on the impact of AI-generated code, the new “vibe coding” trend, and the importance of strong communities—both online and in-person—in reshaping professional networking in the data industry.Key TopicsThe resurgence of data modelingAI hype vs. realityData governance comebackCommunity-driven learningThe shift toward practical data frameworksResources➡️ About Coalesce: https://coalesce.io/ ➡️ Practical Data Modeling Substack: https://practicaldatamodeling.substac... ➡️ Coalesce podcast archive: https://coalesce.io/podcast/
Previously known as Coffee with Coalesce, The Data T is a monthly podcast hosted by Armon Petrossian and Satish Jayanthi, co-founders of Coalesce, the data transformation company. Each month, we invite industry experts, entrepreneurs, and executives to spill the tea on the data industry's hottest topics: from data modeling and data ops to AI and LLMs, data engineering trends and predictions, datapreneurship, and more.