Datadog's Diamond Bishop on Building Production AI Agents That Handle Critical Incidents
What happens when you build AI agents trusted enough to handle production incidents while engineers sleep? At Datadog, it sparked a fundamental rethink of how enterprise AI systems earn developer trust in critical infrastructure environments.
Diamond Bishop, Director of Eng/AI, outlines for Ravin how their Bits AI initiative evolved from basic log analysis to sophisticated incident response agents. By focusing first on root cause identification rather than full automation, they're delivering immediate value while building the confidence needed for deeper integration.
But that's just one part of Datadog's systematic approach. From adopting Anthropic's MCP standard for tool interoperability to implementing multi-modal foundation model strategies, they're creating AI systems that can evolve with rapidly improving underlying technologies while maintaining enterprise reliability standards.
Topics discussed:
Defining AI agents as systems with control flow autonomy rather than simple workflow automation or chatbot interfaces.
Building enterprise trust in AI agents through precision-focused evaluation systems that measure performance across specific incident scenarios.
Implementing root cause identification agents that diagnose production issues before engineers wake up during critical outages.
Adopting Anthropic's MCP standard for tool interoperability to enable seamless integration across different agent platforms and environments.
Using LLM-as-judge evaluation methods combined with human alignment scoring to continuously improve agent reliability and performance.
Managing multi-modal foundation model strategies that allow switching between OpenAI, Anthropic, and open-source models based on tasks.
Balancing organizational AI adoption through decentralized experimentation with centralized procurement standards and security compliance oversight.
Developing LLM observability products that cluster errors and provide visibility into token usage and model performance.
Navigating the bitter lesson principle by building evaluation frameworks that can quickly test new foundation models.
Predicting timeline and bottlenecks for AGI development based on current reasoning limitations and architectural research needs.
--------
49:50
UserTesting's Michael Domanic on Driving Enterprise-Wide AI Adoption through Creativity
How do you achieve 75% weekly active AI tool usage across an enterprise? At UserTesting, it started with a deliberate strategy to foster experimental adoption while maintaining governance guardrails. Michael J. Domanic, Head of Generative AI Business Strategy, tells Ravin their approach to transforming internal workflows through AI on this episode of The AI Adoption Playbook.
Rather than treating AI as purely a technical concern, UserTesting has built an impressive culture of AI adoption by empowering cross-functional teams, establishing clear usage guidelines, and meticulously tracking tangible business value. Their 800-person team has created 600 custom GPTs, many of which have transformed workflows across departments.
Michael also explores how UserTesting balanced centralized governance with democratized experimentation, their methods for measuring ROI, and why creativity — not just technical expertise — might be the critical ingredient for successful enterprise AI transformation.
Topics discussed:
Building an enterprise AI strategy that achieved 75% weekly active users through balanced governance and democratized experimentation.
Creating custom GPTs trained on organizational knowledge to streamline processes like OKR development across all company levels.
Establishing cross-functional "AI ambassadors" and regular office hours to drive adoption and showcase successful use cases.
Implementing clear AI usage guidelines that protect sensitive customer data while encouraging internal experimentation and innovation.
Measuring AI ROI by focusing on business outcomes rather than time savings, with metrics tied to specific operational improvements.
Balancing centralized AI governance through a dedicated council with decentralized, department-level experimentation and implementation.
Exploring the differences between AI transformation and traditional digital transformation, including executive-level buy-in from day one.
Discussing the challenges incumbent software companies face when integrating AI versus new AI-native applications built from scratch.
Developing strategies for scaling AI applications from experimentation to production through appropriate security and governance protocols.
Examining the philosophical aspects of AGI development and the importance of creativity rather than technical skills in leading AI transformation.
--------
48:34
incident.io's Lawrence Jones on Building AI That Automatically Investigates Technical Outages
When technical systems fail at companies like Netflix or Etsy, every minute of downtime can cost millions. That's why incident.io is building AI systems that can automatically investigate and diagnose technical problems faster than human engineers.
In this episode of The AI Adoption Playbook, Lawrence Jones, Product Engineer at incident.io, tells Ravin how they're creating an automated incident investigator that can analyze logs, traces, and metrics to determine what went wrong during an outage. He shares their methodical approach to AI development, focusing on measurable progress through evaluation metrics and scorecards rather than intuitive "vibe-based" changes.
Lawrence also discusses the evolution of their AI teams and roles, including their newly launched AI Engineer position designed specifically for the unique challenges of AI development, and how they use LLMs themselves to evaluate AI system performance.
Topics discussed:
Building an AI incident investigator that can automatically analyze logs, traces, and metrics to determine the root cause of technical outages.
Creating comprehensive evaluation frameworks with scorecards and metrics to measure AI performance against historical incident data.
Using LLMs as evaluators to determine if AI responses were helpful by analyzing post-incident conversations and user feedback.
Developing internal tooling that enables teams to rapidly test and improve AI systems while maintaining quality standards.
Evolving from individual "vibe-based" AI development to team-based systematic improvement with clear metrics for success.
Structuring AI engineering roles and teams to balance product engineering skills with specialized AI development knowledge.
Implementing product-focused AI features like chatbots that can help automate routine tasks during incident response.
Leveraging parallel human and AI processes to collect validation data and improve AI system performance over time.
Building versus buying AI evaluation tools and the advantages of custom solutions integrated with existing product data.
Exploring the future of AI in technical operations and whether AI will enhance or replace human roles in incident management.
Listen to more episodes:
Apple
Spotify
YouTube
--------
40:33
Shopify's Spencer Lawrence on Bridging AI Capability and Organizational Impact
What happens when AI capabilities outpace organizational readiness? At Shopify, this tension has pushed them to develop a practical implementation approach that balances rapid experimentation with sustainable value creation.
Spencer Lawrence, Director of Data Science & Engineering, shares how they've evolved from simple text expansion experiments to sophisticated AI assistants like Help Center and Sidekick that are transforming both customer support and merchant operations.
At the heart of their strategy is a barbell approach enabling self-service for small AI use cases while making targeted investments in transformative projects. Spencer also explains how their one-week sprint cycles, sophisticated evaluation frameworks, and cross-functional collaboration have helped them overcome the common challenges that prevent organizations from realizing AI's full potential.
Successful AI implementation requires more than just technical solutions — it demands new organizational structures, evaluation methods, and a willingness to constantly reevaluate what knowledge work means in an AI-augmented world.
Topics discussed:
Shopify's evolution from early text expansion experiments to production-level AI assistants that support both customers and merchants.
Creating sophisticated evaluation frameworks that combine human annotators with LLM judges to ensure quality and consistency of AI outputs.
Implementing a barbell strategy that balances small self-service AI use cases with strategic investments in high-impact projects.
Running one-week sprints across all AI work to maximize iteration cycles and maintain velocity even at enterprise scale.
Addressing the gap between AI capabilities and real-world impact through both technological solutions and organizational change.
Building feedback loops between technical teams and legal/compliance departments to create AI solutions that meet governance requirements.
Fostering a culture that values experimentation while developing clear policies that give employees confidence to innovate responsibly.
Exploring how AI will raise productivity expectations rather than simply reducing workloads across all roles and functions.
Using AI as a strategic thought partner to generate novel ideas and help evaluate different perspectives on complex problems.
Developing a forward-looking perspective on knowledge work that embraces AI augmentation while maintaining human judgment and oversight.
Listen to more episodes:
Apple
Spotify
YouTube
--------
45:35
MongoDB's David Vainchenker on Shipping Fast and Learning from AI Usage Patterns
Forget theoretical planning — MongoDB dove headfirst into AI adoption and let real-world usage guide their strategy. David Vainchenker, Sr. Director of Enterprise Initiatives & Tools at MongoDB, joins Ravin on this episode of The AI Adoption Playbook to share this practical approach and unpack their evolution from simple chatbots to sophisticated agent-based systems.
David shares their practical challenges with measuring AI's business impact, explaining why time savings metrics alone weren't convincing to leadership without translating to actual dollar savings or increased capacity. He also offers candid insights about security concerns, copyright issues with AI-generated code, and the delicate balance between innovation and governance.
Topics discussed:
Why shipping AI tools quickly and learning from actual usage patterns proved more effective than predicting theoretical use cases.
The challenge of translating AI time savings into measurable business impact that resonates with leadership and affects the P&L.
Security and compliance considerations when implementing AI at enterprise scale, including permission-aware retrieval requirements.
Managing the balance between build vs. buy decisions in the fast-evolving AI landscape while ensuring business continuity.
The reality of AI-assisted coding adoption rates varying significantly between junior and senior engineers in large organizations, and the copyright implications of having non-human-generated code.
How MongoDB approaches vertical (specialized) vs. horizontal (platform) AI solutions for different use cases across the enterprise.
The budgeting challenges created when every existing software vendor offers AI capabilities as premium add-ons.
The importance of maintaining cross-system AI capabilities that match human workflows spanning multiple applications.
Listen to more episodes:
Apple
Spotify
YouTube
Website
Welcome to The AI Adoption Playbook—where we explore real-world AI implementations at leading enterprises. Join host Ravin Thambapillai, CEO of Credal.ai, as he unpacks the technical challenges, architectural decisions, and deployment strategies shaping successful AI adoption. Each episode dives deep into concrete use cases with the engineers and ML platform teams making enterprise AI work at scale. Whether you’re building internal AI tools or leading GenAI initiatives, you’ll find actionable insights for moving from proof-of-concept to production.