Powered by RND
PodcastsTechnologieHow AI Is Built
Luister naar How AI Is Built in de app
Luister naar How AI Is Built in de app
(2.067)(250 021)
Favorieten opslaan
Wekker
Slaaptimer

How AI Is Built

Podcast How AI Is Built
Nicolay Gerold
Real engineers. Real deployments. Zero hype. We interview the top engineers who actually put AI in production. Learn what the best engineers have figured out th...

Beschikbare afleveringen

5 van 50
  • Search in 5 lines of code. Building a search database from first principles | S2 E29
    Modern search is broken. There are too many pieces that are glued together.Vector databases for semantic searchText engines for keywordsRerankers to fix the resultsLLMs to understand queriesMetadata filters for precisionEach piece works well alone.Together, they often become a mess.When you glue these systems together, you create:Data Consistency Gaps Your vector store knows about documents your text engine doesn't. Which is right?Timing Mismatches New content appears in one system before another. Users see different results depending on which path their query takes.Complexity Explosion Every new component doubles your integration points. Three components means three connections. Five means ten.Performance Bottlenecks Each hop between systems adds latency. A 200ms search becomes 800ms after passing through four components.Brittle Chains When one system fails, your entire search breaks. More pieces mean more breaking points.I recently built a system where we had query specific post-filters but the requirement to deliver a fixed number of results to the user.A lot of times, the query had to be run multiple times to achieve the desired amount.So we had an unpredictable latency. A high load on the backend, where some queries hammered the database 10+ times. A relevance cliff, where results 1-6 look great, but the later ones were poor matches.Today on How AI Is Built, we are talking to Marek Galovic from TopK.We talk about how they built a new search database with modern components. "How would search work if we built it today?”Cloud storage is cheap. Compute is fast. Memory is plentiful.One system that handles vectors, text, and filters together - not three systems duct-taped into one.One pass handles everything:Vector search + Text search + Filters → Single sorted result Built with hand-optimized Rust kernels for both x86 and ARM, the system scales to 100M documents with 200ms P99 latency.The goal is to do search in 5 lines of code.Marek Galovic:LinkedInWebsiteTopK WebsiteTopK DocsNicolay Gerold:⁠LinkedIn⁠⁠X (Twitter)00:00 Introduction to TopK and Snowflake Comparison00:35 Architectural Patterns and Custom Formats01:30 Query Execution Engine Explained02:56 Distributed Systems and Rust04:12 Query Execution Process06:56 Custom File Formats for Search11:45 Handling Distributed Queries16:28 Consistency Models and Use Cases26:47 Exploring Database Versioning and Snapshots27:27 Performance Benchmarks: Rust vs. C/C++29:02 Scaling and Latency in Large Datasets29:39 GPU Acceleration and Use Cases31:04 Optimizing Search Relevance and Hybrid Search34:39 Advanced Search Features and Custom Scoring38:43 Future Directions and Research in AI47:11 Takeaways for Building AI Applications
    --------  
    53:29
  • RAG is two things. Prompt Engineering and Search. Keep it Separate | S2 E28
    John Berryman moved from aerospace engineering to search, then to ML and LLMs. His path: Eventbrite search → GitHub code search → data science → GitHub Copilot. He was drawn to more math and ML throughout his career.RAG Explained"RAG is not a thing. RAG is two things." It breaks into:Search - finding relevant informationPrompt engineering - presenting that information to the modelThese should be treated as separate problems to optimize.The Little Red Riding Hood PrincipleWhen prompting LLMs, stay on the path of what models have seen in training. Use formats, structures, and patterns they recognize from their training data:For code, use docstrings and proper formattingFor financial data, use SEC report structuresUse Markdown for better formattingModels respond better to familiar structures.Testing PromptsTesting strategies:Start with "vibe testing" - human evaluation of outputsDevelop systematic tests based on observed failure patternsUse token probabilities to measure model confidenceFor few-shot prompts, watch for diminishing returns as examples increaseManaging Token LimitsWhen designing prompts, divide content into:Static elements (boilerplate, instructions)Dynamic elements (user inputs, context)Prioritize content by:Must-have informationNice-to-have informationOptional if space allowsEven with larger context windows, efficiency remains important for cost and latency.Completion vs. Chat ModelsChat models are winning despite initial concerns about their constraints:Completion models allow more flexibility in document formatChat models are more reliable and aligned with common use casesMost applications now use chat models, even for completion-like tasksApplications: Workflows vs. AssistantsTwo main LLM application patterns:Assistants: Human-in-the-loop interactions where users guide and correctWorkflows: Decomposed tasks where LLMs handle well-defined steps with safeguardsBreaking Down Complex ProblemsTwo approaches:Horizontal: Split into sequential steps with clear inputs/outputsVertical: Divide by case type, with specialized handling for each scenarioExample: For SOX compliance, break horizontally (understand control, find evidence, extract data, compile report) and vertically (different audit types).On AgentsAgents exist on a spectrum from assistants to workflows, characterized by:Having some autonomy to make decisionsUsing tools to interact with the environmentUsually requiring human oversightBest PracticesFor building with LLMs:Start simple: API key + Jupyter notebookBuild prototypes and iterate quicklyAdd evaluation as you scaleKeep users in the loop until models prove reliabilityJohn Berryman:LinkedInX (Twitter)Arcturus LabsPrompt Engineering for LLMs (Book)Nicolay Gerold:⁠LinkedIn⁠⁠X (Twitter)00:00 Introduction to RAG: Retrieval and Generation00:19 Optimizing Retrieval Systems01:11 Introducing John Berryman02:31 John's Journey from Search to Prompt Engineering04:05 Understanding RAG: Search and Prompt Engineering05:39 The Little Red Riding Hood Principle in Prompt Engineering14:14 Balancing Static and Dynamic Elements in Prompts25:52 Assistants vs. Workflows: Choosing the Right Approach30:15 Defining Agency in AI30:35 Spectrum of Assistance and Workflows34:35 Breaking Down Problems Horizontally and Vertically37:57 SOX Compliance Case Study40:56 Integrating LLMs into Existing Applications44:37 Favorite Tools and Missing Features46:37 Exploring Niche Technologies in AI52:52 Key Takeaways and Future Directions
    --------  
    1:02:44
  • Graphs aren't just for specialists anymore. They are one import away | S2 E27
    Kuzu is an embedded graph database that implements Cypher as a library.It can be easily integrated into various environments—from scripts and Android apps to serverless platforms.Its design supports both ephemeral, in-memory graphs (ideal for temporary computations) and large-scale persistent graphs where traditional systems struggle with performance and scalability.Key Architectural Decisions:Columnar Storage:Kuzu stores node and relationship properties in separate, contiguous columns. This design reduces I/O by allowing queries to scan only the needed columns, unlike row-based systems (e.g., Neo4j) that read full records even when only a subset of properties is required.Efficient Join Indexing with CSR:The join index is maintained using a Compressed Sparse Row (CSR) format. By sorting and compressing relationship data, Kuzu ensures that adjacent node relationships are stored contiguously, minimizing random I/O and speeding up traversals.Vectorized Query Processing:Instead of processing one tuple at a time, Kuzu processes blocks (vectors) of tuples. This block-based (or vectorized) approach reduces function-call overhead and improves cache locality, boosting performance for analytic queries.Factorization and ASP Join:For many-to-many queries that can generate enormous intermediate results, Kuzu uses factorization to represent data compactly. Its ASP join algorithm integrates factorization, sequential scanning, and sideways information passing to avoid unnecessary full scans and materializations.Kuzu is optimized for read-heavy, analytic workloads. While batched writes are efficient, the system is less tuned for high-frequency, small transactions. Upcoming features include:A WebAssembly (Wasm) version for running in browsers.Enhanced vector and full-text search indices.Built-in graph data science algorithms for tasks like PageRank and centrality analysis.Kuzu can be a powerful backend for AI applications in several ways:Knowledge Graphs:Store and query complex relationships between entities to support natural language understanding, semantic search, and reasoning tasks.Graph Data Science:Run built-in graph algorithms (like PageRank, centrality, or community detection) that help uncover patterns and insights, which can improve recommendation systems, fraud detection, and other AI-driven analyses.Retrieval-Augmented Generation (RAG):Integrate with large language models by efficiently retrieving relevant, structured graph data. Kuzu’s vector search capabilities and fast query processing make it ideal for augmenting AI responses with contextual information.Graph Embeddings & ML Pipelines:Serve as the foundation for generating graph embeddings, which are used in downstream machine learning tasks—such as clustering, classification, or link prediction—to enhance model performance.Semih Salihoğlu:LinkedInKuzu GitHubKuzu DocsNicolay Gerold:⁠LinkedIn⁠⁠X (Twitter)00:00 Introduction to Graph Databases00:18 Introducing Kuzu: A Modern Graph Database01:48 Use Cases and Applications of Kuzu03:03 Kuzu's Research Origins and Scalability06:18 Columnar Storage vs. Row-Oriented Storage10:27 Query Processing Techniques in Kuzu22:22 Compressed Sparse Row (CSR) Storage27:25 Vectorization in Graph Databases31:24 Optimizing Query Processors with Vectorization33:25 Common Wisdom in Graph Databases35:13 Introducing ASP Join in Kuzu35:55 Factorization and Efficient Query Processing39:49 Challenges and Solutions in Graph Databases45:26 Write Path Optimization in Kuzu54:10 Future Developments in Kuzu57:51 Key Takeaways and Final Thoughts
    --------  
    1:03:35
  • Knowledge Graphs Won't Fix Bad Data | S2 E26
    Metadata is the foundation of any enterprise knowledge graph.By organizing both technical and business metadata, organizations create a “brain” that supports advanced applications like AI-driven data assistants.The goal is to achieve economies of scale—making data reusable, traceable, and ultimately more valuable.Juan Sequeda is a leading expert in enterprise knowledge graphs and metadata management. He has spent years solving the challenges of integrating diverse data sources into coherent, accessible knowledge graphs. As Principal Scientist at data.world, Juan provides concrete strategies for improving data quality, streamlining feature extraction, and enhancing model explainability. If you want to build AI systems on a solid data foundation—one that cuts through the noise and delivers reliable, high-performance insights—you need to listen to Juan’s proven methods and real-world examples.Terms like ontologies, taxonomies, and knowledge graphs aren’t new inventions. Ontologies and taxonomies have been studied for decades—even since ancient Greece. Google popularized “knowledge graphs” in 2012 by building on decades of semantic web research. Despite current buzz, these concepts build on established work.Traditionally, data lives in siloed applications—each with its own relational databases, ETL processes, and dashboards. When cross-application queries and consistent definitions become painful, organizations face metadata management challenges. The first step is to integrate technical metadata (table names, columns, code lineage) into a unified knowledge graph. Then, add business metadata by mapping business glossaries and definitions to that technical layer.A modern data catalog should:Integrate Multiple Sources: Automatically ingest metadata from databases, ETL tools (e.g., dbt, Fivetran), and BI tools.Bridge Technical and Business Views: Link technical definitions (e.g., table “CUST_123”) with business concepts (e.g., “Customer”).Enable Reuse and Governance: Support data discovery, impact analysis, and proper governance while facilitating reuse across teams.Practical Approaches & Use Cases:Start with a Clear Problem: Whether it’s reducing churn, improving operational efficiency, or meeting compliance needs, begin by solving a specific pain point.Iron Thread Method: Follow one query end-to-end—from identifying a business need to tracing it back to source systems—to gradually build and refine the graph.Automation vs. Manual Oversight: Technical metadata extraction is largely automated. For business definitions or text-based entity extraction (e.g., via LLMs), human oversight is key to ensuring accuracy and consistency.Technical Considerations:Entity vs. Property: If you need to attach additional details or reuse an element across contexts, model it as an entity (with a unique identifier). Otherwise, keep it as a simple property.Storage Options: The market offers various graph databases—Neo4j, Amazon Neptune, Cosmos DB, TigerGraph, Apache Jena (for RDF), etc. Future trends point toward multimodel systems that allow querying in SQL, Cypher, or SPARQL over the same underlying data.Juan Sequeda:LinkedIndata.worldSemantic Web for the Working OntologistDesigning and Building Enterprise Knowledge Graphs (before you buy, send Juan a message, he is happy to send you a copy)Catalog & Cocktails (Juan’s podcast)Nicolay Gerold:⁠LinkedIn⁠⁠X (Twitter)00:00 Introduction to Knowledge Graphs 00:45 The Role of Metadata in AI 01:06 Building Knowledge Graphs: First Steps 01:42 Interview with Juan Sequira 02:04 Understanding Buzzwords: Ontologies, Taxonomies, and More 05:05 Challenges and Solutions in Data Management 08:04 Practical Applications of Knowledge Graphs 15:38 Governance and Data Engineering 34:42 Setting the Stage for Data-Driven Problem Solving 34:58 Understanding Consumer Needs and Data Challenges 35:33 Foundations and Advanced Capabilities in Data Management 36:01 The Role of AI and Metadata in Data Maturity 37:56 The Iron Thread Approach to Problem Solving 40:12 Constructing and Utilizing Knowledge Graphs 54:38 Trends and Future Directions in Knowledge Graphs 59:17 Practical Advice for Building Knowledge Graphs
    --------  
    1:10:59
  • Temporal RAG: Embracing Time for Smarter, Reliable Knowledge Graphs | S2 E25
    Daniel Davis is an expert on knowledge graphs. He has a background in risk assessment and complex systems—from aerospace to cybersecurity. Now he is working on “Temporal RAG” in TrustGraph.Time is a critical—but often ignored—dimension in data. Whether it’s threat intelligence, legal contracts, or API documentation, every data point has a temporal context that affects its reliability and usefulness. To manage this, systems must track when data is created, updated, or deleted, and ideally, preserve versions over time.Three Types of Data:Observations:Definition: Measurable, verifiable recordings (e.g., “the hat reads ‘Sunday Running Club’”).Characteristics: Require supporting evidence and may be updated as new data becomes available.Assertions:Definition: Subjective interpretations (e.g., “the hat is greenish”).Characteristics: Involve human judgment and come with confidence levels; they may change over time.Facts:Definition: Immutable, verified information that remains constant.Characteristics: Rare in dynamic environments because most data evolves; serve as the “bedrock” of trust.By clearly categorizing data into these buckets, systems can monitor freshness, detect staleness, and better manage dependencies between components (like code and its documentation).Integrating Temporal Data into Knowledge Graphs:Challenge:Traditional knowledge graphs and schemas (e.g., schema.org) rarely integrate time beyond basic metadata. Long documents may only provide a single timestamp, leaving the context of internal details untracked.Solution:Attach detailed temporal metadata (such as creation, update, and deletion timestamps) during data ingestion. Use versioning to maintain historical context. This allows systems to:Assess whether data is current or stale.Detect conflicts when updates occur.Employ Bayesian methods to adjust trust metrics as more information accumulates.Key Takeaways:Focus on Specialization:Build tools that do one thing well. For example, design a simple yet extensible knowledge graph rather than relying on overly complex ontologies.Integrate Temporal Metadata:Always timestamp data operations and version records. This is key to understanding data freshness and evolution.Adopt Robust Infrastructure:Use scalable, proven technologies to connect specialized modules via APIs. This reduces maintenance overhead compared to systems overloaded with connectors and extra features.Leverage Bayesian Updates:Start with initial trust metrics based on observed data and refine them as new evidence arrives.Mind the Big Picture:Avoid working in isolated silos. Emphasize a holistic system design that maintains in situ context and promotes collaboration across teams.Daniel DavisCognitive CoreTrustGraphYouTubeLinkedInDiscordNicolay Gerold:⁠LinkedIn⁠⁠X (Twitter)00:00 Introduction to Temporal Dimensions in Data 00:53 Timestamping and Versioning Data 01:35 Introducing Daniel Davis and Temporal RAG 01:58 Three Buckets of Data: Observations, Assertions, and Facts 03:22 Dynamic Data and Data Freshness 05:14 Challenges in Integrating Time in Knowledge Graphs 09:41 Defining Observations, Assertions, and Facts 12:57 The Role of Time in Data Trustworthiness 46:58 Chasing White Whales in AI 47:58 The Problem with Feature Overload 48:43 Connector Maintenance Challenges 50:02 The Swiss Army Knife Analogy 51:16 API Meshes and Glue Code 54:14 The Importance of Software Infrastructure 01:00:10 The Need for Specialized Tools 01:13:25 Outro and Future Plans
    --------  
    1:33:44

Meer Technologie podcasts

Over How AI Is Built

Real engineers. Real deployments. Zero hype. We interview the top engineers who actually put AI in production. Learn what the best engineers have figured out through years of experience. Hosted by Nicolay Gerold, CEO of Aisbach and CTO at Proxdeal and Multiply Content.
Podcast website

Luister naar How AI Is Built, Tweakers Podcast en vele andere podcasts van over de hele wereld met de radio.net-app

Ontvang de gratis radio.net app

  • Zenders en podcasts om te bookmarken
  • Streamen via Wi-Fi of Bluetooth
  • Ondersteunt Carplay & Android Auto
  • Veel andere app-functies
Social
v7.11.0 | © 2007-2025 radio.de GmbH
Generated: 3/21/2025 - 11:23:35 AM