Knowledge Graph Insights

Larry Swanson

Zaken en persoonlijke financiën Nieuws

Nieuwste aflevering

18 afleveringen

Tara Raafat: Human-Centered Knowledge Graph and Metadata Leadership – Episode 41
15-12-2025 | 30 Min.
Tara Raafat At Bloomberg, Tara Raafat applies her extensive ontology, knowledge graph, and management expertise to create a solid semantic and technical foundation for the enterprise's mission-critical data, information, and knowledge. One of the keys to the success of her knowledge graph projects is her focus on people. She of course employs the best semantic practices and embraces the latest technology, but her knack for engaging the right stakeholders and building the right kinds of teams is arguably what distinguishes her work. We talked about: her history as a knowledge practitioner and metadata strategist the serendipitous intersection of her knowledge work with the needs of new AI systems her view of a knowledge graph as the DNA of enterprise information, a blueprint for systems that manage the growth and evolution of your enterprise's knowledge the importance of human contributions to LLM-augmented ontology and knowledge graph building the people you need to engage to get a knowledge graph project off the ground: executive sponsors, skeptics, enthusiasts, and change-tolerant pioneers the five stars you need on your team to build a successful knowledge graph: ontologists, business people, subject matter experts, engineers, and a KG product owner the importance of balancing the desire for perfect solutions with the pragmatic and practical concerns that ensure business success a productive approach to integrating AI and other tech into your professional work the importance of viewing your knowledge graph as not just another database, but as the very foundation of your enterprise knowledge Tara's bio Dr. Tara Raafat is Head of Metadata and Knowledge Graph Strategy in Bloomberg’s CTO Office, where she leads the development of Bloomberg’s enterprise Knowledge Graph and semantic metadata strategy, aligning it with AI and data integration initiatives to advance next-generation financial intelligence. With over 15 years of expertise in semantic technologies, she has designed knowledge-driven solutions across multiples domains including but not limited to finance, healthcare, industrial symbiosis, and insurance. Before Bloomberg, Tara was Chief Ontologist at Mphasis and co-founded NextAngles™, an AI/semantic platform for regulatory compliance. Tara holds a PhD in Information System Engineering from the UK. She is a strong advocate for humanitarian tech and women in STEM and a frequent speaker at international conferences, where she delivers keynotes, workshops, and tutorials. Connect with Tara online LinkedIn email: traafat at bloomberg dot net Video Here’s the video version of our conversation: https://youtu.be/yw4yWjeixZw Podcast intro transcript This is the Knowledge Graph Insights podcast, episode number 41. As groundbreaking new AI capabilities appear on an almost daily basis, it's tempting to focus on the technology. But advanced AI leaders like Tara Raafat focus as much, if not more, on the human side of the knowledge graph equation. As she guides metadata and knowledge graph strategy at Bloomberg, Tara continues her career-long focus on building the star-shaped teams of humans who design and construct a solid foundation for your enterprise knowledge. Interview transcript Larry: Hi everyone. Welcome to episode number 41 of the Knowledge Graph Insights podcast. I am really excited today to welcome to the show Tara Raafat. She's the head of metadata and knowledge graph strategy at Bloomberg, and a very accomplished ontologist, knowledge graph practitioner. And welcome to the show, Tara. Tell the folks a little bit more about what you're doing these days. Tara: Hi, thank you so much, Larry. I'm super-excited to be here and chatting with you. We always have amazing chats, so I'm looking forward to this one as well. Well, as Larry mentioned, I'm currently working for Bloomberg and I've been in the space of knowledge graphs and ontology and creation for a pretty long time. So I've been in this community, I've seen a lot. And my interest has always been in the application of ontologies and knowledge graphs in industries, and have worked in so many different industries from banking and financial to insurance to medical. So I touched upon a lot of different domains with the application of knowledge graphs. And currently at Bloomberg, I am also leading their metadata strategy and the knowledge graph strategy, so basically semantic metadata. And we're looking over how we are basically connecting all the different data sources and data silos that we have within Bloomberg to make our data ready for all the AI interesting, exciting AI stuff that we're doing. And making sure that we have a great representation of our data. Larry: That's something that comes up all the time in my conversations lately is that people have done this work for years for very good reasons, all those things you just talked about, the importance of this kind of work in finance and insurance and medical fields and things like that. But it turns out that it makes you AI-ready as well. So is that just a happy coincidence or are you doing even more to make your metadata more AI-ready these days? Tara: Yeah. In a sense, you could say happy coincidence, but I think from the very beginning of when you think about ontologies and knowledge graphs, the goal was always to make your data machine-understandable. So whenever people ask me, "You're an ontologist, what does that even mean?" My explanation was always, I take all the information in your head and put it in a way that is machine understandable. So now encoded in that way. So now when we're thinking about the AI era, it's basically we're thinking if AI is operating on our information, on our data, it needs to have the right context and the right knowledge. So it becomes a perfect fit here. So if data is available and ready in your knowledge graph format, it means that it's machine understandable. It has the right context. It has the extra information that an AI system, specifically in the LLM era and generative AI needs in order to make sure that the answering that it's done is more grounded and based in facts, or have a better provenance. And it's more accurate in quality. Larry: Yeah, that's right. You just reminded me, it's not so much serendipity or a happy coincidence. It's like, no, it's just what we do. Because we make things accessible. The whole beauty of this is the- Tara: We knew what's coming, right? The word AI has changed so much. It's the same thing. It just keeps popping up in different contexts, but yeah. Larry: So you're actually a visionary futurist as all of us are in the product. Yeah. In your long experience, one of the things I love most, there's a lot of things I love about your work. I even wrote about it after KGC. I summarized one of your talks, and I think it's on your LinkedIn profile now, you have this great definition of a knowledge graph. And you liken it to a biological concept that I like. So can you talk a little bit about that? Tara: Sure. I see knowledge graph as the DNA of data or DNA of our information. And the reason I started thinking about it that way is when you think about the human DNA, you're literally thinking of the structure and relationship of the organisms and how they operate and how they evolve. So there's a blueprint of their operation and how they would grow and evolve. And for me, that's very similar to when we start creating a knowledge graph representation of our data, because we're again, capturing the structure and relationships between our data. And we're actually encoding the context and the rules that are needed to allow our data to grow and evolve as our business grows and evolves. So there's a very similarity for me there. And it also brings that human touch to this whole concept of knowledge graphs because when I think about knowledge graphs and talking about ontologies, it comes from a philosophical background. And it's a lot more social and human. Tara: And at the end of the day, the foundation of it is how we as humans interpret the world and interpret information. And how then by the use of technology, we encode it, but the interpretation is still very human. So that's why this link for me is actually very interesting. And I think one more thing I would add, which is I do this comparison to also emphasize on the fact that knowledge graphs are not just another database or another data store. So I don't like companies to look at it from that perspective. They really should look at it as the foundation on which their data grows and evolves as their business grows. Larry: Yeah. And that foundational role, it just keeps coming up, again, related to AI a lot, the LLM stuff that I've heard a lot of people talk about the factual foundation for your AI infrastructure and that kind of thing. And again, another one of those things like, yeah, it just happens to be really good at that. And it was purpose built for that from the start. Larry: You mentioned a lot in there, the human element. And that's what I was so enamored of with your talk at KGC and other talks you've done and we've talked about this. And one of the things that, just a quick personal aside, one of the things that drives me nuts about the current AI hype cycle is this idea like, "Oh, we can just get rid of humans. It's great. We'll just have machines instead." I'm like, "Have you not heard..." Every conversation, I've done about 300 different interviews over the years. Every single one of them talks about how it's not technical, it's not procedural or management wisdom. It's always people stuff. It's like change management and working with people. Can you talk about how the people stuff manifests in your work in metadata strategy and knowledge graph construction? I know that's a lot. Tara: Sure.
Alexandre Bertails: The Netflix Unified Data Architecture – Episode 40
03-11-2025 | 31 Min.
Alexandre Bertails At Netflix, Alexandre Bertails and his team have adopted the RDF standard to capture the meaning in their content in a consistent way and generate consistent representations of it for a variety of internal customers. The keys to their system are a Unified Data Architecture (UDA) and a domain modeling language, Upper, that let them quickly and efficiently share complex data projections in the formats that their internal engineering customers need. We talked about: his work at Netflix on the content engineering team, the internal operation that keeps the rest of the business running how their search for "one schema to rule them all" and the need for semantic interoperability led to the creation of the Unified Data Architecture (UDA) the components of Netflix's knowledge graph Upper, their domain modeling language their focus on conceptual RDF, resulting in a system that works more like a virtual knowledge graph his team's decision to "buy RDF" and its standards the challenges of aligning multiple internal teams on ontology-writing standards and how they led to the creation of UDA their two main goals in creating their Upper domain modeling language - to keep it as compact as possible and to support federation the unique nature of Upper and its three essential characteristics - it has to be self-describing, self-referencing, and self-governing their use of SHACL and its role in Upper how his background in computer science and formal logic and his discovery of information science brought him to the RDF world and ultimately to his current role the importance of marketing your work internally and using accessible language to describe it to your stakeholders - for example describing your work as a "domain model" rather than an ontology UDA's ability to permit the automatic distribution of semantically precise data across their business with one click how reading the introduction to the original 1999 RDF specification can help prepare you for the LLM/gen AI era Alexandre's bio Alexandre Bertails is an engineer in Content Engineering at Netflix, where he leads the design of the Upper metamodel and the semantic foundations for UDA (Unified Data Architecture). Connect with Alex online LinkedIn bertails.org Resources mentioned in this interview Model Once, Represent Everywhere: UDA (Unified Data Architecture) at Netflix Resource Description Framework (RDF) Schema Specification (1999) Video Here’s the video version of our conversation: https://youtu.be/DCoEo3rt91M Podcast intro transcript This is the Knowledge Graph Insights podcast, episode number 40. When you're orchestrating data operations for an enormous enterprise like Netflix, you need all of the automation help you can get. Alex Bertails and his content engineering team have adopted the RDF standard to build a domain modeling and data distribution platform that lets them automatically share semantically precise data across their business, in the variety of formats that their internal engineering customers need, often with just one click. Interview transcript Larry: Hi, everyone. Welcome to episode number 40 of the Knowledge Graph Insights podcast. I am really excited today to welcome to the show, Alex Bertails. Alex is a software engineer at Netflix, where he's done some really interesting work. We'll talk more about that later today. But welcome, Alex, tell the folks a little bit more about what you're up to these days. Alex: Hi, everyone. I'm Alex. I'm part of the content engineering side of Netflix. Just to make it more concrete, most people will think about the streaming products, that's not us. We are more on the enterprise side, so essentially the people helping the business being run, so more internal operations. I'm a software engineer. I've been part of the initiative called UDA for a few years now, and we published that blog post a few months ago, and that's what most people want to talk about. Larry: Yeah, it's amazing that the excitement about that post and so many people talking about it. But one thing, I think I inferred it from the article, but I don't recall a real explicit statement of the problem you were trying to solve in that. Can you talk a little bit about the business prerogatives that drove you to create UDA? Alex: Yeah, totally. There was no UDA, there's no clear problem that we had to solve and really people, won't realize that, but we've been thinking about that point for a very long time. Essentially, on the enterprise side, you have to think about lots of teams having to represent the same business concepts, think about movie actor region, but really hundreds of them really, across different systems. It's not necessarily people not agreeing on what a movie is, although it happens, but it's really what is the movie across a GraphQL service, a data mesh source, an Iceberg table, resulting in duplicating efforts and definitions at the end not aligning. A few years ago, we were in search for this one schema kind of concept that would actually rule them all, and that's how we got into domain modeling, and how can we do that kind of domain modeling across all representations? Alex: So there was one part of it. The other part is we needed to enable what's called semantic interoperability. Once we have the ability to talk about concepts and domain models across all of the representations, then the next question is how can we actually move and help our users move in between all of those data representations? There is one thing to remember from the article that's actually in the title, that's that concept of model once, represent everywhere. The core idea with all of that is to say once we've been able to capture a domain model in one place, then we have the ability to project and generate consistent representations. In our case, we are focused on GraphQL, Avro, Java, and SQL. That's what we have today, but we are looking into adding more support for other representations. Larry: Interesting. And I think every enterprise will have its own mix of data structures like that that they're mapping things to. I love the way you use the word, project. I think different people talk about what they do with the end results of such systems. You have two concepts you talk about as you talk about this, the notion of mappings, which we're just talking about with the data stuff, but also that notion of projection. That's sort of like once you've instantiated something out this system, you project it out to the end user. Is that kind of how it works? Alex: Yes, so we do use the term, projection, in the more mathematical sense, and more people would call that denotations. So essentially, once you have a domain model, and you can reason about it, and we have actually, a formal representation of the domain models, maybe we'll talk about that a little bit later. But then you can actually define how it's supposed to look like, the exact same thing with the same data semantics, but as an API, for example, in GraphQL, or as a data product in Iceberg, in the data warehouse, or as a low-compacted Kafka topic in our data mesh infrastructure as Avro. So for us, we have to make sure that it's quote, unquote, "the same thing," regardless of the data representation that the user is actually interested in. Alex: To put everything together, you talked about the mappings, what's really interesting for us is that the mappings are just one of the three main components that we have in our knowledge graph, because at the end of the day, UDA at its core is really a knowledge graph which is made out of the domain models. We've talked about that. Then the mappings, the mappings are themselves objects in that knowledge graph, and they are here actually to connect the world of concepts from the domain models through the worlds of data containers, which in our case could represent things like an Iceberg table, so we would want to know the coordinates on the Iceberg table and we would want to know the schema. But that applies as well to the data mesh source abstraction and the Avro schema that goes with it. Alex: That would apply as well, and that's a tricky part that very few people actually try to solve, but that would apply to the GraphQL APIs. We want to be able to say and know, oh, there is a type resolver for that GraphQL type that exists in that domain graph service and it's located exactly over there. So that's the kind of granularity that we actually capture in the knowledge graph. Larry: Very cool. And this is the Knowledge Graph Insights podcast, which is how we ended up talking about this. But that notion of the models, and then the mappings, and then the data containers that actually have everything, I'm just trying to get my head around the scale of this knowledge graph. You said this is not just, but you tease it out, it doesn't have to do with the streaming services or the customer facing part of the business, it's just about your kind of content and data media assets that you need to manage on the back end. Are you sort of an internal service? Is that how it's conceived or? Alex: That's a good question. So we are not so much into the binary data. That's not at all what UDA is about. Again, it's knowledge graph podcast, for sure, but even more precisely, when we say knowledge graph, we really mean conceptual RDF and we are very, very clear about that. That means for us, quite a few things. The knowledge graph, in our case, needs to be able to capture the data wherever it lives. We do not want necessarily to be RDF all the way through, but at the very core of it, there is a lot of RDF. I'm trying to remember how we talk about it. But yeah, so think about a graph representation of connected data. And again, it has to work across all of the data representations,
Torrey Podmajersky: Aligning Language and Meaning in Complex Systems – Episode 39
12-10-2025 | 32 Min.
Torrey Podmajersky Torrey Podmajersky is uniquely well-prepared to help digital teams align on language and meaning. Her father's interest in philosophy led her to an early intellectual journey into semantics, and her work as a UX writer at companies like Google and Microsoft has attuned her to the need to discover and convey precise meaning in complex digital experiences. This helps her span the "semantic gaps" that emerge when diverse groups of stakeholders use different language to describe similar things. We talked about: her work as president at her consultancy, Catbird Content, and as the author of two UX books how her father's interest in philosophy and semantics led her to believe that everyone routinely thinks about what things mean and how to represent meaning the role of community and collaboration in crafting the language that conveys meaning how the educational concept of "prelecting" facilitates crafting shared-meaning experiences the importance of understanding how to discern and account for implicit knowledge in experience design how she identifies "semantic gaps" in the language that various stakeholders use her discovery, and immediate fascination with, the Cyc project and its impact on her semantic design work her take on the fundamental differences between how humans and LLMs create content Torrey's bio Torrey Podmajersky helps teams solve business and customer problems using UX and content at Google, OfferUp, Microsoft, and clients of Catbird Content. She wrote Strategic Writing for UX, is co-authoring UX Skills for Business Strategy, hosts the Button Conference, and teaches content, UX, and other topics at schools and conferences in North America and Europe. Connect with Torrey online LinkedIn Catbird Content (newsletter sign-up) Torrey's Books Strategic Writing for UX UX Skills for Business Strategy Resources mentioned in this interview Cyc project Button Conference UX Methods.org Video Here’s the video version of our conversation: https://youtu.be/0GLpW9gAsG0 Podcast intro transcript This is the Knowledge Graph Insights podcast, episode number 39. Finding the right language to describe how groups of people agree on the meaning of the things they're working with is hard. Torrey Podmajersky is uniquely well-prepared to meet this challenge. She was raised in a home where where it was common to have philosophical discussions about semantics over dinner. More recently, she's worked as a designer at tech companies like Google, collaborating with diverse teams to find and share the meaning in complex systems. Interview transcript Larry: Hi everyone. Welcome to episode number 39 of the Knowledge Graph Insights podcast. I am really delighted today to welcome to the show Torrey Podmajersky. I've known Torrey for years from the content world, the UX design and content design and UX writing and all those worlds. I used to live very closer to her office in Seattle, but Torrey's currently the president at Catbird Content, her consultancy, and she's guest faculty at the University of Washington iSchool. She does all kinds of interesting stuff, very accomplished author. So welcome Torrey. Tell the folks a little bit more about what you're up to and where all the books are at these days. Torrey: Thanks so much, Larry. I am up to my neck in finishing the books right now. So one just came out the second edition of Strategic Writing for UX that has a brand new chapter on building LLMs into products and updates throughout, of course since it came out six years ago. But I'm also working on the final manuscript with twoTorrey Podmajersky co-authors for UX Skills for Business Strategy. That'll be a wine pairing guide, a deep reference book that connects the business impact that you might want to make, whether you're a UX pro or a PM or a knowledge graph enthusiast working somewhere in product and connecting it to the UX skills you might want to use to make those impacts. Larry: Excellent. I can't wait to read both of those. I love the first edition of the Strategic Writing for UX book, but... Hey, I want to talk today though about, this is the Knowledge Graph Insights podcast, and you recently did this great post and we'll talk more about it in detail in a bit about how you had discovered the Cyc project, which is a real pioneering project in the semantic technology field and really foundational to a lot of the knowledge graph stuff that's happening today. But I want to start with one of the other things we talked about before we went on the air was your observation of the kind of common philosophical roots that we have in rhetoric, maybe not necessarily rhetoric, but the stuff that we do as word nerds, as meaning nerds, as all these different kinds of technology nerds that we are. Tell me a little bit about what you meant because you just hinted that and I was like, oh, good philosophy. I love philosophy. Torrey: Yeah, I love philosophy too, especially through my dad. My dad was a philosophy major at Haverford College and it has deeply influenced his life and his work in semantic knowledge spaces. And I got to grow up in that context thinking that everybody thought deeply about what things meant and how we represent those meanings. I mean, the Plato's Allegory of the Cave was my bedtime story to the extent that we all knew Plato in the cave, geez, dad, just fine. Plato in the cave. We don't really know anything. All we have is facsimiles and representations of meaning and representations of reality, and through that we construct meaning. And I feel like that's all we're ever doing is using language to construct meaning based on our inability to fully perceive reality. Larry: And just for folks who aren't familiar, I love Plato's Allegory of the Cave. It's these poor people chained to a wall and behind them is a projector projecting stuff on the wall in front of them. So all they see is this projection of an imitation of reality, which is much like what we're doing with either both UX writing and I think ontology design and semantic engineering. So that's the perfect analogy to come into this. But your job for the last, I don't know, because you made the transition from teaching to Xbox, what? 10, 12 years ago or something like that? Torrey: In 2010, I joined Xbox and before that I had a short stint in internal communications in a division at Microsoft working for a VP there. Larry: But you've been in the word biz and the meaning biz for a long time because UX writing is, how did you say it? You have to convey meaning. That's the whole point of UX writing is to just get past random words to actually, what are we talking about here? Torrey: It's to make the words that people understand so quickly while they're in an experience, they're just trying to use it. They're not there to read. So we want the words to disappear into ephemeral meaning in their head that they don't even remember. They just knew what to do and which button to press and where to go next to get done what they wanted to get done. Larry: And one of the things about that is getting to that language to do that in an experience, that's a team sport. One of the other things that really struck me about that post you did was the role of community in language and meaning. Talk a little bit about that. Torrey: Yeah, it is a team sport because in general, even if it's the person doing the UX writing or that content design is also the product designer is also the interaction designer. What they're trying to do is take a wide variety of people who might be using this product that might be an incredibly diverse set of people, or it might be a very narrow set of people, let's say all IT pros. We want to sell this product to big corporations that have IT pros that want to manage their data centers. It's a pretty narrow slice of humans, but it's still hugely diverse in terms of from what language they're speaking and what kind of resources they have inside this company to the kind of background they have, to all of the different reasons they might need to manage their data centers right now. Torrey: From, hey, something new came online or there needs to be a new partition or new admin management of access to it or security patch updates to things like, oh, there was an earthquake at a data center and I need to and secure and audit any damage that might've happened. So there's a huge number of reasons. Let me back up of that deep analogy. There's a huge number of reasons even for a tiny population relative to the scope of humanity, a small population doing a relatively well-defined job still has a huge number of reasons they might need to be in an interface doing a thing. And what we have to do when we are designing the content for that and designing the experience itself is anticipate those and try and make sure that we've indicated that whatever reason they're coming there for, if it's a valid reason to use this piece of software, whatever reason they're coming there for, they see it reflected in the text and they understand what to do. Torrey: That is a team sport because I can't, and no individual person can anticipate all of those things simultaneously. We need to think them through sequentially. We need data to base it on. We need to understand, we need to hear from people who will use it or people who would use it to hear about how they think about it and specifically what language do they use, what's already in their head that we can use to reflect on that screen. So it's about understanding that space well enough, coming to understand that space well enough by communicating with other humans to know what are the right things to represent and in what hierarchy or embeddedness or relationalness, and then use some grammar and punctuation and other tricks up our language sleeves. Larry: Yeah, no.
Casey Hart: The Philosophical Foundations of Ontology Practice – Episode 38
20-8-2025 | 39 Min.
Casey Hart Ontology engineering has its roots in the idea of ontology as defined by classical philosophers. Casey Hart sees many other connections between professional ontology practice and the academic discipline of philosophy and shows how concepts like epistemology, metaphysics, and rhetoric are relevant to both knowledge graphs and AI technology in general. We talked about: his work as a lead ontologist at Ford and as an ontology consultant his academic background in philosophy the variety of pathways into ontology practice the philosophical principles like metaphysics, epistemology, and logic that inform the practice of ontology his history with the the Cyc project and employment at Cycorp how he re-uses classes like "category" and similar concepts from upper ontologies like gist his definition of "AI" - including his assertion that we should use term to talk about a practice, not a particular technology his reminder that ontologies are models and like all models can oversimplify reality Casey's bio Casey Hart is the lead ontologist for Ford, runs an ontology consultancy, and pilots a growing YouTube channel. He is enthusiastic about philosophy and ontology evangelism. After earning his PhD in philosophy from the University of Wisconsin-Madison (specializing in epistemology and the philosophy of science), he found himself in the private sector at Cycorp. Along his professional career, he has worked in several domains: healthcare, oil & gas, automotive, climate science, agriculture, and retail, among others. Casey believes strongly that ontology should be fun, accessible, resemble what is being modelled, and just as complex as it needs to be. He lives in the Pacific Northwest with his wife and three daughters and a few farm animals. Connect with Casey online LinkedIn ontologyexplained at gmail dot com Ontology Explained YouTube channel Video Here’s the video version of our conversation: https://youtu.be/siqwNncPPBw Podcast intro transcript This is the Knowledge Graph Insights podcast, episode number 38. When the subject of philosophy comes up in relation to ontology practice, it's typically cited as the origin of the term, and then the subject is dropped. Casey Hart sees many other connections between ontology practice and it its philosophical roots. In addition to logic as the foundation of OWL, he shows how philosophy concepts like epistemology, metaphysics, and rhetoric are relevant to both knowledge graphs and AI technology in general. Interview transcript Larry: Hi, everyone. Welcome to episode number 38 of the Knowledge Graph Insights podcast. I am really delighted today to welcome to the show Casey Hart. Casey has a really cool YouTube channel on the philosophy behind ontology engineering and ontology practice. Casey is currently an ontologist at Ford, the motor car company. So welcome Casey, tell the folks a little bit more about what you're up to these days. Casey: Hi. Thanks, Larry. I'm super excited to be here. I've listened to the podcast, and man, your intro sounds so smooth. I was like, "I wonder how many edits that takes." No, you just fire them off, that's beautiful. Casey: Yeah, so like you said, these days I'm the ontologist at Ford, so building out data models for sensor data and vehicle information, all those sorts of fun things. I am also working as a consultant. I've got a couple of different startup healthcare companies and some cybersecurity stuff, little things around the edge. I love evangelizing ontology, talking about it and thinking about it. And as you mentioned for the YouTube channel, that's been my creative outlet. My background is in philosophy and I was interested in, I got my PhD in philosophy, I was going to teach it. You write lots of papers, those sorts of things, and I miss that to some extent getting out into industry, and that's been my way back in to, all right, come up with an idea, try and distill it, think about objections, put it together, and so I'm really enjoying that lately. Larry: And I'm enjoying the video- Casey: Glad to be on the show. Larry: Yeah, no, I really appreciate what you're doing there. One thing I wanted to, and I love that that's how you're getting back to both your philosophical roots, but also part of it is to evangelize ontology practice, which is that's what this podcast is all about, democratizing and sharing practice. But I think, and I just love that you have this explicit and strong philosophical foundation and bent to how you talk about things. I think a lot of times that conversation is like, "Yeah, ontology comes out of philosophy," and that's the end of the conversation. But you've mentioned the role of metaphysics, epistemology, logic, all of which, can you talk a little bit about how those, beyond just I think a lot of people think about logic and OWL and all that stuff, but can you talk a little bit more about the role of metaphysics and epistemology and these other philosophical ideas? Casey: Yeah, definitely. You mentioned this in the pre-notes, "Here's a topic we'd like to get to," and I got into a lot of imposter syndrome on this, right? I'm trying to talk myself out of this, but I think most ontologists have this feeling there's no solid easy pipeline into becoming an ontologist, right? It's a very eclectic group of us. My background's in philosophy, you run into a bunch of librarians, you've got computer scientists who do DB administration, you've got jazz musicians I've run into, it's a weird group. Casey: I say that just to be, sometimes when I get asked about, "Okay, how does ontological practice work?" I think, well, I didn't actually train to be an ontologist. I fell into it, so I'm ill-equipped to say things about what role ontology or philosophy plays in ontology. Casey: I just know I learned philosophy, and then I'm using some of those tools here, so there's two different answers. One is historically, how does philosophy inform and shape the nature of ontology practice? And the other part is just, okay, if you've got a philosophical toolkit of metaphysics and epistemology and logic, how does that apply and make you a better, I mean, the obvious connection is that ontology is a philosophical term. It comes from metaphysics. We look back to Aristotle, and it's the study of that which exists, so do we want to say there's fundamentally fire, air, earth, water or something like that? Or fundamentally, there are these atoms and those are the sorts of things that are part of the inventory of reality. It's not physics, it's metaphysics. It's the thing that in I think for Aristotle is just, it's the book that sits next to his physics in all of his category, in his library of everything. Casey: But when we move that forward to computer science and data modeling, then we're thinking, okay, maybe not for all of reality, although maybe it depends on how big you want your data model to be. But if I'm a retailer, what are the terms and ontology, what are the terms that I care about, the things that I need to model the constituents of reality that matter to me? That might be types, if you're Amazon, it's okay, medium-sized dry goods versus sporting equipment versus something else. If I'm doing a medical ontology, it's patients and payers and providers, et cetera. In philosophy, in ontology, there's a bunch of different tools and examples, but we think about, okay, what are some fundamental distinctions that we want to make? How can we carve nature at its joints in really sensible ways? That's a phrase that you'll hear a lot. We could say more about it if you want. Casey: But what I found is being a philosopher goes into an ontology space is that I have this inventory of examples from all of my grad seminars and various things that I'm looking through and going through whether I want to talk about gavagai and undetached rabbit parts, if that makes sense to anybody, or whether I want to talk about grue as a color, here are some examples, ways that we can chop up the world in unnatural ways versus chopping it up in natural ways and how do we make those distinctions? That applies straightforwardly when you get into building an ontology model for an oil and gas industry or something like that. There's a bunch of ways that we can divvy up all the things you care about, what's the right and sensible way to do it? Casey: I guess that's the metaphysics, ontology way. Logic you mentioned, right? We need to think about reasoning. I don't just want to assert a bunch of things about my data. A fundamental premise of an ontology is that we want to understand our data, we want to confer meaning on it, and that means that we have to be able to leverage the structure of the ontology to infer things smartly. Simple things like set containment are fine if all persons are animals, and then we say something about animals, they're creatures. Then when I say that persons are a subclass of that, then I get for free that persons are spatio-temporal things as well. But we get a lot more complicated inferences as we go. We have to think about statistical reasoning. Just in general, if logic is the study of what makes for good arguments, what follows from what, that's obviously got a lot of applications in ontology, AI. Casey: And then the third piece that we talked about is epistemology. Epistemology is the study of knowledge and belief, roughly about what it means to be justified. The classic example there is, if I know something, what exactly does that amount to? And then Plato says it's justified true beliefs. And then the history of epistemology is littered with examples of trying to cash out exactly what does it mean to be justified. And if you get new information, how can that undercut your justifications? How do you update your beliefs? Casey: More recent stuff, and this is what I did in my dissertation,
Chris Mungall: Collaborative Knowledge Graphs in the Life Sciences – Episode 37
04-8-2025 | 32 Min.
Chris Mungall Capturing knowledge in the life sciences is a huge undertaking. The scope of the field extends from the atomic level up to planetary-scale ecosystems, and a wide variety of disciplines collaborate on the research. Chris Mungall and his colleagues at the Berkeley Lab tackle this knowledge-management challenge with well-honed collaborative methods and AI-augmented computational tooling that streamlines the organization of these precious scientific discoveries. We talked about: his biosciences and genetics work at the Berkeley Lab how the complexity and the volume of biological data he works with led to his use of knowledge graphs his early background in AI his contributions to the gene ontology the unique role of bio-curators, non-semantic-tech biologists, in the biological ontology community the diverse range of collaborators involved in building knowledge graphs in the life sciences the variety of collaborative working styles that groups of bio-creators and ontologists have created some key lessons learned in his long history of working on large-scale, collaborative ontologies, key among them, meeting people where they are some of the facilitation methods used in his work, tools like GitHub, for example his group's decision early on to commit to version tracking, making change-tracking an entity in their technical infrastructure how he surfaces and manages the tacit assumptions that diverse collaborators bring to ontology projects how he's using AI and agentic technology in his ontology practice how their decision to adopt versioning early on has enabled them to more easily develop benchmarks and evaluations some of the successes he's had using AI in his knowledge graph work, for example, code refactoring, provenance tracking, and repairing broken links Chris's bio Chris Mungall is Department Head of Biosystems Data Science at Lawrence Berkeley National Laboratory. His research interests center around the capture, computational integration, and dissemination of biological research data, and the development of methods for using this data to elucidate biological mechanisms underpinning the health of humans and of the planet. He is particularly interested in developing and applying knowledge-based AI methods, particularly Knowledge Graphs (KGs) as an approach for integrating and reasoning over multiple types of data. Dr. Mungall and his team have led the creation of key biological ontologies for the integration of resources covering gene function, anatomy, phenotypes and the environment. He is a principal investigator on major projects such as the Gene Ontology (GO) Consortium, the Monarch Initiative, the NCATS Biomedical Data Translator, and the National Microbiome Data Collaborative project. Connect with Chris online LinkedIn Berkeley Lab Video Here’s the video version of our conversation: https://youtu.be/HMXKFQgjo5E Podcast intro transcript This is the Knowledge Graph Insights podcast, episode number 37. The span of the life sciences extends from the atomic level up to planetary ecosystems. Combine this scale and complexity with the variety of collaborators who manage information about the field, and you end up with a huge knowledge-management challenge. Chris Mungall and his colleagues have developed collaborative methods and computational tooling that enable the construction of ontologies and knowledge graphs that capture this crucial scientific knowledge. Interview transcript Larry: Hi everyone. Welcome to episode number 37 of the Knowledge Graph Insights podcast. I am really delighted today to welcome to the show Chris Mungall. Chris is a computational scientist working in the biosciences at the Lawrence Berkeley National Laboratory. Many people just call it the Berkeley Lab. He's the principal investigator in a group there, has his own lab working on a bunch of interesting stuff, which we're going to talk about today. So welcome, Chris, tell the folks a little bit more about what you're up to these days. Chris: Hi, Larry. It's great to be here. Yeah, so as you said, I'm here at Berkeley Lab. We're located in the Bay Area. We're just above UC Berkeley campus. We have a nice view of the San Francisco Bay looking into San Francisco, and so we're a national lab, so we're part of the Department of Energy National Lab system, and we have multiple different areas here in the lab looking at different aspects of science from physics, energy technologies, material science. I'm in the biosciences area, so we are really interested in how we can advance biological science in areas relevant to national scale challenges really in different areas like energy, the environment, health and bio-manufacturing. Chris: My own particular research is really focused on the role of genes and in particular the role of genes in complex systems. So this could be the genes that we have in our own cells, the genes in human beings, how they all work together to hopefully create a healthy human being. One part of my research also looks at the role of genes in the environment, and in particular the role of genes inside tiny old microbes that you'll find in the ocean water and in the soil. And how these genes all work together, both to help drive these microbial systems, help them work together and how they all work together really to drive ecosystems and biogeochemical cycles. Chris: So I think the overall aim is really just to get a picture of these genes and how they interact in these kind of complex systems and build up models of complex systems from scales right the way from atoms through the way through to organisms and indeed all the way to earth-scale systems. So my work is all computational. I don't have a wet lab. So one thing that we realized early on is just when you are sequencing these genomes and trying to interpret the genes, you're generating a lot of information and you need to be able to organize that somehow. And so that's how we arrived at working on knowledge graphs, basically to assemble all of this information together and to be able to use it in algorithms to help us interpret biological data and help us figure out the role of genes in these organisms. Larry: Yeah, many of the people I've talked to on this podcast, they come out of the semantic technology world and apply it in some place or another. It sounds like you came to this world because of the need to work with all the data you've got. What was your learning curve? Was it just another thing in your computational toolkit? Chris: Yeah, in some ways. In fact, my background is, if you go back far enough, my original background is more on the computational side and my undergrad was in AI, but this is back when AI meant good old-fashioned AI and symbolic reasoning and developing Prolog rules to reason about the world and so on. And at that time, I wasn't so interested in that side of AI. I really wanted to push forward with some of the more nascent neural network type approaches. But in those days, we didn't really have the computational power and I thought, "Well, maybe I really need to, I actually learned something about biological systems before trying to simulate them." So that's how I got involved in genomics. This was around about the time of just before the sequencing of the human genome, and I just got really interested in this area, a position came up here at Lawrence Berkeley National Laboratory, and I just got really involved in analyzing some of these genomes. Chris: And in doing this, I came across this project called the Gene Ontology that was developed by some of my colleagues originally in Cambridge and at Lawrence Berkeley National Laboratory. And the goal here was really as we were sequencing these genomes and we were figuring out there's 20,000 genes in the human genome, we discovered we had no way to really categorize what the functions of these different genes were. And if you think about it, there's multiple different ways that you can describe the function of any kind of machine, whether it's a molecular machine inside one of your cells or your car or your iPhone or whatever. You can describe it in terms of what the intent of that machine is. You can describe it in terms of where that machine is localized and what it does, and how that machine works as part of a larger ensemble of machines to achieve some larger objective. Chris: So my colleagues came up with this thing called the gene ontology, and I looked at that and I said, "Hey, I've got this background in symbolic reasoning and good old-fashioned AI. Maybe I could play a role in helping organize all of this information and figuring out ways to connect it together as part of a larger graph." We didn't call them knowledge graphs at this time, but we're essentially building knowledge graphs at the time and make use of, in those days quite early semantic web technologies. This is even before the development of all the web ontology language, but there was still this notion that we could use, we could use rules in combination with graphs to make inferences about things. And I thought, "Well, this seems like an ideal opportunity to apply some of this technology." Larry: That's interesting. It's funny we didn't plan this, but the episode right before you in the queue was of my friend Emeka Okoye. He's a guy who was building knowledge graphs in the late '90s, early 2000s, mostly the early 2000s before the term had been coined, and I think maybe even before a lot of the RDF and OWL and all that stuff was there. So you mentioned Prolog earlier, and what was your toolkit then, and how has it evolved up to the present? That's a huge question. Yeah. Chris: I didn't mean to get into my whole early days with Prolog. Yeah, I've definitely had some interest in applying a lot of these logic programming technologies. As you're aware,