Discover
Knowledge Graph Insights
42 Episodes
Reverse
Jim Hendler
As the World Wide Web emerged in the late 1990s, AI experts like Jim Hendler spotted an opportunity to imbue in the new medium, in a scale-able way, knowledge about the information on the web along with its simple representation as content.
With his colleagues Tim Berners-Lee, the inventor of the web, and Ora Lasilla, an early expert on AI agents, Jim set out their vision in the famous "Semantic Web" article for the May 2001 issue of Scientific American magazine.
Since then, semantic web implementations have blossomed, deployed in virtually every large enterprise on the planet and adding meaning to the web by appearing in the majority of pages on the internet.
We talked about:
his academic and administrative history at the University of Maryland, Rensselaer Polytechnic Institute, and DARPA
the origins of his assertion that "a little semantics goes a long way"
his early thinking on the role of memory in AI and its connections to knowledge representation and to SHOE, the first semantic web language
his goal to scale up knowledge representation in his work as a grant administrator at DARPA
how different departments in the US Air Force used different language to describe airplanes
the origins and development of his relationship with Tim Berners-Lee and how his use of URLs in SHOE caused it to click
how he and Berners-Lee brought Ora Lassila into the semantic web article
how his and Berners-Lee's shared interest in scale contributed to the "a little semantics goes a long way" idea
why he lives in awe of Tim Berners-Lee
Berners-Lee's insight that a scaleable web needed the 404 error code
how including an inverse functionality property like in a relational database would have ruined the semantic web
how they came to open the Scientific American paper with an anecdote about agents
his early involvement in the AI agent community along with Ora Lassila
their shared conviction of the foundational importance of interoperability in their conception of the semantic web
how the lack of interoperability between big internet players now is part of the reason for the inability to fully execute on the agent version they set out in the SciAm article
the impact of LLMs on the semantic web
early examples of semantic web linked data interoperability
Google's reclamation of the term "knowledge graph"
the reason that the shape of the semantic web was always in their mind a graph
how the growth of enterprise data led to their adoption of semantic web technology
how the answer to so many modern AI questions is, "knowledge"
Jim's bio
James Hendler is the Tetherless World Professor of Computer, Web and Cognitive Sciences at RPI where he also serves as a special academic advisor to the Provost and the Head of the Cognitive Science Department. He also serves as a member of the Board, and former chair of the UK’s charitable Web Science Trust. Hendler is a long-time researcher in the widespread use of experimental AI techniques including semantics on the Web, scientific data integration, and data policy in government. One of the originators of the Semantic Web, he has authored over 500 books, technical papers, and articles in the areas of Open Data, the Semantic Web, AI, and data policy and governance. He is the former Chief Scientist of the Information Systems Office at the US Defense Advanced Research Projects Agency (DARPA) and was awarded a US Air Force Exceptional Civilian Service Medal in 2002. In 2010, Hendler was selected as an “Internet Web Expert” by the US government, helping in the development and launch of the US data.gov open data website and from 2015 to 2024 served as an advisor to DHS and DoE board. From 2021-2024 he served as chair of the ACM’s global Technology Policy Council. Hendler is a Fellow of the AAAI, AAIA, AAAS, ACM, BCS, IEEE and the US National Academy of Public Administration. In 2025, Hendler was awarded the Feigenbaum Prize by the Association for the Advancement of Artificial Intelligence, recognizing a “sustained record of high-impact seminal contributions to experimental AI research.”
Connect with Jim online
RPI faculty page
People and resources mentioned in this interview
Tim Berners-Lee
Ora Lassila
Deb McGuinness
The Semantic Web, Scientific American, May 2001
Introducing the Knowledge Graph: things, not strings
Massively Parallel Artificial Intelligence paper
Attention Is all You Need paper
Vision conference
Is There An Agent in Your Future? article
"And then a miracle occurs" cartoon
Jim's SHOE (simple HTML ontology extensions) t-shirt
Video
Here’s the video version of our conversation:
https://youtu.be/DpQki6Y0zx0
Podcast intro transcript
This is the Knowledge Graph Insights podcast, episode number 43. Twenty-five years ago, as AI experts like Jim Hendler navigated the new World Wide Web, they saw an opportunity to imbue in the medium, in a scale-able way, more knowledge than was included in the text on web pages. Jim combined forces with the web's inventor, Tim Berners-Lee, and their mutual friend Ora Lasilla, an expert on AI agents, to set out their vision in the now-famous "Semantic Web" article for Scientific American magazine. The rest, as they say, is history.
Interview transcript
Larry:
Hi everyone. Welcome to episode number 43 of the Knowledge Graph Insights Podcast. I am super extra delighted today to welcome to the show, Jim Hendler. Jim, I think it's fair to say he literally needs no introduction. He was one of the co-authors of the original Semantic Web article in Scientific American. He's been a longtime well-known professor at Rensselaer Polytechnic Institute. So welcome, Jim. Tell the folks a little bit more about what you're up to these days.
Jim:
Sure. Just to go back a little further in history, I've been doing AI a long time and my first paper was about '77, but a lot of the work we're going to be talking today happened when I was a professor at the University of Maryland, which was from '86 to 2007. And then from 2007 on, I've been at RPI where I was really hired to create a lab that really would be a visionary lab on semantic web and related technologies. I think the president of the university saw the data science revolution coming and saw that that was a key part of it.
Jim:
So who am I? What am I? Really, what happened was very early in the days of AI, I was working in a lot of different things. I started under Roger Schank at Yale, took a few years off to work professionally at Texas Instruments, which had the first industrial AI lab outside of the well-known ones at Xerox Park and stuff. Then decided no, I really was an academic at heart. So I came back, went to grad school with Gene Charniak at Brown and went from there to the University of Maryland. So you know my job life history. I've bumped around during that time. Living in Maryland, you tend to bump into the Defense Department and things like that and funding and things like that. I was on a few committees and things like that. Eventually asked to come to DARPA for a few years, which is really where a lot of our conversation today probably starts.
Jim:
And then again, just because it was successful and we had a visionary president here at RPI, she asked me to come and said, "Not only do I want to hire you, but I want you to hire a couple other people you'll work with who'll help put us on the map and this stuff." And I hired Deb McGuinness and I'm sure that'll come up later. And then past 15 years have been a combination of research and administration. So I've done both, doing my own work, working with my students, and also trying to really set up some significant presence of AI on our campus, AI and beyond.
Larry:
Nice. Yeah, and we'll talk definitely more about your research work and everything. But hey, I want to set a little bit of context about how we met, because I know Dean Allemang from the Knowledge Graph Conference community, and we'll talk a little bit more about the book that you wrote with him later on. But one of the things that he famously says, and always attributes it to you, is that phrase "A little semantics goes a long way." I'd love to open up by talking a little bit about that.
Jim:
So early on in AI, it was becoming very, very clear to me, and now I'm talking 70s, early 80s, so a long time before we were where scaling means what it does today. But it's very clear to me that a lot of the problem with AI is it didn't scale. And meanwhile, I was seeing these other technologies coming along, the ones that really led to the web, that were looking at a much, much broader thing than the typical AI system. So one of the things I started asking is, how do we scale up AI? And we were looking at traditional knowledge representation languages. I actually have a paper from the 80s. I actually did a book with Hiroki Katano, who's now the... I believe he's still the vice president for research at Sony, if not something higher. And Katanosan and I actually had a book called Massively Parallel Artificial Intelligence in the 80s, but it became clear to me that the machines were part of the story, but the lots and lots of people doing lots and lots of different things was the much more interesting part of the story.
Jim:
And then also, I've always been intrigued by human memory. You asked me a question and I not only answered that question, but I'm doing right now. It's associating a million things in my mind. And what I'm really doing is winnowing rather than trying to come up with the precise answer. And so I started thinking about how does AI memory start to look like human memory more? In those days, a thousand and then 10,000 and then a million "axioms" were very, very large things, and that's what I wanted to do. And then the web was coming along and I saw that, well, if I'm going to get a million facts about something,
Brad Bolliger
Brad Bolliger entered the knowledge graph space via enterprise software system design and data analytics. That background informs their pragmatic and strategic approach to the use of semantic technology in systems that facilitate information exchange across government agencies.
We talked about:
their work at EY (Ernst & Young) on data and analytics strategy assessments and enterprise software design and as a co-chair of the NIEMOpen Technical Architecture Committee
how their work on EY's Unified Justice Platform introduced them to the knowledge graph world
a quick overview of entity resolution
the NIEM standard, its origin in the wake of 9/11, its scope, how it's built and managed, and how governments use it
their pragmatic approach to ontology and vocabulary management
the benefits of the extensibility of the RDF format and knowledge graph technology
how entity-centric data modeling accelerates and facilitates systems evolution
their take on "analytics enablement engineering"
their approach to crafting AI-ready data and building AI-aware enterprise solutions
some of the neuro-symbolic AI architecture's they have seen and implemented
their call for more systems thinking and systems analysis to create more effective services that work together in a more ethical and effective way
Brad's bio
Bradley Bolliger (they/them) works in the AI & Data practice of Ernst & Young and serves as co-chair of the NIEMOpen Technical Architecture Committee, an OASIS open standards project for data interoperability.
Brad assists clients across various industries with optimizing data platform ecosystems, enhancing customer relationships, and leveraging advanced analytics tools and techniques in their digital transformation efforts. In addition to designing data platforms and AI/NLP systems, Brad has served in lead analyst roles for public sector information system modernization efforts, including major contact center data ecosystems and integrated criminal justice system environments, the latter of which would lead to the development of the UnifiedJusticePlatform.
Connect with Brad online
LinkedIn
Unified Justice Platform
Video
Here’s the video version of our conversation:
https://youtu.be/8XCmF3qXv1E
Podcast intro transcript
This is the Knowledge Graph Insights podcast, episode number 42. When you have to account for the people and other entities involved in high-stakes situations, you need a system that delivers accurate, unambiguous information. Brad Bolliger does this in their work on EY's Unified Justice Platform. Brad is relatively new to the graph world and has adopted a pragmatic approach to semantic modeling and knowledge graphs, focusing on applying lessons learned in their extensive experience in enterprise systems design and data analytics.
Interview transcript
Larry:
Hi, everyone. Welcome to episode number 42 of the Knowledge Graph Insights podcast. I am really delighted today to welcome to the show Brad Bolliger. Brad works in the AI and data practice at EY, the big consultancy in Chicago, and also helps co-chair the NIEM Information Exchange, the Info Exchange Network and standard. Welcome, Brad. Tell the folks a little bit more about what you're up to these days.
Brad:
Thanks for having me, Larry. I'm thrilled to be talking to you today. Yeah, I'm non-binary. I use they/them pronouns, and I work in the AI and data practice at Ernst & Young, as you said, where I do data and analytics strategy assessments and enterprise software design, things like that. I'm also co-chair of the NIEMOpen Technical Architecture Committee, which is an Oasis Open standard for sharing data in public services primarily, but for specification for developing information exchanges. And I'm working on semantics and software design more generally.
Larry:
Yeah. And you kind of not stumbled, but you had semantics thrust upon you in this new role, I understand, 'cause one of the projects you work on, I don't know if you're still working on it, was the Unified Justice Platform at EY. Can you talk a little bit about that and how it brought you into the semantics world?
Brad:
Yeah, that's right. It spun out of an assessment from a county government wanting to overhaul their integrated justice system, which was the collection of actors who collaborate or have this adversarial relationship to administer the process of justice in their jurisdiction. And because very often they're their own elected officials with their own budgets, they have their own software to fulfill their own functions. And that means that they are kind of inherently operating a distributed system, sending messages back and forth to say, "Hey, we booked this person into the jail. Hey, we've got this court date coming up. Hey, we're filing these charges." And they need to orchestrate complex operational processes across multiple software systems and multiple groups of people, again, kind of across jurisdictions or enclaves. And that was, of course, a really interesting systems analysis process that led to the development of a solution to this problem we were trying to assess, which we later called the Unified Justice Platform and is an event-driven architecture for building an entity-resolved knowledge graph as an operational data store programmatically as messages are exchanged between the stakeholders in the Enclave.
Larry:
Yeah. And you used a couple of words in there. I want to clarify for folks who might be new to them. The notion of entity resolution, the entity-resolved knowledge graph, I'll just point out that we met through our mutual friend, Paco Nathan, who works for Senzing, a company that just does entity resolution. And can you talk a little bit about entity resolution, how that fits into the needs of this distributed system and how you implement it in the platform?
Brad:
Yeah. Actually, I'll plug almost two years ago, we did a webinar with someone from Senzing and talked about the fundamental utility of entity resolution and relevance, I suppose, as a problem more generally. Entity resolution is essentially about creating, for me, is essentially about creating a high quality master index of whatever kind of data that it is that you're looking at. So in this case, we were talking about a master person index so that you have a more reliable picture of the same natural person, no matter which software system is representing the data that describes the person subject to judicial proceedings in particular. But thinking about entity-centric data modeling more generally, you got a different type of entity, you still need to disambiguate which location you're talking about, which person you're talking about, which entity that really is. And if there are different representations, different records that relate to the same underlying entity, that process of entity resolution therefore has this really broad systemic benefit to data management and data engineering in particular, because ultimately it's about the master index at the end of the day.
Larry:
Yeah. And as you talked about that, you mentioned that it's like this a canonical record of entities. And how does NIEM fit into that? Because that's a vocabulary as I understand it.
Brad:
That's right.
Larry:
Yeah. Can you talk a little bit about NIEM and how that works with entity resolution?
Brad:
Yeah, very briefly on NIEM, NIEM spun out of the post September 11th realization that public services needed to share data to collaborate more effectively to actually solve emergencies, but just problems in general. And what they realized was that they need to have a common language to collaborate more effectively. Again, because systems, machines, software systems, have this really concrete definition of we use these particular terms and they mean something in our enclave, but you could have a person's full name and a person's first name and a person's last name in two different records, but actually they're the same real person. So NIEM came out of an attempt to at least address some of that disambiguity. And what is most interesting to me about NIEM, honestly, is that it is a collaboratively defined list of vocabulary. So we actually get domain participants involved and they decide we use these terms and they mean these things.
Brad:
And so it's an attempt to reduce the amount of complexity that you could use to describe a different person, but communicate the same meaning without losing the information that's entailed in some data record. But I'm digressing a little bit probably. What NIEM is a framework for building message specifications, APIs, if you like, or other types of structures, data structures in general that is a community agreed-upon set of terms that have some kind of core relevance, person, entity, organization, or have some domain specific function, like, subject or something in human services and so on.
Larry:
Interesting. Yeah. And as you talk about that, that attempt to align people on vocabulary is such a notoriously difficult problem. And I don't know how many jurisdictions we're talking about here, but every little town in America has a police department and other social services that they do. What is the scope or the scale of that? And is it facilitated in any way by existing standards or vocabularies?
Brad:
Oh, very much so. In fact, the problem is even worse than you've described it very charitably, I think. Just in the United States alone, I'm told that there are over 18,000 law enforcement agencies, just law enforcement agencies. Nevermind how ... Anyway, so NIEM is a voluntary open standard. So it is something that is available, but is usually not mandated. There are some places where it is mandated for specific types of services. So the scale of the problem that we're talking about really depends on who's included in the conversation.
Tara Raafat
At Bloomberg, Tara Raafat applies her extensive ontology, knowledge graph, and management expertise to create a solid semantic and technical foundation for the enterprise's mission-critical data, information, and knowledge.
One of the keys to the success of her knowledge graph projects is her focus on people. She of course employs the best semantic practices and embraces the latest technology, but her knack for engaging the right stakeholders and building the right kinds of teams is arguably what distinguishes her work.
We talked about:
her history as a knowledge practitioner and metadata strategist
the serendipitous intersection of her knowledge work with the needs of new AI systems
her view of a knowledge graph as the DNA of enterprise information, a blueprint for systems that manage the growth and evolution of your enterprise's knowledge
the importance of human contributions to LLM-augmented ontology and knowledge graph building
the people you need to engage to get a knowledge graph project off the ground: executive sponsors, skeptics, enthusiasts, and change-tolerant pioneers
the five stars you need on your team to build a successful knowledge graph: ontologists, business people, subject matter experts, engineers, and a KG product owner
the importance of balancing the desire for perfect solutions with the pragmatic and practical concerns that ensure business success
a productive approach to integrating AI and other tech into your professional work
the importance of viewing your knowledge graph as not just another database, but as the very foundation of your enterprise knowledge
Tara's bio
Dr. Tara Raafat is Head of Metadata and Knowledge Graph Strategy in Bloomberg’s CTO Office, where she leads the development of Bloomberg’s enterprise Knowledge Graph and semantic metadata strategy, aligning it with AI and data integration initiatives to advance next-generation financial intelligence. With over 15 years of expertise in semantic technologies, she has designed knowledge-driven solutions across multiples domains including but not limited to finance, healthcare, industrial symbiosis, and insurance. Before Bloomberg, Tara was Chief Ontologist at Mphasis and co-founded NextAngles™, an AI/semantic platform for regulatory compliance. Tara holds a PhD in Information System Engineering from the UK. She is a strong advocate for humanitarian tech and women in STEM and a frequent speaker at international conferences, where she delivers keynotes, workshops, and tutorials.
Connect with Tara online
LinkedIn
email: traafat at bloomberg dot net
Video
Here’s the video version of our conversation:
https://youtu.be/yw4yWjeixZw
Podcast intro transcript
This is the Knowledge Graph Insights podcast, episode number 41. As groundbreaking new AI capabilities appear on an almost daily basis, it's tempting to focus on the technology. But advanced AI leaders like Tara Raafat focus as much, if not more, on the human side of the knowledge graph equation. As she guides metadata and knowledge graph strategy at Bloomberg, Tara continues her career-long focus on building the star-shaped teams of humans who design and construct a solid foundation for your enterprise knowledge.
Interview transcript
Larry:
Hi everyone. Welcome to episode number 41 of the Knowledge Graph Insights podcast. I am really excited today to welcome to the show Tara Raafat. She's the head of metadata and knowledge graph strategy at Bloomberg, and a very accomplished ontologist, knowledge graph practitioner. And welcome to the show, Tara. Tell the folks a little bit more about what you're doing these days.
Tara:
Hi, thank you so much, Larry. I'm super-excited to be here and chatting with you. We always have amazing chats, so I'm looking forward to this one as well. Well, as Larry mentioned, I'm currently working for Bloomberg and I've been in the space of knowledge graphs and ontology and creation for a pretty long time. So I've been in this community, I've seen a lot. And my interest has always been in the application of ontologies and knowledge graphs in industries, and have worked in so many different industries from banking and financial to insurance to medical. So I touched upon a lot of different domains with the application of knowledge graphs. And currently at Bloomberg, I am also leading their metadata strategy and the knowledge graph strategy, so basically semantic metadata. And we're looking over how we are basically connecting all the different data sources and data silos that we have within Bloomberg to make our data ready for all the AI interesting, exciting AI stuff that we're doing. And making sure that we have a great representation of our data.
Larry:
That's something that comes up all the time in my conversations lately is that people have done this work for years for very good reasons, all those things you just talked about, the importance of this kind of work in finance and insurance and medical fields and things like that. But it turns out that it makes you AI-ready as well. So is that just a happy coincidence or are you doing even more to make your metadata more AI-ready these days?
Tara:
Yeah. In a sense, you could say happy coincidence, but I think from the very beginning of when you think about ontologies and knowledge graphs, the goal was always to make your data machine-understandable. So whenever people ask me, "You're an ontologist, what does that even mean?" My explanation was always, I take all the information in your head and put it in a way that is machine understandable. So now encoded in that way. So now when we're thinking about the AI era, it's basically we're thinking if AI is operating on our information, on our data, it needs to have the right context and the right knowledge. So it becomes a perfect fit here. So if data is available and ready in your knowledge graph format, it means that it's machine understandable. It has the right context. It has the extra information that an AI system, specifically in the LLM era and generative AI needs in order to make sure that the answering that it's done is more grounded and based in facts, or have a better provenance. And it's more accurate in quality.
Larry:
Yeah, that's right. You just reminded me, it's not so much serendipity or a happy coincidence. It's like, no, it's just what we do. Because we make things accessible. The whole beauty of this is the-
Tara:
We knew what's coming, right? The word AI has changed so much. It's the same thing. It just keeps popping up in different contexts, but yeah.
Larry:
So you're actually a visionary futurist as all of us are in the product. Yeah. In your long experience, one of the things I love most, there's a lot of things I love about your work. I even wrote about it after KGC. I summarized one of your talks, and I think it's on your LinkedIn profile now, you have this great definition of a knowledge graph. And you liken it to a biological concept that I like. So can you talk a little bit about that?
Tara:
Sure. I see knowledge graph as the DNA of data or DNA of our information. And the reason I started thinking about it that way is when you think about the human DNA, you're literally thinking of the structure and relationship of the organisms and how they operate and how they evolve. So there's a blueprint of their operation and how they would grow and evolve. And for me, that's very similar to when we start creating a knowledge graph representation of our data, because we're again, capturing the structure and relationships between our data. And we're actually encoding the context and the rules that are needed to allow our data to grow and evolve as our business grows and evolves. So there's a very similarity for me there. And it also brings that human touch to this whole concept of knowledge graphs because when I think about knowledge graphs and talking about ontologies, it comes from a philosophical background. And it's a lot more social and human.
Tara:
And at the end of the day, the foundation of it is how we as humans interpret the world and interpret information. And how then by the use of technology, we encode it, but the interpretation is still very human. So that's why this link for me is actually very interesting. And I think one more thing I would add, which is I do this comparison to also emphasize on the fact that knowledge graphs are not just another database or another data store. So I don't like companies to look at it from that perspective. They really should look at it as the foundation on which their data grows and evolves as their business grows.
Larry:
Yeah. And that foundational role, it just keeps coming up, again, related to AI a lot, the LLM stuff that I've heard a lot of people talk about the factual foundation for your AI infrastructure and that kind of thing. And again, another one of those things like, yeah, it just happens to be really good at that. And it was purpose built for that from the start.
Larry:
You mentioned a lot in there, the human element. And that's what I was so enamored of with your talk at KGC and other talks you've done and we've talked about this. And one of the things that, just a quick personal aside, one of the things that drives me nuts about the current AI hype cycle is this idea like, "Oh, we can just get rid of humans. It's great. We'll just have machines instead." I'm like, "Have you not heard..." Every conversation, I've done about 300 different interviews over the years. Every single one of them talks about how it's not technical, it's not procedural or management wisdom. It's always people stuff. It's like change management and working with people. Can you talk about how the people stuff manifests in your work in metadata strategy and knowledge graph construction? I know that's a lot.
Tara:
Sure.
Alexandre Bertails
At Netflix, Alexandre Bertails and his team have adopted the RDF standard to capture the meaning in their content in a consistent way and generate consistent representations of it for a variety of internal customers.
The keys to their system are a Unified Data Architecture (UDA) and a domain modeling language, Upper, that let them quickly and efficiently share complex data projections in the formats that their internal engineering customers need.
We talked about:
his work at Netflix on the content engineering team, the internal operation that keeps the rest of the business running
how their search for "one schema to rule them all" and the need for semantic interoperability led to the creation of the Unified Data Architecture (UDA)
the components of Netflix's knowledge graph
Upper, their domain modeling language
their focus on conceptual RDF, resulting in a system that works more like a virtual knowledge graph
his team's decision to "buy RDF" and its standards
the challenges of aligning multiple internal teams on ontology-writing standards and how they led to the creation of UDA
their two main goals in creating their Upper domain modeling language - to keep it as compact as possible and to support federation
the unique nature of Upper and its three essential characteristics - it has to be self-describing, self-referencing, and self-governing
their use of SHACL and its role in Upper
how his background in computer science and formal logic and his discovery of information science brought him to the RDF world and ultimately to his current role
the importance of marketing your work internally and using accessible language to describe it to your stakeholders - for example describing your work as a "domain model" rather than an ontology
UDA's ability to permit the automatic distribution of semantically precise data across their business with one click
how reading the introduction to the original 1999 RDF specification can help prepare you for the LLM/gen AI era
Alexandre's bio
Alexandre Bertails is an engineer in Content Engineering at Netflix, where he leads the design of the Upper metamodel and the semantic foundations for UDA (Unified Data Architecture).
Connect with Alex online
LinkedIn
bertails.org
Resources mentioned in this interview
Model Once, Represent Everywhere: UDA (Unified Data Architecture) at Netflix
Resource Description Framework (RDF) Schema Specification (1999)
Video
Here’s the video version of our conversation:
https://youtu.be/DCoEo3rt91M
Podcast intro transcript
This is the Knowledge Graph Insights podcast, episode number 40. When you're orchestrating data operations for an enormous enterprise like Netflix, you need all of the automation help you can get. Alex Bertails and his content engineering team have adopted the RDF standard to build a domain modeling and data distribution platform that lets them automatically share semantically precise data across their business, in the variety of formats that their internal engineering customers need, often with just one click.
Interview transcript
Larry:
Hi, everyone. Welcome to episode number 40 of the Knowledge Graph Insights podcast. I am really excited today to welcome to the show, Alex Bertails. Alex is a software engineer at Netflix, where he's done some really interesting work. We'll talk more about that later today. But welcome, Alex, tell the folks a little bit more about what you're up to these days.
Alex:
Hi, everyone. I'm Alex. I'm part of the content engineering side of Netflix. Just to make it more concrete, most people will think about the streaming products, that's not us. We are more on the enterprise side, so essentially the people helping the business being run, so more internal operations. I'm a software engineer. I've been part of the initiative called UDA for a few years now, and we published that blog post a few months ago, and that's what most people want to talk about.
Larry:
Yeah, it's amazing that the excitement about that post and so many people talking about it. But one thing, I think I inferred it from the article, but I don't recall a real explicit statement of the problem you were trying to solve in that. Can you talk a little bit about the business prerogatives that drove you to create UDA?
Alex:
Yeah, totally. There was no UDA, there's no clear problem that we had to solve and really people, won't realize that, but we've been thinking about that point for a very long time. Essentially, on the enterprise side, you have to think about lots of teams having to represent the same business concepts, think about movie actor region, but really hundreds of them really, across different systems. It's not necessarily people not agreeing on what a movie is, although it happens, but it's really what is the movie across a GraphQL service, a data mesh source, an Iceberg table, resulting in duplicating efforts and definitions at the end not aligning. A few years ago, we were in search for this one schema kind of concept that would actually rule them all, and that's how we got into domain modeling, and how can we do that kind of domain modeling across all representations?
Alex:
So there was one part of it. The other part is we needed to enable what's called semantic interoperability. Once we have the ability to talk about concepts and domain models across all of the representations, then the next question is how can we actually move and help our users move in between all of those data representations? There is one thing to remember from the article that's actually in the title, that's that concept of model once, represent everywhere. The core idea with all of that is to say once we've been able to capture a domain model in one place, then we have the ability to project and generate consistent representations. In our case, we are focused on GraphQL, Avro, Java, and SQL. That's what we have today, but we are looking into adding more support for other representations.
Larry:
Interesting. And I think every enterprise will have its own mix of data structures like that that they're mapping things to. I love the way you use the word, project. I think different people talk about what they do with the end results of such systems. You have two concepts you talk about as you talk about this, the notion of mappings, which we're just talking about with the data stuff, but also that notion of projection. That's sort of like once you've instantiated something out this system, you project it out to the end user. Is that kind of how it works?
Alex:
Yes, so we do use the term, projection, in the more mathematical sense, and more people would call that denotations. So essentially, once you have a domain model, and you can reason about it, and we have actually, a formal representation of the domain models, maybe we'll talk about that a little bit later. But then you can actually define how it's supposed to look like, the exact same thing with the same data semantics, but as an API, for example, in GraphQL, or as a data product in Iceberg, in the data warehouse, or as a low-compacted Kafka topic in our data mesh infrastructure as Avro. So for us, we have to make sure that it's quote, unquote, "the same thing," regardless of the data representation that the user is actually interested in.
Alex:
To put everything together, you talked about the mappings, what's really interesting for us is that the mappings are just one of the three main components that we have in our knowledge graph, because at the end of the day, UDA at its core is really a knowledge graph which is made out of the domain models. We've talked about that. Then the mappings, the mappings are themselves objects in that knowledge graph, and they are here actually to connect the world of concepts from the domain models through the worlds of data containers, which in our case could represent things like an Iceberg table, so we would want to know the coordinates on the Iceberg table and we would want to know the schema. But that applies as well to the data mesh source abstraction and the Avro schema that goes with it.
Alex:
That would apply as well, and that's a tricky part that very few people actually try to solve, but that would apply to the GraphQL APIs. We want to be able to say and know, oh, there is a type resolver for that GraphQL type that exists in that domain graph service and it's located exactly over there. So that's the kind of granularity that we actually capture in the knowledge graph.
Larry:
Very cool. And this is the Knowledge Graph Insights podcast, which is how we ended up talking about this. But that notion of the models, and then the mappings, and then the data containers that actually have everything, I'm just trying to get my head around the scale of this knowledge graph. You said this is not just, but you tease it out, it doesn't have to do with the streaming services or the customer facing part of the business, it's just about your kind of content and data media assets that you need to manage on the back end. Are you sort of an internal service? Is that how it's conceived or?
Alex:
That's a good question. So we are not so much into the binary data. That's not at all what UDA is about. Again, it's knowledge graph podcast, for sure, but even more precisely, when we say knowledge graph, we really mean conceptual RDF and we are very, very clear about that. That means for us, quite a few things. The knowledge graph, in our case, needs to be able to capture the data wherever it lives. We do not want necessarily to be RDF all the way through, but at the very core of it, there is a lot of RDF. I'm trying to remember how we talk about it. But yeah, so think about a graph representation of connected data. And again, it has to work across all of the data representations,
Torrey Podmajersky
Torrey Podmajersky is uniquely well-prepared to help digital teams align on language and meaning.
Her father's interest in philosophy led her to an early intellectual journey into semantics, and her work as a UX writer at companies like Google and Microsoft has attuned her to the need to discover and convey precise meaning in complex digital experiences.
This helps her span the "semantic gaps" that emerge when diverse groups of stakeholders use different language to describe similar things.
We talked about:
her work as president at her consultancy, Catbird Content, and as the author of two UX books
how her father's interest in philosophy and semantics led her to believe that everyone routinely thinks about what things mean and how to represent meaning
the role of community and collaboration in crafting the language that conveys meaning
how the educational concept of "prelecting" facilitates crafting shared-meaning experiences
the importance of understanding how to discern and account for implicit knowledge in experience design
how she identifies "semantic gaps" in the language that various stakeholders use
her discovery, and immediate fascination with, the Cyc project and its impact on her semantic design work
her take on the fundamental differences between how humans and LLMs create content
Torrey's bio
Torrey Podmajersky helps teams solve business and customer problems using UX and content at Google, OfferUp, Microsoft, and clients of Catbird Content. She wrote Strategic Writing for UX, is co-authoring UX Skills for Business Strategy, hosts the Button Conference, and teaches content, UX, and other topics at schools and conferences in North America and Europe.
Connect with Torrey online
LinkedIn
Catbird Content (newsletter sign-up)
Torrey's Books
Strategic Writing for UX
UX Skills for Business Strategy
Resources mentioned in this interview
Cyc project
Button Conference
UX Methods.org
Video
Here’s the video version of our conversation:
https://youtu.be/0GLpW9gAsG0
Podcast intro transcript
This is the Knowledge Graph Insights podcast, episode number 39. Finding the right language to describe how groups of people agree on the meaning of the things they're working with is hard. Torrey Podmajersky is uniquely well-prepared to meet this challenge. She was raised in a home where where it was common to have philosophical discussions about semantics over dinner. More recently, she's worked as a designer at tech companies like Google, collaborating with diverse teams to find and share the meaning in complex systems.
Interview transcript
Larry:
Hi everyone. Welcome to episode number 39 of the Knowledge Graph Insights podcast. I am really delighted today to welcome to the show Torrey Podmajersky. I've known Torrey for years from the content world, the UX design and content design and UX writing and all those worlds. I used to live very closer to her office in Seattle, but Torrey's currently the president at Catbird Content, her consultancy, and she's guest faculty at the University of Washington iSchool. She does all kinds of interesting stuff, very accomplished author. So welcome Torrey. Tell the folks a little bit more about what you're up to and where all the books are at these days.
Torrey:
Thanks so much, Larry. I am up to my neck in finishing the books right now. So one just came out the second edition of Strategic Writing for UX that has a brand new chapter on building LLMs into products and updates throughout, of course since it came out six years ago. But I'm also working on the final manuscript with twoTorrey Podmajersky co-authors for UX Skills for Business Strategy. That'll be a wine pairing guide, a deep reference book that connects the business impact that you might want to make, whether you're a UX pro or a PM or a knowledge graph enthusiast working somewhere in product and connecting it to the UX skills you might want to use to make those impacts.
Larry:
Excellent. I can't wait to read both of those. I love the first edition of the Strategic Writing for UX book, but... Hey, I want to talk today though about, this is the Knowledge Graph Insights podcast, and you recently did this great post and we'll talk more about it in detail in a bit about how you had discovered the Cyc project, which is a real pioneering project in the semantic technology field and really foundational to a lot of the knowledge graph stuff that's happening today. But I want to start with one of the other things we talked about before we went on the air was your observation of the kind of common philosophical roots that we have in rhetoric, maybe not necessarily rhetoric, but the stuff that we do as word nerds, as meaning nerds, as all these different kinds of technology nerds that we are. Tell me a little bit about what you meant because you just hinted that and I was like, oh, good philosophy. I love philosophy.
Torrey:
Yeah, I love philosophy too, especially through my dad. My dad was a philosophy major at Haverford College and it has deeply influenced his life and his work in semantic knowledge spaces. And I got to grow up in that context thinking that everybody thought deeply about what things meant and how we represent those meanings. I mean, the Plato's Allegory of the Cave was my bedtime story to the extent that we all knew Plato in the cave, geez, dad, just fine. Plato in the cave. We don't really know anything. All we have is facsimiles and representations of meaning and representations of reality, and through that we construct meaning. And I feel like that's all we're ever doing is using language to construct meaning based on our inability to fully perceive reality.
Larry:
And just for folks who aren't familiar, I love Plato's Allegory of the Cave. It's these poor people chained to a wall and behind them is a projector projecting stuff on the wall in front of them. So all they see is this projection of an imitation of reality, which is much like what we're doing with either both UX writing and I think ontology design and semantic engineering. So that's the perfect analogy to come into this. But your job for the last, I don't know, because you made the transition from teaching to Xbox, what? 10, 12 years ago or something like that?
Torrey:
In 2010, I joined Xbox and before that I had a short stint in internal communications in a division at Microsoft working for a VP there.
Larry:
But you've been in the word biz and the meaning biz for a long time because UX writing is, how did you say it? You have to convey meaning. That's the whole point of UX writing is to just get past random words to actually, what are we talking about here?
Torrey:
It's to make the words that people understand so quickly while they're in an experience, they're just trying to use it. They're not there to read. So we want the words to disappear into ephemeral meaning in their head that they don't even remember. They just knew what to do and which button to press and where to go next to get done what they wanted to get done.
Larry:
And one of the things about that is getting to that language to do that in an experience, that's a team sport. One of the other things that really struck me about that post you did was the role of community in language and meaning. Talk a little bit about that.
Torrey:
Yeah, it is a team sport because in general, even if it's the person doing the UX writing or that content design is also the product designer is also the interaction designer. What they're trying to do is take a wide variety of people who might be using this product that might be an incredibly diverse set of people, or it might be a very narrow set of people, let's say all IT pros. We want to sell this product to big corporations that have IT pros that want to manage their data centers. It's a pretty narrow slice of humans, but it's still hugely diverse in terms of from what language they're speaking and what kind of resources they have inside this company to the kind of background they have, to all of the different reasons they might need to manage their data centers right now.
Torrey:
From, hey, something new came online or there needs to be a new partition or new admin management of access to it or security patch updates to things like, oh, there was an earthquake at a data center and I need to and secure and audit any damage that might've happened. So there's a huge number of reasons. Let me back up of that deep analogy. There's a huge number of reasons even for a tiny population relative to the scope of humanity, a small population doing a relatively well-defined job still has a huge number of reasons they might need to be in an interface doing a thing. And what we have to do when we are designing the content for that and designing the experience itself is anticipate those and try and make sure that we've indicated that whatever reason they're coming there for, if it's a valid reason to use this piece of software, whatever reason they're coming there for, they see it reflected in the text and they understand what to do.
Torrey:
That is a team sport because I can't, and no individual person can anticipate all of those things simultaneously. We need to think them through sequentially. We need data to base it on. We need to understand, we need to hear from people who will use it or people who would use it to hear about how they think about it and specifically what language do they use, what's already in their head that we can use to reflect on that screen. So it's about understanding that space well enough, coming to understand that space well enough by communicating with other humans to know what are the right things to represent and in what hierarchy or embeddedness or relationalness, and then use some grammar and punctuation and other tricks up our language sleeves.
Larry:
Yeah, no.
Casey Hart
Ontology engineering has its roots in the idea of ontology as defined by classical philosophers.
Casey Hart sees many other connections between professional ontology practice and the academic discipline of philosophy and shows how concepts like epistemology, metaphysics, and rhetoric are relevant to both knowledge graphs and AI technology in general.
We talked about:
his work as a lead ontologist at Ford and as an ontology consultant
his academic background in philosophy
the variety of pathways into ontology practice
the philosophical principles like metaphysics, epistemology, and logic that inform the practice of ontology
his history with the the Cyc project and employment at Cycorp
how he re-uses classes like "category" and similar concepts from upper ontologies like gist
his definition of "AI" - including his assertion that we should use term to talk about a practice, not a particular technology
his reminder that ontologies are models and like all models can oversimplify reality
Casey's bio
Casey Hart is the lead ontologist for Ford, runs an ontology consultancy, and pilots a growing YouTube channel. He is enthusiastic about philosophy and ontology evangelism. After earning his PhD in philosophy from the University of Wisconsin-Madison (specializing in epistemology and the philosophy of science), he found himself in the private sector at Cycorp. Along his professional career, he has worked in several domains: healthcare, oil & gas, automotive, climate science, agriculture, and retail, among others. Casey believes strongly that ontology should be fun, accessible, resemble what is being modelled, and just as complex as it needs to be.
He lives in the Pacific Northwest with his wife and three daughters and a few farm animals.
Connect with Casey online
LinkedIn
ontologyexplained at gmail dot com
Ontology Explained YouTube channel
Video
Here’s the video version of our conversation:
https://youtu.be/siqwNncPPBw
Podcast intro transcript
This is the Knowledge Graph Insights podcast, episode number 38. When the subject of philosophy comes up in relation to ontology practice, it's typically cited as the origin of the term, and then the subject is dropped. Casey Hart sees many other connections between ontology practice and it its philosophical roots. In addition to logic as the foundation of OWL, he shows how philosophy concepts like epistemology, metaphysics, and rhetoric are relevant to both knowledge graphs and AI technology in general.
Interview transcript
Larry:
Hi, everyone. Welcome to episode number 38 of the Knowledge Graph Insights podcast. I am really delighted today to welcome to the show Casey Hart. Casey has a really cool YouTube channel on the philosophy behind ontology engineering and ontology practice. Casey is currently an ontologist at Ford, the motor car company. So welcome Casey, tell the folks a little bit more about what you're up to these days.
Casey:
Hi. Thanks, Larry. I'm super excited to be here. I've listened to the podcast, and man, your intro sounds so smooth. I was like, "I wonder how many edits that takes." No, you just fire them off, that's beautiful.
Casey:
Yeah, so like you said, these days I'm the ontologist at Ford, so building out data models for sensor data and vehicle information, all those sorts of fun things. I am also working as a consultant. I've got a couple of different startup healthcare companies and some cybersecurity stuff, little things around the edge. I love evangelizing ontology, talking about it and thinking about it. And as you mentioned for the YouTube channel, that's been my creative outlet. My background is in philosophy and I was interested in, I got my PhD in philosophy, I was going to teach it. You write lots of papers, those sorts of things, and I miss that to some extent getting out into industry, and that's been my way back in to, all right, come up with an idea, try and distill it, think about objections, put it together, and so I'm really enjoying that lately.
Larry:
And I'm enjoying the video-
Casey:
Glad to be on the show.
Larry:
Yeah, no, I really appreciate what you're doing there. One thing I wanted to, and I love that that's how you're getting back to both your philosophical roots, but also part of it is to evangelize ontology practice, which is that's what this podcast is all about, democratizing and sharing practice. But I think, and I just love that you have this explicit and strong philosophical foundation and bent to how you talk about things. I think a lot of times that conversation is like, "Yeah, ontology comes out of philosophy," and that's the end of the conversation. But you've mentioned the role of metaphysics, epistemology, logic, all of which, can you talk a little bit about how those, beyond just I think a lot of people think about logic and OWL and all that stuff, but can you talk a little bit more about the role of metaphysics and epistemology and these other philosophical ideas?
Casey:
Yeah, definitely. You mentioned this in the pre-notes, "Here's a topic we'd like to get to," and I got into a lot of imposter syndrome on this, right? I'm trying to talk myself out of this, but I think most ontologists have this feeling there's no solid easy pipeline into becoming an ontologist, right? It's a very eclectic group of us. My background's in philosophy, you run into a bunch of librarians, you've got computer scientists who do DB administration, you've got jazz musicians I've run into, it's a weird group.
Casey:
I say that just to be, sometimes when I get asked about, "Okay, how does ontological practice work?" I think, well, I didn't actually train to be an ontologist. I fell into it, so I'm ill-equipped to say things about what role ontology or philosophy plays in ontology.
Casey:
I just know I learned philosophy, and then I'm using some of those tools here, so there's two different answers. One is historically, how does philosophy inform and shape the nature of ontology practice? And the other part is just, okay, if you've got a philosophical toolkit of metaphysics and epistemology and logic, how does that apply and make you a better, I mean, the obvious connection is that ontology is a philosophical term. It comes from metaphysics. We look back to Aristotle, and it's the study of that which exists, so do we want to say there's fundamentally fire, air, earth, water or something like that? Or fundamentally, there are these atoms and those are the sorts of things that are part of the inventory of reality. It's not physics, it's metaphysics. It's the thing that in I think for Aristotle is just, it's the book that sits next to his physics in all of his category, in his library of everything.
Casey:
But when we move that forward to computer science and data modeling, then we're thinking, okay, maybe not for all of reality, although maybe it depends on how big you want your data model to be. But if I'm a retailer, what are the terms and ontology, what are the terms that I care about, the things that I need to model the constituents of reality that matter to me? That might be types, if you're Amazon, it's okay, medium-sized dry goods versus sporting equipment versus something else. If I'm doing a medical ontology, it's patients and payers and providers, et cetera. In philosophy, in ontology, there's a bunch of different tools and examples, but we think about, okay, what are some fundamental distinctions that we want to make? How can we carve nature at its joints in really sensible ways? That's a phrase that you'll hear a lot. We could say more about it if you want.
Casey:
But what I found is being a philosopher goes into an ontology space is that I have this inventory of examples from all of my grad seminars and various things that I'm looking through and going through whether I want to talk about gavagai and undetached rabbit parts, if that makes sense to anybody, or whether I want to talk about grue as a color, here are some examples, ways that we can chop up the world in unnatural ways versus chopping it up in natural ways and how do we make those distinctions? That applies straightforwardly when you get into building an ontology model for an oil and gas industry or something like that. There's a bunch of ways that we can divvy up all the things you care about, what's the right and sensible way to do it?
Casey:
I guess that's the metaphysics, ontology way. Logic you mentioned, right? We need to think about reasoning. I don't just want to assert a bunch of things about my data. A fundamental premise of an ontology is that we want to understand our data, we want to confer meaning on it, and that means that we have to be able to leverage the structure of the ontology to infer things smartly. Simple things like set containment are fine if all persons are animals, and then we say something about animals, they're creatures. Then when I say that persons are a subclass of that, then I get for free that persons are spatio-temporal things as well. But we get a lot more complicated inferences as we go. We have to think about statistical reasoning. Just in general, if logic is the study of what makes for good arguments, what follows from what, that's obviously got a lot of applications in ontology, AI.
Casey:
And then the third piece that we talked about is epistemology. Epistemology is the study of knowledge and belief, roughly about what it means to be justified. The classic example there is, if I know something, what exactly does that amount to? And then Plato says it's justified true beliefs. And then the history of epistemology is littered with examples of trying to cash out exactly what does it mean to be justified. And if you get new information, how can that undercut your justifications? How do you update your beliefs?
Casey:
More recent stuff, and this is what I did in my dissertation,
Chris Mungall
Capturing knowledge in the life sciences is a huge undertaking. The scope of the field extends from the atomic level up to planetary-scale ecosystems, and a wide variety of disciplines collaborate on the research.
Chris Mungall and his colleagues at the Berkeley Lab tackle this knowledge-management challenge with well-honed collaborative methods and AI-augmented computational tooling that streamlines the organization of these precious scientific discoveries.
We talked about:
his biosciences and genetics work at the Berkeley Lab
how the complexity and the volume of biological data he works with led to his use of knowledge graphs
his early background in AI
his contributions to the gene ontology
the unique role of bio-curators, non-semantic-tech biologists, in the biological ontology community
the diverse range of collaborators involved in building knowledge graphs in the life sciences
the variety of collaborative working styles that groups of bio-creators and ontologists have created
some key lessons learned in his long history of working on large-scale, collaborative ontologies, key among them, meeting people where they are
some of the facilitation methods used in his work, tools like GitHub, for example
his group's decision early on to commit to version tracking, making change-tracking an entity in their technical infrastructure
how he surfaces and manages the tacit assumptions that diverse collaborators bring to ontology projects
how he's using AI and agentic technology in his ontology practice
how their decision to adopt versioning early on has enabled them to more easily develop benchmarks and evaluations
some of the successes he's had using AI in his knowledge graph work, for example, code refactoring, provenance tracking, and repairing broken links
Chris's bio
Chris Mungall is Department Head of Biosystems Data Science at Lawrence Berkeley National Laboratory. His research interests center around the capture, computational integration, and dissemination of biological research data, and the development of methods for using this data to elucidate biological mechanisms underpinning the health of humans and of the planet. He is particularly interested in developing and applying knowledge-based AI methods, particularly Knowledge Graphs (KGs) as an approach for integrating and reasoning over multiple types of data. Dr. Mungall and his team have led the creation of key biological ontologies for the integration of resources covering gene function, anatomy, phenotypes and the environment. He is a principal investigator on major projects such as the Gene Ontology (GO) Consortium, the Monarch Initiative, the NCATS Biomedical Data Translator, and the National Microbiome Data Collaborative project.
Connect with Chris online
LinkedIn
Berkeley Lab
Video
Here’s the video version of our conversation:
https://youtu.be/HMXKFQgjo5E
Podcast intro transcript
This is the Knowledge Graph Insights podcast, episode number 37. The span of the life sciences extends from the atomic level up to planetary ecosystems. Combine this scale and complexity with the variety of collaborators who manage information about the field, and you end up with a huge knowledge-management challenge. Chris Mungall and his colleagues have developed collaborative methods and computational tooling that enable the construction of ontologies and knowledge graphs that capture this crucial scientific knowledge.
Interview transcript
Larry:
Hi everyone. Welcome to episode number 37 of the Knowledge Graph Insights podcast. I am really delighted today to welcome to the show Chris Mungall. Chris is a computational scientist working in the biosciences at the Lawrence Berkeley National Laboratory. Many people just call it the Berkeley Lab. He's the principal investigator in a group there, has his own lab working on a bunch of interesting stuff, which we're going to talk about today. So welcome, Chris, tell the folks a little bit more about what you're up to these days.
Chris:
Hi, Larry. It's great to be here. Yeah, so as you said, I'm here at Berkeley Lab. We're located in the Bay Area. We're just above UC Berkeley campus. We have a nice view of the San Francisco Bay looking into San Francisco, and so we're a national lab, so we're part of the Department of Energy National Lab system, and we have multiple different areas here in the lab looking at different aspects of science from physics, energy technologies, material science. I'm in the biosciences area, so we are really interested in how we can advance biological science in areas relevant to national scale challenges really in different areas like energy, the environment, health and bio-manufacturing.
Chris:
My own particular research is really focused on the role of genes and in particular the role of genes in complex systems. So this could be the genes that we have in our own cells, the genes in human beings, how they all work together to hopefully create a healthy human being. One part of my research also looks at the role of genes in the environment, and in particular the role of genes inside tiny old microbes that you'll find in the ocean water and in the soil. And how these genes all work together, both to help drive these microbial systems, help them work together and how they all work together really to drive ecosystems and biogeochemical cycles.
Chris:
So I think the overall aim is really just to get a picture of these genes and how they interact in these kind of complex systems and build up models of complex systems from scales right the way from atoms through the way through to organisms and indeed all the way to earth-scale systems. So my work is all computational. I don't have a wet lab. So one thing that we realized early on is just when you are sequencing these genomes and trying to interpret the genes, you're generating a lot of information and you need to be able to organize that somehow. And so that's how we arrived at working on knowledge graphs, basically to assemble all of this information together and to be able to use it in algorithms to help us interpret biological data and help us figure out the role of genes in these organisms.
Larry:
Yeah, many of the people I've talked to on this podcast, they come out of the semantic technology world and apply it in some place or another. It sounds like you came to this world because of the need to work with all the data you've got. What was your learning curve? Was it just another thing in your computational toolkit?
Chris:
Yeah, in some ways. In fact, my background is, if you go back far enough, my original background is more on the computational side and my undergrad was in AI, but this is back when AI meant good old-fashioned AI and symbolic reasoning and developing Prolog rules to reason about the world and so on. And at that time, I wasn't so interested in that side of AI. I really wanted to push forward with some of the more nascent neural network type approaches. But in those days, we didn't really have the computational power and I thought, "Well, maybe I really need to, I actually learned something about biological systems before trying to simulate them." So that's how I got involved in genomics. This was around about the time of just before the sequencing of the human genome, and I just got really interested in this area, a position came up here at Lawrence Berkeley National Laboratory, and I just got really involved in analyzing some of these genomes.
Chris:
And in doing this, I came across this project called the Gene Ontology that was developed by some of my colleagues originally in Cambridge and at Lawrence Berkeley National Laboratory. And the goal here was really as we were sequencing these genomes and we were figuring out there's 20,000 genes in the human genome, we discovered we had no way to really categorize what the functions of these different genes were. And if you think about it, there's multiple different ways that you can describe the function of any kind of machine, whether it's a molecular machine inside one of your cells or your car or your iPhone or whatever. You can describe it in terms of what the intent of that machine is. You can describe it in terms of where that machine is localized and what it does, and how that machine works as part of a larger ensemble of machines to achieve some larger objective.
Chris:
So my colleagues came up with this thing called the gene ontology, and I looked at that and I said, "Hey, I've got this background in symbolic reasoning and good old-fashioned AI. Maybe I could play a role in helping organize all of this information and figuring out ways to connect it together as part of a larger graph." We didn't call them knowledge graphs at this time, but we're essentially building knowledge graphs at the time and make use of, in those days quite early semantic web technologies. This is even before the development of all the web ontology language, but there was still this notion that we could use, we could use rules in combination with graphs to make inferences about things. And I thought, "Well, this seems like an ideal opportunity to apply some of this technology."
Larry:
That's interesting. It's funny we didn't plan this, but the episode right before you in the queue was of my friend Emeka Okoye. He's a guy who was building knowledge graphs in the late '90s, early 2000s, mostly the early 2000s before the term had been coined, and I think maybe even before a lot of the RDF and OWL and all that stuff was there. So you mentioned Prolog earlier, and what was your toolkit then, and how has it evolved up to the present? That's a huge question. Yeah.
Chris:
I didn't mean to get into my whole early days with Prolog. Yeah, I've definitely had some interest in applying a lot of these logic programming technologies. As you're aware,
Emeka Okoye
Semantic technologies permit powerful connections across a variety of linked data resources across the web. Until recently, developers had to learn the RDF language to discover and use these resources.
Leveraging the new Model Context Protocol (MCP) and LLM-powered natural-language interfaces, Emeka Okoye has created the RDF Explorer, an MCP service that lets any developer surf the semantic web without having to learn its specialized language.
We talked about:
his long history in knowledge engineering and AI agents
his deep involvement in the business and technology communities in Nigeria, including founding the country's first internet startup
how he was building knowledge graphs before Google coined the term
an overview of MCP, the Model Context Protocol, and its benefits
the RDF Explorer MCP server he has developed
how the MCP protocol and helps ease some of the challenges that semantic web developers have traditionally faced
the capabilities of his RDF Explorer:
facilitating communication between AI applications, language models, and RDF data
enabling graph exploration and graph data analysis via SPARQL queries
browsing, accessing, and evaluating linked-open-data RDF resources
the origins of RDF Explorer in his attempt to improve ontology engineering tooling
his objections to "vibe ontology" creation
the ability of RDF Explorer to let non-RDF developers users access knowledge graph data
how accessing knowledge graph data addresses the problem of the static nature of the data in language models
the natural connections he sees between neural network AI and symbolic AI like knowledge graphs, and the tech tribalism he sees in the broader AI world that prevents others from seeing them
how the ability of LLMs to predict likely language isn't true intelligence or actual knowledge
some of the lessons he learned by building the RDF Explorer, e.g., how the MCP protocol removes a lot of the complexity in building hybrid AI solutions
how MCP helps him validate the ontologies he creates
Emeka's bio
Emeka is a Knowledge Engineer, Semantic Architect, and Generative AI Engineer who leverages his over two decades of expertise in ontology and knowledge engineering and software development to architect, develop, and deploy innovative, data-centric AI products and intelligent cognitive systems to enable organizations in their Digital Transformation journey to enhance their data infrastructure, harness their data assets for high-level cognitive tasks and decision-making processes, and drive innovation and efficiency enroute to achieving their organizational goals.
Emeka’s experience has embraced a breadth of technologies his primary focus being solution design, engineering and product development while working with a cross section of professionals across various cultures in Africa and Europe in solving problems at a complex level. Emeka can understand and explain technologies from deep diving under the hood to the value proposition level.
Connect with Emeka online
LinkedIn
Making Knowledge Graphs Accessible: My Journey with MCP and RDF Explorer
RDF Explorer (GitHub)
Video
Here’s the video version of our conversation:
https://youtu.be/GK4cqtgYRfA
Podcast intro transcript
This is the Knowledge Graph Insights podcast, episode number 36. The widespread adoption of semantic technologies has created a variety of linked data resources on the web. Until recently, you had to learn semantic tools to access that data. The arrival of LLMs, with their conversational interfaces and ability to translate natural language into knowledge graph queries, combined with the new Model Context Protocol, has empowered semantic web experts like Emeka Okoye to build tools that let any developer surf the semantic web.
Interview transcript
Larry:
Hi, everyone. Welcome to episode number 36 of the Knowledge Graph Insights podcast. I am really delighted today to welcome to the show my good friend, Emeka Okoye. Emeka is a really interesting ontology practitioner and knowledge engineer, and he's operating now at the intersection of knowledge engineering and generative AI, which I think is a really interesting intersection and that's what we're going to talk about today. So welcome, Emeka. Tell the folks a little bit more about what you're up to these days.
Emeka:
Oh, well, thank you bringing me to this awesome podcast. I'm proud to be here. I have been involved in knowledge engineering or more like AI. We need to understand that knowledge engineering is important for AI because it creates the knowledge layer. So that's where we have knowledge graphs. There's been a lot of tribalism in AI, the neural nets on one side and the symbolic AI on the other side. So I am in for the convergence. I've always believed in the convergence.
Emeka:
Funny enough, I've been teaching and mentoring young ones on both sides of the divide since 2016 in the Nigerian data science space. So no surprises that generative AI boomed, and I needed to find reasons to see how we can integrate both sides, because that's what AI is all about, the best of both worlds, best of neural nets, and then best of symbolic AI. That's the future. I mean, there's no doubt about it. So that foundation, I needed to be there and that's why I've been working on both sides. So from knowledge graphs to AI agents.
Larry:
That's so funny, we didn't talk about this before I hit record, but right before we started this interview, I posted a thing to LinkedIn about exactly that. It was specifically about the need for executive education around hybrid AI architectures 'cause all they have is Silicon Valley hype. That's all the information they have. But more to the point, you're a hybrid practice. Well, first of all, I've known you for years now, and it just occurred to me, I don't really know your academic background, but it sounds like you're equally grounded in machine learning and knowledge representation stuff. Have you always pursued both?
Emeka:
I'm a geologist. That's the only qualification I do have. Immediately I found love with personal computers. So once the PC era boomed, I just went in programming. Nigeria was once one of the biggest software countries in the world at a point in time. Our software houses were building financial and banking systems the whole of North America were using, and some part of Europe. So we are that big. So when the internet came, we embraced it that early. I was already building internet protocols using Visual Basic, and not long after I co-founded the first startup in Nigeria. And then after that I worked with probably one of the earliest Semantic Web brands in the world, which is OpenLink Software. I became the Chief Technical Officer in the whole of Africa.
Emeka:
So I was with OpenLink Software when Tim Berners-Lee came up with the Semantic Web thing and Ora and co coming up with agents. So I started early on, thanks to my mentor, my boss then, Kingsley Idehen, who mentored me throughout and made me understand that the future was Semantic Web. So I dove right into it. And can you believe this, we were already creating knowledge graph before Google called it knowledge graph. I had created one for a client, which is Music In Africa by 2011, 2012.
Larry:
That's right before they introduced the term knowledge graph with their... That's so interesting because... And the RDF and the OWL and all the Semantic Web tech goes back 10 years before that. So that gap between the dawn of the Semantic Web and the coining of the term knowledge graph, you were just in there doing it.
Emeka:
Yes. Yeah, we were already doing it. And remember I came from a company that is on top of this technology. You who Kingsley Idehen is. He's my former boss, and mentor today, even after. I left OpenLink Software, he was there to guide me in. So most of what I know in semantic technology comes from Kingsley. So we were already doing this. So my understanding of the technology is very sound. Academia-wise, I didn't do anything much in that regards on the technology, but I'm hoping I'll do research in the future, because as I'm trying to come into Europe, I noticed that there are a lot of research-based jobs and AI is something I would love to devote research time.
Larry:
Yeah, and I know a lot of those people, and there's not a specific track yet around the hybrid AI stuff. I hope you get a chance to do that. But hey, that's what I want to really focus on today. So your background, your RDF Explorer project makes even more sense to me now. I just want to say real quickly about that. Emeka and I meet once or twice a week, and our Dataworthy Collective, which we co-organize with some other folks, and I was just embarrassed that I had totally missed this awesome piece you wrote for LinkedIn about RDF Explorer, and then you just happened to mention it in one of our meetings, and I went and read it and I was like, "Whoa, that's amazing. We got to talk about this."
Larry:
So here we are. Finally, I get to share the RDF Explorer with folks. So tell me, I think one thing I've been a little bit surprised by is that not everybody in the knowledge graph and semantic tech space is familiar with MCP. They maybe know the acronym and what it stands for, but can you talk just a little bit about the Model Context Protocol?
Emeka:
All right, so the Model Context Protocol, which was created by Anthropic sometimes in November 2024, is a standardized protocol which allows AI agents to connect and interact with external tools and different data sources in a simplified manner. It's that simplicity that is the attraction. So it removes a lot of stress that comes to connecting different data source to it. Now, just to give you an idea what we are talking about. Before MCP, we had all these agentic RAG solutions.
Tom Plasterer
Shortly after the semantic web was introduced, the demand for discoverable and shareable data arose in both research and industry.
Tom Plasterer was instrumental in the early conception and creation of the FAIR data principle, the idea that data should be findable, accessible, interoperable, and reusable.
From its origins in the semantic web community, scientific research, and the pharmaceutical industry, the FAIR data idea has spread across academia, research, industry, and enterprises of all kinds.
We talked about:
his recent move from a big pharma company to Exponential Data where he leads the knowledge graph and FAIR data practices
the direct line from the original semantic web concept to FAIR data principles
the scope of the FAIR acronym, not just four concepts, but actually 15
how the accessibility requirement in FAIR distinguishes the standard from the open data
the role of knowledge graphs in the implementation of a FAIR data program
the intentional omission of prescribed implementations in the development of FAIR and the ensuing variety of implementation patterns
how the desire for consensus in the biology community smoothed the development of the FAIR standard
the role of knowledge graphs in providing a structure for sharing terminology and other information in a scientific community
how his interest in omics led him to computer science and then to the people skills crucial to knowledge graph work
the origins of the impetus for FAIR in European scientific research and the pharmaceutical industry
the growing adoption of FAIR as enterprises mature their web thinking and vendors offer products to help with implementations
the roles of both open science and the accessibility needs in industry contributed to the development of FAIR
the interesting new space at the intersection of generative AI and FAIR and knowledge graph
the crucial foundational role of FAIR in AI systems
Tom's bio
Dr. Tom Plasterer is a leading expert in data strategy and bioinformatics, specializing in the application of knowledge graphs and FAIR data principles within life sciences and healthcare. With over two decades of experience in both industry and academia, he has significantly contributed to bioinformatics, systems biology, biomarker discovery, and data stewardship. His entrepreneurial ventures include co-founding PanGenX, a Personalized Medicine/Pharmacogenetics Knowledge Base start-up, and directing Project Planning and Data Interpretation at BG Medicine. During his extensive tenure at AstraZeneca, he was instrumental in championing Data Centricity, FAIR Data, and Knowledge Graph initiatives across various IT and scientific business units.
Currently, Dr. Plasterer serves as the Managing Director of Knowledge Graph and FAIR Data Capability at XponentL Data, where he defines strategy and implements advanced applications of FAIR data, knowledge graphs, and generative AI for the life science and healthcare industries. He is also a prominent figure in the community, having co-founded the Pistoia Alliance FAIR Data Implementation group and serving on its FAIR data advisory board. Additionally, he co-organizes the Health Care and Life Sciences symposium at the Knowledge Graph Conference and is a member of Elsevier’s Corporate Advisory Board.
Connect with Tom online
LinkedIn
Video
Here’s the video version of our conversation:
https://youtu.be/Lt9Dc0Jvr4c
Podcast intro transcript
This is the Knowledge Graph Insights podcast, episode number 35. With the introduction of semantic web technologies in the early 2000s, the World Wide Web began to look something like a giant database. And with great data, comes great responsibility. In response to the needs of data stewards and consumers across science, industry, and technology, the FAIR data principle - F A I R - was introduced. Tom Plasterer was instrumental in the early efforts to make web data findable, accessible, interoperable, and reusable.
Interview transcript
Larry:
Hi everyone. Welcome to episode number 35 of the Knowledge Graph Insights podcast. I am really delighted today to welcome to the show, Tom Plaster. Tom is the managing director who leads the knowledge graph and FAIR practices at Exponential Data, which is a company in the Boston area, or he's in the Boston area. So welcome Tom, tell the folks a little bit more about what you're up to these days.
Tom:
Thanks, Larry. And great pleasure to be with you and the audience. So I'm now, just last week I hit a year at Exponential Data, after 12 and a half years at big pharma. And so, I came over to Exponential Data to lead the knowledge graph and FAIR data practices, and also to unite with our expertise around artificial intelligence. One of the things that I started to get really excited about with the knowledge graph conference over the last few years was the convergence of these two communities, and really how AI knowledge graphs and especially FAIR data, as a way of having curated trusted data for these applications, could be completely synergistic. And so that was really what brought me there. And when I joined, we were around 40 people. As I was leading this practice, we grew to about 240. And were recently acquired by Genpact.
Tom:
And so, now we're now part of a much bigger organization bringing our strength of artificial intelligence, generative AI, knowledge graphs and FAIR data to this larger organization. So that's been really my journey over the last year. And really wanted to bring these two technologies together. And one of the things that we've really found is how important FAIR data is to both sides of the equation. And so, this is really where trusted data, clean data, data that follows standards, data that's self-describing, all of the things that you want to do for FAIR data, are really important foundationally for what you want to do with knowledge graphs and for how you want to give this trusted data to large language models, generative AI, to get the most out of those technologies. So in a nutshell, that's been my journey over the last year.
Larry:
Yeah. And we didn't talk explicitly about it as we were preparing for this, but AI is the logical and obvious place where all this is going now. And I think everybody's concerned about delivering trustworthy, clean, FAIR data wherever you are. But do you feel like have you been uniquely well-prepared for that with both your company but... And I know your background, that's what we want to talk about today, is the origins of the FAIR data standard and you've been around it right from the get-go right?
Tom:
Right from the beginning. And the community leans a lot on earlier trends around the Semantic Web, Semantic Web technology. I think a lot of the founders are very web centric in their thinking. And there's a direct tie between with Tim Berners-Lee, Ora Lassila, Jim Hendler wanted to accomplish with the Semantic Web, how the standards evolved there and then grew up and became available within graph databases, eventually knowledge graphs, as a vehicle to prove that FAIR data worked. And so, that's a direct thread between that and wanting to have knowledge injection for generative AI and the value there. The whole thing flows really, really well.
Larry:
Yeah, interesting. And one thing as you said that the direct descendants from Tim Berners-Lee's and Ora and Jim's, I guess the paper in Scientific American, one of the things that arose like, I don't know what, five or 10 years after that was Tim Berners-Lee's notion of five star data, like the kind of 1, 2, 3, 4, 5 star rating. And then only, what, five, not five, seven years later, FAIR came along. Can you talk a little bit about how these perceptions of and the way good data and their practices are codified?
Tom:
Sure. So if we think about five star linked data and kind of what Tim was trying to accomplish there, get your data on the web, having an accessible format, follow standards, have it linked together, that's really, really close to the FAIR data principles itself. And I think a lot of the things within the FAIR data principles were learned directly from that. And I guess first I should take a step back and explain. People have probably come across the FAIR data principles, and they've heard Findable, Accessible, Interoperable, Reusable, and they think there's four of them. There's 15 of them. So this is where it gets to be a little bit more complicated. So FAIR as an acronym was just a very nice way of marketing and putting these things together, but a lot of the ways that they can become really useful is the cell principle. So I'm just going to talk about them and describe them real briefly without being too technical. People can learn more about it in the 2016 Nature Medicine paper.
Tom:
So the findable is really about URIs. And so it's really about can I identify both an instance of data or a concept, a class that follows a URI, later an IRI, and sometimes we're calling them persistent identifiers or GUPRIs, so Global, Unique, Persistent Resource Identifiers, all the same thing. So can you use that to identify a piece of data, and if so, when you resolve it, will it provide useful metadata for both humans and machines? That's really the most important piece that you need to do to get started. Let's put an identifier on our data, on our metadata, so that we can resolve it, find it, put it in an index, so that we can get something useful out of it. So that's about four of the F principles there.
Tom:
Accessible is really about interoperability and it's following common protocols. So HTTP, HTTPS, we're not reinventing protocols, we're following standards. And then authentication on top of that in some sort of a certified manner. Usually it ends up being LDAP with single sign-on or something like that. Some way of authenticating your data.
Mara Inglezakis Owens
Mara Inglezakis Owens brings a human-centered focus to her work as an enterprise architect at a major US airline.
Drawing on her background in the humanities and her pragmatic approach to business, she has developed a practice that embodies both "digital anthropology" and product thinking.
The result is a knowledge architecture that works for its users and consistently demonstrates its value to key stakeholders.
We talked about:
her role as an enterprise architect at a major US airline
how her background as a humanities scholar, and especially as a rhetoric teacher, prepared her for her current work as a trusted business advisor
some important mentoring she received early in her career
how "digital anthropology" and product thinking fit into her enterprise architecture practice
how she demonstrates the financial value of her work to executives and other stakeholders
her thoughtful approach to the digitalization process and systems design
the importance of documentation in knowledge engineering work
how to sort out and document stakeholders' self-reports versus their actual behavior
the scope of her knowledge modeling work, not just physical objects in the world, but also processes and procedures
two important lessons she's learned over her career: don't be afraid to justify financial investment in your work, and "don't be so attached to an ideal outcome that you miss the best possible"
Mara's bio
Mara Inglezakis Owens is an enterprise architect who specializes in digitalization and knowledge management. She has deep experience in end-to-end supply chain as well as in planning, product, and program management.
Mara’s background is in epistemology (history and philosophy of science, information science, and literature), which gives a unique, humanistic flavor to her practice. When she is not working, Mara enjoys aviation, creative writing, gardening, and raising her children. She lives in Minneapolis.
Connect with Mara online
LinkedIn
email: mara dot inglezakis dot owens at gmail dot com
Video
Here’s the video version of our conversation:
https://youtu.be/d8JUkq8bMIc
Podcast intro transcript
This is the Knowledge Graph Insights podcast, episode number 34. When think about architecting knowledge systems for a giant business like a global airline, you might picture huge databases and complex spaghetti diagrams of enterprise architectures. These do in fact exist, but the thing that actually makes these systems work is an understanding of the needs of the people who use, manage, and finance them. That's the important, human-focused work that Mara Inglezakis Owens does as an enterprise architect at a major US airline.
Interview transcript
Larry:
Hi, everyone. Welcome to episode 34 of the Knowledge Graph Insights Podcast. I am really delighted today to welcome to the show, Mara, I'm going to get this right, Inglezakis Owens. She's an enterprise architect at a major US airline. So, welcome, Mara. Tell the folks a little bit more about what you're up to these days.
Mara:
Hi, everybody. My name's Mara. And these days I am achieving my childhood dream of working in aviation, not as a pilot, but that'll happen, but as an enterprise architect. I've been doing EA, also data and information architecture, across the whole scope of supply chain for about 10 years, everything from commodity sourcing to SaaS, software as a service, to now logistics. And a lot of my days, I spend interviewing subject matter experts, convincing business leaders they should do stuff, and on my best days, I get to crawl around on my hands and knees in an airplane hangar.
Larry:
Oh, fun. That is ... Yeah. I didn't know ... I knew that there's that great picture of you sitting in the jet engine, but I didn't realize this was the fulfillment of a childhood dream. That's awesome. But everything you've just said ties in so well to the tagline on your LinkedIn profile. You're like, "I'm a people-loving architect, and data leader." And one of the things I love about that, we talked a fair amount at the knowledge graph conference about your background in the humanities-
Mara:
We did.
Larry:
... and your transition into your current role. I would love to hear ... And what you just said, like, the end of what you were just saying about so much of your job is about interacting with people, and convincing business leaders to fund you, and stuff.
Can you talk a little bit about that? Like, what drew you into the humanities in the first place, your transition out of it, and here we are today.
Mara:
100%. Before I talk about being in the humanities, I love to read, I was an epistemologist, and a 19th Century scholar. But before that, when I was a little girl, I was writing my own websites in HTML, XML, and some of the technologies that eventually got to be used in the semantic web, which is how I entered the knowledge graph space way later as an adult. So, that got put on hold.
Mara:
I love to read. So, I became a humanities scholar, and I was for about five years, the lowest of the low adjuncts at an R1. My teaching experience, not my scholarship, although, I did a lot of thinking about how people interact with written media, and how they enter internal argumentation with those media, and come to know the world differently. That's what most of my work was about. It was interdisciplinary with literature, history, philosophy of science, which is why I say epistemology.
Mara:
But my best teacher for coming to where I am today was being a teacher. So, lowest of the low, first year, although, I spent most of the time teaching applied rhetoric, I was teaching freshman comp. So, this is a super diverse group of students who are showing up for a required class. To be successful, I needed to do two things. One, I had to listen carefully to what these students cared about to actually get them to get something out of the between $5000 and $8000 they were paying for this course. And then I had to generalize what I wanted them to learn about enough to make it accessible to them. Okay? So, my goal throughout my teaching career, similar to my goal now, is to inculcate effective communication through fit for purpose argumentation.
Mara:
So, while a lot of my colleagues were being like, "Here's an essay. Write something about it. Make it sound smart," what I did, because I needed my students to hook in, to be engaged, because the vast ... I maybe, I don't know, taught, like, four English majors over my career ... No HSPS [humanities, social, and political science] people. I told my students, "Okay, guys. Get into groups. So, you're set up to do some argumentation amongst yourselves, pick a little part of this essay," this was the first year, "And something you react strongly to. What about the sentences are doing this? Grammar, syntax, semantics. What's the whole universe of your group reactions? How are they related, or not related?"
Mara:
This evolved into a directed research curriculum in my applied rhetoric courses. So, I said, "Okay. Okay, guys. Go find something out in the world that needs to change. We need a pedestrian bridge over the street. We need better accessibility for people with disabilities in our gym. We need better gym hours. Figure out how it's working, frame up a case for someone who can make a change, do your argumentation, go present it." Some of my students actually argued well enough that they got a stoplight installed on a really busy street corner. So, it worked.
Mara:
So, fast-forward, lots of life drama that brought me out of the humanities into what is a much better place for me, in corporate. I'm in a trusted advisor role, not so dissimilar to being a teacher. As a trusted advisor, I have to be attuned to what the business says that they want. So, if they're saying ... And then what they demonstrate that they want through their behaviors, and through their artifacts, often times, their processes and information system. And then I have to think about why and how those things align, or don't align.
Mara:
Because I'm full-time employed, and this is in this role, and all of my corporate roles, but I'm, effectively, providing a boutique service. It's not enough for me to come up with something that sounds smart, or cool. I have to come up with a solution that accommodates process data, technology, and, most importantly, people, and that actually fulfills a business need. And I used to think about the connection from my academic career to my corporate career as like, "Oh, I became a good EA, because I taught my students to do this," but with about a decade of reflection, I'm realizing that teaching was really mutual.
Mara:
Like, I asked my students to show me what they were thinking. I evaluated what they were doing. I was very critical, but I was generous. And I was with them as their efforts bore fruit, or didn't. But how I demonstrated, elicited, and critiqued them evolved with constant, and often very, very vulnerable feedback. Like, I do with my clients now, I constantly asked, "How am I doing? Am I giving you what you need? Do you need something else?'
Mara:
For a student, it's really hard to say that to someone who's got the power of a grade over you. It's not as perhaps scary when we're all adults in corporate, but I still think many adults ... I was always the stupidest person in the room as a scholar. So, I don't have this problem, but a lot of us are worried about appearing, "I'm not smart enough. I am not creative enough." So, I still have to flex that good, compassionate, people-loving, listening muscle all the time in corporate just like my wonderful undergrads at Indiana University taught me how to do.
Larry:
That's so awesome. I've thought a lot about rhetoric. In fact, I don't know if we talked about this in New York, but my first career was in college textbook publishing,
Frank van Harmelen
Much of the conversation around AI architectures lately is about neuro-symbolic systems that combine neural-network learning tech like LLMs and symbolic AI like knowledge graphs.
Frank van Harmelen's research has followed this path, but he puts all of his AI research in the larger context of how these technical systems can best support people.
While some in the AI world seek to replace humans with machines, Frank focuses on AI systems that collaborate effectively with people.
We talked about:
his role as a professor of AI at the Vrije Universiteit in Amsterdam
how rapid change in the AI world has affected the 10-year, €20-million Hybrid Intelligence Centre research he oversees
the focus of his research on the hybrid combination of human and machine intelligence
how the introduction of conversational interfaces has advance AI-human collaboration
a few of the benefits of hybrid human-AI collaboration
the importance of a shared worldview in any collaborative effort
the role of the psychological concept of "theory of mind" in hybrid human-AI systems
the emergence of neuro-symbolic solutions
how he helps his students see the differences between systems 1 and 2 thinking and its relevance in AI systems
his role in establishing the foundations of the semantic web
the challenges of running a program that spans seven universities and employs dozens of faculty and PhD students
some examples of use cases for hybrid AI-human systems
his take on agentic AI, and the importance of humans in agent systems
some classic research on multi-agent computer systems
the four research challenges - collaboration, adaptation, responsibility, and explainability - they are tackling in their hybrid intelligence research
his take on the different approaches to AI in Europe, the US, and China
the matrix structure he uses to allocate people and resources to three key research areas: problems, solutions, and evaluation
his belief that "AI is there to collaborate with people and not to replace us"
Frank's bio
Since 2000 Frank van Harmelen has played a leading role in the development of the Semantic Web. He is a co-designer of the Web Ontology Language OWL, which has become a worldwide standard. He co-authored the first academic textbook of the field, and was one of the architects of Sesame, an RDF storage and retrieval engine, which is in wide academic and industrial use. This work received the 10-year impact award at the International Semantic Web Conference. Linked Open Data and Knowledge Graphs are important spin-offs from this work.
Since 2020, Frank is is scientific director of the Hybrid Intelligence Centre, where 50 PhD students and as many faculty members from 7 Dutch universities investigate AI systems that collaborate with people instead of replacing them.
The large scale of modern knowledge graphs that contain hundreds of millions of entities and relationships (made possible partly by the work of Van Harmelen and his team) opened the door to combine these symbolic knowledge representations with machine learning. Since 2018, Frank has pivoted his research group from purely symbolic Knowledge Representation to Neuro-Symbolic forms of AI.
Connect with Frank online
Hybrid Intelligence Centre
Video
Here’s the video version of our conversation:
https://youtu.be/ox20_l67R7I
Podcast intro transcript
This is the Knowledge Graph Insights podcast, episode number 33. As the AI landscape has evolved over the past few years, hybrid architectures that combine LLMs, knowledge graphs, and other AI technology have become the norm. Frank van Harmelen argues that the ultimate hybrid system must also include humans. He's running a 10-year, €20 million research program in the Netherlands to explore exactly this. His Hybrid Intelligence Centre investigates AI systems that collaborate with people instead of replacing them.
Interview transcript
Larry:
Hi, everyone. Welcome to episode number 33 of the Knowledge Graph Insights podcast. I am really delighted today to welcome to the show Frank van Harmelen. Frank is a professor of AI at the Vrije Universiteit in Amsterdam, that's the Free University in Amsterdam. He's also the PI of this big program called the Hybrid Intelligence Center, which spans seven Dutch universities, multimillion euro grant over 10 years. Welcome, Frank. Tell the folks a little bit more about what you're up to these days?
Frank:
All right. This Hybrid Intelligence Center occupies me most of the time, and that's been a very exciting ride over the past five years. We're just at the midpoint and we have five more years to go.
Larry:
Nice. How is it going? Are you satisfied? Are the expectations of the grantors being met and are you happy with the progress you're making?
Frank:
Yes. It's obvious to say that the world of AI is super dynamic now. All kinds of things have happened in the past few years in AI that nobody had predicted when we started, the rise of large language models of conversational AI. That has also really affected the notion of hybrid intelligence. It's been an even more exciting ride than we had expected.
Larry:
Yeah. That's right. Yeah. I think excitement is the word of the day. Hey, one thing I have to observe, earlier today before we recorded this, I was doing a presentation with some information architects, and the subject I was talking about hybrid AI architectures and neuro-symbolic loops and all this stuff. One of the people in the presentation asked, "Hey, what about human AI? Shouldn't that be the architecture?" Then I said, "You're going to love my next podcast guest," because that's the whole point of this hybrid intelligence idea, right?
Frank:
Yeah. The core idea of hybrid intelligence, hybrid standing for hybrid combination of human and machine intelligence. Think of hybrid teams, where a hybrid team is made up a bunch of people and a bunch of AIs who collaborate to get a task done. That, if you want, the tagline of the Hybrid Intelligence Center is that we're working on AI that collaborates with people instead of replacing them. If you work on AI systems that collaborate with people, then you certainly need to solve all kinds of different problems and answer all kinds of different questions than where you are thinking about AI in the replacement mode.
Larry:
Yeah. That seems to be, like in a lot of circles, there's this assumption that AI is just here to replace people, but you've been... Long before that was a meme and people talking about it, you were working on this hybrid concept. Has that heightened the urgency around your work, the current state of AI expectations?
Frank:
It has heightened the urgency, and it has also opened all kinds of doors. One of the big hurdles in AI-human collaboration, say five years ago, was really the conversational interface. It was hard to talk to AI systems, and they certainly wouldn't talk back to you in a coherent way. Well, we all know that's now a solved problem. But what happens in the middle is the real challenge. We don't think that the large language models are going to solve all of the collaboration between humans and AI systems. We want our AI systems to do things that the language models are not very good at, but we're using that technique in a kind of sandwich model. Now, the language model does the conversation on the front end, it does the conversation on the back end, and we're working on the AI agents, the smart that's in the middle, to create these hybrid teams.
Larry:
As you say that I'm thinking about that's just one aspect of the hybridization of this. That that's one way that humans... When you think about hybrid architectures, where LLMs can help build knowledge graphs and they can also fill in knowledge gaps in LLM architectures. What other obvious complimentary things are there between... What do humans need help with and what do machines need help with?
Frank:
Right. There are some obvious things like the perfect memory that machines have and the imperfect memory that we have. Okay? That's a nice example of where members in the team can really compensate for each other's strengths and weaknesses. Humans suffer from a whole host of these cognitive biases. For example, we suffer from the recency effect. We believe information more if we've heard it recently rather than when we've heard it in the past. We believe information more when we've heard it more frequently rather than... There's no reason to believe something more if you hear it more often.
Frank:
That doesn't make it more true, but it's how our brain works. Not always such a good idea. Computers can help us to compensate for all of these cognitive limitations. Conversely, we are very much aware of the context in which we operate. We are aware why we are doing something. We are aware of the implicit norms and values that govern the task that we're doing, that we're expected to obey in a particular group to perform a particular task. Computers don't have any sense of why they are doing something, the context in which they're doing it, the social and ethical norms under which they should operate. That's something where the human component can compensate for the machine limitations. These are just a few examples of that complementarity.
Larry:
Yeah. That's one of the things I think about a lot is that what we call in my world stakeholder alignment or stakeholder discovery or working with subject matter experts to make explicit their tacit knowledge in their head and things like that. It seems like that's probably always or mostly going to be a human capability. Is that... You probably have research that backs this up, right?
Frank:
Well, and if you want to collaborate with a computer, then you better make sure that there is some alignment between you and the computer.
Denny Vrandečić
As the founder of Wikidata, Denny Vrandečić has thought a lot about how to better connect the world's knowledge.
His current project is Abstract Wikipedia, an initiative that aims to let anyone anywhere on the planet contribute to, and benefit from, the world's collective knowledge, in their native language.
It's an ambitious goal, but - inspired by the success of other contributor-driven Wikimedia Foundation projects - Denny is confident that community can make it happen
We talked about:
his work as Head of Special Projects at the Wikimedia Foundation and his current projects: Wikifunctions and Abstract Wikipedia
the origin story of his first project at Wikimedia - Wikidata
a precursor project that informed Wikidata - Semantic MediaWiki
the resounding success of the Wikidata project, the most edited wiki in the world, with half a million contributors
how the need for more expressivity than Wikidata offers led to the idea for Abstract Wikipedia
an overview of the Abstract Wikipedia project
the abstract language-independent notation that underlies Abstract Wikipedia
how Abstract Wikipedia will permit almost instant updating of Wikipedia pages with the facts it provides
the capability of Abstract Wikipedia to permit both editing and use of knowledge in an author's native language
their exploration of using LLMs to use natural language to create structured representations of knowledge
how the design of Abstract Wikipedia encourages and facilitates contributions to the project
the Wikifunctions project, a necessary precondition to Abstract Wikipedia
the role of Wikidata as the Rosetta Stone of the web
some background on the Wikifunctions project
the community outreach work that Wikimedia Foundation does and the role of the community in the development of Abstract Wikipedia and Wikifunctions
the technical foundations for his
how to contribute to Wikimedia Foundation projects
his goal to remove language barriers to allow all people to work together in a shared knowledge space
a reminder that Tim Berners-Lee's original web browser included an editing function
Denny's bio
Denny Vrandečić is Head of Special Projects at the Wikimedia Foundation, leading the development of Wikifunctions and Abstract Wikipedia. He is the founder of Wikidata, co-creator of Semantic MediaWiki, and former elected member of the Wikimedia Foundation Board of Trustees. He worked for Google on the Google Knowledge Graph. He has a PhD in Semantic Web and Knowledge Representation from the Karlsruhe Institute of Technology.
Connect with Denny online
user Denny at Wikimedia
Wikidata profile
Mastodon
LinkedIn
email: denny at wikimedia dot org
Resources mentioned in this interview
Wikimedia Foundation
Wikidata
Semantic MediaWiki
Wikidata: The Making Of
Wikifunctions
Abstract Wikipedia
Meta-Wiki
Video
Here’s the video version of our conversation:
https://youtu.be/iB6luu0w_Jk
Podcast intro transcript
This is the Knowledge Graph Insights podcast, episode number 32. The original plan for the World Wide Web was that it would be a two-way street, with opportunities to both discover and share knowledge. That promise was lost early on - and then restored a few years later when Wikipedia added an "edit" button to the internet. Denny Vrandečić is working to make that edit function even more powerful with Abstract Wikipedia, an innovative platform that lets web citizens both create and consume the world's knowledge, in their own language.
Interview transcript
Larry:
Hi, everyone. Welcome to episode number 32 of the Knowledge Graph Insights podcast. I am really delighted today to welcome to the show Denny Vrandecic. Denny is best known as the founder of Wikidata, which we'll talk about more in just a minute. He's currently the Head of Special Projects at the Wikimedia Foundation. He's also a visiting professor at King's College London. So welcome, Denny. Tell the folks a little bit more about what you're up to these days.
Denny:
Thank you so much for having me, Larry. It's really a pleasure and honor. I enjoy listening to your podcast a lot, and I'm very happy to be here too. So these days I'm with the Wikimedia Foundation and, as I said, called Head of Special Projects. There are working on two new projects, one called Wikifunctions and Abstract Wikipedia, which are really very much tied together, and we'll get to those both in a moment, I think.
Larry:
Yeah, I'm really excited about both projects. I can't wait to get to them, but let's talk a little bit about Wikidata first because you started that 2012, is that correct?
Denny:
That's right, yes.
Larry:
What was the impetus for that? What motivated you to start that project?
Denny:
Well, this goes actually back to 2005. Markus Krötzsch and I were PhD students in Karlsruhe and Wikimania was coming to Frankfurt, which is really close to Karlsruhe. It was the very first Wikimania at all. We were both Wikipedians and we wanted to go there and we thought, "What could we do?" And so we connected our research topic, which was the Semantic Web, with Wikipedia and made a proposal there. We didn't really think it would go anywhere. We were just like, "This would be really cool if this happened."
Denny:
But over the next few years, there was so much interest in that people actually started implementing our ideas. We picked up on that. Semantic MediaWiki came out of it. And eventually when I was finishing my PhD, I was asked by Mark Reeves, who was working for Paul Allen's Vulcan back then, he was asking if I would like to make this happen for real. And so we approached the Wikimedia Foundation, we approached Wikimedia movement, and there was great excitement about it. We got the funding aligned and then we started working on Wikidata. This was really a dream come true basically for us who've been working on this idea of bringing structured data and Wikipedia together for more than seven years at that point.
Larry:
That's so interesting because my interest, I mean obviously it goes back aways, but my history of this kind of picks up with Wikidata, so that prehistory of it, connecting Wikipedia to the Semantic Web, which is obviously you're going to end up with something like Wikidata. And you were backed by the Vulcan Foundation or by Paul Allen's foundation?
Denny:
Yes.
Larry:
I did not know that. I lived in Seattle for a long time, so I walked by their building a lot. Well, that's really fascinating. So from 2005 to 2012, was it like simmering or were you doing things like precursors to the launch of Wikidata?
Denny:
We were doing precursive work. We were developing an extension called Semantic MediaWiki, which it is still quite widely used. There are two conferences per year about Semantic MediaWiki users. NASA is using it, for example, on the ISS. Microsoft and many others were also using or are still using it, which is actually integrating structured data into a MediaWiki installation and allowing everyone to build small knowledge graphs to query it and so on.
Denny:
For Wikidata, we took a lot of those lessons. We knew that we needed a little bit different data models. We started actually a different software project where we didn't build it on Semantic MediaWiki but rather something even more structured. Semantic MediaWiki is really good if you want to interleave the text together with the structured data, with the annotations. Whereas, Wikidata really builds a pure knowledge graph, items connecting it and giving it values and so on. But originally we were thinking, "Oh, we'll just switch on Semantic MediaWiki on the Wikipedias. I'm very glad we didn't do that.
Denny:
Actually, just recently, the 10-year anniversary of Wikidata was coming up recently. It's also already three years ago. Markus and Lydia Pintscher, who's the product lead for Wikidata at Wikimedia Deutschland, and I wrote a paper about the history of Wikidata. We were actually going into detail about these topics and how Wikidata came around.
Larry:
Oh, I'll have to link to that paper in the show notes. I'd love to read it. Well, then that's interesting. And then so Wikidata, that was sort of the original... not original, but it was one of the first realizations of the promise of the Semantic Web, and it continues to be in the sense of the unique identifiers and entity resolution and things like that. I assume you consider it a success. It seems like it's such an important part of the knowledge part of the internet.
Denny:
If you ask me, yes, definitely, Wikidata is a resounding success, obviously. It's certainly a much bigger success than we expected. More than half a million people have contributed to Wikidata. If you had asked me in 2010/2011 what the number of people will be who will contribute to such a project, I would be off by more than 10X. I would never have assumed that half a million people would actually contribute to such a project. So I'm really happy. Wikidata is now the most edited wiki in the world by far, even beating English Wikipedia. It is also just very large, very comprehensive. I'm more than excited about how it has developed, and I'm very happy to see how really I was continuing to work after I left Wikidata.
Larry:
Nice. For all of its success though, you see more that could be done in this area, right? Is that where your current projects come from?
Denny:
Yes, absolutely. So Wikidata is a classical knowledge graph. Actually, we went beyond the classical data model already in Wikidata, right? So we are not just like triples, it's not just subject predicate object, we also introduced the ability to have qualifiers on each of those statements. We introduced the ability to have references for every statement and so on. So there was a number of things that we added.
Denny:
Charles Ivie
Since the semantic web was introduced almost 25 years ago, many have dismissed it as a failure.
Charles Ivie shows that the RDF standard and the knowledge-representation technology built on it have actually been quite successful.
More than half of the world's web pages now share semantic annotations and the widespread adoption of knowledge graphs in enterprises and media companies is only growing as enterprise AI architectures mature.
We talked about:
his long work history in the knowledge graph world
his observation that the semantic web is "the most catastrophically successful thing which people have called a failure"
some of the measures of the success of the semantic web: ubiquitous RDF annotations in web pages, numerous knowledge graph deployments in big enterprises and media companies, etc.
the long history of knowledge representation
the role of RDF as a Rosetta Stone between human knowledge and computing capabilities
how the abstraction that RDF permits helps connect different views of knowledge within a domain
the need to scope any ontology in a specific domain
the role of upper ontologies
his transition from computer science and software engineering to semantic web technologies
the fundamental role of knowledge representation tech - to help humans communicate information, to innovate, and to solve problems
how semantic modeling's focus on humans working things out leads to better solutions than tech-driven approaches
his desire to start a conversation around the fundamental upper principles of ontology design and semantic modeling, and his hypothesis that it might look something like a network of taxonomies
Charles' bio
Charles Ivie is a Senior Graph Architect with the Amazon Neptune team at Amazon Web Services (AWS). With over 15 years of experience in the knowledge graph community, he has been instrumental in designing, leading, and implementing graph solutions across various industries.
Connect with Charles online
LinkedIn
Video
Here’s the video version of our conversation:
https://youtu.be/1ANaFs-4hE4
Podcast intro transcript
This is the Knowledge Graph Insights podcast, episode number 31. Since the concept of the semantic web was introduced almost 25 years ago, many have dismissed it as a failure. Charles Ivie points out that it's actually been a rousing success. From the ubiquitous presence of RDF annotations in web pages to the mass adoption of knowledge graphs in enterprises and media companies, the semantic web has been here all along and only continues to grow as more companies discover the benefits of knowledge-representation technology.
Interview transcript
Larry:
Hi everyone. Welcome to episode number 31 of the Knowledge Graph Insights Podcast. I am really happy today to welcome to the show Charles Ivie. Charles is currently a senior graph architect at Amazon's Neptune department. He's been in the graph community for years, worked at the BBC, ran his own consultancies, worked at places like The Telegraph and The Financial Times and places you've heard of. So welcome Charles. Tell the folks a little bit more about what you're up to these days.
Charles:
Sure. Thanks. Thanks, Larry. Very grateful to be invited on, so thank you for that. And what have I been up to? Yeah, I've been about in the graph industry for about 14 years or something like that now. And these days I am working with the Amazon Neptune team doing everything I can to help people become more successful with their graph implementations and with their projects. And I like to talk at conferences and join things like this and write as much as I can. And occasionally they let me loose on some code too. So that's kind of what I'm up to these days.
Larry:
Nice. Because you have a background as a software engineer and we will talk more about that later because I think that's really relevant to a lot of what we'll talk about.
Andrea Gioia
In recent years, data products have emerged as a solution to the enterprise problem of siloed data and knowledge.
Andrea Gioia helps his clients build composable, reusable data products so they can capitalize on the value in their data assets.
Built around collaboratively developed ontologies, these data products evolve into something that might also be called a knowledge product.
We talked about:
his work as CTO at Quantyca, a data and metadata management consultancy
his description of data products and their lifecycle
how the lack of reusability in most data products inspired his current approach to modular, composable data products - and brought him into the world of ontology
how focusing on specific data assets facilitates the creation of reusable data products
his take on the role of data as a valuable enterprise asset
how he accounts for technical metadata and conceptual metadata in his modeling work
his preference for a federated model in the development of enterprise ontologies
the evolution of his data architecture thinking from a central-governance model to a federated model
the importance of including the right variety business stakeholders in the design of the ontology for a knowledge product
his observation that semantic model is mostly about people, and working with them to come to agreements about how they each see their domain
Andrea's bio
Andrea Gioia is a Partner and CTO at Quantyca, a consulting company specializing in data management. He is also a co-founder of blindata.io, a SaaS platform focused on data governance and compliance. With over two decades of experience in the field, Andrea has led cross-functional teams in the successful execution of complex data projects across diverse market sectors, ranging from banking and utilities to retail and industry. In his current role as CTO at Quantyca, Andrea primarily focuses on advisory, helping clients define and execute their data strategy with a strong emphasis on organizational and change management issues.
Actively involved in the data community, Andrea is a regular speaker, writer, and author of 'Managing Data as a Product'. Currently, he is the main organizer of the Data Engineering Italian Meetup and leads the Open Data Mesh Initiative. Within this initiative, Andrea has published the data product descriptor open specification and is guiding the development of the open-source ODM Platform to support the automation of the data product lifecycle.
Andrea is an active member of DAMA and, since 2023, has been part of the scientific committee of the DAMA Italian Chapter.
Connect with Andrea online
LinkedIn (#TheDataJoy)
Github
Video
Here’s the video version of our conversation:
https://www.youtube.com/watch?v=g34K_kJGZMc
Podcast intro transcript
This is the Knowledge Graph Insights podcast, episode number 30. In the world of enterprise architectures, data products are emerging as a solution to the problem of siloed data and knowledge. As a data and metadata management consultant, Andrea Gioia helps his clients realize the value in their data assets by assembling them into composable, reusable data products. Built around collaboratively developed ontologies, these data products evolve into something that might also be called a knowledge product.
Interview transcript
Larry:
Hi, everyone. Welcome to episode number 30 of the Knowledge Graph Insights podcast. I'm really happy today to welcome to the show Andrea Gioia. Andrea's, he does a lot of stuff. He's a busy guy. He's a partner and the chief technical officer at Quantyca, a consulting firm that works on data and metadata management. He's the founder of Blindata, a SaaS product that goes with his consultancy. I let him talk a little bit more about that. He's the author of the book Managing Data as a Product, and he's also, he comes out of the data heritage but he's now one of these knowledge people like us.
Dave McComb
During the course of his 25-year consulting career, Dave McComb has discovered both a foundational problem in enterprise architectures and the solution to it.
The problem lies in application-focused software engineering that results in an inefficient explosion of redundant solutions that draw on overlapping data sources.
The solution that Dave has introduced is a data-centric architecture approach that treats data like the precious business asset that it is.
We talked about:
his work as the CEO of Semantic Arts, a prominent semantic technology and knowledge graph consultancy based in the US
the application-centric quagmire that most modern enterprises find themselves trapped in
data centricity, the antidote to application centricity
his early work in semantic modeling
how the discovery of the "core model" in an enterprise facilitates modeling and building data-centric enterprise systems
the importance of "baby step" approaches and working with actual customer data in enterprise data projects
how building to "enduring business themes" rather than to the needs of individual applications creates a more solid foundation for enterprise architectures
his current interest in developing a semantic model for the accounting field, drawing on his history in the field and on Semantic Arts' gist upper ontology
the importance of the concept of a "commitment" in an accounting model
how his approach to financial modeling permits near-real-time reporting
his Data-Centric Architecture Forum, a practitioner-focused event held each June in Ft. Collins, Colorado
Dave's bio
Dave McComb is the CEO of Semantic Arts. In 2000 he co-founded Semantic Arts with the aim of bringing semantic technology to Enterprises. From 2000- 2010 Semantic Arts focused on ways to improve enterprise architecture through ontology modeling and design. Around 2010 Semantic Arts began helping clients more directly with implementation, which led to the use of Knowledge Graphs in Enterprises. Semantic Arts has conducted over 100 successful projects with a number of well know firms including Morgan Stanley, Electronic Arts, Amgen, Standard & Poors, Schneider-Electric, MD Anderson, the International Monetary Fund, Procter & Gamble, Goldman Sachs as well as a number of government agencies. Dave is the author of Semantics in Business Systems (2003), which made the case for using Semantics to improve the design of information systems, Software Wasteland (2018) which points out how application-centric thinking has led to the deplorable state of enterprise systems and The Data-Centric Revolution (2019) which outlines a alternative to the application-centric quagmire.
Prior to founding Semantic Arts he was VP of Engineering for Velocity Healthcare, a dot com startup that pioneered the model driven approach to software development. He was granted three patents on the architecture developed at Velocity. Prior to that he was with a small consulting firm: First Principles Consulting. Prior to that he was part of the problem.
Connect with Dave online
LinkedIn
email: mccomb at semanticarts dot com
Semantic Arts
Resources mentioned in this interview
Dave's books:
The Data-Centric Revolution: Restoring Sanity to Enterprise Information Systems
Software Wasteland: How the Application-Centric Quagmire is Hobbling Our Enterprises
Semantics in Business Systems: The Savvy Manager's Guide
gist ontology
Data-Centric Architecture Forum
Video
Here’s the video version of our conversation:
https://youtu.be/X_hZG7cFOCE
Podcast intro transcript
This is the Knowledge Graph Insights podcast, episode number 29. Every modern enterprise wrestles with its data, trying to get the most out of it. The smartest businesses have figured out that it isn't just "the new oil" - data is the very bedrock of their enterprise architecture. For the past 25 years, Dave McComb has helped companies understand the...
Ole Olesen-Bagneux
In every enterprise, says Ole Olesen-Bagneux, the information you need to understand your organization's metadata is already there. It just needs to be discovered and documented.
Ole's Meta Grid can be as simple as a shared, curated collection of documents, diagrams, and data but might also be expressed as a knowledge graph.
Ole appreciates "North Star" architectures like microservices and the Data Mesh but presents the Meta Grid as a simpler way to manage enterprise metadata.
We talked about:
his work as Chief Evangelist at Actian
his forthcoming book, "Fundamentals of Metadata Management"
how he defines his Meta Grid: an integration architecture that connects metadata across metadata repositories
his definition of metadata and its key characteristic, that it's always in two places at once
how the Meta Grid compares with microservices architectures and organizing concepts like Data Mesh
the nature of the Meta Grid as a small, simple, and slow architecture which is not technically difficult to achieve
his assertion that you can't build a Meta Grid because it already exists in every organization
the elements of the Meta Grid: documents, diagrams or pictures, and examples of data
how knowledge graphs fit into the Meta Grid
his appreciation for "North Star" architectures like Data Mesh but also how he sees the Meta Grid as a more pragmatic approach to enterprise metadata management
the evolution of his new book from a knowledge graph book to
his elaboration on the "slow" nature of the Meta Grid, in particular how its metadata focus contrasts with faster real-time systems like ERPs
the shape of the team topology that makes Meta Grid work
Ole's bio
Ole Olesen-Bagneux is a globally recognized thought leader in metadata management and enterprise data architecture. As VP, Chief Evangelist at Actian, he drives industry awareness and adoption of modern approaches to data intelligence, drawing on his extensive expertise in data management, metadata, data catalogs, and decentralized architectures. An accomplished author, Ole has written The Enterprise Data Catalog (O’Reilly, 2023). He is currently working on Fundamentals of Metadata Management (O’Reilly, 2025), introducing a novel metadata architecture known as the Meta Grid. With a PhD in Library and Information Science from the University of Copenhagen, his unique perspective bridges traditional information science with modern data management.
Before joining Actian, Ole served as Chief Evangelist at Zeenea, where he played a key role in shaping and communicating the company’s technology vision. His industry experience includes leadership roles in enterprise architecture and data strategy at major pharmaceutical companies like Novo Nordisk.Ole is passionate about scalable metadata architectures, knowledge graphs, and enabling organizations to make data truly discoverable and usable.
Connect with Ole online
LinkedIn
Substack
Medium
Resources mentioned in this interview
Fundamentals of Metadata Management, Ole's forthcoming book
Data Management at Scale by Piethein Strengholt
Fundamentals of Data Engineering by Joe Reis and Matt Housley
Meta Grid as a Team Topology, Substack article
Stewart Brand's Pace Layers
Video
Here’s the video version of our conversation:
https://youtu.be/t01IZoegKRI
Podcast intro transcript
This is the Knowledge Graph Insights podcast, episode number 28. Every modern enterprise wrestles with the scale, the complexity, and the urgency of understanding their data and metadata. So, by necessity, comprehensive architectural approaches like microservices and the data mesh are complex, big, and fast. Ole Olesen-Bagneux proposes a simple, small, and slow way for enterprises to cultivate a shared understanding of their enterprise knowledge, a decentralized approach to metadata strategy that he calls the Meta Grid.
Interview transcript
Larry:
Hi,
Andrea Volpini
Your organization's brand is what people say about you after you've left the room. It's the memories you create that determine how people think about you later.
Andrea Volpini says that the same dynamic applies in marketing to AI systems. Modern brand managers, he argues, need to understand how both human and machine memory work and then use that knowledge to create digital memories that align with how AI systems understand the world.
We talked about:
his work as CEO at WordLift, a company that builds knowledge graphs to help companies automate SEO and other marketing activities
a recent experiment he did during a talk at an AI conference that illustrates the ability of applications like Grok and ChatGPT to build and share information in real time
the role of memory in marketing to current AI architectures
his discovery of how the agentic approach he was taking to automating marketing tasks was actually creating valuable context for AI systems
the mechanisms of memory in AI systems and an analogy to human short- and long-term memory
the similarities he sees in how the human neocortex forms memories and how the knowledge about memory is represented in AI systems
his practice of representing entities as both triples and vectors in his knowledge graph
how he leverages his understanding of the differences in AI models in his work
the different types of memory frameworks to account for in both the consumption and creation of AI systems: semantic, episodic, and procedural
his new way of thinking about marketing: as a memory-creation process
the shift in focus that he thinks marketers need to make, "creating good memories for AI in order to protect their brand values"
Andrea's bio
Andrea Volpini is the CEO of WordLift and co-founder of Insideout10. With 25 years of experience in semantic web technologies, SEO, and artificial intelligence, he specializes in marketing strategies. He is a regular speaker at international conferences, including SXSW, TNW Conference, BrightonSEO, The Knowledge Graph Conference, G50, Connected Data and AI Festival.
Andrea has contributed to industry publications, including the Web Almanac by HTTP Archive. In 2013, he co-founded RedLink GmbH, a commercial spin-off focused on semantic content enrichment, natural language processing, and information extraction.
Connect with Andrea online
LinkedIn
X
Bluesky
WordLift
Video
Here’s the video version of our conversation:
https://youtu.be/do-Y7w47CZc
Podcast intro transcript
This is the Knowledge Graph Insights podcast, episode number 27. Some experts describe the marketing concept of branding as, What people say about you after you’ve left the room. It's the memories they form of your company that define your brand. Andrea Volpini sees this same dynamic unfolding as companies turn their attention to AI. To build a memorable brand online, modern marketers need to understand how both human and machine memory work and then focus on creating memories that align with how AI systems understand the world.
Interview transcript
Larry:
Hi, everyone. Welcome to episode number 27 of the Knowledge Graph Insights podcast. I am really delighted today to welcome to the show Andrea Volpini. Andrea is the CEO and the founder at WordLift, a company based in Rome. Tell the folks a little bit more about WordLift and what you're up to these days, Andrea.
Andrea:
Yep. So we build knowledge graphs and to help brands automate their SEO and marketing efforts using large language model and AI in general.
Larry:
Nice. Yeah, and you're pretty good at this. You've been doing this a while and you had a recent success story, I think that shows, that really highlights some of your current interests in your current work. Tell me about your talk in Milan and the little demonstration you did with that.
Andrea:
Yeah, yeah, so it was last week at AI Festival,
Jacobus Geluk
The arrival of AI agents creates urgency around the need to guide and govern them.
Drawing on his 15-year history in building reliable AI solutions for banks and other enterprises, Jacobus Geluk sees a standards-based data-product marketplace as the key to creating the thriving data economy that will enable AI agents to succeed at scale.
Jacobus launched the effort to create the DPROD data-product description specification, creating the supply side of the data market. He's now forming a working group to document the demand side, a "use-case tree" specification to articulate the business needs that data products address.
We talked about:
his work as CEO at Agnos.ai, an enterprise knowledge graph and AI consultancy
the working group he founded in 2023 which resulted in the DPROD specification to describe data products
an overview of the data-product marketplace and the data economy
the need to account for the demand side of the data marketplace
the intent of his current work on to address the disconnect between tech activities and business use cases
how the capabilities of LLMs and knowledge graphs complement each other
the origins of his "use-case tree" model in a huge banking enterprise knowledge graph he built ten years ago
how use case trees improve LLM-driven multi-agent architectures
some examples of the persona-driven, tech-agnostic solutions in agent architectures that use-case trees support
the importance of constraining LLM action with a control layer that governs agent activities, accounting for security, data sourcing, and issues like data lineage and provenance
the new Use Case Tree Work Group he is forming
the paradox in the semantic technology industry now of a lack of standards in a field with its roots in W3C standards
Jacobus' bio
Jacobus Geluk is a Dutch Semantic Technology Architect and CEO of agnos.ai, a UK-based consulting firm with a global team of experts specializing in GraphAI — the combination of Enterprise Knowledge Graphs (EKG) with Generative AI (GenAI). Jacobus has over 20 years of experience in data management and semantic technologies, previously serving as a Senior Data Architect at Bloomberg and Fellow Architect at BNY Mellon, where he led the first large-scale production EKG in the financial industry.
As a founding member and current co-chair of the Enterprise Knowledge Graph Forum (EKGF), Jacobus initiated the Data Product Workgroup, which developed the Data Product Ontology (DPROD) — a proposed OMG standard for consistent data product management across platforms. Jacobus can claim to have coined the term "Enterprise Knowledge Graph (EKG)" more than 10 years ago, and his work has been instrumental in advancing semantic technologies in financial services and other information-intensive industries.
Connect with Jacobus online
LinkedIn
Agnos.ai
Resources mentioned in this podcast
DPROD specification
Enterprise Knowledge Graph Forum
Object Management Group
Use Case Tree Method for Business Capabilities
DCAT Data Catalog Vocabulary
Video
Here’s the video version of our conversation:
https://youtu.be/J0JXkvizxGo
Podcast intro transcript
This is the Knowledge Graph Insights podcast, episode number 26. In an AI landscape that will soon include huge groups of independent software agents acting on behalf of humans, we'll need solid mechanisms to guide the actions of those agents. Jacobus Geluk looks at this situation from the perspective of the data economy, specifically the data-products marketplace. He helped develop the DPROD specification that describes data products and is now focused on developing use-case trees that describe the business needs that they address.
Interview transcript
Larry:
Okay. Hi everyone. Welcome to episode number 26 of the Knowledge Graph Insights podcast. I am really happy today to welcome to the show, Jacobus Geluk. Sorry, I try to speak Dutch, do my best.
Rebecca Schneider
Skills that Rebecca Schneider learned in library science school - taxonomy, ontology, and semantic modeling - have only become more valuable with the arrival of AI technologies like LLMs and the growing interest in knowledge graphs.
Two things have stayed constant across her library and enterprise content strategy work: organizational rigor and the need to always focus on people and their needs.
We talked about:
her work as Co-Founder and Executive Director at AvenueCX, an enterprise content strategy consultancy
her background as a "recovering librarian" and her focus on taxonomies, metadata, and structured content
the importance of structured content in LLMs and other AI applications
how she balances the capabilities of AI architectures and the needs of the humans that contribute to them
the need to disambiguate the terms that describe the span of the semantic spectrum
the crucial role of organization in her work and how you don't to have formally studied library science to do it
the role of a service mentality in knowledge graph work
how she measures the efficiency and other benefits of well-organized information
how domain modeling and content modeling work together in her work
her tech-agnostic approach to consulting
the role of metadata strategy into her work
how new AI tools permit easier content tagging and better governance
the importance of "knowing your collection," not becoming a true subject matter expert but at least getting familiar with the content you are working with
the need to clean up your content and data to build successful AI applications
Rebecca's bio
Rebecca is co-founder of AvenueCX, an enterprise content strategy consultancy. Her areas of expertise include content strategy, taxonomy development, and structured content. She has guided content strategy in a variety of industries: automotive, semiconductors, telecommunications, retail, and financial services.
Connect with Rebecca online
LinkedIn
email: rschneider at avenuecx dot com
Video
Here’s the video version of our conversation:
https://youtu.be/ex8Z7aXmR0o
Podcast intro transcript
This is the Knowledge Graph Insights podcast, episode number 25. If you've ever visited the reference desk at your local library, you've seen the service mentality that librarians bring to their work. Rebecca Schneider brings that same sensibility to her content and knowledge graph consulting. Like all digital practitioners, her projects now include a lot more AI, but her work remains grounded in the fundamentals she learned studying library science: organizational rigor and a focus on people and their needs.
Interview transcript
Larry:
Hi, everyone. Welcome to episode number 25 of the Knowledge Graph Insights podcast. I am really excited today to welcome to the show Rebecca Schneider. Rebecca is the co-founder and the executive director at AvenueCX, a consultancy in the Boston area. Welcome, Rebecca. Tell the folks a little bit more about what you're up to these days.
Rebecca:
Hi, Larry. Thanks for having me on your show. Hello, everyone. My name is Rebecca Schneider. I am a recovering librarian. I was a trained librarian, worked in a library with actual books, but for most of my career, I have been focusing on enterprise content strategy. Furthermore, I typically focus on taxonomies, metadata, structured content, and all of that wonderful world that we live in.
Larry:
Yeah, and we both come out of that content background and have sort of converged on the knowledge graph background together kind of over the same time period. And it's really interesting, like those skills that you mentioned, the library science skills of taxonomy, metadata, structured, and then the application of that in structured content in the content world, how, as you've got in more and more into knowledge graph stuff, how has that background, I guess...
Ashleigh Faith
With her 15-year history in the knowledge graph industry and her popular YouTube channel, Ashleigh Faith has informed and inspired a generation of graph practitioners and enthusiasts.
She's an expert on semantic modeling, knowledge graph construction, and AI architectures and talks about those concepts in ways that resonate both with her colleagues and with newcomers to the field.
We talked about:
her popular IsA DataThing YouTube channel
the crucial role of accurately modeling actual facts in semantic practice and AI architectures
her appreciation of the role of knowledge graphs in aligning people in large organizations around concepts and the various words that describe them
the importance of staying focused on the business case for knowledge graph work, which has become both more important with the arrival of LLMs and generative AI
the emergence of more intuitive "talk to your graph" interfaces
some of her checklist items for onboarding aspiring knowledge graph engineers
how to decide whether to use a property graph or a knowledge graph, or both
her hope that more RDF graph vendors will offer a free tier so that people can more easily experiment with them
approaches to AI architecture orchestration
the enduring importance of understanding how information retrieval works
Ashleigh's bio
Ashleigh Faith has her PhD in Advanced Semantics and over 15 years of experience working on graph solutions across the STEM, government, and finance industries. Outside of her day-job, she is the Founder and host of the IsA DataThing YouTube channel and podcast where she tries to demystify the graph space.
Connect with Ashleigh online
LinkedIn
IsA DataThing YouTube channel
Video
Here’s the video version of our conversation:
https://youtu.be/eMqLydDu6oY
Podcast intro transcript
This is the Knowledge Graph Insights podcast, episode number 24. One way to understand the entity resolution capabilities of knowledge graphs is to picture on old-fashioned telephone operator moving plugs around a switchboard to make the right connections. Early in her career, that's one way that Ashleigh Faith saw the power of knowledge graphs. She has since developed sophisticated approaches to knowledge graph construction, semantic modeling, and AI architectures and shares her deeply informed insights on her popular YouTube channel.
Interview transcript
Larry:
Hi, everyone. Welcome to episode number 24 of the Knowledge Graph Insights Podcast. I am super extra delighted today to welcome to the show Ashleigh Faith. Ashleigh is the host of the awesome YouTube channel IsA DataThing, which has thousands of subscribers, thousands of monthly views. I think it's many people's entry point into the knowledge graph world. Welcome, Ashleigh. Great to have you here. Tell the folks a little bit more about what you're up to these days.
Ashleigh:
Thanks, Larry. I've known you for quite some time. I'm really excited to be here today.
What about me? I do a lot of semantic and AI stuff for my day job. But yeah, I think my main passion is also helping others get involved, understand some of the concepts a little bit better for the semantic space and now the neuro-symbolic AI. That's AI and knowledge graphs coming together. That is quite a hot topic right now, so lots and lots of untapped potential in what we can talk about. I do most of that on my channel.
Larry:
Yeah. I will refer people to your channel because we've got only a half-hour today. It's ridiculous.
Ashleigh:
Yeah.
Larry:
We just talked for an hour before we went on the air. It's ridiculous. What I'd really like to focus on today is the first stage in any of this, the first step in any of these knowledge graph implementations or any of this stuff is modeling. I think about it from a designerly perspective. I do a lot of mental model discernment, user research kind of stuff, and then conceptual modeling to agree on things.



