NeurIPS 2025: KGGen: Extracting Knowledge Graphs from Plain Text with Language Models
Description
The academic paper introduces KGGen, a novel text-to-knowledge-graph generator designed to overcome the scarcity and poor quality of automatically extracted knowledge graphs (KGs). KGGen utilizes Language Models for initial triple extraction but innovates by employing an iterative clustering and de-duplication process that resolves duplicate entities and relations to reduce sparsity in the final graph representation. To properly assess KG extraction performance, the authors release a new two-part benchmark called Measure of Information in Nodes and Edges (MINE), which evaluates both short-text information retention and knowledge retrieval capabilities in RAG systems. Results on this new benchmark demonstrate that KGGen outperforms competitors like OpenIE and Microsoft's GraphRAG in crucial metrics, including information capture and scaling efficiency across large corpora. The study concludes that KGGen successfully generates KGs with more concise, generalizable entities and relations, which is essential for maximizing utility in downstream applications like embeddings and information retrieval.
Source:
https://openreview.net/pdf?id=YyhRJXxbpi




