Yeh Feng-Yu Leo, Asato Matthew, Zheng Jie, He Yongqun Oliver
University of Michigan, Ann Arbor, MI, USA.
bioRxiv. 2025 Jul 18:2025.07.15.664450. doi: 10.1101/2025.07.15.664450.
Vaccine research faces challenges in integrating diverse biomedical datasets. While the Vaccine Investigation and Online Information Network (VIOLIN) provides comprehensive vaccine data, implemented in traditional relational models limit complex analysis. Similarly, the Vaccine Ontology (VO) offers standardized semantic frameworks but lacks comprehensive empirical data. This study addresses these limitations by developing the Vaccine Knowledge Graph (VaxKG) that integrates VIOLIN's dataset with VO's standardized terminology. Using Neo4j, we transformed 12 core VIOLIN tables into a graph structure enriched with VO concepts. The resulting knowledge graph comprises 28,123 VIOLIN data nodes and 101,282 VO resource nodes, connected by 412,865 relationships. Our comparative analysis of Brucella and Influenza vaccines demonstrates VaxKG's ability to enable complex semantic queries and reveal insights unavailable from either resource alone. We further demonstrate VaxKG's utility through VaxChat, a large language model application that leverages the VaxKG as Retrieval-Augmented Generation (RAG) for intuitive vaccine information access.
疫苗研究在整合多样的生物医学数据集方面面临挑战。虽然疫苗调查与在线信息网络(VIOLIN)提供了全面的疫苗数据,但以传统关系模型实施的数据限制了复杂分析。同样,疫苗本体(VO)提供了标准化的语义框架,但缺乏全面的实证数据。本研究通过开发疫苗知识图谱(VaxKG)来解决这些限制,该图谱将VIOLIN的数据集与VO的标准化术语相结合。使用Neo4j,我们将12个核心VIOLIN表转换为富含VO概念的图结构。生成的知识图谱包含28,123个VIOLIN数据节点和101,282个VO资源节点,由412,865个关系连接。我们对布鲁氏菌疫苗和流感疫苗的比较分析表明,VaxKG能够进行复杂的语义查询,并揭示仅从单一资源无法获得的见解。我们还通过VaxChat进一步展示了VaxKG的效用,VaxChat是一个大型语言模型应用程序,它利用VaxKG作为检索增强生成(RAG)来实现直观的疫苗信息访问。