Biomedical Informatics and Medical Education, University of Washington, Seattle, Washington, USA.
AstraZeneca PLC, Gaithersburg, Maryland, USA.
BMC Med Inform Decis Mak. 2024 Jan 4;24(1):10. doi: 10.1186/s12911-023-02409-8.
Knowledge graphs are well-suited for modeling complex, unstructured, and multi-source data and facilitating their analysis. During the COVID-19 pandemic, adverse event data were integrated into a knowledge graph to support vaccine safety surveillance and nimbly respond to urgent health authority questions. Here, we provide details of this post-marketing safety system using public data sources. In addition to challenges with varied data representations, adverse event reporting on the COVID-19 vaccines generated an unprecedented volume of data; an order of magnitude larger than adverse events for all previous vaccines. The Patient Safety Knowledge Graph (PSKG) is a robust data store to accommodate the volume of adverse event data and harmonize primary surveillance data sources.
We designed a semantic model to represent key safety concepts. We built an extract-transform-load (ETL) data pipeline to parse and import primary public data sources; align key elements such as vaccine names; integrated the Medical Dictionary for Regulatory Activities (MedDRA); and applied quality metrics. PSKG is deployed in a Neo4J graph database, and made available via a web interface and Application Programming Interfaces (APIs).
We import and align adverse event data and vaccine exposure data from 250 countries on a weekly basis, producing a graph with 4,340,980 nodes and 30,544,475 edges as of July 1, 2022. PSKG is used for ad-hoc analyses and periodic reporting for several widely available COVID-19 vaccines. Analysis code using the knowledge graph is 80% shorter than an equivalent implementation written entirely in Python, and runs over 200 times faster.
Organizing safety data into a concise model of nodes, properties, and edge relationships has greatly simplified analysis code by removing complex parsing and transformation algorithms from individual analyses and instead managing these centrally. The adoption of the knowledge graph transformed how the team answers key scientific and medical questions. Whereas previously an analysis would involve aggregating and transforming primary datasets from scratch to answer a specific question, the team can now iterate easily and respond as quickly as requests evolve (e.g., "Produce vaccine-X safety profile for adverse event-Y by country instead of age-range").
知识图谱非常适合对复杂、非结构化和多源数据进行建模,并有助于对其进行分析。在 COVID-19 大流行期间,将不良事件数据整合到知识图谱中,以支持疫苗安全监测,并灵活应对紧急卫生当局的问题。在这里,我们提供使用公共数据源的这个上市后安全系统的详细信息。除了数据表示形式多样化的挑战外,COVID-19 疫苗的不良事件报告产生了前所未有的大量数据;比以往所有疫苗的不良事件数量多一个数量级。患者安全知识图谱(PSKG)是一个强大的数据存储库,可容纳不良事件数据的数量,并协调主要监测数据源。
我们设计了一个语义模型来表示关键安全概念。我们构建了一个提取-转换-加载(ETL)数据管道来解析和导入主要公共数据源;对齐疫苗名称等关键要素;整合监管活动医学词典(MedDRA);并应用质量指标。PSKG 部署在 Neo4J 图形数据库中,并通过 Web 界面和应用程序编程接口(API)提供。
截至 2022 年 7 月 1 日,我们每周从 250 个国家/地区导入和对齐不良事件数据和疫苗暴露数据,生成一个具有 4,340,980 个节点和 30,544,475 条边的图。PSKG 用于几种广泛可用的 COVID-19 疫苗的临时分析和定期报告。使用知识图谱的分析代码比完全用 Python 编写的等效实现短 80%,并且运行速度快 200 多倍。
将安全数据组织成节点、属性和边关系的简洁模型,通过从各个分析中删除复杂的解析和转换算法,并集中管理这些算法,极大地简化了分析代码。知识图谱的采用改变了团队回答关键科学和医学问题的方式。以前,分析需要从头开始汇总和转换主要数据集来回答特定问题,而现在团队可以轻松迭代,并根据请求的变化快速响应(例如,“按国家而不是年龄范围生成不良事件-Y 的疫苗-X 安全性概况”)。