Department of Biomedical Informatics, Columbia University, New York, NY, USA,
Pac Symp Biocomput. 2023;28:371-382.
Preeclampsia is a leading cause of maternal and fetal morbidity and mortality. Currently, the only definitive treatment of preeclampsia is delivery of the placenta, which is central to the pathogenesis of the disease. Transcriptional profiling of human placenta from pregnancies complicated by preeclampsia has been extensively performed to identify differentially expressed genes (DEGs). The decisions to investigate DEGs experimentally are biased by many factors, causing many DEGs to remain uninvestigated. A set of DEGs which are associated with a disease experimentally, but which have no known association to the disease in the literature are known as the ignorome. Preeclampsia has an extensive body of scientific literature, a large pool of DEG data, and only one definitive treatment. Tools facilitating knowledge-based analyses, which are capable of combining disparate data from many sources in order to suggest underlying mechanisms of action, may be a valuable resource to support discovery and improve our understanding of this disease. In this work we demonstrate how a biomedical knowledge graph (KG) can be used to identify novel preeclampsia molecular mechanisms. Existing open source biomedical resources and publicly available high-throughput transcriptional profiling data were used to identify and annotate the function of currently uninvestigated preeclampsia-associated DEGs. Experimentally investigated genes associated with preeclampsia were identified from PubMed abstracts using text-mining methodologies. The relative complement of the text-mined- and meta-analysis-derived lists were identified as the uninvestigated preeclampsia-associated DEGs (n=445), i.e., the preeclampsia ignorome. Using the KG to investigate relevant DEGs revealed 53 novel clinically relevant and biologically actionable mechanistic associations.
子痫前期是孕产妇和胎儿发病率和死亡率的主要原因。目前,子痫前期的唯一明确治疗方法是分娩胎盘,这是疾病发病机制的核心。已经广泛进行了人类胎盘转录组分析,以鉴定差异表达基因(DEGs)。对 DEGs 进行实验研究的决定受到许多因素的影响,导致许多 DEGs 未被研究。一组与疾病有实验关联但在文献中与疾病无已知关联的 DEGs 称为 ignorome。子痫前期有大量的科学文献、大量的 DEG 数据和唯一的明确治疗方法。促进基于知识的分析的工具,能够将来自多个来源的不同数据结合起来,以提示潜在的作用机制,可能是支持发现和提高我们对这种疾病的理解的宝贵资源。在这项工作中,我们展示了如何使用生物医学知识图谱(KG)来识别新的子痫前期分子机制。使用现有的开源生物医学资源和公开的高通量转录组学数据,来识别和注释目前未研究的与子痫前期相关的 DEGs 的功能。使用文本挖掘方法从 PubMed 摘要中鉴定与子痫前期相关的经过实验研究的基因。文本挖掘和荟萃分析衍生列表的相对补充被确定为未研究的与子痫前期相关的 DEGs(n=445),即子痫前期 ignorome。使用 KG 研究相关的 DEGs 揭示了 53 个新的具有临床相关性和生物学可操作性的机制关联。