• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

从 COVID-19 文献、LitCovid 和 Pubtator 中深度去噪原始生物医学知识图谱:框架开发和验证。

Deep Denoising of Raw Biomedical Knowledge Graph From COVID-19 Literature, LitCovid, and Pubtator: Framework Development and Validation.

机构信息

Department of Computer Science and Software Engineering, Auburn University, Auburn, AL, United States.

Center for Innovation to Implementation, VA Palo Alto Health Care System, Sacramento, CA, United States.

出版信息

J Med Internet Res. 2022 Jul 6;24(7):e38584. doi: 10.2196/38584.

DOI:10.2196/38584
PMID:35658098
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9301549/
Abstract

BACKGROUND

Multiple types of biomedical associations of knowledge graphs, including COVID-19-related ones, are constructed based on co-occurring biomedical entities retrieved from recent literature. However, the applications derived from these raw graphs (eg, association predictions among genes, drugs, and diseases) have a high probability of false-positive predictions as co-occurrences in the literature do not always mean there is a true biomedical association between two entities.

OBJECTIVE

Data quality plays an important role in training deep neural network models; however, most of the current work in this area has been focused on improving a model's performance with the assumption that the preprocessed data are clean. Here, we studied how to remove noise from raw knowledge graphs with limited labeled information.

METHODS

The proposed framework used generative-based deep neural networks to generate a graph that can distinguish the unknown associations in the raw training graph. Two generative adversarial network models, NetGAN and Cross-Entropy Low-rank Logits (CELL), were adopted for the edge classification (ie, link prediction), leveraging unlabeled link information based on a real knowledge graph built from LitCovid and Pubtator.

RESULTS

The performance of link prediction, especially in the extreme case of training data versus test data at a ratio of 1:9, demonstrated that the proposed method still achieved favorable results (area under the receiver operating characteristic curve >0.8 for the synthetic data set and 0.7 for the real data set), despite the limited amount of testing data available.

CONCLUSIONS

Our preliminary findings showed the proposed framework achieved promising results for removing noise during data preprocessing of the biomedical knowledge graph, potentially improving the performance of downstream applications by providing cleaner data.

摘要

背景

多种类型的生物医学关联知识图谱,包括与 COVID-19 相关的知识图谱,都是基于从近期文献中检索到的共同出现的生物医学实体构建的。然而,这些原始图谱衍生出的应用(例如,基因、药物和疾病之间的关联预测)存在很高的假阳性预测概率,因为文献中的共同出现并不总是意味着两个实体之间存在真正的生物医学关联。

目的

数据质量在训练深度神经网络模型中起着重要作用;然而,该领域的大多数当前工作都集中在提高模型的性能上,前提是预处理的数据是干净的。在这里,我们研究了如何在有限的有标签信息的情况下从原始知识图谱中去除噪声。

方法

所提出的框架使用基于生成的深度神经网络来生成一个能够区分原始训练图中未知关联的图。采用了两个生成对抗网络模型 NetGAN 和 Cross-Entropy Low-rank Logits (CELL) 进行边分类(即链接预测),利用基于真实知识图谱的无标签链接信息,该知识图谱是由 LitCovid 和 Pubtator 构建的。

结果

链接预测的性能,特别是在训练数据与测试数据的比例为 1:9 的极端情况下,表明尽管可用的测试数据有限,但所提出的方法仍然取得了较好的结果(合成数据集的接收者操作特征曲线下面积>0.8,真实数据集的面积>0.7)。

结论

我们的初步研究结果表明,所提出的框架在生物医学知识图谱的数据预处理过程中去除噪声方面取得了有前景的结果,通过提供更干净的数据,可能会提高下游应用的性能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dc45/9301549/94e29cd75897/jmir_v24i7e38584_fig7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dc45/9301549/4b691df89ff7/jmir_v24i7e38584_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dc45/9301549/256a08b65b2b/jmir_v24i7e38584_fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dc45/9301549/de04ed946f8e/jmir_v24i7e38584_fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dc45/9301549/05fc598d5105/jmir_v24i7e38584_fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dc45/9301549/236f7bc3ceaf/jmir_v24i7e38584_fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dc45/9301549/29ef19be36bf/jmir_v24i7e38584_fig6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dc45/9301549/94e29cd75897/jmir_v24i7e38584_fig7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dc45/9301549/4b691df89ff7/jmir_v24i7e38584_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dc45/9301549/256a08b65b2b/jmir_v24i7e38584_fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dc45/9301549/de04ed946f8e/jmir_v24i7e38584_fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dc45/9301549/05fc598d5105/jmir_v24i7e38584_fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dc45/9301549/236f7bc3ceaf/jmir_v24i7e38584_fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dc45/9301549/29ef19be36bf/jmir_v24i7e38584_fig6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dc45/9301549/94e29cd75897/jmir_v24i7e38584_fig7.jpg

相似文献

1
Deep Denoising of Raw Biomedical Knowledge Graph From COVID-19 Literature, LitCovid, and Pubtator: Framework Development and Validation.从 COVID-19 文献、LitCovid 和 Pubtator 中深度去噪原始生物医学知识图谱:框架开发和验证。
J Med Internet Res. 2022 Jul 6;24(7):e38584. doi: 10.2196/38584.
2
FuseLinker: Leveraging LLM's pre-trained text embeddings and domain knowledge to enhance GNN-based link prediction on biomedical knowledge graphs.FuseLinker:利用大语言模型的预训练文本嵌入和领域知识增强基于图神经网络的生物医学知识图谱的链接预测。
J Biomed Inform. 2024 Oct;158:104730. doi: 10.1016/j.jbi.2024.104730. Epub 2024 Sep 24.
3
Construction of Genealogical Knowledge Graphs From Obituaries: Multitask Neural Network Extraction System.从讣告构建族谱知识图谱:多任务神经网络提取系统。
J Med Internet Res. 2021 Aug 4;23(8):e25670. doi: 10.2196/25670.
4
A biomedical knowledge graph-based method for drug-drug interactions prediction through combining local and global features with deep neural networks.基于生物医学知识图谱的方法,通过结合局部和全局特征与深度神经网络来预测药物-药物相互作用。
Brief Bioinform. 2022 Sep 20;23(5). doi: 10.1093/bib/bbac363.
5
Constructing co-occurrence network embeddings to assist association extraction for COVID-19 and other coronavirus infectious diseases.构建共现网络嵌入以辅助 COVID-19 和其他冠状病毒传染病的关联提取。
J Am Med Inform Assoc. 2020 Aug 1;27(8):1259-1267. doi: 10.1093/jamia/ocaa117.
6
Explaining protein-protein interactions with knowledge graph-based semantic similarity.用基于知识图的语义相似度解释蛋白质-蛋白质相互作用。
Comput Biol Med. 2024 Mar;170:108076. doi: 10.1016/j.compbiomed.2024.108076. Epub 2024 Feb 1.
7
BioNet: a large-scale and heterogeneous biological network model for interaction prediction with graph convolution.BioNet:一种基于图卷积的大规模异质生物网络互作预测模型。
Brief Bioinform. 2022 Jan 17;23(1). doi: 10.1093/bib/bbab491.
8
PT-KGNN: A framework for pre-training biomedical knowledge graphs with graph neural networks.PT-KGNN:基于图神经网络的生物医学知识图谱预训练框架。
Comput Biol Med. 2024 Aug;178:108768. doi: 10.1016/j.compbiomed.2024.108768. Epub 2024 Jun 26.
9
edge2vec: Representation learning using edge semantics for biomedical knowledge discovery.边向量模型:利用边语义的表示学习方法进行生物医学知识发现。
BMC Bioinformatics. 2019 Jun 10;20(1):306. doi: 10.1186/s12859-019-2914-2.
10
Task-driven knowledge graph filtering improves prioritizing drugs for repurposing.任务驱动的知识图过滤可改善药物再利用的优先级排序。
BMC Bioinformatics. 2022 Mar 4;23(1):84. doi: 10.1186/s12859-022-04608-y.

引用本文的文献

1
Comprehensive applications of the artificial intelligence technology in new drug research and development.人工智能技术在新药研发中的综合应用。
Health Inf Sci Syst. 2024 Aug 8;12(1):41. doi: 10.1007/s13755-024-00300-y. eCollection 2024 Dec.

本文引用的文献

1
Pre-training graph neural networks for link prediction in biomedical networks.用于生物医学网络中链接预测的预训练图神经网络。
Bioinformatics. 2022 Apr 12;38(8):2254-2262. doi: 10.1093/bioinformatics/btac100.
2
KGen: a knowledge graph generator from biomedical scientific literature.KGen:一种从生物医学科学文献中生成知识图谱的工具。
BMC Med Inform Decis Mak. 2020 Dec 14;20(Suppl 4):314. doi: 10.1186/s12911-020-01341-5.
3
Towards Using Graph Analytics for Tracking Covid-19.迈向使用图分析追踪新冠疫情
Procedia Comput Sci. 2020;177:204-211. doi: 10.1016/j.procs.2020.10.029. Epub 2020 Nov 11.
4
LitCovid: an open database of COVID-19 literature.LitCovid:一个 COVID-19 文献的开放数据库。
Nucleic Acids Res. 2021 Jan 8;49(D1):D1534-D1540. doi: 10.1093/nar/gkaa952.
5
Constructing co-occurrence network embeddings to assist association extraction for COVID-19 and other coronavirus infectious diseases.构建共现网络嵌入以辅助 COVID-19 和其他冠状病毒传染病的关联提取。
J Am Med Inform Assoc. 2020 Aug 1;27(8):1259-1267. doi: 10.1093/jamia/ocaa117.
6
Keep up with the latest coronavirus research.跟上冠状病毒的最新研究进展。
Nature. 2020 Mar;579(7798):193. doi: 10.1038/d41586-020-00694-1.
7
Drug-target prediction utilizing heterogeneous bio-linked network embeddings.利用异构生物链接网络嵌入进行药物-靶标预测。
Brief Bioinform. 2021 Jan 18;22(1):568-580. doi: 10.1093/bib/bbz147.
8
Image Quality Improvement of Hand-Held Ultrasound Devices With a Two-Stage Generative Adversarial Network.基于两阶段生成对抗网络的手持式超声设备的图像质量改进。
IEEE Trans Biomed Eng. 2020 Jan;67(1):298-311. doi: 10.1109/TBME.2019.2912986. Epub 2019 Apr 24.
9
Low-Dose CT Image Denoising Using a Generative Adversarial Network With Wasserstein Distance and Perceptual Loss.基于 Wasserstein 距离和感知损失的生成对抗网络的低剂量 CT 图像去噪
IEEE Trans Med Imaging. 2018 Jun;37(6):1348-1357. doi: 10.1109/TMI.2018.2827462.
10
node2vec: Scalable Feature Learning for Networks.节点2向量:网络的可扩展特征学习
KDD. 2016 Aug;2016:855-864. doi: 10.1145/2939672.2939754.