构建共现网络嵌入以辅助 COVID-19 和其他冠状病毒传染病的关联提取。

Constructing co-occurrence network embeddings to assist association extraction for COVID-19 and other coronavirus infectious diseases.

机构信息

Kern Center for the Science of Health Care Delivery, Mayo Clinic, Rochester, Minnesota, USA.

Division of Digital Health Sciences, Mayo Clinic, Rochester, Minnesota, USA.

出版信息

J Am Med Inform Assoc. 2020 Aug 1;27(8):1259-1267. doi: 10.1093/jamia/ocaa117.

DOI:10.1093/jamia/ocaa117

PMID:32458963

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7314034/

Abstract

OBJECTIVE

As coronavirus disease 2019 (COVID-19) started its rapid emergence and gradually transformed into an unprecedented pandemic, the need for having a knowledge repository for the disease became crucial. To address this issue, a new COVID-19 machine-readable dataset known as the COVID-19 Open Research Dataset (CORD-19) has been released. Based on this, our objective was to build a computable co-occurrence network embeddings to assist association detection among COVID-19-related biomedical entities.

MATERIALS AND METHODS

Leveraging a Linked Data version of CORD-19 (ie, CORD-19-on-FHIR), we first utilized SPARQL to extract co-occurrences among chemicals, diseases, genes, and mutations and build a co-occurrence network. We then trained the representation of the derived co-occurrence network using node2vec with 4 edge embeddings operations (L1, L2, Average, and Hadamard). Six algorithms (decision tree, logistic regression, support vector machine, random forest, naïve Bayes, and multilayer perceptron) were applied to evaluate performance on link prediction. An unsupervised learning strategy was also developed incorporating the t-SNE (t-distributed stochastic neighbor embedding) and DBSCAN (density-based spatial clustering of applications with noise) algorithms for case studies.

RESULTS

The random forest classifier showed the best performance on link prediction across different network embeddings. For edge embeddings generated using the Average operation, random forest achieved the optimal average precision of 0.97 along with a F1 score of 0.90. For unsupervised learning, 63 clusters were formed with silhouette score of 0.128. Significant associations were detected for 5 coronavirus infectious diseases in their corresponding subgroups.

CONCLUSIONS

In this study, we constructed COVID-19-centered co-occurrence network embeddings. Results indicated that the generated embeddings were able to extract significant associations for COVID-19 and coronavirus infectious diseases.

摘要

目的

随着 2019 年冠状病毒病（COVID-19）的迅速出现并逐渐演变为前所未有的大流行，对疾病知识库的需求变得至关重要。为了解决这个问题，一个新的 COVID-19 机器可读数据集，即 COVID-19 开放研究数据集（CORD-19）已经发布。在此基础上，我们的目标是构建可计算的共现网络嵌入，以协助 COVID-19 相关生物医学实体之间的关联检测。

材料和方法

利用 CORD-19 的 Linked Data 版本（即 CORD-19-on-FHIR），我们首先使用 SPARQL 提取化学物质、疾病、基因和突变之间的共现，并构建共现网络。然后，我们使用 node2vec 训练所得共现网络的表示，共进行了 4 次边嵌入操作（L1、L2、Average 和 Hadamard）。我们应用了 6 种算法（决策树、逻辑回归、支持向量机、随机森林、朴素贝叶斯和多层感知机）来评估链接预测的性能。还开发了一种无监督学习策略，结合 t-SNE（t 分布随机邻域嵌入）和 DBSCAN（基于密度的空间聚类应用噪声）算法进行案例研究。

结果

随机森林分类器在不同网络嵌入上的链接预测表现最佳。对于使用 Average 操作生成的边嵌入，随机森林在平均精度达到 0.97 的同时，F1 得分为 0.90。对于无监督学习，形成了 63 个簇，轮廓得分 0.128。在其相应的子组中，检测到 5 种冠状病毒传染病的显著关联。

结论

在这项研究中，我们构建了以 COVID-19 为中心的共现网络嵌入。结果表明，生成的嵌入能够提取 COVID-19 和冠状病毒传染病的显著关联。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ac26/7647278/7d48eff6e1c6/ocaa117f1.jpg

相似文献

Constructing co-occurrence network embeddings to assist association extraction for COVID-19 and other coronavirus infectious diseases.

J Am Med Inform Assoc. 2020 Aug 1;27(8):1259-1267. doi: 10.1093/jamia/ocaa117.

HPO2Vec+: Leveraging heterogeneous knowledge resources to enrich node embeddings for the Human Phenotype Ontology.

J Biomed Inform. 2019 Aug;96:103246. doi: 10.1016/j.jbi.2019.103246. Epub 2019 Jun 27.

Optimizing neural networks for medical data sets: A case study on neonatal apnea prediction.

Artif Intell Med. 2019 Jul;98:59-76. doi: 10.1016/j.artmed.2019.07.008. Epub 2019 Jul 25.

Machine Learning Hybrid Model for the Prediction of Chronic Kidney Disease.

Comput Intell Neurosci. 2023 Mar 14;2023:9266889. doi: 10.1155/2023/9266889. eCollection 2023.

Efficient Prediction of Missed Clinical Appointment Using Machine Learning.

Comput Math Methods Med. 2021 Oct 22;2021:2376391. doi: 10.1155/2021/2376391. eCollection 2021.

Utility of General and Specific Word Embeddings for Classifying Translational Stages of Research.

AMIA Annu Symp Proc. 2018 Dec 5;2018:1405-1414. eCollection 2018.

Artificial Neural Network Modeling of Novel Coronavirus (COVID-19) Incidence Rates across the Continental United States.

Int J Environ Res Public Health. 2020 Jun 12;17(12):4204. doi: 10.3390/ijerph17124204.

COVID19XrayNet: A Two-Step Transfer Learning Model for the COVID-19 Detecting Problem Based on a Limited Number of Chest X-Ray Images.

Interdiscip Sci. 2020 Dec;12(4):555-565. doi: 10.1007/s12539-020-00393-5. Epub 2020 Sep 21.

Fetal health status prediction based on maternal clinical history using machine learning techniques.

Comput Methods Programs Biomed. 2018 Sep;163:87-100. doi: 10.1016/j.cmpb.2018.06.010. Epub 2018 Jun 14.

Machine learning based COVID -19 disease recognition using CT images of SIRM database.

J Med Eng Technol. 2022 Oct;46(7):590-603. doi: 10.1080/03091902.2022.2080883. Epub 2022 May 31.

引用本文的文献

Uncovering COVID-19 transmission tree: identifying traced and untraced infections in an infection network.

Front Public Health. 2024 Jun 3;12:1362823. doi: 10.3389/fpubh.2024.1362823. eCollection 2024.

SymptomGraph: Identifying Symptom Clusters from Narrative Clinical Notes using Graph Clustering.

Proc Symp Appl Comput. 2023 Mar;2023:518-527. doi: 10.1145/3555776.3577685. Epub 2023 Jun 7.

Review on the Evaluation and Development of Artificial Intelligence for COVID-19 Containment.

Sensors (Basel). 2023 Jan 3;23(1):527. doi: 10.3390/s23010527.

Deep Denoising of Raw Biomedical Knowledge Graph From COVID-19 Literature, LitCovid, and Pubtator: Framework Development and Validation.

J Med Internet Res. 2022 Jul 6;24(7):e38584. doi: 10.2196/38584.

A Web Application for Biomedical Text Mining of Scientific Literature Associated with Coronavirus-Related Syndromes: Coronavirus Finder.

Diagnostics (Basel). 2022 Apr 2;12(4):887. doi: 10.3390/diagnostics12040887.

Subphenotyping of Mexican Patients With COVID-19 at Preadmission To Anticipate Severity Stratification: Age-Sex Unbiased Meta-Clustering Technique.

JMIR Public Health Surveill. 2022 Mar 30;8(3):e30032. doi: 10.2196/30032.

Text mining approaches for dealing with the rapidly expanding literature on COVID-19.

Brief Bioinform. 2021 Mar 22;22(2):781-799. doi: 10.1093/bib/bbaa296.

本文引用的文献

Tissue plasminogen activator (tPA) treatment for COVID-19 associated acute respiratory distress syndrome (ARDS): A case series.

J Thromb Haemost. 2020 Jul;18(7):1752-1755. doi: 10.1111/jth.14828. Epub 2020 May 11.

An orally bioavailable broad-spectrum antiviral inhibits SARS-CoV-2 in human airway epithelial cell cultures and multiple coronaviruses in mice.

Sci Transl Med. 2020 Apr 29;12(541). doi: 10.1126/scitranslmed.abb5883. Epub 2020 Apr 6.

Keep up with the latest coronavirus research.

Nature. 2020 Mar;579(7798):193. doi: 10.1038/d41586-020-00694-1.

Gga-miR-30d regulates infectious bronchitis virus infection by targeting USP47 in HD11 cells.

Microb Pathog. 2020 Apr;141:103998. doi: 10.1016/j.micpath.2020.103998. Epub 2020 Jan 23.

HPO2Vec+: Leveraging heterogeneous knowledge resources to enrich node embeddings for the Human Phenotype Ontology.

J Biomed Inform. 2019 Aug;96:103246. doi: 10.1016/j.jbi.2019.103246. Epub 2019 Jun 27.

Predicate Oriented Pattern Analysis for Biomedical Knowledge Discovery.

Intell Inf Manag. 2016 May;8(3):66-85. doi: 10.4236/iim.2016.83006.

node2vec: Scalable Feature Learning for Networks.

KDD. 2016 Aug;2016:855-864. doi: 10.1145/2939672.2939754.

The association of functional polymorphisms in genes encoding growth factors for endothelial cells and smooth muscle cells with the severity of coronary artery disease.

BMC Cardiovasc Disord. 2016 Nov 11;16(1):218. doi: 10.1186/s12872-016-0402-4.

Knowledge Discovery from Biomedical Ontologies in Cross Domains.

PLoS One. 2016 Aug 22;11(8):e0160005. doi: 10.1371/journal.pone.0160005. eCollection 2016.

Hegemonic structure of basic, clinical and patented knowledge on Ebola research: a US army reductionist initiative.

J Transl Med. 2015 Apr 19;13:124. doi: 10.1186/s12967-015-0496-y.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

构建共现网络嵌入以辅助 COVID-19 和其他冠状病毒传染病的关联提取。

Constructing co-occurrence network embeddings to assist association extraction for COVID-19 and other coronavirus infectious diseases.

机构信息

Kern Center for the Science of Health Care Delivery, Mayo Clinic, Rochester, Minnesota, USA.

Division of Digital Health Sciences, Mayo Clinic, Rochester, Minnesota, USA.