CoVEffect：基于深度学习的 SARS-CoV-2 突变和变体效应挖掘的交互式系统。

CoVEffect: interactive system for mining the effects of SARS-CoV-2 mutations and variants based on deep learning.

机构信息

Dipartimento di Informazione, Elettronica e Bioingegneria, 20133 Milano Country: Italy, Italy.

出版信息

Gigascience. 2022 Dec 28;12. doi: 10.1093/gigascience/giad036. Epub 2023 May 23.

DOI:10.1093/gigascience/giad036

PMID:37222749

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10205000/

Abstract

BACKGROUND

Literature about SARS-CoV-2 widely discusses the effects of variations that have spread in the past 3 years. Such information is dispersed in the texts of several research articles, hindering the possibility of practically integrating it with related datasets (e.g., millions of SARS-CoV-2 sequences available to the community). We aim to fill this gap, by mining literature abstracts to extract-for each variant/mutation-its related effects (in epidemiological, immunological, clinical, or viral kinetics terms) with labeled higher/lower levels in relation to the nonmutated virus.

RESULTS

The proposed framework comprises (i) the provisioning of abstracts from a COVID-19-related big data corpus (CORD-19) and (ii) the identification of mutation/variant effects in abstracts using a GPT2-based prediction model. The above techniques enable the prediction of mutations/variants with their effects and levels in 2 distinct scenarios: (i) the batch annotation of the most relevant CORD-19 abstracts and (ii) the on-demand annotation of any user-selected CORD-19 abstract through the CoVEffect web application (http://gmql.eu/coveffect), which assists expert users with semiautomated data labeling. On the interface, users can inspect the predictions and correct them; user inputs can then extend the training dataset used by the prediction model. Our prototype model was trained through a carefully designed process, using a minimal and highly diversified pool of samples.

CONCLUSIONS

The CoVEffect interface serves for the assisted annotation of abstracts, allowing the download of curated datasets for further use in data integration or analysis pipelines. The overall framework can be adapted to resolve similar unstructured-to-structured text translation tasks, which are typical of biomedical domains.

摘要

背景

文献广泛讨论了过去 3 年中传播的 SARS-CoV-2 变异的影响。此类信息分散在几篇研究文章的文本中，阻碍了将其与相关数据集（例如，社区可获得的数百万个 SARS-CoV-2 序列）实际整合的可能性。我们旨在通过挖掘文献摘要来填补这一空白，为每个变体/突变提取与其相关的影响（在流行病学、免疫学、临床或病毒动力学方面），并根据与未突变病毒的关系标记为更高/更低水平。

结果

所提出的框架包括 (i) 从与 COVID-19 相关的大数据语料库 (CORD-19) 提供摘要，以及 (ii) 使用基于 GPT2 的预测模型在摘要中识别突变/变体的影响。上述技术可用于预测具有其影响和水平的突变/变体，在两种不同情况下：(i) 对最相关的 CORD-19 摘要进行批量注释，以及 (ii) 通过 CoVEffect 网络应用程序 (http://gmql.eu/coveffect) 对任何用户选择的 CORD-19 摘要进行按需注释，该应用程序通过半自动数据标记来协助专家用户。在界面上，用户可以检查预测并进行更正；用户输入可以扩展预测模型使用的训练数据集。我们的原型模型是通过精心设计的过程进行训练的，使用了最小且高度多样化的样本池。

结论

CoVEffect 界面可用于辅助摘要注释，允许下载经过整理的数据集，以进一步用于数据集成或分析管道。总体框架可以适应解决类似的非结构化到结构化文本翻译任务，这是生物医学领域的典型任务。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4f15/10205000/75bbe4228d62/giad036fig1.jpg

相似文献

CoVEffect: interactive system for mining the effects of SARS-CoV-2 mutations and variants based on deep learning.CoVEffect：基于深度学习的 SARS-CoV-2 突变和变体效应挖掘的交互式系统。

Gigascience. 2022 Dec 28;12. doi: 10.1093/gigascience/giad036. Epub 2023 May 23.

Taxonium, a web-based tool for exploring large phylogenetic trees.Taxonium，一个用于探索大型系统发育树的网络工具。

Elife. 2022 Nov 15;11:e82392. doi: 10.7554/eLife.82392.

ViruClust: direct comparison of SARS-CoV-2 genomes and genetic variants in space and time.ViruClust：严重急性呼吸综合征冠状病毒2基因组及基因变异在时空上的直接比较

Bioinformatics. 2022 Mar 28;38(7):1988-1994. doi: 10.1093/bioinformatics/btac030.

Prediction of death status on the course of treatment in SARS-COV-2 patients with deep learning and machine learning methods.利用深度学习和机器学习方法预测 SARS-CoV-2 患者治疗过程中的死亡状态。

Comput Methods Programs Biomed. 2021 Apr;201:105951. doi: 10.1016/j.cmpb.2021.105951. Epub 2021 Jan 22.

Searching COVID-19 Clinical Research Using Graph Queries: Algorithm Development and Validation.使用图查询搜索 COVID-19 临床研究：算法开发与验证。

J Med Internet Res. 2024 May 30;26:e52655. doi: 10.2196/52655.

COVID-Net Biochem: an explainability-driven framework to building machine learning models for predicting survival and kidney injury of COVID-19 patients from clinical and biochemistry data.COVID-Net 生化：一个基于可解释性的框架，用于构建基于临床和生化数据预测 COVID-19 患者生存和肾脏损伤的机器学习模型。

Sci Rep. 2023 Oct 9;13(1):17001. doi: 10.1038/s41598-023-42203-0.

Deep-learning-enabled protein-protein interaction analysis for prediction of SARS-CoV-2 infectivity and variant evolution.基于深度学习的蛋白质相互作用分析预测 SARS-CoV-2 感染性和变异进化。

Nat Med. 2023 Aug;29(8):2007-2018. doi: 10.1038/s41591-023-02483-5. Epub 2023 Jul 31.

FnCas9-based CRISPR diagnostic for rapid and accurate detection of major SARS-CoV-2 variants on a paper strip.基于 FnCas9 的 CRISPR 诊断技术可在纸条上快速准确地检测主要 SARS-CoV-2 变体。

Elife. 2021 Jun 9;10:e67130. doi: 10.7554/eLife.67130.

Patterns of within-host genetic diversity in SARS-CoV-2.SARS-CoV-2 病毒在宿主内的遗传多样性模式。

Elife. 2021 Aug 13;10:e66857. doi: 10.7554/eLife.66857.

VariantHunter: a method and tool for fast detection of emerging SARS-CoV-2 variants.VariantHunter：一种快速检测新型 SARS-CoV-2 变异株的方法和工具。

Database (Oxford). 2023 Jul 6;2023. doi: 10.1093/database/baad044.

引用本文的文献

Identification of patient demographic, clinical, and SARS-CoV-2 genomic factors associated with severe COVID-19 using supervised machine learning: a retrospective multicenter study.使用监督式机器学习识别与重症新型冠状病毒肺炎相关的患者人口统计学、临床和严重急性呼吸综合征冠状病毒2基因组因素：一项回顾性多中心研究

BMC Infect Dis. 2025 Jan 28;25(1):132. doi: 10.1186/s12879-025-10450-3.

Editorial: Exploring genetic characteristics and molecular mechanisms of host adaptation of viruses with artificial intelligence (AI) or (and) biological (BIO) approaches.社论：利用人工智能（AI）或（和）生物学（BIO）方法探索病毒宿主适应性的遗传特征和分子机制。

Front Cell Infect Microbiol. 2024 Aug 26;14:1474097. doi: 10.3389/fcimb.2024.1474097. eCollection 2024.

Searching COVID-19 Clinical Research Using Graph Queries: Algorithm Development and Validation.使用图查询搜索 COVID-19 临床研究：算法开发与验证。

J Med Internet Res. 2024 May 30;26:e52655. doi: 10.2196/52655.

RCoV19: A One-stop Hub for SARS-CoV-2 Genome Data Integration, Variant Monitoring, and Risk Pre-warning.RCoV19：一个用于严重急性呼吸综合征冠状病毒2基因组数据整合、变异监测和风险预警的一站式中心。

Genomics Proteomics Bioinformatics. 2023 Oct;21(5):1066-1079. doi: 10.1016/j.gpb.2023.10.004. Epub 2023 Oct 26.

本文引用的文献

Fine-tuning large neural language models for biomedical natural language processing.针对生物医学自然语言处理对大型神经语言模型进行微调。

Patterns (N Y). 2023 Apr 14;4(4):100729. doi: 10.1016/j.patter.2023.100729.

Applications of transformer-based language models in bioinformatics: a survey.基于Transformer的语言模型在生物信息学中的应用：一项综述。

Bioinform Adv. 2023 Jan 11;3(1):vbad001. doi: 10.1093/bioadv/vbad001. eCollection 2023.

Outbreak.info genomic reports: scalable and dynamic surveillance of SARS-CoV-2 variants and mutations.暴发信息基因组报告：可扩展和动态监测 SARS-CoV-2 变体和突变。

Nat Methods. 2023 Apr;20(4):512-522. doi: 10.1038/s41592-023-01769-3. Epub 2023 Feb 23.

Interpretable and Predictive Deep Neural Network Modeling of the SARS-CoV-2 Spike Protein Sequence to Predict COVID-19 Disease Severity.用于预测COVID-19疾病严重程度的SARS-CoV-2刺突蛋白序列的可解释性和预测性深度神经网络建模

Biology (Basel). 2022 Dec 8;11(12):1786. doi: 10.3390/biology11121786.

A survey on clinical natural language processing in the United Kingdom from 2007 to 2022.2007年至2022年英国临床自然语言处理调查。

NPJ Digit Med. 2022 Dec 21;5(1):186. doi: 10.1038/s41746-022-00730-6.

GenBank 2023 update.GenBank 2023 更新。

Nucleic Acids Res. 2023 Jan 6;51(D1):D141-D144. doi: 10.1093/nar/gkac1012.

ViMRT: a text-mining tool and search engine for automated virus mutation recognition.ViMRT：一种用于自动病毒突变识别的文本挖掘工具和搜索引擎。

Bioinformatics. 2023 Jan 1;39(1). doi: 10.1093/bioinformatics/btac721.

How can natural language processing help model informed drug development?: a review.自然语言处理如何助力模型指导下的药物研发？综述

JAMIA Open. 2022 Jun 11;5(2):ooac043. doi: 10.1093/jamiaopen/ooac043. eCollection 2022 Jul.

GeMI: interactive interface for transformer-based Genomic Metadata Integration.GeMI：基于转换器的基因组元数据集成的交互式接口。

Database (Oxford). 2022 Jun 3;2022. doi: 10.1093/database/baac036.

CoV2K model, a comprehensive representation of SARS-CoV-2 knowledge and data interplay.CoV2K 模型，全面展示 SARS-CoV-2 知识和数据相互作用。

Sci Data. 2022 Jun 1;9(1):260. doi: 10.1038/s41597-022-01348-9.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

CoVEffect：基于深度学习的 SARS-CoV-2 突变和变体效应挖掘的交互式系统。

CoVEffect: interactive system for mining the effects of SARS-CoV-2 mutations and variants based on deep learning.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献