文献检索文档翻译深度研究
Suppr Zotero 插件Zotero 插件
邀请有礼套餐&价格历史记录

新学期,新优惠

限时优惠:9月1日-9月22日

30天高级会员仅需29元

1天体验卡首发特惠仅需5.99元

了解详情
不再提醒
插件&应用
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
高级版
套餐订阅购买积分包
AI 工具
文献检索文档翻译深度研究
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2025

CoVEffect:基于深度学习的 SARS-CoV-2 突变和变体效应挖掘的交互式系统。

CoVEffect: interactive system for mining the effects of SARS-CoV-2 mutations and variants based on deep learning.

机构信息

Dipartimento di Informazione, Elettronica e Bioingegneria, 20133 Milano Country: Italy, Italy.

出版信息

Gigascience. 2022 Dec 28;12. doi: 10.1093/gigascience/giad036. Epub 2023 May 23.


DOI:10.1093/gigascience/giad036
PMID:37222749
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10205000/
Abstract

BACKGROUND: Literature about SARS-CoV-2 widely discusses the effects of variations that have spread in the past 3 years. Such information is dispersed in the texts of several research articles, hindering the possibility of practically integrating it with related datasets (e.g., millions of SARS-CoV-2 sequences available to the community). We aim to fill this gap, by mining literature abstracts to extract-for each variant/mutation-its related effects (in epidemiological, immunological, clinical, or viral kinetics terms) with labeled higher/lower levels in relation to the nonmutated virus. RESULTS: The proposed framework comprises (i) the provisioning of abstracts from a COVID-19-related big data corpus (CORD-19) and (ii) the identification of mutation/variant effects in abstracts using a GPT2-based prediction model. The above techniques enable the prediction of mutations/variants with their effects and levels in 2 distinct scenarios: (i) the batch annotation of the most relevant CORD-19 abstracts and (ii) the on-demand annotation of any user-selected CORD-19 abstract through the CoVEffect web application (http://gmql.eu/coveffect), which assists expert users with semiautomated data labeling. On the interface, users can inspect the predictions and correct them; user inputs can then extend the training dataset used by the prediction model. Our prototype model was trained through a carefully designed process, using a minimal and highly diversified pool of samples. CONCLUSIONS: The CoVEffect interface serves for the assisted annotation of abstracts, allowing the download of curated datasets for further use in data integration or analysis pipelines. The overall framework can be adapted to resolve similar unstructured-to-structured text translation tasks, which are typical of biomedical domains.

摘要

背景:文献广泛讨论了过去 3 年中传播的 SARS-CoV-2 变异的影响。此类信息分散在几篇研究文章的文本中,阻碍了将其与相关数据集(例如,社区可获得的数百万个 SARS-CoV-2 序列)实际整合的可能性。我们旨在通过挖掘文献摘要来填补这一空白,为每个变体/突变提取与其相关的影响(在流行病学、免疫学、临床或病毒动力学方面),并根据与未突变病毒的关系标记为更高/更低水平。

结果:所提出的框架包括 (i) 从与 COVID-19 相关的大数据语料库 (CORD-19) 提供摘要,以及 (ii) 使用基于 GPT2 的预测模型在摘要中识别突变/变体的影响。上述技术可用于预测具有其影响和水平的突变/变体,在两种不同情况下:(i) 对最相关的 CORD-19 摘要进行批量注释,以及 (ii) 通过 CoVEffect 网络应用程序 (http://gmql.eu/coveffect) 对任何用户选择的 CORD-19 摘要进行按需注释,该应用程序通过半自动数据标记来协助专家用户。在界面上,用户可以检查预测并进行更正;用户输入可以扩展预测模型使用的训练数据集。我们的原型模型是通过精心设计的过程进行训练的,使用了最小且高度多样化的样本池。

结论:CoVEffect 界面可用于辅助摘要注释,允许下载经过整理的数据集,以进一步用于数据集成或分析管道。总体框架可以适应解决类似的非结构化到结构化文本翻译任务,这是生物医学领域的典型任务。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4f15/10205000/3222e7d8cb9d/giad036fig9.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4f15/10205000/75bbe4228d62/giad036fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4f15/10205000/ee2c020812f3/giad036fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4f15/10205000/9b7ae7891fe6/giad036fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4f15/10205000/aefeadebf847/giad036fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4f15/10205000/3abc727a0337/giad036fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4f15/10205000/a8cd705e3d48/giad036fig6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4f15/10205000/e3336c526862/giad036fig7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4f15/10205000/80dd1d69bc39/giad036fig8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4f15/10205000/3222e7d8cb9d/giad036fig9.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4f15/10205000/75bbe4228d62/giad036fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4f15/10205000/ee2c020812f3/giad036fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4f15/10205000/9b7ae7891fe6/giad036fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4f15/10205000/aefeadebf847/giad036fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4f15/10205000/3abc727a0337/giad036fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4f15/10205000/a8cd705e3d48/giad036fig6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4f15/10205000/e3336c526862/giad036fig7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4f15/10205000/80dd1d69bc39/giad036fig8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4f15/10205000/3222e7d8cb9d/giad036fig9.jpg

相似文献

[1]
CoVEffect: interactive system for mining the effects of SARS-CoV-2 mutations and variants based on deep learning.

Gigascience. 2022-12-28

[2]
Taxonium, a web-based tool for exploring large phylogenetic trees.

Elife. 2022-11-15

[3]
ViruClust: direct comparison of SARS-CoV-2 genomes and genetic variants in space and time.

Bioinformatics. 2022-3-28

[4]
Prediction of death status on the course of treatment in SARS-COV-2 patients with deep learning and machine learning methods.

Comput Methods Programs Biomed. 2021-4

[5]
Searching COVID-19 Clinical Research Using Graph Queries: Algorithm Development and Validation.

J Med Internet Res. 2024-5-30

[6]
COVID-Net Biochem: an explainability-driven framework to building machine learning models for predicting survival and kidney injury of COVID-19 patients from clinical and biochemistry data.

Sci Rep. 2023-10-9

[7]
Deep-learning-enabled protein-protein interaction analysis for prediction of SARS-CoV-2 infectivity and variant evolution.

Nat Med. 2023-8

[8]
FnCas9-based CRISPR diagnostic for rapid and accurate detection of major SARS-CoV-2 variants on a paper strip.

Elife. 2021-6-9

[9]
Patterns of within-host genetic diversity in SARS-CoV-2.

Elife. 2021-8-13

[10]
VariantHunter: a method and tool for fast detection of emerging SARS-CoV-2 variants.

Database (Oxford). 2023-7-6

引用本文的文献

[1]
Identification of patient demographic, clinical, and SARS-CoV-2 genomic factors associated with severe COVID-19 using supervised machine learning: a retrospective multicenter study.

BMC Infect Dis. 2025-1-28

[2]
Editorial: Exploring genetic characteristics and molecular mechanisms of host adaptation of viruses with artificial intelligence (AI) or (and) biological (BIO) approaches.

Front Cell Infect Microbiol. 2024-8-26

[3]
Searching COVID-19 Clinical Research Using Graph Queries: Algorithm Development and Validation.

J Med Internet Res. 2024-5-30

[4]
RCoV19: A One-stop Hub for SARS-CoV-2 Genome Data Integration, Variant Monitoring, and Risk Pre-warning.

Genomics Proteomics Bioinformatics. 2023-10

本文引用的文献

[1]
Fine-tuning large neural language models for biomedical natural language processing.

Patterns (N Y). 2023-4-14

[2]
Applications of transformer-based language models in bioinformatics: a survey.

Bioinform Adv. 2023-1-11

[3]
Outbreak.info genomic reports: scalable and dynamic surveillance of SARS-CoV-2 variants and mutations.

Nat Methods. 2023-4

[4]
Interpretable and Predictive Deep Neural Network Modeling of the SARS-CoV-2 Spike Protein Sequence to Predict COVID-19 Disease Severity.

Biology (Basel). 2022-12-8

[5]
A survey on clinical natural language processing in the United Kingdom from 2007 to 2022.

NPJ Digit Med. 2022-12-21

[6]
GenBank 2023 update.

Nucleic Acids Res. 2023-1-6

[7]
ViMRT: a text-mining tool and search engine for automated virus mutation recognition.

Bioinformatics. 2023-1-1

[8]
How can natural language processing help model informed drug development?: a review.

JAMIA Open. 2022-6-11

[9]
GeMI: interactive interface for transformer-based Genomic Metadata Integration.

Database (Oxford). 2022-6-3

[10]
CoV2K model, a comprehensive representation of SARS-CoV-2 knowledge and data interplay.

Sci Data. 2022-6-1

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

推荐工具

医学文档翻译智能文献检索