• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

机器能否学习 SARS-CoV-2 的突变特征,并实现基于病毒基因型的预测预后?

Can machines learn the mutation signatures of SARS-CoV-2 and enable viral-genotype guided predictive prognosis?

机构信息

Tata Consultancy Services Ltd, Pune 411013, India; CSIR-Institute of Genomics and Integrative Biology (CSIR-IGIB), New Delhi 110025, India; Academy of Scientific and Innovative Research (AcSIR), Ghaziabad 201002, India. Electronic address: https://twitter.com/NagpalSun.

Tata Consultancy Services Ltd, Pune 411013, India. Electronic address: https://twitter.com/nishal_pinna.

出版信息

J Mol Biol. 2022 Aug 15;434(15):167684. doi: 10.1016/j.jmb.2022.167684. Epub 2022 Jun 11.

DOI:10.1016/j.jmb.2022.167684
PMID:35700770
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9188262/
Abstract

MOTIVATION

Continuous emergence of new variants through appearance/accumulation/disappearance of mutations is a hallmark of many viral diseases. SARS-CoV-2 variants have particularly exerted tremendous pressure on global healthcare system owing to their life threatening and debilitating implications. The sheer plurality of variants and huge scale of genomic data have added to the challenges of tracing the mutations/variants and their relationship to infection severity (if any).

RESULTS

We explored the suitability of virus-genotype guided machine-learning in infection prognosis and identification of features/mutations-of-interest. Total 199,519 outcome-traced genomes, representing 45,625 nucleotide-mutations, were employed. Among these, post data-cleaning, Low and High severity genomes were classified using an integrated model (employing virus genotype, epitopic-influence and patient-age) with consistently high ROC-AUC (Asia:0.97 ± 0.01, Europe:0.94 ± 0.01, N.America:0.92 ± 0.02, Africa:0.94 ± 0.07, S.America:0.93 ± 03). Although virus-genotype alone could enable high predictivity (0.97 ± 0.01, 0.89 ± 0.02, 0.86 ± 0.04, 0.95 ± 0.06, 0.9 ± 0.04), the performance was not found to be consistent and the models for a few geographies displayed significant improvement in predictivity when the influence of age and/or epitope was incorporated with virus-genotype (Wilcoxon p_BH < 0.05). Neither age or epitopic-influence or clade information could out-perform the integrated features. A sparse model (6 features), developed using patient-age and epitopic-influence of the mutations, performed reasonably well (>0.87 ± 0.03, 0.91 ± 0.01, 0.87 ± 0.03, 0.84 ± 0.08, 0.89 ± 0.05). High-performance models were employed for inferring the important mutations-of-interest using Shapley Additive exPlanations (SHAP). The changes in HLA interactions of the mutated epitopes of reference SARS-CoV-2 were then subsequently probed. Notably, we also describe the significance of a 'temporal-modeling approach' to benchmark the models linked with continuously evolving pathogens. We conclude that while machine learning can play a vital role in identifying relevant mutations and factors driving the severity, caution should be exercised in using the genotypic signatures for predictive prognosis.

摘要

动机

通过突变的出现/积累/消失,新变体的持续出现是许多病毒疾病的标志。由于 SARS-CoV-2 变体具有危及生命和使身体虚弱的影响,因此对全球医疗保健系统造成了巨大压力。变体的多样性和基因组数据的规模庞大,增加了追踪突变/变体及其与感染严重程度(如果有的话)的关系的挑战。

结果

我们探讨了病毒基因型指导的机器学习在感染预后和鉴定特征/感兴趣的突变中的适用性。总共使用了 199519 个追踪到结果的基因组,代表 45625 个核苷酸突变。在这些数据中,经过数据清理后,使用整合模型(使用病毒基因型、表位影响和患者年龄)对低严重程度和高严重程度的基因组进行了分类,该模型具有始终较高的 ROC-AUC(亚洲:0.97±0.01,欧洲:0.94±0.01,北美:0.92±0.02,非洲:0.94±0.07,南美:0.93±0.03)。尽管病毒基因型本身可以实现高预测性(0.97±0.01,0.89±0.02,0.86±0.04,0.95±0.06,0.9±0.04),但发现其性能并不一致,并且当年龄和/或表位的影响与病毒基因型结合使用时,一些地理模型的预测性能显著提高(Wilcoxon p_BH<0.05)。年龄、表位影响或进化枝信息都无法超过整合特征。使用患者年龄和突变的表位影响开发的稀疏模型(6 个特征)表现良好(>0.87±0.03,0.91±0.01,0.87±0.03,0.84±0.08,0.89±0.05)。使用 Shapley Additive exPlanations (SHAP) 推断重要的感兴趣突变。然后,进一步探测参考 SARS-CoV-2 突变表位的 HLA 相互作用的变化。值得注意的是,我们还描述了“时间建模方法”在基准与不断进化的病原体相关联的模型方面的重要性。我们得出的结论是,虽然机器学习可以在识别相关突变和驱动严重程度的因素方面发挥重要作用,但在使用基因型特征进行预测预后时应谨慎行事。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4c8c/9188262/577081ac8c09/gr8_lrg.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4c8c/9188262/f42a993a2e2f/ga1_lrg.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4c8c/9188262/f2bed47c99a0/gr1_lrg.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4c8c/9188262/e4e0329e982a/gr2_lrg.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4c8c/9188262/3fe0e617f221/gr3_lrg.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4c8c/9188262/9a75bdf052aa/gr4_lrg.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4c8c/9188262/1a1a0483443c/gr5_lrg.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4c8c/9188262/98d68e9bc267/gr6_lrg.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4c8c/9188262/6b7a1f998857/gr7_lrg.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4c8c/9188262/577081ac8c09/gr8_lrg.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4c8c/9188262/f42a993a2e2f/ga1_lrg.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4c8c/9188262/f2bed47c99a0/gr1_lrg.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4c8c/9188262/e4e0329e982a/gr2_lrg.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4c8c/9188262/3fe0e617f221/gr3_lrg.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4c8c/9188262/9a75bdf052aa/gr4_lrg.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4c8c/9188262/1a1a0483443c/gr5_lrg.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4c8c/9188262/98d68e9bc267/gr6_lrg.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4c8c/9188262/6b7a1f998857/gr7_lrg.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4c8c/9188262/577081ac8c09/gr8_lrg.jpg

相似文献

1
Can machines learn the mutation signatures of SARS-CoV-2 and enable viral-genotype guided predictive prognosis?机器能否学习 SARS-CoV-2 的突变特征,并实现基于病毒基因型的预测预后?
J Mol Biol. 2022 Aug 15;434(15):167684. doi: 10.1016/j.jmb.2022.167684. Epub 2022 Jun 11.
2
Profiling SARS-CoV-2 mutation fingerprints that range from the viral pangenome to individual infection quasispecies.分析 SARS-CoV-2 突变指纹,范围从病毒泛基因组到个体感染准种。
Genome Med. 2021 Apr 19;13(1):62. doi: 10.1186/s13073-021-00882-2.
3
COVIDOUTCOME-estimating COVID severity based on mutation signatures in the SARS-CoV-2 genome.COVIDOUTCOME——基于严重急性呼吸综合征冠状病毒2(SARS-CoV-2)基因组中的突变特征估计新冠严重程度。
Database (Oxford). 2021 May 8;2021. doi: 10.1093/database/baab020.
4
Prediction of Recurrent Mutations in SARS-CoV-2 Using Artificial Neural Networks.利用人工神经网络预测 SARS-CoV-2 的复发性突变。
Int J Mol Sci. 2022 Nov 24;23(23):14683. doi: 10.3390/ijms232314683.
5
Analysis of the potential impact of genomic variants in global SARS-CoV-2 genomes on molecular diagnostic assays.分析全球 SARS-CoV-2 基因组中基因组变异对分子诊断检测的潜在影响。
Int J Infect Dis. 2021 Jan;102:460-462. doi: 10.1016/j.ijid.2020.10.086. Epub 2020 Nov 9.
6
Evolutionary and Phenotypic Characterization of Two Spike Mutations in European Lineage 20E of SARS-CoV-2.两种 SARS-CoV-2 欧洲谱系 20E 刺突突变的进化和表型特征。
mBio. 2021 Dec 21;12(6):e0231521. doi: 10.1128/mBio.02315-21. Epub 2021 Nov 16.
7
Temporal landscape of mutational frequencies in SARS-CoV-2 genomes of Bangladesh: possible implications from the ongoing outbreak in Bangladesh.孟加拉国 SARS-CoV-2 基因组中突变频率的时间景观:孟加拉国当前疫情爆发的可能影响。
Virus Genes. 2021 Oct;57(5):413-425. doi: 10.1007/s11262-021-01860-x. Epub 2021 Jul 12.
8
Emergence of European and North American mutant variants of SARS-CoV-2 in South-East Asia.东南亚出现欧洲和北美变异的 SARS-CoV-2 。
Transbound Emerg Dis. 2021 Mar;68(2):824-832. doi: 10.1111/tbed.13748. Epub 2020 Aug 9.
9
Developing an Amplification Refractory Mutation System-Quantitative Reverse Transcription-PCR Assay for Rapid and Sensitive Screening of SARS-CoV-2 Variants of Concern.开发扩增抑制突变系统-实时荧光定量 RT-PCR 检测方法,快速灵敏地筛查关注的 SARS-CoV-2 变异株。
Microbiol Spectr. 2022 Feb 23;10(1):e0143821. doi: 10.1128/spectrum.01438-21. Epub 2022 Jan 5.
10
Learning From Biological and Computational Machines: Importance of SARS-CoV-2 Genomic Surveillance, Mutations and Risk Stratification.从生物和计算机器中学习:SARS-CoV-2 基因组监测、突变和风险分层的重要性。
Front Cell Infect Microbiol. 2021 Dec 24;11:783961. doi: 10.3389/fcimb.2021.783961. eCollection 2021.

引用本文的文献

1
Impact of Obesity-Associated SARS-CoV-2 Mutations on COVID-19 Severity and Clinical Outcomes.肥胖相关的严重急性呼吸综合征冠状病毒2(SARS-CoV-2)突变对冠状病毒病2019(COVID-19)严重程度和临床结局的影响
Viruses. 2024 Dec 30;17(1):38. doi: 10.3390/v17010038.
2
Enhanced predictability and interpretability of COVID-19 severity based on SARS-CoV-2 genomic diversity: a comprehensive study encompassing four years of data.基于 SARS-CoV-2 基因组多样性的 COVID-19 严重程度的可预测性和可解释性增强:一项涵盖四年数据的综合研究。
Sci Rep. 2024 Nov 6;14(1):26992. doi: 10.1038/s41598-024-78493-1.
3
Evaluation of Mutual Information and Feature Selection for SARS-CoV-2 Respiratory Infection.

本文引用的文献

1
Does immune recognition of SARS-CoV2 epitopes vary between different ethnic groups?不同种族人群对 SARS-CoV2 表位的免疫识别是否存在差异?
Virus Res. 2021 Nov;305:198579. doi: 10.1016/j.virusres.2021.198579. Epub 2021 Sep 21.
2
Ongoing global and regional adaptive evolution of SARS-CoV-2.SARS-CoV-2 在全球和区域范围内持续的适应性进化。
Proc Natl Acad Sci U S A. 2021 Jul 20;118(29). doi: 10.1073/pnas.2104241118. Epub 2021 Jul 2.
3
The emerging SARS-CoV-2 variants of concern.新出现的严重急性呼吸综合征冠状病毒2(SARS-CoV-2)关注变体。
新型冠状病毒2型呼吸道感染的互信息评估与特征选择
Bioengineering (Basel). 2023 Jul 24;10(7):880. doi: 10.3390/bioengineering10070880.
4
Tracking mutational semantics of SARS-CoV-2 genomes.追踪 SARS-CoV-2 基因组的突变语义。
Sci Rep. 2022 Sep 20;12(1):15704. doi: 10.1038/s41598-022-20000-5.
5
Predicting COVID-19 disease severity from SARS-CoV-2 spike protein sequence by mixed effects machine learning.基于混合效应机器学习的 SARS-CoV-2 刺突蛋白序列预测 COVID-19 疾病严重程度。
Comput Biol Med. 2022 Oct;149:105969. doi: 10.1016/j.compbiomed.2022.105969. Epub 2022 Aug 17.
Ther Adv Infect Dis. 2021 Jun 18;8:20499361211024372. doi: 10.1177/20499361211024372. eCollection 2021 Jan-Dec.
4
COVIDOUTCOME-estimating COVID severity based on mutation signatures in the SARS-CoV-2 genome.COVIDOUTCOME——基于严重急性呼吸综合征冠状病毒2(SARS-CoV-2)基因组中的突变特征估计新冠严重程度。
Database (Oxford). 2021 May 8;2021. doi: 10.1093/database/baab020.
5
Contribution of machine learning approaches in response to SARS-CoV-2 infection.机器学习方法在应对严重急性呼吸综合征冠状病毒2(SARS-CoV-2)感染中的作用。
Inform Med Unlocked. 2021;23:100526. doi: 10.1016/j.imu.2021.100526. Epub 2021 Jan 24.
6
Total predicted MHC-I epitope load is inversely associated with population mortality from SARS-CoV-2.总预测 MHC-I 表位负荷与 SARS-CoV-2 导致的人群死亡率呈负相关。
Cell Rep Med. 2021 Mar 16;2(3):100221. doi: 10.1016/j.xcrm.2021.100221. Epub 2021 Feb 25.
7
Hospitalization and mortality associated with SARS-CoV-2 viral clades in COVID-19.与 COVID-19 中 SARS-CoV-2 病毒亚群相关的住院和死亡。
Sci Rep. 2021 Feb 26;11(1):4802. doi: 10.1038/s41598-021-82850-9.
8
Prediction of death status on the course of treatment in SARS-COV-2 patients with deep learning and machine learning methods.利用深度学习和机器学习方法预测 SARS-CoV-2 患者治疗过程中的死亡状态。
Comput Methods Programs Biomed. 2021 Apr;201:105951. doi: 10.1016/j.cmpb.2021.105951. Epub 2021 Jan 22.
9
Machine learning-based prediction of COVID-19 diagnosis based on symptoms.基于症状的新冠肺炎诊断的机器学习预测
NPJ Digit Med. 2021 Jan 4;4(1):3. doi: 10.1038/s41746-020-00372-6.
10
Evaluation of a genetic risk score for severity of COVID-19 using human chromosomal-scale length variation.利用人类染色体尺度长度变异评估 COVID-19 严重程度的遗传风险评分。
Hum Genomics. 2020 Oct 9;14(1):36. doi: 10.1186/s40246-020-00288-y.