复杂进化信号的机器学习可改善单核苷酸变异的分类。

Machine-learning of complex evolutionary signals improves classification of SNVs.

作者信息

Labes Sapir, Stupp Doron, Wagner Naama, Bloch Idit, Lotem Michal, L Lahad Ephrat, Polak Paz, Pupko Tal, Tabach Yuval

机构信息

Department of Developmental Biology and Cancer Research, Institute for Medical Research Israel-Canada, Faculty of Medicine, and Hadassah University Medical School, The Hebrew University of Jerusalem, Jerusalem9112001, Israel.

The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 6997801, Israel.

出版信息

NAR Genom Bioinform. 2022 Apr 7;4(2):lqac025. doi: 10.1093/nargab/lqac025. eCollection 2022 Jun.

DOI:10.1093/nargab/lqac025

PMID:35402908

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8988715/

Abstract

Conservation is a strong predictor for the pathogenicity of single-nucleotide variants (SNVs). However, some positions that present complex conservation patterns across vertebrates stray from this paradigm. Here, we analyzed the association between complex conservation patterns and the pathogenicity of SNVs in the 115 disease-genes that had sufficient variant data. We show that conservation is not a one-rule-fits-all solution since its accuracy highly depends on the analyzed set of species and genes. For example, pairwise comparisons between the human and 99 vertebrate species showed that species differ in their ability to predict the clinical outcomes of variants among different genes using conservation. Furthermore, certain genes were less amenable for conservation-based variant prediction, while others demonstrated species that optimize prediction. These insights led to developing EvoDiagnostics, which uses the conservation against each species as a feature within a random-forest machine-learning classification algorithm. EvoDiagnostics outperformed traditional conservation algorithms, deep-learning based methods and most ensemble tools in every prediction-task, highlighting the strength of optimizing conservation analysis per-species and per-gene. Overall, we suggest a new and a more biologically relevant approach for analyzing conservation, which improves prediction of variant pathogenicity.

摘要

保守性是单核苷酸变异（SNV）致病性的有力预测指标。然而，一些在脊椎动物中呈现复杂保守模式的位点却偏离了这一范式。在此，我们分析了115个拥有足够变异数据的疾病基因中，复杂保守模式与SNV致病性之间的关联。我们发现保守性并非适用于所有情况的解决方案，因为其准确性高度依赖于所分析的物种和基因集合。例如，人类与99种脊椎动物物种之间的成对比较表明，不同物种在利用保守性预测不同基因变异的临床结果方面能力存在差异。此外，某些基因不太适合基于保守性的变异预测，而其他基因则展示出能优化预测的物种。这些见解促使我们开发了EvoDiagnostics，它在随机森林机器学习分类算法中，将针对每个物种的保守性作为一个特征来使用。在每项预测任务中，EvoDiagnostics都优于传统保守算法、基于深度学习的方法以及大多数集成工具，凸显了针对每个物种和每个基因优化保守性分析的优势。总体而言，我们提出了一种新的、更具生物学相关性的保守性分析方法，该方法改进了变异致病性的预测。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/68cd/8988715/0c70588d57d9/lqac025fig1.jpg

相似文献

Machine-learning of complex evolutionary signals improves classification of SNVs.复杂进化信号的机器学习可改善单核苷酸变异的分类。

NAR Genom Bioinform. 2022 Apr 7;4(2):lqac025. doi: 10.1093/nargab/lqac025. eCollection 2022 Jun.

An Ensemble Approach to Predict the Pathogenicity of Synonymous Variants.一种预测同义变体致病性的集成方法。

Genes (Basel). 2020 Sep 21;11(9):1102. doi: 10.3390/genes11091102.

Machine learning random forest for predicting oncosomatic variant NGS analysis.机器学习随机森林预测肿瘤体细胞变异 NGS 分析。

Sci Rep. 2021 Nov 8;11(1):21820. doi: 10.1038/s41598-021-01253-y.

VIPPID: a gene-specific single nucleotide variant pathogenicity prediction tool for primary immunodeficiency diseases.VIPPID：一种用于原发性免疫缺陷病的基因特异性单核苷酸变异致病性预测工具。

Brief Bioinform. 2022 Sep 20;23(5). doi: 10.1093/bib/bbac176.

Evaluation of performance of leading algorithms for variant pathogenicity predictions and designing a combinatory predictor method: application to Rett syndrome variants.用于变异致病性预测的领先算法性能评估及组合预测方法设计：应用于雷特综合征变异

PeerJ. 2019 Nov 27;7:e8106. doi: 10.7717/peerj.8106. eCollection 2019.

Integration of Random Forest Classifiers and Deep Convolutional Neural Networks for Classification and Biomolecular Modeling of Cancer Driver Mutations.随机森林分类器与深度卷积神经网络的集成用于癌症驱动突变的分类和生物分子建模

Front Mol Biosci. 2019 Jun 11;6:44. doi: 10.3389/fmolb.2019.00044. eCollection 2019.

Machine learning algorithms to predict early pregnancy loss after in vitro fertilization-embryo transfer with fetal heart rate as a strong predictor.以胎儿心率作为强预测指标，用于预测体外受精-胚胎移植后早期妊娠丢失的机器学习算法。

Comput Methods Programs Biomed. 2020 Nov;196:105624. doi: 10.1016/j.cmpb.2020.105624. Epub 2020 Jun 25.

A novel machine learning-based approach for the computational functional assessment of pharmacogenomic variants.一种基于新型机器学习的方法，用于计算药物基因组变异的功能评估。

Hum Genomics. 2021 Aug 9;15(1):51. doi: 10.1186/s40246-021-00352-1.

AllelePred: A Simple Allele Frequencies Ensemble Predictor for Different Single Nucleotide Variants.等位基因预测：一种针对不同单核苷酸变异的简单等位基因频率集成预测器。

IEEE/ACM Trans Comput Biol Bioinform. 2023 Jan-Feb;20(1):796-801. doi: 10.1109/TCBB.2022.3155659. Epub 2023 Feb 3.

Collective judgment predicts disease-associated single nucleotide variants.群体判断可预测与疾病相关的单核苷酸变异。

BMC Genomics. 2013;14 Suppl 3(Suppl 3):S2. doi: 10.1186/1471-2164-14-S3-S2. Epub 2013 May 28.

引用本文的文献

Using multi-scale genomics to associate poorly annotated genes with rare diseases.利用多尺度基因组学将注释不良的基因与罕见疾病联系起来。

Genome Med. 2024 Jan 4;16(1):4. doi: 10.1186/s13073-023-01276-2.

Using evolutionary data to make sense of macromolecules with a "face-lifted" ConSurf.利用进化数据，通过“改头换面”的 ConSurf 来理解大分子。

Protein Sci. 2023 Mar;32(3):e4582. doi: 10.1002/pro.4582.

Cutting-Edge AI Technologies Meet Precision Medicine to Improve Cancer Care.前沿人工智能技术与精准医学相结合，改善癌症治疗。

Biomolecules. 2022 Aug 17;12(8):1133. doi: 10.3390/biom12081133.

本文引用的文献

Co-evolution based machine-learning for predicting functional interactions between human genes.基于共同进化的机器学习预测人类基因之间的功能相互作用。

Nat Commun. 2021 Nov 9;12(1):6454. doi: 10.1038/s41467-021-26792-w.

Disease variant prediction with deep generative models of evolutionary data.利用进化数据的深度生成模型进行疾病变异预测。

Nature. 2021 Nov;599(7883):91-95. doi: 10.1038/s41586-021-04043-8. Epub 2021 Oct 27.

Expanding the MECP2 network using comparative genomics reveals potential therapeutic targets for Rett syndrome.利用比较基因组学扩展 MECP2 网络，揭示雷特综合征的潜在治疗靶点。

Elife. 2021 Aug 6;10:e67085. doi: 10.7554/eLife.67085.

CladeOScope: functional interactions through the prism of clade-wise co-evolution.进化枝观测镜：通过进化枝特异性共同进化视角探究功能相互作用

NAR Genom Bioinform. 2021 Apr 20;3(2):lqab024. doi: 10.1093/nargab/lqab024. eCollection 2021 Jun.

The UCSC Genome Browser database: 2021 update.UCSC 基因组浏览器数据库：2021 年更新。

Nucleic Acids Res. 2021 Jan 8;49(D1):D1046-D1057. doi: 10.1093/nar/gkaa1070.

The road ahead in genetics and genomics.遗传学和基因组学的未来之路。

Nat Rev Genet. 2020 Oct;21(10):581-596. doi: 10.1038/s41576-020-0272-6. Epub 2020 Aug 24.

ACE2 Co-evolutionary Pattern Suggests Targets for Pharmaceutical Intervention in the COVID-19 Pandemic.血管紧张素转换酶2的共同进化模式为新冠疫情中的药物干预提供了靶点。

iScience. 2020 Aug 21;23(8):101384. doi: 10.1016/j.isci.2020.101384. Epub 2020 Jul 18.

Optimization of co-evolution analysis through phylogenetic profiling reveals pathway-specific signals.通过系统发育轮廓分析进行共进化分析的优化揭示了特定途径的信号。

Bioinformatics. 2020 Aug 15;36(14):4116-4125. doi: 10.1093/bioinformatics/btaa281.

Predicting Functional Effects of Synonymous Variants: A Systematic Review and Perspectives.预测同义变异的功能效应：系统综述与展望

Front Genet. 2019 Oct 7;10:914. doi: 10.3389/fgene.2019.00914. eCollection 2019.

A multi-scale coevolutionary approach to predict interactions between protein domains.一种预测蛋白质结构域相互作用的多尺度协同进化方法。

PLoS Comput Biol. 2019 Oct 21;15(10):e1006891. doi: 10.1371/journal.pcbi.1006891. eCollection 2019 Oct.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

复杂进化信号的机器学习可改善单核苷酸变异的分类。

Machine-learning of complex evolutionary signals improves classification of SNVs.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献