Suppr超能文献

复杂进化信号的机器学习可改善单核苷酸变异的分类。

Machine-learning of complex evolutionary signals improves classification of SNVs.

作者信息

Labes Sapir, Stupp Doron, Wagner Naama, Bloch Idit, Lotem Michal, L Lahad Ephrat, Polak Paz, Pupko Tal, Tabach Yuval

机构信息

Department of Developmental Biology and Cancer Research, Institute for Medical Research Israel-Canada, Faculty of Medicine, and Hadassah University Medical School, The Hebrew University of Jerusalem, Jerusalem9112001, Israel.

The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 6997801, Israel.

出版信息

NAR Genom Bioinform. 2022 Apr 7;4(2):lqac025. doi: 10.1093/nargab/lqac025. eCollection 2022 Jun.

Abstract

Conservation is a strong predictor for the pathogenicity of single-nucleotide variants (SNVs). However, some positions that present complex conservation patterns across vertebrates stray from this paradigm. Here, we analyzed the association between complex conservation patterns and the pathogenicity of SNVs in the 115 disease-genes that had sufficient variant data. We show that conservation is not a one-rule-fits-all solution since its accuracy highly depends on the analyzed set of species and genes. For example, pairwise comparisons between the human and 99 vertebrate species showed that species differ in their ability to predict the clinical outcomes of variants among different genes using conservation. Furthermore, certain genes were less amenable for conservation-based variant prediction, while others demonstrated species that optimize prediction. These insights led to developing EvoDiagnostics, which uses the conservation against each species as a feature within a random-forest machine-learning classification algorithm. EvoDiagnostics outperformed traditional conservation algorithms, deep-learning based methods and most ensemble tools in every prediction-task, highlighting the strength of optimizing conservation analysis per-species and per-gene. Overall, we suggest a new and a more biologically relevant approach for analyzing conservation, which improves prediction of variant pathogenicity.

摘要

保守性是单核苷酸变异(SNV)致病性的有力预测指标。然而,一些在脊椎动物中呈现复杂保守模式的位点却偏离了这一范式。在此,我们分析了115个拥有足够变异数据的疾病基因中,复杂保守模式与SNV致病性之间的关联。我们发现保守性并非适用于所有情况的解决方案,因为其准确性高度依赖于所分析的物种和基因集合。例如,人类与99种脊椎动物物种之间的成对比较表明,不同物种在利用保守性预测不同基因变异的临床结果方面能力存在差异。此外,某些基因不太适合基于保守性的变异预测,而其他基因则展示出能优化预测的物种。这些见解促使我们开发了EvoDiagnostics,它在随机森林机器学习分类算法中,将针对每个物种的保守性作为一个特征来使用。在每项预测任务中,EvoDiagnostics都优于传统保守算法、基于深度学习的方法以及大多数集成工具,凸显了针对每个物种和每个基因优化保守性分析的优势。总体而言,我们提出了一种新的、更具生物学相关性的保守性分析方法,该方法改进了变异致病性的预测。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/68cd/8988715/0c70588d57d9/lqac025fig1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验