Institute of Medical Biometry and Statistics, Faculty of Medicine and Medical Center, University of Freiburg, 79104 Freiburg im Breisgau, Germany.
Institute of Medical Bioinformatics and Systems Medicine, Medical Center-University of Freiburg, Faculty of Medicine, University of Freiburg, 79110 Freiburg im Breisgau, Germany.
Genes (Basel). 2020 Sep 14;11(9):1076. doi: 10.3390/genes11091076.
A novel approach is developed to address the challenge of annotating with phenotypic effects those exome variants for which relevant empirical data are lacking or minimal. The predictive annotation method is implemented as a stacked ensemble of supervised base-learners, including distributed random forest and gradient boosting machines. Ensemble models were trained and cross-validated on evidence-based categorical variant effect annotations from the ClinVar database, and were applied to 84 million non-synonymous single nucleotide variants (SNVs). The consensus model combined 39 functional mutation impacts, cross-species conservation score, and gene indispensability score. The indispensability score, accounting for differences in variant pathogenicities including in essential and mutation-tolerant genes, considerably improved the predictions. The consensus combination is consistent with as many input scores as possible while minimizing false predictions. The input scores are ranked based on their ability to predict effects. The score rankings and categorical phenotypic variant effect predictions are aimed for direct use in clinical and biological applications to prioritize human exome variants and mutations.
针对缺乏或仅有少量相关经验数据的外显子变异进行表型效应注释这一挑战,我们开发了一种新方法。预测性注释方法实现为一个监督基础学习器的堆叠集成,包括分布式随机森林和梯度提升机。基于 ClinVar 数据库中的基于证据的分类变异效应注释对集成模型进行了训练和交叉验证,并将其应用于 8400 万非同义单核苷酸变异 (SNV)。共识模型结合了 39 种功能突变影响、跨物种保守性评分和基因不可或缺性评分。不可或缺性评分考虑了变异致病性的差异,包括在必需和耐受突变的基因中,这极大地提高了预测的准确性。共识组合尽可能地与多个输入分数一致,同时最小化假阳性预测。根据预测效果的能力对输入分数进行排序。分数排名和分类表型变异效应预测旨在直接用于临床和生物学应用,以优先考虑人类外显子变异和突变。