School of Informatics, Computing, and Engineering, Indiana University, Bloomington, Indiana, United States of America.
Department of Psychiatry, University of California San Diego, La Jolla, California, United States of America.
PLoS Comput Biol. 2019 Jun 14;15(6):e1007112. doi: 10.1371/journal.pcbi.1007112. eCollection 2019 Jun.
Differentiation between phenotypically neutral and disease-causing genetic variation remains an open and relevant problem. Among different types of variation, non-frameshifting insertions and deletions (indels) represent an understudied group with widespread phenotypic consequences. To address this challenge, we present a machine learning method, MutPred-Indel, that predicts pathogenicity and identifies types of functional residues impacted by non-frameshifting insertion/deletion variation. The model shows good predictive performance as well as the ability to identify impacted structural and functional residues including secondary structure, intrinsic disorder, metal and macromolecular binding, post-translational modifications, allosteric sites, and catalytic residues. We identify structural and functional mechanisms impacted preferentially by germline variation from the Human Gene Mutation Database, recurrent somatic variation from COSMIC in the context of different cancers, as well as de novo variants from families with autism spectrum disorder. Further, the distributions of pathogenicity prediction scores generated by MutPred-Indel are shown to differentiate highly recurrent from non-recurrent somatic variation. Collectively, we present a framework to facilitate the interrogation of both pathogenicity and the functional effects of non-frameshifting insertion/deletion variants. The MutPred-Indel webserver is available at http://mutpred.mutdb.org/.
表型中性和致病遗传变异之间的区分仍然是一个悬而未决的问题。在不同类型的变异中,非移框插入和缺失(indels)是一个研究不足的群体,具有广泛的表型后果。为了解决这一挑战,我们提出了一种机器学习方法 MutPred-Indel,该方法可预测致病性并识别非移框插入/缺失变异影响的功能残基类型。该模型具有良好的预测性能,并且能够识别受影响的结构和功能残基,包括二级结构、固有无序、金属和大分子结合、翻译后修饰、变构位点和催化残基。我们从人类基因突变数据库中确定了种系变异优先影响的结构和功能机制,从 COSMIC 中确定了不同癌症背景下的高频体细胞变异和自闭症谱系障碍家族中的新生变异。此外,MutPred-Indel 生成的致病性预测评分分布表明可区分高频和非高频体细胞变异。总的来说,我们提出了一个框架,以促进对非移框插入/缺失变异的致病性和功能影响的研究。MutPred-Indel 网络服务器可在 http://mutpred.mutdb.org/ 上访问。