Suppr超能文献

KinMutRF:人类蛋白激酶超家族中序列变异的随机森林分类器。

KinMutRF: a random forest classifier of sequence variants in the human protein kinase superfamily.

作者信息

Pons Tirso, Vazquez Miguel, Matey-Hernandez María Luisa, Brunak Søren, Valencia Alfonso, Izarzugaza Jose Mg

机构信息

Structural Biology and BioComputing Programme, Spanish National Cancer Research Centre (CNIO), Melchor Fernández Almagro, 3, 28029, Madrid, Spain.

Center for Biological Sequence Analysis (CBS), Systems Biology Department, Technical University of Denmark (DTU), Kemitorvet, Building 208, 2800 Kgs., Lyngby, Denmark.

出版信息

BMC Genomics. 2016 Jun 23;17 Suppl 2(Suppl 2):396. doi: 10.1186/s12864-016-2723-1.

Abstract

BACKGROUND

The association between aberrant signal processing by protein kinases and human diseases such as cancer was established long time ago. However, understanding the link between sequence variants in the protein kinase superfamily and the mechanistic complex traits at the molecular level remains challenging: cells tolerate most genomic alterations and only a minor fraction disrupt molecular function sufficiently and drive disease.

RESULTS

KinMutRF is a novel random-forest method to automatically identify pathogenic variants in human kinases. Twenty six decision trees implemented as a random forest ponder a battery of features that characterize the variants: a) at the gene level, including membership to a Kinbase group and Gene Ontology terms; b) at the PFAM domain level; and c) at the residue level, the types of amino acids involved, changes in biochemical properties, functional annotations from UniProt, Phospho.ELM and FireDB. KinMutRF identifies disease-associated variants satisfactorily (Acc: 0.88, Prec:0.82, Rec:0.75, F-score:0.78, MCC:0.68) when trained and cross-validated with the 3689 human kinase variants from UniProt that have been annotated as neutral or pathogenic. All unclassified variants were excluded from the training set. Furthermore, KinMutRF is discussed with respect to two independent kinase-specific sets of mutations no included in the training and testing, Kin-Driver (643 variants) and Pon-BTK (1495 variants). Moreover, we provide predictions for the 848 protein kinase variants in UniProt that remained unclassified. A public implementation of KinMutRF, including documentation and examples, is available online ( http://kinmut2.bioinfo.cnio.es ). The source code for local installation is released under a GPL version 3 license, and can be downloaded from https://github.com/Rbbt-Workflows/KinMut2 .

CONCLUSIONS

KinMutRF is capable of classifying kinase variation with good performance. Predictions by KinMutRF compare favorably in a benchmark with other state-of-the-art methods (i.e. SIFT, Polyphen-2, MutationAssesor, MutationTaster, LRT, CADD, FATHMM, and VEST). Kinase-specific features rank as the most elucidatory in terms of information gain and are likely the improvement in prediction performance. This advocates for the development of family-specific classifiers able to exploit the discriminatory power of features unique to individual protein families.

摘要

背景

蛋白激酶异常信号处理与癌症等人类疾病之间的关联早在很久以前就已确立。然而,在分子水平上理解蛋白激酶超家族中的序列变异与复杂机制性状之间的联系仍然具有挑战性:细胞能够耐受大多数基因组改变,只有一小部分会充分破坏分子功能并引发疾病。

结果

KinMutRF是一种用于自动识别人类激酶中致病变异的新型随机森林方法。作为随机森林实现的26个决策树考虑了一系列表征变异的特征:a)在基因水平,包括属于Kinbase组和基因本体术语;b)在PFAM结构域水平;c)在残基水平,涉及的氨基酸类型、生化特性的变化、来自UniProt、Phospho.ELM和FireDB的功能注释。当使用来自UniProt的3689个人类激酶变异进行训练和交叉验证时,这些变异已被注释为中性或致病性,KinMutRF能够令人满意地识别与疾病相关的变异(准确率:0.88,精确率:0.82,召回率:0.75,F值:0.78,马修斯相关系数:0.68)。所有未分类的变异都被排除在训练集中。此外,还针对两个独立的、未包含在训练和测试中的激酶特异性突变集Kin-Driver(643个变异)和Pon-BTK(1495个变异)对KinMutRF进行了讨论。此外,我们还对UniProt中848个未分类的蛋白激酶变异进行了预测。KinMutRF的公开实现,包括文档和示例,可在线获取(http://kinmut2.bioinfo.cnio.es)。用于本地安装的源代码根据GPL第3版许可发布,可从https://github.com/Rbbt-Workflows/KinMut2下载。

结论

KinMutRF能够以良好的性能对激酶变异进行分类。在基准测试中,KinMutRF的预测与其他现有最先进方法(即SIFT、Polyphen-2、MutationAssesor、MutationTaster、LRT、CADD、FATHMM和VEST)相比具有优势。就信息增益而言,激酶特异性特征排名最具解释力,并且可能是预测性能提高的原因。这提倡开发能够利用单个蛋白质家族独特特征的区分能力的家族特异性分类器。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验