Baeissa Hanadi M, Pearl Frances M G
Bioinformatics Group, School of Life Sciences, University of Sussex, Brighton, United Kingdom.
J Comput Biol. 2020 May;27(5):786-795. doi: 10.1089/cmb.2018.0192. Epub 2019 Aug 28.
Inframe insertion and deletion mutations (indels) are commonly observed in cancer samples accounting for over 1% of all reported mutations. Few somatic inframe indels have been clinically documented as pathogenic and at present there are few tools to predict which indels drive cancer development. However, indels are a common feature of hereditary disease and several tools have been developed to predict the impact of inframe indels on protein function. In this study, we test whether six of the popular prediction tools can be adapted to test for cancer driver mutations and then develop a new algorithm (IndelRF) that discriminates between recurrent indels in known cancer genes and indels not associated with disease. IndelRF was developed to try and identify somatic, driver, and inframe indel mutations. Using a random forest classifier with 11 features, IndelRF achieved accuracies of 0.995 and 0.968 for insertion and deletion mutations, respectively. Finally, we use IndelRF to classify the inframe indel cancer mutations in the MOKCa database.
框内插入和缺失突变(indels)在癌症样本中很常见,占所有报告突变的1%以上。很少有体细胞框内indels在临床上被记录为致病突变,目前几乎没有工具可以预测哪些indels会驱动癌症发展。然而,indels是遗传性疾病的一个常见特征,已经开发了几种工具来预测框内indels对蛋白质功能的影响。在本研究中,我们测试了六种流行的预测工具是否可以用于检测癌症驱动突变,然后开发了一种新算法(IndelRF),该算法可以区分已知癌症基因中的复发性indels和与疾病无关的indels。开发IndelRF是为了尝试识别体细胞、驱动性和框内indel突变。使用具有11个特征的随机森林分类器,IndelRF对插入和缺失突变的准确率分别达到了0.995和0.968。最后,我们使用IndelRF对MOKCa数据库中的框内indel癌症突变进行分类。