School of Mathematical Sciences, Beijing Normal University, Laboratory of Mathematics and Complex Systems, Ministry of Education, Beijing 100875, P.R. China.
Bioinformatics. 2012 Nov 15;28(22):2948-55. doi: 10.1093/bioinformatics/bts558. Epub 2012 Oct 7.
It becomes widely accepted that human cancer is a disease involving dynamic changes in the genome and that the missense mutations constitute the bulk of human genetic variations. A multitude of computational algorithms, especially the machine learning-based ones, has consequently been proposed to distinguish missense changes that contribute to the cancer progression ('driver' mutation) from those that do not ('passenger' mutation). However, the existing methods have multifaceted shortcomings, in the sense that they either adopt incomplete feature space or depend on protein structural databases which are usually far from integrated.
In this article, we investigated multiple aspects of a missense mutation and identified a novel feature space that well distinguishes cancer-associated driver mutations from passenger ones. An index (DX score) was proposed to evaluate the discriminating capability of each feature, and a subset of these features which ranks top was selected to build the SVM classifier. Cross-validation showed that the classifier trained on our selected features significantly outperforms the existing ones both in precision and robustness. We applied our method to several datasets of missense mutations culled from published database and literature and obtained more reasonable results than previous studies.
The software is available online at http://www.methodisthealth.com/software and https://sites.google.com/site/drivermutationidentification/.
Supplementary data are available at Bioinformatics online.
人们普遍认为,人类癌症是一种涉及基因组动态变化的疾病,而错义突变构成了人类遗传变异的大部分。因此,已经提出了许多计算算法,特别是基于机器学习的算法,以区分导致癌症进展的错义变化(“驱动”突变)和那些不导致癌症进展的变化(“乘客”突变)。然而,现有的方法存在多方面的缺点,因为它们要么采用不完整的特征空间,要么依赖于蛋白质结构数据库,而这些数据库通常远未集成。
在本文中,我们研究了错义突变的多个方面,并确定了一个新的特征空间,可以很好地区分与癌症相关的驱动突变和乘客突变。提出了一个指数(DX 分数)来评估每个特征的区分能力,并选择排名靠前的这些特征的子集来构建 SVM 分类器。交叉验证表明,在精度和稳健性方面,基于我们选择的特征训练的分类器明显优于现有的分类器。我们将我们的方法应用于从已发表的数据库和文献中提取的几种错义突变数据集,并获得了比以前的研究更合理的结果。
该软件可在 http://www.methodisthealth.com/software 和 https://sites.google.com/site/drivermutationidentification/ 在线使用。
补充数据可在 Bioinformatics 在线获得。