National Center for Mathematics and Interdisciplinary Sciences, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China.
Bioinformatics. 2012 Nov 15;28(22):2940-7. doi: 10.1093/bioinformatics/bts564. Epub 2012 Sep 14.
The first step for clinical diagnostics, prognostics and targeted therapeutics of cancer is to comprehensively understand its molecular mechanisms. Large-scale cancer genomics projects are providing a large volume of data about genomic, epigenomic and gene expression aberrations in multiple cancer types. One of the remaining challenges is to identify driver mutations, driver genes and driver pathways promoting cancer proliferation and filter out the unfunctional and passenger ones.
In this study, we propose two methods to solve the so-called maximum weight submatrix problem, which is designed to de novo identify mutated driver pathways from mutation data in cancer. The first one is an exact method that can be helpful for assessing other approximate or/and heuristic algorithms. The second one is a stochastic and flexible method that can be employed to incorporate other types of information to improve the first method. Particularly, we propose an integrative model to combine mutation and expression data. We first apply our methods onto simulated data to show their efficiency. We further apply the proposed methods onto several real biological datasets, such as the mutation profiles of 74 head and neck squamous cell carcinomas samples, 90 glioblastoma tumor samples and 313 ovarian carcinoma samples. The gene expression profiles were also considered for the later two data. The results show that our integrative model can identify more biologically relevant gene sets. We have implemented all these methods and made a package called mutated driver pathway finder, which can be easily used for other researchers.
A MATLAB package of MDPFinder is available at http://zhangroup.aporc.org/ShiHuaZhang.
Supplementary data are available at Bioinformatics online.
癌症临床诊断、预后和靶向治疗的第一步是全面了解其分子机制。大规模癌症基因组学项目提供了大量关于多种癌症中基因组、表观基因组和基因表达异常的数据。目前面临的挑战之一是识别促进癌症增殖的驱动突变、驱动基因和驱动途径,并筛选出非功能的和乘客的突变。
在这项研究中,我们提出了两种方法来解决所谓的最大权重子矩阵问题,该问题旨在从癌症中的突变数据中从头识别突变驱动途径。第一种是一种精确的方法,可用于评估其他近似或/和启发式算法。第二种是一种随机和灵活的方法,可以用来整合其他类型的信息来改进第一种方法。特别是,我们提出了一个整合模型来结合突变和表达数据。我们首先将我们的方法应用于模拟数据,以显示它们的效率。我们进一步将提出的方法应用于几个真实的生物学数据集,如 74 个头颈部鳞状细胞癌样本的突变谱、90 个胶质母细胞瘤肿瘤样本和 313 个卵巢癌样本。后两个数据还考虑了基因表达谱。结果表明,我们的整合模型可以识别更多生物学上相关的基因集。我们已经实现了所有这些方法,并创建了一个名为 MDPFinder 的 MATLAB 包,可供其他研究人员使用。
MDPFinder 的 MATLAB 包可在 http://zhangroup.aporc.org/ShiHuaZhang 获得。
补充数据可在《生物信息学》在线获得。