School of Public Health, Yale University, New Haven, CT 06520, USA.
BMC Genomics. 2009 Nov 17;10:535. doi: 10.1186/1471-2164-10-535.
Advancement in gene profiling techniques makes it possible to measure expressions of thousands of genes and identify genes associated with development and progression of cancer. The identified cancer-associated genes can be used for diagnosis, prognosis prediction, and treatment selection. Most existing cancer microarray studies have been focusing on the identification of genes associated with a specific type of cancer. Recent biomedical studies suggest that different cancers may share common susceptibility genes. A comprehensive description of the associations between genes and cancers requires identification of not only multiple genes associated with a specific type of cancer but also genes associated with multiple cancers.
In this article, we propose the Mc.TGD (Multi-cancer Threshold Gradient Descent), an integrative analysis approach capable of analyzing multiple microarray studies on different cancers. The Mc.TGD is the first regularized approach to conduct "two-dimensional" selection of genes with joint effects on cancer development. Simulation studies show that the Mc.TGD can more accurately identify genes associated with multiple cancers than meta analysis based on "one-dimensional" methods. As a byproduct, identification accuracy of genes associated with only one type of cancer may also be improved. We use the Mc.TGD to analyze seven microarray studies investigating development of seven different types of cancers. We identify one gene associated with six types of cancers and four genes associated with five types of cancers. In addition, we also identify 11, 9, 18, and 17 genes associated with 4 to 1 types of cancers, respectively. We evaluate prediction performance using a Leave-One-Out cross validation approach and find that only 4 (out of 570) subjects cannot be properly predicted.
The Mc.TGD can identify a short list of genes associated with one or multiple types of cancers. The identified genes are considerably different from those identified using meta analysis or analysis of marginal effects.
基因谱分析技术的进步使得测量数千个基因的表达并识别与癌症发生和发展相关的基因成为可能。鉴定出的与癌症相关的基因可用于诊断、预后预测和治疗选择。大多数现有的癌症微阵列研究都集中在鉴定与特定类型癌症相关的基因。最近的生物医学研究表明,不同的癌症可能具有共同的易感基因。全面描述基因与癌症之间的关联,不仅需要鉴定与特定类型癌症相关的多个基因,还需要鉴定与多种癌症相关的基因。
在本文中,我们提出了 Mc.TGD(多癌症阈值梯度下降),这是一种能够分析不同癌症的多个微阵列研究的综合分析方法。Mc.TGD 是第一个对具有癌症发展联合效应的基因进行“二维”选择的正则化方法。模拟研究表明,与基于“一维”方法的荟萃分析相比,Mc.TGD 可以更准确地鉴定与多种癌症相关的基因。作为副产品,与仅一种类型癌症相关的基因的鉴定准确性也可能得到提高。我们使用 Mc.TGD 分析了 7 项研究 7 种不同类型癌症发展的微阵列研究。我们鉴定出一个与 6 种癌症相关的基因和 4 个与 5 种癌症相关的基因。此外,我们还分别鉴定出与 4 到 1 种癌症相关的 11、9、18 和 17 个基因。我们使用留一交叉验证方法评估预测性能,发现只有 4(570 个中的 4 个)个个体不能被正确预测。
Mc.TGD 可以鉴定出与一种或多种类型癌症相关的基因列表。鉴定出的基因与使用荟萃分析或边际效应分析鉴定出的基因有很大的不同。