Zhang Wei, Li Li, Li Xia, Jiang Wei, Huo Jianmin, Wang Yadong, Lin Meihua, Rao Shaoqi
The First Clinical College, Department of Bioinformatics, and the Bio-pharmaceutical Key Laboratory of Heilongjiang Province and State, Harbin Medical University, Harbin 150086, China.
BMC Genomics. 2007 Sep 22;8:332. doi: 10.1186/1471-2164-8-332.
It becomes increasingly clear that our current taxonomy of clinical phenotypes is mixed with molecular heterogeneity. Of vital importance for refined clinical practice and improved intervention strategies is to define the hidden molecular distinct diseases using modern large-scale genomic approaches. Microarray omics technology has provided a powerful way to dissect hidden genetic heterogeneity of complex diseases. The aim of this study was thus to develop a bioinformatics approach to seek the transcriptional features leading to the hidden subtyping of a complex clinical phenotype. The basic strategy of the proposed method was to iteratively partition in two ways sample and feature space with super-paramagnetic clustering technique and to seek for hard and robust gene clusters that lead to a natural partition of disease samples and that have the highest functionally conceptual consensus evaluated with Gene Ontology.
We applied the proposed method to two publicly available microarray datasets of diffuse large B-cell lymphoma (DLBCL), a notoriously heterogeneous phenotype. A feature subset of 30 genes (38 probes) derived from analysis of the first dataset consisting of 4026 genes and 42 DLBCL samples identified three categories of patients with very different five-year overall survival rates (70.59%, 44.44% and 14.29% respectively; p = 0.0017). Analysis of the second dataset consisting of 7129 genes and 58 DLBCL samples revealed a feature subset of 13 genes (16 probes) that not only replicated the findings of the important DLBCL genes (e.g. JAW1 and BCL7A), but also identified three clinically similar subtypes (with 5-year overall survival rates of 63.13%, 34.92% and 15.38% respectively; p = 0.0009) to those identified in the first dataset. Finally, we built a multivariate Cox proportional-hazards prediction model for each feature subset and defined JAW1 as one of the most significant predictor (p = 0.005 and 0.014; hazard ratios = 0.02 and 0.03, respectively for two datasets) for both DLBCL cohorts under study.
Our results showed that the proposed algorithm is a promising computational strategy for peeling off the hidden genetic heterogeneity based on transcriptionally profiling disease samples, which may lead to an improved diagnosis and treatment of cancers.
越来越明显的是,我们当前的临床表型分类与分子异质性相互交织。利用现代大规模基因组方法定义隐藏的分子特征不同的疾病,对于优化临床实践和改进干预策略至关重要。微阵列组学技术为剖析复杂疾病隐藏的遗传异质性提供了强有力的方法。因此,本研究的目的是开发一种生物信息学方法,以寻找导致复杂临床表型隐藏亚型的转录特征。所提出方法的基本策略是以超顺磁聚类技术对样本和特征空间进行两种方式的迭代划分,并寻找能够导致疾病样本自然划分且通过基因本体评估具有最高功能概念一致性的硬且稳健的基因簇。
我们将所提出的方法应用于两个公开可用的弥漫性大B细胞淋巴瘤(DLBCL)微阵列数据集,DLBCL是一种众所周知的异质性表型。对由4026个基因和42个DLBCL样本组成的第一个数据集进行分析得到的一个包含30个基因(38个探针)的特征子集,确定了三类患者,其五年总生存率差异很大(分别为70.59%、44.44%和14.29%;p = 0.0017)。对由7129个基因和58个DLBCL样本组成的第二个数据集进行分析,发现一个包含13个基因(16个探针)的特征子集,该子集不仅重现了重要DLBCL基因(如JAW1和BCL7A)的研究结果,还确定了与第一个数据集中所确定的临床相似的三个亚型(五年总生存率分别为63.13%、34.92%和15.38%;p = 0.0009)。最后,我们为每个特征子集构建了一个多变量Cox比例风险预测模型,并将JAW1定义为所研究的两个DLBCL队列中最显著的预测因子之一(两个数据集的p值分别为 = 0.005和0.014;风险比分别为0.02和0.03)。
我们的结果表明,所提出的算法是一种基于疾病样本转录谱剖析隐藏遗传异质性的有前景的计算策略,这可能会改善癌症的诊断和治疗。