Matsui Shigeyuki, Yamanaka Takeharu, Barlogie Bart, Shaughnessy John D, Crowley John
Department of Pharmacoepidemiology, School of Public Health, Kyoto University, Yoshida Konoe-cho, Sakyo-ku, Kyoto, Japan.
Stat Med. 2008 Mar 30;27(7):1106-20. doi: 10.1002/sim.2997.
When a large number of genes are significant in correlating microarray gene expression data with patient prognosis, clustering of significant genes may be effective not only for further dimension reduction but also for identifying co-regulated genes that belong to the same molecular pathway related to disease biology and aggressiveness. Moreover, a reduced feature, such as the average expression across samples for a cluster of significant genes, can play an important role in reducing variance in prediction analysis. We propose a simple procedure to select gene clusters that have strong marginal association with survival outcome from a large pool of candidate hierarchical clusters of significant genes. Selected gene clusters can have better predictive capability than the other gene clusters and singleton genes. Application of such clustering to the data set from a clinical study for patients with multiple myeloma and associated microarrays is given.
当大量基因在将微阵列基因表达数据与患者预后关联时具有显著性时,对显著基因进行聚类不仅可能有助于进一步降维,还能识别属于与疾病生物学和侵袭性相关的同一分子途径的共调控基因。此外,一个简化特征,比如一组显著基因在样本中的平均表达,在减少预测分析中的方差方面可以发挥重要作用。我们提出了一个简单的程序,从大量显著基因的候选层次聚类中选择与生存结果有强烈边际关联的基因簇。所选基因簇可能比其他基因簇和单个基因具有更好的预测能力。文中给出了这种聚类方法在一项针对多发性骨髓瘤患者的临床研究及其相关微阵列数据集上的应用。