Martoglio Ann-Marie, Miskin James W, Smith Stephen K, MacKay David J C
Department of Pathology, University of Cambridge, Tennis Court Road, Cambridge, CB2 1QP, UK.
Bioinformatics. 2002 Dec;18(12):1617-24. doi: 10.1093/bioinformatics/18.12.1617.
A number of algorithms and analytical models have been employed to reduce the multidimensional complexity of DNA array data and attempt to extract some meaningful interpretation of the results. These include clustering, principal components analysis, self-organizing maps, and support vector machine analysis. Each method assumes an implicit model for the data, many of which separate genes into distinct clusters defined by similar expression profiles in the samples tested. A point of concern is that many genes may be involved in a number of distinct behaviours, and should therefore be modelled to fit into as many separate clusters as detected in the multidimensional gene expression space. The analysis of gene expression data using a decomposition model that is independent of the observer involved would be highly beneficial to improve standard and reproducible classification of clinical and research samples.
We present a variational independent component analysis (ICA) method for reducing high dimensional DNA array data to a smaller set of latent variables, each associated with a gene signature. We present the results of applying the method to data from an ovarian cancer study, revealing a number of tissue type-specific and tissue type-independent gene signatures present in varying amounts among the samples surveyed. The observer independent results of such molecular analysis of biological samples could help identify patients who would benefit from different treatment strategies. We further explore the application of the model to similar high-throughput studies.
已采用多种算法和分析模型来降低DNA阵列数据的多维复杂性,并尝试对结果进行一些有意义的解读。这些方法包括聚类、主成分分析、自组织映射和支持向量机分析。每种方法都对数据采用了一个隐含模型,其中许多方法将基因分为由测试样本中相似表达谱定义的不同簇。一个值得关注的问题是,许多基因可能参与多种不同的行为,因此应进行建模以适应在多维基因表达空间中检测到的尽可能多的单独簇。使用独立于相关观察者的分解模型来分析基因表达数据,将非常有助于改进临床和研究样本的标准且可重复的分类。
我们提出了一种变分独立成分分析(ICA)方法,用于将高维DNA阵列数据简化为一组较小的潜在变量,每个变量都与一个基因特征相关。我们展示了将该方法应用于卵巢癌研究数据的结果,揭示了在所调查的样本中存在数量各异的多种组织类型特异性和组织类型非特异性基因特征。生物样本这种分子分析的独立于观察者的结果有助于识别可能从不同治疗策略中获益的患者。我们进一步探讨了该模型在类似高通量研究中的应用。