Suppr超能文献

分群解耦在基因表达谱数据分析中的多基因分析

Partition decoupling for multi-gene analysis of gene expression profiling data.

机构信息

Department of Preventive Medicine and Robert H, Lurie Cancer Center, Northwestern University, Chicago, IL, USA.

出版信息

BMC Bioinformatics. 2011 Dec 30;12:497. doi: 10.1186/1471-2105-12-497.

Abstract

BACKGROUND

Multi-gene interactions likely play an important role in the development of complex phenotypes, and relationships between interacting genes pose a challenging statistical problem in microarray analysis, since the genes involved in these interactions may not exhibit marginal differential expression. As a result, it is necessary to develop tools that can identify sets of interacting genes that discriminate phenotypes without requiring that the classification boundary between phenotypes be convex.

RESULTS

We describe an extension and application of a new unsupervised statistical learning technique, known as the Partition Decoupling Method (PDM), to gene expression microarray data. This method may be used to classify samples based on multi-gene expression patterns and to identify pathways associated with phenotype, without relying upon the differential expression of individual genes. The PDM uses iterated spectral clustering and scrubbing steps, revealing at each iteration progressively finer structure in the geometry of the data. Because spectral clustering has the ability to discern clusters that are not linearly separable, it is able to articulate relationships between samples that would be missed by distance- and tree-based classifiers. After projecting the data onto the cluster centroids and computing the residuals ("scrubbing"), one can repeat the spectral clustering, revealing clusters that were not discernible in the first layer. These iterations, each of which provide a partition of the data that is decoupled from the others, are carried forward until the structure in the residuals is indistinguishable from noise, preventing over-fitting. We describe the PDM in detail and apply it to three publicly available cancer gene expression data sets. By applying the PDM on a pathway-by-pathway basis and identifying those pathways that permit unsupervised clustering of samples that match known sample characteristics, we show how the PDM may be used to find sets of mechanistically-related genes that may play a role in disease. An R package to carry out the PDM is available for download.

CONCLUSIONS

We show that the PDM is a useful tool for the analysis of gene expression data from complex diseases, where phenotypes are not linearly separable and multi-gene effects are likely to play a role. Our results demonstrate that the PDM is able to distinguish cell types and treatments with higher accuracy than is obtained through other approaches, and that the Pathway-PDM application is a valuable technique for identifying disease-associated pathways.

摘要

背景

多基因相互作用可能在复杂表型的发展中起着重要作用,而在微阵列分析中,相互作用基因之间的关系构成了一个具有挑战性的统计问题,因为这些相互作用涉及的基因可能不表现出边缘差异表达。因此,有必要开发能够识别区分表型的相互作用基因集的工具,而无需要求表型之间的分类边界是凸的。

结果

我们描述了一种新的无监督统计学习技术,称为分区分解方法(PDM)的扩展和应用,该技术可用于基因表达微阵列数据。该方法可用于基于多基因表达模式对样本进行分类,并识别与表型相关的途径,而无需依赖于单个基因的差异表达。PDM 使用迭代谱聚类和清理步骤,在每次迭代中揭示数据几何形状中越来越精细的结构。由于谱聚类具有辨别不可线性分离的聚类的能力,因此它能够阐明在距离和基于树的分类器中会错过的样本之间的关系。在将数据投影到聚类中心点并计算残差(“清理”)之后,可以重复进行谱聚类,从而揭示在第一层中无法辨别出的聚类。这些迭代每次都提供一个与其他迭代解耦的数据分区,直到残差中的结构与噪声无法区分,从而防止过度拟合。我们详细描述了 PDM,并将其应用于三个公开的癌症基因表达数据集。通过在途径对途径的基础上应用 PDM,并识别那些允许与已知样本特征匹配的样本进行无监督聚类的途径,我们展示了如何使用 PDM 找到可能在疾病中起作用的具有机制相关性的基因集。可用于执行 PDM 的 R 包可下载。

结论

我们表明 PDM 是分析复杂疾病基因表达数据的有用工具,其中表型不是线性可分离的,并且多基因效应可能起作用。我们的结果表明,PDM 能够比其他方法更准确地区分细胞类型和处理,并且途径-PDM 应用是识别与疾病相关途径的有价值的技术。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/52cd/3276603/f0bf35ed2a99/1471-2105-12-497-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验