Suppr超能文献

利用基因集高分对,从整合的微阵列基因表达数据中识别扩张型心肌病中失调的通路特征。

Employing gene set top scoring pairs to identify deregulated pathway-signatures in dilated cardiomyopathy from integrated microarray gene expression data.

作者信息

Tan Aik Choon

机构信息

Division of Medical Oncology, Department of Medicine, School of Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO, USA.

出版信息

Methods Mol Biol. 2012;802:345-61. doi: 10.1007/978-1-61779-400-1_23.

Abstract

It is well accepted that a set of genes must act in concert to drive various cellular processes. However, under different biological phenotypes, not all the members of a gene set will participate in a biological process. Hence, it is useful to construct a discriminative classifier by focusing on the core members (subset) of a highly informative gene set. Such analyses can reveal which of those subsets from the same gene set correspond to different biological phenotypes. In this study, we propose Gene Set Top Scoring Pairs (GSTSP) approach that exploits the simple yet powerful relative expression reversal concept at the gene set levels to achieve these goals. To illustrate the usefulness of GSTSP, we applied this method to five different human heart failure gene expression data sets. We take advantage of the direct data integration feature in the GSTSP approach to combine two data sets, identify a discriminative gene set from >190 predefined gene sets, and evaluate the predictive power of the GSTSP classifier derived from this informative gene set on three independent test sets (79.31% in test accuracy). The discriminative gene pairs identified in this study may provide new biological understanding on the disturbed pathways that are involved in the development of heart failure. GSTSP methodology is general in purpose and is applicable to a variety of phenotypic classification problems using gene expression data.

摘要

人们普遍认为,一组基因必须协同作用才能驱动各种细胞过程。然而,在不同的生物学表型下,基因集的所有成员并非都会参与某一生物学过程。因此,通过关注高信息量基因集的核心成员(子集)来构建一个判别分类器是很有用的。这样的分析可以揭示同一基因集中的哪些子集对应于不同的生物学表型。在本研究中,我们提出了基因集最高得分对(GSTSP)方法,该方法在基因集水平上利用简单而强大的相对表达逆转概念来实现这些目标。为了说明GSTSP的有用性,我们将此方法应用于五个不同的人类心力衰竭基因表达数据集。我们利用GSTSP方法中的直接数据整合功能来合并两个数据集,从190多个预定义基因集中识别出一个判别基因集,并在三个独立测试集上评估从这个信息丰富的基因集衍生出的GSTSP分类器的预测能力(测试准确率为79.31%)。本研究中识别出的判别基因对可能为心力衰竭发展过程中涉及的紊乱通路提供新的生物学理解。GSTSP方法具有通用性,适用于使用基因表达数据的各种表型分类问题。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验