Suppr超能文献

使用充分降维的基因集分析。

Gene set analysis using sufficient dimension reduction.

作者信息

Hsueh Huey-Miin, Tsai Chen-An

机构信息

Department of Statistics, National Chengchi UniversityZhinan Road, Taipei116, Taiwan, Taipei, 116, Taiwan.

Department of Agronomy, National Taiwan University, No. 1, Section 4, Roosevelt Road, Taipei, 106, Taiwan.

出版信息

BMC Bioinformatics. 2016 Feb 6;17:74. doi: 10.1186/s12859-016-0928-6.

Abstract

BACKGROUND

Gene set analysis (GSA) aims to evaluate the association between the expression of biological pathways, or a priori defined gene sets, and a particular phenotype. Numerous GSA methods have been proposed to assess the enrichment of sets of genes. However, most methods are developed with respect to a specific alternative scenario, such as a differential mean pattern or a differential coexpression. Moreover, a very limited number of methods can handle either binary, categorical, or continuous phenotypes. In this paper, we develop two novel GSA tests, called SDRs, based on the sufficient dimension reduction technique, which aims to capture sufficient information about the relationship between genes and the phenotype. The advantages of our proposed methods are that they allow for categorical and continuous phenotypes, and they are also able to identify a variety of enriched gene sets.

RESULTS

Through simulation studies, we compared the type I error and power of SDRs with existing GSA methods for binary, triple, and continuous phenotypes. We found that SDR methods adequately control the type I error rate at the pre-specified nominal level, and they have a satisfactory power to detect gene sets with differential coexpression and to test non-linear associations between gene sets and a continuous phenotype. In addition, the SDR methods were compared with seven widely-used GSA methods using two real microarray datasets for illustration.

CONCLUSIONS

We concluded that the SDR methods outperform the others because of their flexibility with regard to handling different kinds of phenotypes and their power to detect a wide range of alternative scenarios. Our real data analysis highlights the differences between GSA methods for detecting enriched gene sets.

摘要

背景

基因集分析(GSA)旨在评估生物途径或预先定义的基因集的表达与特定表型之间的关联。已经提出了许多GSA方法来评估基因集的富集情况。然而,大多数方法是针对特定的替代情形开发的,例如差异均值模式或差异共表达。此外,能够处理二元、分类或连续表型的方法数量非常有限。在本文中,我们基于充分降维技术开发了两种新颖的GSA检验,称为SDR,其目的是捕获有关基因与表型之间关系的充分信息。我们提出的方法的优点是它们允许处理分类和连续表型,并且还能够识别各种富集的基因集。

结果

通过模拟研究,我们将SDR的I型错误率和检验功效与现有的针对二元、三元和连续表型的GSA方法进行了比较。我们发现SDR方法能够在预先指定的名义水平上充分控制I型错误率,并且它们具有令人满意的功效来检测具有差异共表达的基因集,并检验基因集与连续表型之间的非线性关联。此外,使用两个真实的微阵列数据集将SDR方法与七种广泛使用的GSA方法进行了比较以作说明。

结论

我们得出结论,SDR方法优于其他方法,因为它们在处理不同类型表型方面具有灵活性,并且具有检测广泛替代情形的能力。我们的实际数据分析突出了检测富集基因集的GSA方法之间的差异。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d85e/4744442/5001d1439065/12859_2016_928_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验