Suppr超能文献

一种用于超低样本量微阵列研究的基于回归的差异表达检测算法。

A regression-based differential expression detection algorithm for microarray studies with ultra-low sample size.

作者信息

Vasiliu Daniel, Clamons Samuel, McDonough Molly, Rabe Brian, Saha Margaret

机构信息

Department of Mathematics, College of William and Mary, Williamsburg, Virginia, United States of America.

Department of Biology, College of William and Mary, Williamsburg, Virginia, United States of America.

出版信息

PLoS One. 2015 Mar 4;10(3):e0118198. doi: 10.1371/journal.pone.0118198. eCollection 2015.

Abstract

Global gene expression analysis using microarrays and, more recently, RNA-seq, has allowed investigators to understand biological processes at a system level. However, the identification of differentially expressed genes in experiments with small sample size, high dimensionality, and high variance remains challenging, limiting the usability of these tens of thousands of publicly available, and possibly many more unpublished, gene expression datasets. We propose a novel variable selection algorithm for ultra-low-n microarray studies using generalized linear model-based variable selection with a penalized binomial regression algorithm called penalized Euclidean distance (PED). Our method uses PED to build a classifier on the experimental data to rank genes by importance. In place of cross-validation, which is required by most similar methods but not reliable for experiments with small sample size, we use a simulation-based approach to additively build a list of differentially expressed genes from the rank-ordered list. Our simulation-based approach maintains a low false discovery rate while maximizing the number of differentially expressed genes identified, a feature critical for downstream pathway analysis. We apply our method to microarray data from an experiment perturbing the Notch signaling pathway in Xenopus laevis embryos. This dataset was chosen because it showed very little differential expression according to limma, a powerful and widely-used method for microarray analysis. Our method was able to detect a significant number of differentially expressed genes in this dataset and suggest future directions for investigation. Our method is easily adaptable for analysis of data from RNA-seq and other global expression experiments with low sample size and high dimensionality.

摘要

使用微阵列以及最近的RNA测序进行的全基因组表达分析,使研究人员能够在系统水平上理解生物学过程。然而,在样本量小、维度高且方差大的实验中识别差异表达基因仍然具有挑战性,这限制了这些数以万计的公开可用(可能还有更多未发表的)基因表达数据集的可用性。我们提出了一种新颖的变量选择算法,用于超低样本量的微阵列研究,该算法使用基于广义线性模型的变量选择以及一种称为惩罚欧几里得距离(PED)的惩罚二项式回归算法。我们的方法使用PED在实验数据上构建分类器,以按重要性对基因进行排名。大多数类似方法需要交叉验证,但对于小样本量实验不可靠,我们取而代之的是使用基于模拟的方法,从排名列表中累加构建差异表达基因列表。我们基于模拟的方法在最大化识别出的差异表达基因数量的同时保持低错误发现率,这一特性对于下游通路分析至关重要。我们将我们的方法应用于来自非洲爪蟾胚胎中Notch信号通路扰动实验的微阵列数据。选择这个数据集是因为根据limma(一种强大且广泛使用的微阵列分析方法),它显示出很少的差异表达。我们的方法能够在该数据集中检测到大量差异表达基因,并为未来的研究指明方向。我们的方法很容易适用于分析来自RNA测序和其他低样本量、高维度的全基因组表达实验的数据。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验