一种用于超低样本量微阵列研究的基于回归的差异表达检测算法。

A regression-based differential expression detection algorithm for microarray studies with ultra-low sample size.

作者信息

Vasiliu Daniel, Clamons Samuel, McDonough Molly, Rabe Brian, Saha Margaret

机构信息

Department of Mathematics, College of William and Mary, Williamsburg, Virginia, United States of America.

Department of Biology, College of William and Mary, Williamsburg, Virginia, United States of America.

出版信息

PLoS One. 2015 Mar 4;10(3):e0118198. doi: 10.1371/journal.pone.0118198. eCollection 2015.

DOI:10.1371/journal.pone.0118198

PMID:25738861

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4349782/

Abstract

Global gene expression analysis using microarrays and, more recently, RNA-seq, has allowed investigators to understand biological processes at a system level. However, the identification of differentially expressed genes in experiments with small sample size, high dimensionality, and high variance remains challenging, limiting the usability of these tens of thousands of publicly available, and possibly many more unpublished, gene expression datasets. We propose a novel variable selection algorithm for ultra-low-n microarray studies using generalized linear model-based variable selection with a penalized binomial regression algorithm called penalized Euclidean distance (PED). Our method uses PED to build a classifier on the experimental data to rank genes by importance. In place of cross-validation, which is required by most similar methods but not reliable for experiments with small sample size, we use a simulation-based approach to additively build a list of differentially expressed genes from the rank-ordered list. Our simulation-based approach maintains a low false discovery rate while maximizing the number of differentially expressed genes identified, a feature critical for downstream pathway analysis. We apply our method to microarray data from an experiment perturbing the Notch signaling pathway in Xenopus laevis embryos. This dataset was chosen because it showed very little differential expression according to limma, a powerful and widely-used method for microarray analysis. Our method was able to detect a significant number of differentially expressed genes in this dataset and suggest future directions for investigation. Our method is easily adaptable for analysis of data from RNA-seq and other global expression experiments with low sample size and high dimensionality.

摘要

使用微阵列以及最近的RNA测序进行的全基因组表达分析，使研究人员能够在系统水平上理解生物学过程。然而，在样本量小、维度高且方差大的实验中识别差异表达基因仍然具有挑战性，这限制了这些数以万计的公开可用（可能还有更多未发表的）基因表达数据集的可用性。我们提出了一种新颖的变量选择算法，用于超低样本量的微阵列研究，该算法使用基于广义线性模型的变量选择以及一种称为惩罚欧几里得距离（PED）的惩罚二项式回归算法。我们的方法使用PED在实验数据上构建分类器，以按重要性对基因进行排名。大多数类似方法需要交叉验证，但对于小样本量实验不可靠，我们取而代之的是使用基于模拟的方法，从排名列表中累加构建差异表达基因列表。我们基于模拟的方法在最大化识别出的差异表达基因数量的同时保持低错误发现率，这一特性对于下游通路分析至关重要。我们将我们的方法应用于来自非洲爪蟾胚胎中Notch信号通路扰动实验的微阵列数据。选择这个数据集是因为根据limma（一种强大且广泛使用的微阵列分析方法），它显示出很少的差异表达。我们的方法能够在该数据集中检测到大量差异表达基因，并为未来的研究指明方向。我们的方法很容易适用于分析来自RNA测序和其他低样本量、高维度的全基因组表达实验的数据。

相似文献

A regression-based differential expression detection algorithm for microarray studies with ultra-low sample size.

PLoS One. 2015 Mar 4;10(3):e0118198. doi: 10.1371/journal.pone.0118198. eCollection 2015.

A unified framework for finding differentially expressed genes from microarray experiments.

BMC Bioinformatics. 2007 Sep 18;8:347. doi: 10.1186/1471-2105-8-347.

Practical FDR-based sample size calculations in microarray experiments.

Bioinformatics. 2005 Aug 1;21(15):3264-72. doi: 10.1093/bioinformatics/bti519. Epub 2005 Jun 2.

Differential gene expression detection and sample classification using penalized linear regression models.

Bioinformatics. 2006 Feb 15;22(4):472-6. doi: 10.1093/bioinformatics/bti827. Epub 2005 Dec 13.

Integration of RNA-Seq data with heterogeneous microarray data for breast cancer profiling.

BMC Bioinformatics. 2017 Nov 21;18(1):506. doi: 10.1186/s12859-017-1925-0.

A weighted sample size for microarray datasets that considers the variability of variance and multiplicity.

J Biosci Bioeng. 2009 Sep;108(3):252-8. doi: 10.1016/j.jbiosc.2009.03.017.

Methods for evaluating gene expression from Affymetrix microarray datasets.

BMC Bioinformatics. 2008 Jun 17;9:284. doi: 10.1186/1471-2105-9-284.

Parallel comparison of Illumina RNA-Seq and Affymetrix microarray platforms on transcriptomic profiles generated from 5-aza-deoxy-cytidine treated HT-29 colon cancer cells and simulated datasets.

BMC Bioinformatics. 2013;14 Suppl 9(Suppl 9):S1. doi: 10.1186/1471-2105-14-S9-S1. Epub 2013 Jun 28.

Coex-Rank: An approach incorporating co-expression information for combined analysis of microarray data.

J Integr Bioinform. 2012 Jul 30;9(1):208. doi: 10.2390/biecoll-jib-2012-208.

GEOlimma: differential expression analysis and feature selection using pre-existing microarray data.

BMC Bioinformatics. 2021 Feb 3;22(1):44. doi: 10.1186/s12859-020-03932-5.

引用本文的文献

Comparative analysis of tissue-specific genes in maize based on machine learning models: CNN performs technically best, LightGBM performs biologically soundest.

Front Genet. 2023 May 9;14:1190887. doi: 10.3389/fgene.2023.1190887. eCollection 2023.

The Gene Family: From Embryo to Disease.

Front Mol Neurosci. 2021 Jun 28;14:672511. doi: 10.3389/fnmol.2021.672511. eCollection 2021.

Xenopus embryos show a compensatory response following perturbation of the Notch signaling pathway.

Dev Biol. 2020 Apr 15;460(2):99-107. doi: 10.1016/j.ydbio.2019.12.016. Epub 2019 Dec 30.

Genomic signature of parity in the breast of premenopausal women.

Breast Cancer Res. 2019 Mar 28;21(1):46. doi: 10.1186/s13058-019-1128-x.

Automated Classification of Benign and Malignant Proliferative Breast Lesions.

Sci Rep. 2017 Aug 29;7(1):9900. doi: 10.1038/s41598-017-10324-y.

本文引用的文献

A novel strategy for gene selection of microarray data based on gene-to-class sensitivity information.

PLoS One. 2014 May 20;9(5):e97530. doi: 10.1371/journal.pone.0097530. eCollection 2014.

Identification of significant features in DNA microarray data.

Wiley Interdiscip Rev Comput Stat. 2013 Jul;5(4). doi: 10.1002/wics.1260.

Revisiting global gene expression analysis.

Cell. 2012 Oct 26;151(3):476-82. doi: 10.1016/j.cell.2012.10.012.

Life on a microarray: assessing live cell functions in a microarray format.

Cell Mol Life Sci. 2012 Aug;69(16):2717-25. doi: 10.1007/s00018-012-0947-z. Epub 2012 Mar 4.

Notch and disease: a growing field.

Semin Cell Dev Biol. 2012 Jun;23(4):473-80. doi: 10.1016/j.semcdb.2012.02.005. Epub 2012 Feb 20.

Comparison of methods for identifying differentially expressed genes across multiple conditions from microarray data.

Bioinformation. 2011;7(8):400-4. doi: 10.6026/97320630007400. Epub 2011 Dec 21.

Statistical considerations for analysis of microarray experiments.

Clin Transl Sci. 2011 Dec;4(6):466-77. doi: 10.1111/j.1752-8062.2011.00309.x. Epub 2011 Nov 7.

COORDINATE DESCENT ALGORITHMS FOR NONCONVEX PENALIZED REGRESSION, WITH APPLICATIONS TO BIOLOGICAL FEATURE SELECTION.

Ann Appl Stat. 2011 Jan 1;5(1):232-253. doi: 10.1214/10-AOAS388.

Cloning and characterization of GABAA α subunits and GABAB subunits in Xenopus laevis during development.

Dev Dyn. 2011 Apr;240(4):862-73. doi: 10.1002/dvdy.22580. Epub 2011 Mar 7.

An overview of techniques for linking high-dimensional molecular data to time-to-event endpoints by risk prediction models.

Biom J. 2011 Mar;53(2):170-89. doi: 10.1002/bimj.201000152. Epub 2011 Feb 17.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

一种用于超低样本量微阵列研究的基于回归的差异表达检测算法。

A regression-based differential expression detection algorithm for microarray studies with ultra-low sample size.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献