用于检测微阵列实验中差异表达基因的样本量。

Sample size for detecting differentially expressed genes in microarray experiments.

作者信息

Wei Caimiao, Li Jiangning, Bumgarner Roger E

机构信息

Department of Microbiology, University of Washington, Seattle, WA 98195, USA.

出版信息

BMC Genomics. 2004 Nov 8;5:87. doi: 10.1186/1471-2164-5-87.

DOI:10.1186/1471-2164-5-87

PMID:15533245

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC533874/

Abstract

BACKGROUND

Microarray experiments are often performed with a small number of biological replicates, resulting in low statistical power for detecting differentially expressed genes and concomitant high false positive rates. While increasing sample size can increase statistical power and decrease error rates, with too many samples, valuable resources are not used efficiently. The issue of how many replicates are required in a typical experimental system needs to be addressed. Of particular interest is the difference in required sample sizes for similar experiments in inbred vs. outbred populations (e.g. mouse and rat vs. human).

RESULTS

We hypothesize that if all other factors (assay protocol, microarray platform, data pre-processing) were equal, fewer individuals would be needed for the same statistical power using inbred animals as opposed to unrelated human subjects, as genetic effects on gene expression will be removed in the inbred populations. We apply the same normalization algorithm and estimate the variance of gene expression for a variety of cDNA data sets (humans, inbred mice and rats) comparing two conditions. Using one sample, paired sample or two independent sample t-tests, we calculate the sample sizes required to detect a 1.5-, 2-, and 4-fold changes in expression level as a function of false positive rate, power and percentage of genes that have a standard deviation below a given percentile.

CONCLUSIONS

Factors that affect power and sample size calculations include variability of the population, the desired detectable differences, the power to detect the differences, and an acceptable error rate. In addition, experimental design, technical variability and data pre-processing play a role in the power of the statistical tests in microarrays. We show that the number of samples required for detecting a 2-fold change with 90% probability and a p-value of 0.01 in humans is much larger than the number of samples commonly used in present day studies, and that far fewer individuals are needed for the same statistical power when using inbred animals rather than unrelated human subjects.

摘要

背景

微阵列实验通常使用少量生物重复样本进行，这导致检测差异表达基因的统计效力较低，同时假阳性率较高。虽然增加样本量可以提高统计效力并降低错误率，但样本过多会导致宝贵资源利用效率低下。需要解决典型实验系统中所需重复样本数量的问题。特别值得关注的是近交系与远交群体（如小鼠、大鼠与人）中类似实验所需样本量的差异。

结果

我们假设，如果所有其他因素（检测方案、微阵列平台、数据预处理）相同，与无关人类受试者相比，使用近交动物获得相同统计效力所需的个体数量会更少，因为近交群体中基因表达的遗传效应将被消除。我们应用相同的归一化算法，并估计比较两种条件下的各种cDNA数据集（人类、近交小鼠和大鼠）的基因表达方差。使用单样本、配对样本或两独立样本t检验，我们计算检测表达水平1.5倍、2倍和4倍变化所需的样本量，该样本量是假阳性率、效力以及标准差低于给定百分位数的基因百分比的函数。

结论

影响效力和样本量计算的因素包括群体变异性、期望检测到的差异、检测差异的效力以及可接受的错误率。此外，实验设计、技术变异性和数据预处理对微阵列统计检验的效力也有影响。我们表明，在人类中以90%的概率检测到2倍变化且p值为0.01所需的样本数量远大于当今研究中常用的样本数量，并且使用近交动物而非无关人类受试者获得相同统计效力时所需的个体数量要少得多。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/173f/533874/3181f4718889/1471-2164-5-87-1.jpg

相似文献

Sample size for detecting differentially expressed genes in microarray experiments.用于检测微阵列实验中差异表达基因的样本量。

BMC Genomics. 2004 Nov 8;5:87. doi: 10.1186/1471-2164-5-87.

General power and sample size calculations for high-dimensional genomic data.高维基因组数据的一般功效和样本量计算

Stat Appl Genet Mol Biol. 2013 Aug;12(4):449-67. doi: 10.1515/sagmb-2012-0046.

Previously unidentified changes in renal cell carcinoma gene expression identified by parametric analysis of microarray data.通过微阵列数据的参数分析确定的肾细胞癌基因表达中先前未被识别的变化。

BMC Cancer. 2003 Nov 27;3:31. doi: 10.1186/1471-2407-3-31.

Cross-species hybridization of woodchuck hepatitis viral infection-induced woodchuck hepatocellular carcinoma using human, rat and mouse oligonucleotide microarrays.使用人、大鼠和小鼠寡核苷酸微阵列对土拨鼠肝炎病毒感染诱导的土拨鼠肝细胞癌进行跨物种杂交。

J Gastroenterol Hepatol. 2009 Apr;24(4):605-17. doi: 10.1111/j.1440-1746.2008.05581.x. Epub 2008 Oct 21.

On the relevance of technical variation due to building pools in microarray experiments.关于微阵列实验中构建样本池导致的技术变异的相关性

BMC Genomics. 2015 Dec 1;16:1027. doi: 10.1186/s12864-015-2055-6.

Practical FDR-based sample size calculations in microarray experiments.微阵列实验中基于实际错误发现率的样本量计算

Bioinformatics. 2005 Aug 1;21(15):3264-72. doi: 10.1093/bioinformatics/bti519. Epub 2005 Jun 2.

Comparison of seven methods for producing Affymetrix expression scores based on False Discovery Rates in disease profiling data.基于疾病谱数据中错误发现率的七种生成Affymetrix表达分数方法的比较。

BMC Bioinformatics. 2005 Feb 10;6:26. doi: 10.1186/1471-2105-6-26.

Powers of multiple-testing procedures for identification of genes significantly differentially expressed in microarray experiments.用于识别在微阵列实验中显著差异表达基因的多重检验程序的功效。

Yi Chuan Xue Bao. 2006 Dec;33(12):1132-40. doi: 10.1016/S0379-4172(06)60152-2.

The limit fold change model: a practical approach for selecting differentially expressed genes from microarray data.极限倍数变化模型：一种从微阵列数据中选择差异表达基因的实用方法。

BMC Bioinformatics. 2002 Jun 21;3:17. doi: 10.1186/1471-2105-3-17.

Effect of pooling samples on the efficiency of comparative studies using microarrays.样本合并对使用微阵列的比较研究效率的影响。

Bioinformatics. 2005 Dec 15;21(24):4378-83. doi: 10.1093/bioinformatics/bti717. Epub 2005 Oct 18.

引用本文的文献

Effect of Day Length on Growth and Gonadal Development in Meishan Male Pigs.日长对梅山公猪生长和性腺发育的影响。

Animals (Basel). 2024 Mar 13;14(6):876. doi: 10.3390/ani14060876.

A 15-Gene-Based Risk Signature for Predicting Overall Survival in SCLC Patients Who Have Undergone Surgical Resection.一种基于15个基因的风险特征用于预测接受手术切除的小细胞肺癌患者的总生存期。

Cancers (Basel). 2023 Oct 30;15(21):5219. doi: 10.3390/cancers15215219.

Differentially Expressed Candidate miRNAs of Day 16 Bovine Embryos on the Regulation of Pregnancy Establishment in Dairy Cows.第16天奶牛胚胎中差异表达的候选微小RNA对奶牛妊娠建立的调控作用

Animals (Basel). 2023 Sep 28;13(19):3052. doi: 10.3390/ani13193052.

Tsc2 mutation rather than Tsc1 mutation dominantly causes a social deficit in a mouse model of tuberous sclerosis complex.结节性硬化症小鼠模型中 Tsc2 突变而非 Tsc1 突变主要引起社交缺陷。

Hum Genomics. 2023 Feb 2;17(1):4. doi: 10.1186/s40246-023-00450-2.

BluePrint breast cancer molecular subtyping recognizes single and dual subtype tumors with implications for therapeutic guidance.Blueprint 乳腺癌分子分型可识别单亚型和双亚型肿瘤，对治疗指导具有重要意义。

Breast Cancer Res Treat. 2022 Oct;195(3):263-274. doi: 10.1007/s10549-022-06698-x. Epub 2022 Aug 19.

Bioinformatics-Led Discovery of Osteoarthritis Biomarkers and Inflammatory Infiltrates.基于生物信息学的骨关节炎生物标志物和炎症浸润的发现。

Front Immunol. 2022 Jun 6;13:871008. doi: 10.3389/fimmu.2022.871008. eCollection 2022.

Anti-inflammatory effects of recreational marijuana in virally suppressed youth with HIV-1 are reversed by use of tobacco products in combination with marijuana.娱乐性大麻对病毒抑制的 HIV-1 青年的抗炎作用，可因烟草产品与大麻联合使用而逆转。

Retrovirology. 2022 May 31;19(1):10. doi: 10.1186/s12977-022-00594-4.

Identifying Immunological and Clinical Predictors of COVID-19 Severity and Sequelae by Mathematical Modeling.通过数学建模识别 COVID-19 严重程度和后遗症的免疫和临床预测因子。

Front Immunol. 2022 Apr 20;13:865845. doi: 10.3389/fimmu.2022.865845. eCollection 2022.

Gene expression study in monocytes: evidence of inflammatory dysregulation in early-onset obsessive-compulsive disorder.单核细胞基因表达研究：早发性强迫症炎症失调的证据。

Transl Psychiatry. 2022 Mar 31;12(1):134. doi: 10.1038/s41398-022-01905-1.

Systems Immunology Analysis Reveals the Contribution of Pulmonary and Extrapulmonary Tissues to the Immunopathogenesis of Severe COVID-19 Patients.系统免疫分析揭示了肺和肺外组织对重症 COVID-19 患者免疫发病机制的贡献。

Front Immunol. 2021 Jun 28;12:595150. doi: 10.3389/fimmu.2021.595150. eCollection 2021.

本文引用的文献

Transformations for cDNA microarray data.cDNA微阵列数据的转换

Stat Appl Genet Mol Biol. 2003;2:Article4. doi: 10.2202/1544-6115.1009. Epub 2003 Jun 18.

Different gene expression patterns in invasive lobular and ductal carcinomas of the breast.乳腺浸润性小叶癌和导管癌中不同的基因表达模式。

Mol Biol Cell. 2004 Jun;15(6):2523-36. doi: 10.1091/mbc.e03-11-0786. Epub 2004 Mar 19.

Gene expression profiling identifies clinically relevant subtypes of prostate cancer.基因表达谱分析可识别前列腺癌的临床相关亚型。

Proc Natl Acad Sci U S A. 2004 Jan 20;101(3):811-6. doi: 10.1073/pnas.0304146101. Epub 2004 Jan 7.

The effect of replication on gene expression microarray experiments.复制对基因表达微阵列实验的影响。

Bioinformatics. 2003 Sep 1;19(13):1620-7. doi: 10.1093/bioinformatics/btg227.

Microarrays: how many do you need?微阵列：你需要多少？

J Comput Biol. 2003;10(3-4):653-67. doi: 10.1089/10665270360688246.

Experimental design to make the most of microarray studies.充分利用微阵列研究的实验设计。

Methods Mol Biol. 2003;224:137-47. doi: 10.1385/1-59259-364-X:137.

Identification of novel tumor markers in hepatitis C virus-associated hepatocellular carcinoma.丙型肝炎病毒相关肝细胞癌中新型肿瘤标志物的鉴定

Cancer Res. 2003 Feb 15;63(4):859-64.

Power and sample size for DNA microarray studies.DNA微阵列研究的效能与样本量

Stat Med. 2002 Dec 15;21(23):3543-70. doi: 10.1002/sim.1335.

Design issues for cDNA microarray experiments.cDNA微阵列实验的设计问题。

Nat Rev Genet. 2002 Aug;3(8):579-88. doi: 10.1038/nrg863.

Gene expression patterns in human liver cancers.人类肝癌中的基因表达模式。

Mol Biol Cell. 2002 Jun;13(6):1929-39. doi: 10.1091/mbc.02-02-0023.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

用于检测微阵列实验中差异表达基因的样本量。

Sample size for detecting differentially expressed genes in microarray experiments.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献