Suppr超能文献

小样本量时批量RNA测序差异表达及富集分析结果的可重复性

Replicability of bulk RNA-Seq differential expression and enrichment analysis results for small cohort sizes.

作者信息

Degen Peter Methys, Medo Matúš

机构信息

Department for BioMedical Research, Radiation Oncology, University of Bern, Bern, Switzerland.

Department of Radiation Oncology, Inselspital Bern University Hospital, Bern, Switzerland.

出版信息

PLoS Comput Biol. 2025 May 5;21(5):e1011630. doi: 10.1371/journal.pcbi.1011630. eCollection 2025 May.

Abstract

The high-dimensional and heterogeneous nature of transcriptomics data from RNA sequencing (RNA-Seq) experiments poses a challenge to routine downstream analysis steps, such as differential expression analysis and enrichment analysis. Additionally, due to practical and financial constraints, RNA-Seq experiments are often limited to a small number of biological replicates. In light of recent studies on the low replicability of preclinical cancer research, it is essential to understand how the combination of population heterogeneity and underpowered cohort sizes affects the replicability of RNA-Seq research. Using 18'000 subsampled RNA-Seq experiments based on real gene expression data from 18 different data sets, we find that differential expression and enrichment analysis results from underpowered experiments are unlikely to replicate well. However, low replicability does not necessarily imply low precision of results, as data sets exhibit a wide range of possible outcomes. In fact, 10 out of 18 data sets achieve high median precision despite low recall and replicability for cohorts with more than five replicates. To assist researchers constrained by small cohort sizes in estimating the expected performance regime of their data sets, we provide a simple bootstrapping procedure that correlates strongly with the observed replicability and precision metrics. We conclude with practical recommendations to alleviate problems with underpowered RNA-Seq studies.

摘要

来自RNA测序(RNA-Seq)实验的转录组学数据具有高维度和异质性,这给常规的下游分析步骤带来了挑战,如差异表达分析和富集分析。此外,由于实际和财务限制,RNA-Seq实验通常限于少量生物学重复样本。鉴于近期关于临床前癌症研究低可重复性的研究,了解群体异质性和样本量不足如何影响RNA-Seq研究的可重复性至关重要。我们基于来自18个不同数据集的真实基因表达数据进行了18000次二次抽样RNA-Seq实验,发现样本量不足的实验所得到的差异表达和富集分析结果不太可能具有良好的可重复性。然而,低可重复性并不一定意味着结果的低精确度,因为数据集呈现出广泛的可能结果。事实上,在召回率和可重复性较低的情况下,18个数据集中有10个在样本量超过五个重复样本时达到了较高的中位数精确度。为了帮助受样本量小限制的研究人员估计其数据集的预期性能状况,我们提供了一种简单的自抽样程序,该程序与观察到的可重复性和精确度指标高度相关。我们最后提出了一些实用建议,以缓解样本量不足的RNA-Seq研究中存在的问题。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3cfa/12077797/7aa9f43d9a80/pcbi.1011630.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验