• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

小样本量时批量RNA测序差异表达及富集分析结果的可重复性

Replicability of bulk RNA-Seq differential expression and enrichment analysis results for small cohort sizes.

作者信息

Degen Peter Methys, Medo Matúš

机构信息

Department for BioMedical Research, Radiation Oncology, University of Bern, Bern, Switzerland.

Department of Radiation Oncology, Inselspital Bern University Hospital, Bern, Switzerland.

出版信息

PLoS Comput Biol. 2025 May 5;21(5):e1011630. doi: 10.1371/journal.pcbi.1011630. eCollection 2025 May.

DOI:10.1371/journal.pcbi.1011630
PMID:40324149
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12077797/
Abstract

The high-dimensional and heterogeneous nature of transcriptomics data from RNA sequencing (RNA-Seq) experiments poses a challenge to routine downstream analysis steps, such as differential expression analysis and enrichment analysis. Additionally, due to practical and financial constraints, RNA-Seq experiments are often limited to a small number of biological replicates. In light of recent studies on the low replicability of preclinical cancer research, it is essential to understand how the combination of population heterogeneity and underpowered cohort sizes affects the replicability of RNA-Seq research. Using 18'000 subsampled RNA-Seq experiments based on real gene expression data from 18 different data sets, we find that differential expression and enrichment analysis results from underpowered experiments are unlikely to replicate well. However, low replicability does not necessarily imply low precision of results, as data sets exhibit a wide range of possible outcomes. In fact, 10 out of 18 data sets achieve high median precision despite low recall and replicability for cohorts with more than five replicates. To assist researchers constrained by small cohort sizes in estimating the expected performance regime of their data sets, we provide a simple bootstrapping procedure that correlates strongly with the observed replicability and precision metrics. We conclude with practical recommendations to alleviate problems with underpowered RNA-Seq studies.

摘要

来自RNA测序(RNA-Seq)实验的转录组学数据具有高维度和异质性,这给常规的下游分析步骤带来了挑战,如差异表达分析和富集分析。此外,由于实际和财务限制,RNA-Seq实验通常限于少量生物学重复样本。鉴于近期关于临床前癌症研究低可重复性的研究,了解群体异质性和样本量不足如何影响RNA-Seq研究的可重复性至关重要。我们基于来自18个不同数据集的真实基因表达数据进行了18000次二次抽样RNA-Seq实验,发现样本量不足的实验所得到的差异表达和富集分析结果不太可能具有良好的可重复性。然而,低可重复性并不一定意味着结果的低精确度,因为数据集呈现出广泛的可能结果。事实上,在召回率和可重复性较低的情况下,18个数据集中有10个在样本量超过五个重复样本时达到了较高的中位数精确度。为了帮助受样本量小限制的研究人员估计其数据集的预期性能状况,我们提供了一种简单的自抽样程序,该程序与观察到的可重复性和精确度指标高度相关。我们最后提出了一些实用建议,以缓解样本量不足的RNA-Seq研究中存在的问题。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3cfa/12077797/917ba62840b7/pcbi.1011630.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3cfa/12077797/7aa9f43d9a80/pcbi.1011630.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3cfa/12077797/4e36da0c8c27/pcbi.1011630.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3cfa/12077797/7e74f457ebc1/pcbi.1011630.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3cfa/12077797/fe06dfb1b907/pcbi.1011630.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3cfa/12077797/917ba62840b7/pcbi.1011630.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3cfa/12077797/7aa9f43d9a80/pcbi.1011630.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3cfa/12077797/4e36da0c8c27/pcbi.1011630.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3cfa/12077797/7e74f457ebc1/pcbi.1011630.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3cfa/12077797/fe06dfb1b907/pcbi.1011630.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3cfa/12077797/917ba62840b7/pcbi.1011630.g005.jpg

相似文献

1
Replicability of bulk RNA-Seq differential expression and enrichment analysis results for small cohort sizes.小样本量时批量RNA测序差异表达及富集分析结果的可重复性
PLoS Comput Biol. 2025 May 5;21(5):e1011630. doi: 10.1371/journal.pcbi.1011630. eCollection 2025 May.
2
A comparison of strategies for generating artificial replicates in RNA-seq experiments.RNA-seq 实验中人工重复生成策略的比较。
Sci Rep. 2022 May 3;12(1):7170. doi: 10.1038/s41598-022-11302-9.
3
RNA-Seq in Nonmodel Organisms.非模式生物的 RNA-Seq。
Methods Mol Biol. 2021;2243:143-167. doi: 10.1007/978-1-0716-1103-6_8.
4
RNA-Seq and Gene Set Enrichment Analysis (GSEA) in Peripheral Blood Mononuclear Cells (PBMCs).外周血单个核细胞(PBMCs)中的RNA测序(RNA-Seq)和基因集富集分析(GSEA)
Methods Mol Biol. 2025;2880:179-192. doi: 10.1007/978-1-0716-4276-4_8.
5
Short-Read RNA-Seq.短读 RNA 测序。
Methods Mol Biol. 2024;2822:245-262. doi: 10.1007/978-1-0716-3918-4_17.
6
Improving replicability in single-cell RNA-Seq cell type discovery with Dune.利用 Dune 提高单细胞 RNA-Seq 细胞类型发现的可重复性。
BMC Bioinformatics. 2024 May 24;25(1):198. doi: 10.1186/s12859-024-05814-6.
7
Transcriptome size matters for single-cell RNA-seq normalization and bulk deconvolution.转录组大小对单细胞RNA测序标准化和批量反卷积很重要。
Nat Commun. 2025 Feb 1;16(1):1246. doi: 10.1038/s41467-025-56623-1.
8
RAP: A Web Tool for RNA-Seq Data Analysis.RAP:一个用于 RNA-Seq 数据分析的网络工具。
Methods Mol Biol. 2021;2284:393-415. doi: 10.1007/978-1-0716-1307-8_21.
9
Analysis of the Pattern of RNA Expression in the Skin of TR-Deficient Mice By RNA-seq.通过 RNA-seq 分析 TR 缺陷小鼠皮肤中的 RNA 表达模式。
Methods Mol Biol. 2025;2876:151-162. doi: 10.1007/978-1-0716-4252-8_10.
10
A flexible count data model to fit the wide diversity of expression profiles arising from extensively replicated RNA-seq experiments.一种灵活的计数数据模型,可适用于广泛复制的 RNA-seq 实验所产生的广泛多样化的表达谱。
BMC Bioinformatics. 2013 Aug 21;14:254. doi: 10.1186/1471-2105-14-254.

本文引用的文献

1
High-throughput mRNA-seq atlas of human placenta shows vast transcriptome remodeling from first to third trimester†.高通量 mRNA-seq 人类胎盘图谱显示从第一到第三孕期的巨大转录组重构。
Biol Reprod. 2024 May 9;110(5):936-949. doi: 10.1093/biolre/ioae007.
2
Covid-19 Through the Lens of the Peer-Reviewed Literature.同行评议文献视角下的新冠疫情
Signif (Oxf). 2020 Jul 29;17(4):10-11. doi: 10.1111/1740-9713.01416. eCollection 2020 Aug.
3
GSEApy: a comprehensive package for performing gene set enrichment analysis in Python.GSEApy:一个用于在 Python 中进行基因集富集分析的综合软件包。
Bioinformatics. 2023 Jan 1;39(1). doi: 10.1093/bioinformatics/btac757.
4
Replicability in cancer omics data analysis: measures and empirical explorations.癌症组学数据分析中的可重复性:度量和实证研究。
Brief Bioinform. 2022 Sep 20;23(5). doi: 10.1093/bib/bbac304.
5
Exaggerated false positives by popular differential expression methods when analyzing human population samples.分析人类群体样本时,常用差异表达方法会导致假阳性结果夸大。
Genome Biol. 2022 Mar 15;23(1):79. doi: 10.1186/s13059-022-02648-4.
6
Investigating the replicability of preclinical cancer biology.探究癌症生物学的临床前可重复性。
Elife. 2021 Dec 7;10:e71601. doi: 10.7554/eLife.71601.
7
Confronting false discoveries in single-cell differential expression.单细胞差异表达中虚假发现的应对策略。
Nat Commun. 2021 Sep 28;12(1):5692. doi: 10.1038/s41467-021-25960-2.
8
Inflated false discovery rate due to volcano plots: problem and solutions.由于火山图而导致的 inflated false discovery rate:问题与解决方案。
Brief Bioinform. 2021 Sep 2;22(5). doi: 10.1093/bib/bbab053.
9
High heterogeneity undermines generalization of differential expression results in RNA-Seq analysis.RNA-Seq 分析中差异表达结果的高度异质性破坏了其可推广性。
Hum Genomics. 2021 Jan 28;15(1):7. doi: 10.1186/s40246-021-00308-5.
10
Chronic Hypersensitivity Pneumonitis, an Interstitial Lung Disease with Distinct Molecular Signatures.慢性过敏性肺炎,一种具有独特分子特征的间质性肺疾病。
Am J Respir Crit Care Med. 2020 Nov 15;202(10):1430-1444. doi: 10.1164/rccm.202001-0134OC.