Suppr超能文献

超越标准管道,通路富集分析中的 p < 0.05。

Beyond standard pipeline and p < 0.05 in pathway enrichment analyses.

机构信息

The Robert S. Boas Center for Genomics and Human Genetics, The Feinstein Institutes for Medical Research, Northwell Health, Manhasset, NY, USA.

Litwin-Zucker Center for the study of Alzheimer's Disease, The Feinstein Institutes for Medical Research, Northwell Health, Manhasset, NY, USA; Division of Geriatric Psychiatry, Zucker Hillside Hospital, Northwell Health, Glen Oaks, NY, USA.

出版信息

Comput Biol Chem. 2021 Jun;92:107455. doi: 10.1016/j.compbiolchem.2021.107455. Epub 2021 Feb 12.

Abstract

A standard pathway/gene-set enrichment analysis, the over-representation analysis, is based on four values: the size of two gene-sets, size of their overlap, and size of the gene universe from which the gene-sets are chosen. The standard result of such an analysis is based on the p-value of a statistical test. We supplement this standard pipeline by six cautions: (1) any p-value threshold to distinguish enriched gene-sets from not-enriched ones is to certain degree arbitrary; (2) genes in a gene-set may be correlated, which potentially overcount the gene-set size; (3) any attempt to impose multiple testing correction will increase the false negative rate; (4) gene-sets in a gene-set database may be correlated, potentially overcount the factor for multiple testing correction; (5) the discrete nature of the data make it possible that a minimum change in counts may lead to a quantum change in the p-value threshold-based conclusion; (6) the two gene-sets may not be chosen from the universe of all human genes, but in fact from a subset of that universe, or even two different subsets of all genes. Careful reconsideration of these issues can have an impact on an enrichment analysis conclusion. Part of our cautions mirror the call from statistician that reaching conclusion from data is not a simple matter of p-value smaller than 0.05, but a thoughtful process with due diligences.

摘要

标准的通路/基因集富集分析(over-representation analysis)基于四个数值:两个基因集的大小、它们的重叠大小,以及从中选择基因集的基因宇宙的大小。这种分析的标准结果基于统计检验的 p 值。我们通过六个注意事项来补充这个标准流程:(1)任何用于区分富集基因集和非富集基因集的 p 值阈值在某种程度上都是任意的;(2)基因集中的基因可能相关,这可能会过度计算基因集的大小;(3)任何尝试施加多重检验校正的尝试都会增加假阴性率;(4)基因集数据库中的基因集可能相关,可能会过度计算多重检验校正的因素;(5)数据的离散性质使得计数的微小变化可能导致基于 p 值阈值的结论发生量子变化;(6)这两个基因集可能不是从所有人类基因的宇宙中选择的,而是实际上是从该宇宙的一个子集,甚至是所有基因的两个不同子集选择的。仔细考虑这些问题可能会对富集分析的结论产生影响。我们的部分注意事项反映了统计学家的呼吁,即从数据中得出结论不仅仅是 p 值小于 0.05 的简单问题,而是一个需要深思熟虑和勤勉的过程。

相似文献

1
Beyond standard pipeline and p < 0.05 in pathway enrichment analyses.超越标准管道,通路富集分析中的 p < 0.05。
Comput Biol Chem. 2021 Jun;92:107455. doi: 10.1016/j.compbiolchem.2021.107455. Epub 2021 Feb 12.
6
Comparative study of gene set enrichment methods.基因集富集方法的比较研究。
BMC Bioinformatics. 2009 Sep 2;10:275. doi: 10.1186/1471-2105-10-275.
10
Using set theory to reduce redundancy in pathway sets.运用集合论减少通路集的冗余。
BMC Bioinformatics. 2018 Oct 19;19(1):386. doi: 10.1186/s12859-018-2355-3.

引用本文的文献

本文引用的文献

2
Evidence from marginally significant statistics.来自边缘显著统计数据的证据。
Am Stat. 2019;73(Suppl 1):129-134. doi: 10.1080/00031305.2018.1518788. Epub 2019 Mar 20.
6
Using set theory to reduce redundancy in pathway sets.运用集合论减少通路集的冗余。
BMC Bioinformatics. 2018 Oct 19;19(1):386. doi: 10.1186/s12859-018-2355-3.
8
The reproducibility of research and the misinterpretation of -values.研究的可重复性与P值的错误解读
R Soc Open Sci. 2017 Dec 6;4(12):171085. doi: 10.1098/rsos.171085. eCollection 2017 Dec.
9
Myriads: P-value-based multiple testing correction.Myriads:基于 P 值的多重检验校正。
Bioinformatics. 2018 Mar 15;34(6):1043-1045. doi: 10.1093/bioinformatics/btx746.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验