Ziemann Mark, Schroeter Barry, Bora Anusuiya
Bioinformatics Working Group, Burnet Institute, Melbourne, VIC 3004, Australia.
School of Life and Environmental Sciences, Deakin University, Geelong, VIC 3216, Australia.
Bioinform Adv. 2024 Oct 21;4(1):vbae159. doi: 10.1093/bioadv/vbae159. eCollection 2024.
Overrepresentation analysis (ORA) is used widely to assess the enrichment of functional categories in a gene list compared to a background list. ORA is therefore a critical method in the interpretation of 'omics data, relating gene lists to biological functions and themes. Although ORA is hugely popular, we and others have noticed two potentially undesired behaviours of some ORA tools. The first one we call the 'background problem', because it involves the software eliminating large numbers of genes from the background list if they are not annotated as belonging to any category. The second one we call the 'false discovery rate problem', because some tools underestimate the true number of parallel tests conducted.
Here, we demonstrate the impact of these issues on several real RNA-seq datasets and use simulated RNA-seq data to quantify the impact of these problems. We show that the severity of these problems depends on the gene set library, the number of genes in the list, and the degree of noise in the dataset. These problems can be mitigated by changing packages/websites for ORA or by changing to another approach such as functional class scoring.
An R/Shiny tool has been provided at https://oratool.ziemann-lab.net/ and the supporting materials are available from Zenodo (https://zenodo.org/records/13823301).
过表达分析(ORA)被广泛用于评估基因列表中功能类别相对于背景列表的富集情况。因此,ORA是解释“组学”数据(将基因列表与生物学功能和主题相关联)的关键方法。尽管ORA非常受欢迎,但我们和其他人已经注意到一些ORA工具存在两种潜在的不良行为。第一种我们称为“背景问题”,因为它涉及软件从背景列表中剔除大量未被注释为属于任何类别的基因。第二种我们称为“错误发现率问题”,因为一些工具低估了所进行的并行测试的真实数量。
在这里,我们展示了这些问题对几个真实RNA测序数据集的影响,并使用模拟RNA测序数据来量化这些问题的影响。我们表明,这些问题的严重程度取决于基因集库、列表中的基因数量以及数据集中的噪声程度。可以通过更换ORA的软件包/网站或改用另一种方法(如功能类评分)来缓解这些问题。
已在https://oratool.ziemann-lab.net/提供了一个R/Shiny工具,支持材料可从Zenodo(https://zenodo.org/records/13823301)获取。