Escuela Superior Politécnica de Chimborazo (ESPOCH), Research Group in Data Science CIDED, Panamericana Sur Km 1 1/2, Riobamba, Ecuador.
Department of Statistics and Operational Research, Faculty of Mathematics and Statistics, Universitat Politècnica de Catalunya, Barcelona, Spain.
BMC Bioinformatics. 2022 May 31;23(1):207. doi: 10.1186/s12859-022-04739-2.
In integrative bioinformatic analyses, it is of great interest to stablish the equivalence between gene or (more in general) feature lists, up to a given level and in terms of their annotations in the Gene Ontology. The aim of this article is to present an equivalence test based on the proportion of GO terms which are declared as enriched in both lists simultaneously.
On the basis of these data, the dissimilarity between gene lists is measured by means of the Sorensen-Dice index. We present two flavours of the same test: One of them based on the asymptotic normality of the test statistic and the other based on the bootstrap method.
The accuracy of these tests is studied by means of simulation and their possible interest is illustrated by using them over two real datasets: A collection of gene lists related to cancer and a collection of gene lists related to kidney rejection after transplantation.
在综合性生物信息分析中,建立基因或(更一般地说)特征列表之间的等效性,达到给定的水平,并根据它们在基因本体论中的注释,是非常重要的。本文的目的是提出一种基于同时在两个列表中声明为富集的 GO 术语比例的等价性检验。
基于这些数据,通过索伦森-戴斯指数来衡量基因列表之间的差异。我们提出了同一测试的两种变体:一种基于测试统计量的渐近正态性,另一种基于自举法。
通过模拟研究了这些测试的准确性,并通过将它们应用于两个真实数据集来说明它们的可能的意义:一个与癌症相关的基因列表集合和一个与移植后肾脏排斥反应相关的基因列表集合。