Douglas Cameron J, Seath Ciaran P
Department of Chemistry, Wertheim UF Scripps, Jupiter, Florida, 33418, United States.
The Skaggs Graduate School of Chemical and Biological Sciences, 120 Scripps Way, Jupiter, FL 33458, USA.
bioRxiv. 2025 May 10:2025.05.05.652253. doi: 10.1101/2025.05.05.652253.
Omics analysis has become an indispensable tool for researchers in the life sciences, enabling the study of DNA, RNA, and proteins and how they respond to cellular stimulus. Many methods of data analysis exist for the generation and characterization of gene lists, however, selection of genes for further investigation is still heavily influenced by prior knowledge, with practitioners often studying well characterized genes, reinforcing bias in the literature. Here, we have developed an open-source, R package for impartial ranking of gene lists derived from omics analysis that we term Deciphering Scientific Discoveries (DeSciDe). We applied a pipeline that sorts a gene list first by precedence, which we define as co-occurrence of the gene with pre-defined search terms in publications. We then rank gene lists by connectivity, an underutilized metric for how related a gene is to other enriched genes. The combination of these rankings by scatterplot provides a method for gene selection by simple visual analysis. We apply this analysis methods to published Omics datasets, identifying novel avenues for investigation. Further, using this method we have been able to assign a novel loss of function role for the histone mutation H2A E92K.
组学分析已成为生命科学研究人员不可或缺的工具,能够研究DNA、RNA和蛋白质以及它们如何对细胞刺激做出反应。存在许多用于生成和表征基因列表的数据分析方法,然而,选择进一步研究的基因仍然受到先验知识的严重影响,从业者通常研究特征明确的基因,这加剧了文献中的偏差。在这里,我们开发了一个开源的R包,用于对源自组学分析的基因列表进行公正排名,我们将其称为“解读科学发现”(DeSciDe)。我们应用了一种流程,首先按优先级对基因列表进行排序,我们将优先级定义为基因与出版物中预定义搜索词的共现情况。然后,我们通过连通性对基因列表进行排名,连通性是一种未充分利用的指标,用于衡量一个基因与其他富集基因的关联程度。通过散点图对这些排名进行组合,提供了一种通过简单视觉分析进行基因选择的方法。我们将这种分析方法应用于已发表的组学数据集,确定了新的研究途径。此外,使用这种方法,我们能够为组蛋白突变H2A E92K赋予一种新的功能丧失作用。