Suppr超能文献

高通量基因组数据的基因集分析十五年:统计方法综述与未来挑战

Fifteen Years of Gene Set Analysis for High-Throughput Genomic Data: A Review of Statistical Approaches and Future Challenges.

作者信息

Das Samarendra, McClain Craig J, Rai Shesh N

机构信息

Division of Statistical Genetics, ICAR-Indian Agricultural Statistics Research Institute, New Delhi 110012, India.

School of Interdisciplinary and Graduate Studies, University of Louisville, Louisville, KY 40292, USA.

出版信息

Entropy (Basel). 2020 Apr 10;22(4):427. doi: 10.3390/e22040427.

Abstract

Over the last decade, gene set analysis has become the first choice for gaining insights into underlying complex biology of diseases through gene expression and gene association studies. It also reduces the complexity of statistical analysis and enhances the explanatory power of the obtained results. Although gene set analysis approaches are extensively used in gene expression and genome wide association data analysis, the statistical structure and steps common to these approaches have not yet been comprehensively discussed, which limits their utility. In this article, we provide a comprehensive overview, statistical structure and steps of gene set analysis approaches used for microarrays, RNA-sequencing and genome wide association data analysis. Further, we also classify the gene set analysis approaches and tools by the type of genomic study, null hypothesis, sampling model and nature of the test statistic, etc. Rather than reviewing the gene set analysis approaches individually, we provide the generation-wise evolution of such approaches for microarrays, RNA-sequencing and genome wide association studies and discuss their relative merits and limitations. Here, we identify the key biological and statistical challenges in current gene set analysis, which will be addressed by statisticians and biologists collectively in order to develop the next generation of gene set analysis approaches. Further, this study will serve as a catalog and provide guidelines to genome researchers and experimental biologists for choosing the proper gene set analysis approach based on several factors.

摘要

在过去十年中,基因集分析已成为通过基因表达和基因关联研究深入了解疾病潜在复杂生物学机制的首选方法。它还降低了统计分析的复杂性,并增强了所得结果的解释力。尽管基因集分析方法在基因表达和全基因组关联数据分析中被广泛使用,但这些方法共有的统计结构和步骤尚未得到全面讨论,这限制了它们的实用性。在本文中,我们对用于微阵列、RNA测序和全基因组关联数据分析的基因集分析方法进行了全面概述、统计结构和步骤介绍。此外,我们还根据基因组研究类型、零假设、抽样模型和检验统计量的性质等对基因集分析方法和工具进行了分类。我们并非单独回顾基因集分析方法,而是提供了针对微阵列、RNA测序和全基因组关联研究的此类方法的逐代演变,并讨论了它们的相对优缺点。在这里,我们确定了当前基因集分析中的关键生物学和统计挑战,统计学家和生物学家将共同应对这些挑战,以开发下一代基因集分析方法。此外,本研究将作为一个目录,为基因组研究人员和实验生物学家提供指导,以便他们根据多个因素选择合适的基因集分析方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b11d/7516904/9d71378a8900/entropy-22-00427-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验