Suppr超能文献

使用疾病通路网络对富集分析方法进行基准测试。

Benchmarking enrichment analysis methods with the disease pathway network.

作者信息

Buzzao Davide, Castresana-Aguirre Miguel, Guala Dimitri, Sonnhammer Erik L L

机构信息

Department of Biochemistry and Biophysics, Stockholm University, Science for Life Laboratory, Box 1031, 171 21 Solna, Sweden.

K7 Department of Oncology-Pathology, Karolinska Institute, 171 77 Stockholm, Sweden.

出版信息

Brief Bioinform. 2024 Jan 22;25(2). doi: 10.1093/bib/bbae069.

Abstract

Enrichment analysis (EA) is a common approach to gain functional insights from genome-scale experiments. As a consequence, a large number of EA methods have been developed, yet it is unclear from previous studies which method is the best for a given dataset. The main issues with previous benchmarks include the complexity of correctly assigning true pathways to a test dataset, and lack of generality of the evaluation metrics, for which the rank of a single target pathway is commonly used. We here provide a generalized EA benchmark and apply it to the most widely used EA methods, representing all four categories of current approaches. The benchmark employs a new set of 82 curated gene expression datasets from DNA microarray and RNA-Seq experiments for 26 diseases, of which only 13 are cancers. In order to address the shortcomings of the single target pathway approach and to enhance the sensitivity evaluation, we present the Disease Pathway Network, in which related Kyoto Encyclopedia of Genes and Genomes pathways are linked. We introduce a novel approach to evaluate pathway EA by combining sensitivity and specificity to provide a balanced evaluation of EA methods. This approach identifies Network Enrichment Analysis methods as the overall top performers compared with overlap-based methods. By using randomized gene expression datasets, we explore the null hypothesis bias of each method, revealing that most of them produce skewed P-values.

摘要

富集分析(EA)是一种从基因组规模实验中获取功能见解的常用方法。因此,已经开发了大量的EA方法,但从以前的研究中尚不清楚哪种方法最适合给定的数据集。以前基准测试的主要问题包括将真实通路正确分配给测试数据集的复杂性,以及评估指标缺乏通用性,其中通常使用单个目标通路的排名。我们在此提供一个通用的EA基准,并将其应用于最广泛使用的EA方法,这些方法代表了当前所有四类方法。该基准采用了一组新的82个经过整理的基因表达数据集,这些数据集来自26种疾病的DNA微阵列和RNA测序实验,其中只有13种是癌症。为了解决单一目标通路方法的缺点并提高敏感性评估,我们提出了疾病通路网络,其中相关的京都基因与基因组百科全书通路相互关联。我们引入了一种通过结合敏感性和特异性来评估通路EA的新方法,以对EA方法进行平衡评估。与基于重叠的方法相比,这种方法将网络富集分析方法确定为总体上表现最佳的方法。通过使用随机基因表达数据集,我们探索了每种方法的零假设偏差,发现它们中的大多数产生了偏态P值。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c9f8/10939300/41674372b784/bbae069f1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验