Suppr超能文献

候选途径与途径评分关联的概率优先级排序。

Probabilistic prioritization of candidate pathway association with pathway score.

机构信息

Institute of Epidemiology and Preventive Medicine, College of Public Health, National Taiwan University, Taipei, 10055, Taiwan.

Bioinformatics and Biostatistics Core, Center of Genomic Medicine, National Taiwan University, Taipei, 10055, Taiwan.

出版信息

BMC Bioinformatics. 2018 Oct 24;19(1):391. doi: 10.1186/s12859-018-2411-z.

Abstract

BACKGROUND

Current methods for gene-set or pathway analysis are usually designed to test the enrichment of a single gene-set. Once the analysis is carried out for each of the sets under study, a list of significant sets can be obtained. However, if one wishes to further prioritize the importance or strength of association of these sets, no such quantitative measure is available. Using the magnitude of p-value to rank the pathways may not be appropriate because p-value is not a measure for strength of significance. In addition, when testing each pathway, these analyses are often implicitly affected by the number of differentially expressed genes included in the set and/or affected by the dependence among genes.

RESULTS

Here we propose a two-stage procedure to prioritize the pathways/gene-sets. In the first stage we develop a pathway-level measure with three properties. First, it contains all genes (differentially expressed or not) in the same set, and summarizes the collective effect of all genes per sample. Second, this pathway score accounts for the correlation between genes by synchronizing their correlation directions. Third, the score includes a rank transformation to enhance the variation among samples as well as to avoid the influence of extreme heterogeneity among genes. In the second stage, all scores are included simultaneously in a Bayesian logistic regression model which can evaluate the strength of association for each set and rank the sets based on posterior probabilities. Simulations from Gaussian distributions and human microarray data, and a breast cancer study with RNA-Seq are considered for demonstration and comparison with other existing methods.

CONCLUSIONS

The proposed summary pathway score provides for each sample an overall evaluation of gene expression in a gene-set. It demonstrates the advantages of including all genes in the set and the synchronization of correlation direction. The simultaneous utilization of all pathway-level scores in a Bayesian model not only offers a probabilistic evaluation and ranking of the pathway association but also presents good accuracy in identifying the top-ranking pathways. The resulting recommendation list of ranked pathways can be a reference for potential target therapy or for future allocation of research resources.

摘要

背景

目前的基因集或通路分析方法通常旨在测试单个基因集的富集情况。一旦对每个研究中的基因集进行分析,就可以获得一组显著的基因集。然而,如果希望进一步优先考虑这些基因集的重要性或关联强度,则没有这样的定量衡量标准。使用 p 值的大小来对通路进行排序可能并不合适,因为 p 值不是衡量显著性强度的指标。此外,在对每个通路进行测试时,这些分析通常会受到纳入基因集的差异表达基因数量的影响,并且/或者受到基因之间的相关性的影响。

结果

我们提出了一种两阶段程序来对通路/基因集进行优先级排序。在第一阶段,我们开发了一种具有三个特性的通路水平度量标准。首先,它包含同一基因集中的所有基因(差异表达或不差异表达),并汇总了每个样本中所有基因的综合效应。其次,该通路得分通过同步其相关方向来考虑基因之间的相关性。第三,该得分包括一个排名转换,以增强样本之间的变异性,同时避免基因之间极端异质性的影响。在第二阶段,所有得分同时包含在贝叶斯逻辑回归模型中,该模型可以评估每个基因集的关联强度,并根据后验概率对基因集进行排名。从高斯分布和人类微阵列数据模拟以及 RNA-Seq 的乳腺癌研究中进行了演示和与其他现有方法的比较。

结论

所提出的综合通路得分标准为每个样本提供了基因集内基因表达的整体评估。它展示了纳入基因集中的所有基因以及同步相关方向的优势。在贝叶斯模型中同时利用所有通路水平得分不仅提供了对通路关联的概率评估和排名,而且在识别排名靠前的通路方面具有很好的准确性。按排名顺序排列的推荐通路列表可以作为潜在靶向治疗或未来研究资源分配的参考。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验