Suppr超能文献

一种用于在具有多类别响应的广义线性模型中引入稀疏性的贝叶斯方法。

A Bayesian approach for inducing sparsity in generalized linear models with multi-category response.

作者信息

Madahian Behrouz, Roy Sujoy, Bowman Dale, Deng Lih Y, Homayouni Ramin

出版信息

BMC Bioinformatics. 2015;16 Suppl 13(Suppl 13):S13. doi: 10.1186/1471-2105-16-S13-S13. Epub 2015 Sep 25.

Abstract

BACKGROUND

The dimension and complexity of high-throughput gene expression data create many challenges for downstream analysis. Several approaches exist to reduce the number of variables with respect to small sample sizes. In this study, we utilized the Generalized Double Pareto (GDP) prior to induce sparsity in a Bayesian Generalized Linear Model (GLM) setting. The approach was evaluated using a publicly available microarray dataset containing 99 samples corresponding to four different prostate cancer subtypes.

RESULTS

A hierarchical Sparse Bayesian GLM using GDP prior (SBGG) was developed to take into account the progressive nature of the response variable. We obtained an average overall classification accuracy between 82.5% and 94%, which was higher than Support Vector Machine, Random Forest or a Sparse Bayesian GLM using double exponential priors. Additionally, SBGG outperforms the other 3 methods in correctly identifying pre-metastatic stages of cancer progression, which can prove extremely valuable for therapeutic and diagnostic purposes. Importantly, using Geneset Cohesion Analysis Tool, we found that the top 100 genes produced by SBGG had an average functional cohesion p-value of 2.0E-4 compared to 0.007 to 0.131 produced by the other methods.

CONCLUSIONS

Using GDP in a Bayesian GLM model applied to cancer progression data results in better subclass prediction. In particular, the method identifies pre-metastatic stages of prostate cancer with substantially better accuracy and produces more functionally relevant gene sets.

摘要

背景

高通量基因表达数据的规模和复杂性给下游分析带来了诸多挑战。针对小样本量的情况,存在多种减少变量数量的方法。在本研究中,我们在贝叶斯广义线性模型(GLM)设置中使用广义双帕累托(GDP)先验来诱导稀疏性。该方法使用一个公开可用的包含99个样本的微阵列数据集进行评估,这些样本对应四种不同的前列腺癌亚型。

结果

开发了一种使用GDP先验的分层稀疏贝叶斯GLM(SBGG),以考虑响应变量的渐进性质。我们获得的平均总体分类准确率在82.5%至94%之间,高于支持向量机、随机森林或使用双指数先验的稀疏贝叶斯GLM。此外,在正确识别癌症进展的转移前阶段方面,SBGG优于其他三种方法,这对于治疗和诊断目的可能极具价值。重要的是,使用基因集凝聚分析工具,我们发现SBGG产生的前100个基因的平均功能凝聚p值为2.0E - 4,而其他方法产生的该值在0.007至0.131之间。

结论

在应用于癌症进展数据的贝叶斯GLM模型中使用GDP可实现更好的亚类预测。特别是,该方法在识别前列腺癌转移前阶段时具有显著更高的准确性,并产生更多功能相关的基因集。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8154/4597416/a43445fed8b6/1471-2105-16-S13-S13-1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验