Suppr超能文献

基于生物网络正则化逻辑模型的分子通路识别。

Molecular pathway identification using biological network-regularized logistic models.

出版信息

BMC Genomics. 2013;14 Suppl 8(Suppl 8):S7. doi: 10.1186/1471-2164-14-S8-S7. Epub 2013 Dec 9.

Abstract

BACKGROUND

Selecting genes and pathways indicative of disease is a central problem in computational biology. This problem is especially challenging when parsing multi-dimensional genomic data. A number of tools, such as L1-norm based regularization and its extensions elastic net and fused lasso, have been introduced to deal with this challenge. However, these approaches tend to ignore the vast amount of a priori biological network information curated in the literature.

RESULTS

We propose the use of graph Laplacian regularized logistic regression to integrate biological networks into disease classification and pathway association problems. Simulation studies demonstrate that the performance of the proposed algorithm is superior to elastic net and lasso analyses. Utility of this algorithm is also validated by its ability to reliably differentiate breast cancer subtypes using a large breast cancer dataset recently generated by the Cancer Genome Atlas (TCGA) consortium. Many of the protein-protein interaction modules identified by our approach are further supported by evidence published in the literature. Source code of the proposed algorithm is freely available at http://www.github.com/zhandong/Logit-Lapnet.

CONCLUSION

Logistic regression with graph Laplacian regularization is an effective algorithm for identifying key pathways and modules associated with disease subtypes. With the rapid expansion of our knowledge of biological regulatory networks, this approach will become more accurate and increasingly useful for mining transcriptomic, epi-genomic, and other types of genome wide association studies.

摘要

背景

选择与疾病相关的基因和途径是计算生物学中的一个核心问题。在解析多维基因组数据时,这个问题尤其具有挑战性。已经引入了许多工具,例如基于 L1 范数的正则化及其扩展弹性网络和融合套索,以应对这一挑战。然而,这些方法往往忽略了文献中精心整理的大量先验生物学网络信息。

结果

我们建议使用图拉普拉斯正则化逻辑回归将生物学网络整合到疾病分类和途径关联问题中。模拟研究表明,所提出算法的性能优于弹性网络和套索分析。该算法的有效性也通过其使用最近由癌症基因组图谱 (TCGA) 联盟生成的大型乳腺癌数据集可靠地区分乳腺癌亚型的能力得到了验证。我们方法识别的许多蛋白质-蛋白质相互作用模块进一步得到了文献中发表的证据的支持。所提出算法的源代码可在 http://www.github.com/zhandong/Logit-Lapnet 上免费获得。

结论

带有图拉普拉斯正则化的逻辑回归是识别与疾病亚型相关的关键途径和模块的有效算法。随着我们对生物调控网络知识的快速扩展,这种方法将变得更加准确,并越来越有助于挖掘转录组、表观基因组和其他类型的全基因组关联研究。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/78a5/4046566/fb05f2db9701/1471-2164-14-S8-S7-1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验