Suppr超能文献

用于将先验网络知识整合到基因集分析中的加权重叠组套索法。

Weighted overlapping group lasso for integrating prior network knowledge into gene set analysis.

作者信息

Huang Dan, Jo Geunsu, Kim Kipoong, Sun Hokeun

机构信息

Department of Statistic, Pusan National University, Busan, 46241, Korea.

Department of Statistics, Changwon National University, Changwon, 51140, Korea.

出版信息

BMC Bioinformatics. 2025 Sep 1;26(1):226. doi: 10.1186/s12859-025-06170-9.

Abstract

BACKGROUND

Gene set analysis aims to identify gene sets containing differentially expressed genes between two different experimental conditions. A representative example of gene sets is a gene regulatory network where multiple genes are linked with each other for regulation of gene expression. Most of statistical methods for gene set analysis were designed to capture group-based association signals, ignoring a genetic network structure. Consequently, they often fail to identify gene sets where the number of differentially expressed genes are only a few and they have sparse association signals.

RESULTS

We propose a new computational method to utilize prior network knowledge for gene set analysis. The proposed method is essentially combines the coefficient estimates of network-based regularization into overlapping group lasso. Network-based regularization can boost association signals among linked genes while overlapping group lasso performs selection of gene sets including differentially expressed genes. In our extensive simulation study, the performance of the proposed method has been evaluated, compared with the existing methods. We also applied it to gene expression data of The Cancer Genome Atlas Breast Invasive Carcinoma Collection (TCGA-BRCA). We were able to identify cancer-related pathways that were missed by the existing methods.

CONCLUSION

Overlapping group lasso is a regularization method for group selection allowing overlapping variables. Network-based regularization is a variable selection method utilizing graph information among variables. The proposed weighted overlapping group lasso (wOGL) adopts the coefficient estimates of network-based regularization for the weight of overlapping group lasso. Consequently, it can identify gene sets containing differentially expressed genes, utilizing prior network knowledge.

摘要

背景

基因集分析旨在识别在两种不同实验条件下包含差异表达基因的基因集。基因集的一个典型例子是基因调控网络,其中多个基因相互连接以调控基因表达。大多数基因集分析的统计方法旨在捕捉基于组的关联信号,而忽略了遗传网络结构。因此,它们常常无法识别差异表达基因数量较少且具有稀疏关联信号的基因集。

结果

我们提出了一种新的计算方法,利用先验网络知识进行基因集分析。所提出的方法本质上是将基于网络的正则化系数估计与重叠组套索相结合。基于网络的正则化可以增强连锁基因之间的关联信号,而重叠组套索则执行包含差异表达基因的基因集选择。在我们广泛的模拟研究中,已将所提出方法的性能与现有方法进行了评估比较。我们还将其应用于癌症基因组图谱乳腺浸润癌数据集(TCGA-BRCA)的基因表达数据。我们能够识别出现有方法遗漏的癌症相关通路。

结论

重叠组套索是一种用于组选择的正则化方法,允许变量重叠。基于网络的正则化是一种利用变量间图信息的变量选择方法。所提出的加权重叠组套索(wOGL)采用基于网络的正则化系数估计作为重叠组套索的权重。因此,它可以利用先验网络知识识别包含差异表达基因的基因集。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7f3c/12403420/caccb9acda5c/12859_2025_6170_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验