Suppr超能文献

用于生物标志物识别的基于网络的逻辑回归集成方法。

Network-based logistic regression integration method for biomarker identification.

作者信息

Zhang Ke, Geng Wei, Zhang Shuqin

机构信息

School of Mathematical Sciences, Fudan University, No.220 Handan Road, Shanghai, 200433, China.

Center for Computational Systems Biology, Shanghai Key Laboratory for Contemporary Applied Mathematics, School of Mathematical Sciences, Fudan University, No.220 Handan Road, Shanghai, 200433, China.

出版信息

BMC Syst Biol. 2018 Dec 31;12(Suppl 9):135. doi: 10.1186/s12918-018-0657-8.

Abstract

BACKGROUND

Many mathematical and statistical models and algorithms have been proposed to do biomarker identification in recent years. However, the biomarkers inferred from different datasets suffer a lack of reproducibilities due to the heterogeneity of the data generated from different platforms or laboratories. This motivates us to develop robust biomarker identification methods by integrating multiple datasets.

METHODS

In this paper, we developed an integrative method for classification based on logistic regression. Different constant terms are set in the logistic regression model to measure the heterogeneity of the samples. By minimizing the differences of the constant terms within the same dataset, both the homogeneity within the same dataset and the heterogeneity in multiple datasets can be kept. The model is formulated as an optimization problem with a network penalty measuring the differences of the constant terms. The L penalty, elastic penalty and network related penalties are added to the objective function for the biomarker discovery purpose. Algorithms based on proximal Newton method are proposed to solve the optimization problem.

RESULTS

We first applied the proposed method to the simulated datasets. Both the AUC of the prediction and the biomarker identification accuracy are improved. We then applied the method to two breast cancer gene expression datasets. By integrating both datasets, the prediction AUC is improved over directly merging the datasets and MetaLasso. And it's comparable to the best AUC when doing biomarker identification in an individual dataset. The identified biomarkers using network related penalty for variables were further analyzed. Meaningful subnetworks enriched by breast cancer were identified.

CONCLUSION

A network-based integrative logistic regression model is proposed in the paper. It improves both the prediction and biomarker identification accuracy.

摘要

背景

近年来,人们提出了许多数学和统计模型及算法来进行生物标志物识别。然而,由于不同平台或实验室生成的数据存在异质性,从不同数据集中推断出的生物标志物缺乏可重复性。这促使我们通过整合多个数据集来开发强大的生物标志物识别方法。

方法

在本文中,我们开发了一种基于逻辑回归的综合分类方法。在逻辑回归模型中设置不同的常数项来衡量样本的异质性。通过最小化同一数据集中常数项的差异,可以同时保持同一数据集中的同质性和多个数据集中的异质性。该模型被表述为一个带有衡量常数项差异的网络惩罚的优化问题。为了发现生物标志物,将L惩罚、弹性惩罚和与网络相关的惩罚添加到目标函数中。提出了基于近端牛顿法的算法来解决该优化问题。

结果

我们首先将所提出的方法应用于模拟数据集。预测的AUC和生物标志物识别准确率均得到提高。然后我们将该方法应用于两个乳腺癌基因表达数据集。通过整合这两个数据集,预测AUC比直接合并数据集和MetaLasso有所提高。并且在单个数据集中进行生物标志物识别时,它与最佳AUC相当。对使用与网络相关的变量惩罚所识别出的生物标志物进行了进一步分析。识别出了富含乳腺癌的有意义的子网。

结论

本文提出了一种基于网络的综合逻辑回归模型。它提高了预测和生物标志物识别的准确率。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验