用于生物标志物识别的基于网络的逻辑回归集成方法。

Network-based logistic regression integration method for biomarker identification.

作者信息

Zhang Ke, Geng Wei, Zhang Shuqin

机构信息

School of Mathematical Sciences, Fudan University, No.220 Handan Road, Shanghai, 200433, China.

Center for Computational Systems Biology, Shanghai Key Laboratory for Contemporary Applied Mathematics, School of Mathematical Sciences, Fudan University, No.220 Handan Road, Shanghai, 200433, China.

出版信息

BMC Syst Biol. 2018 Dec 31;12(Suppl 9):135. doi: 10.1186/s12918-018-0657-8.

DOI:10.1186/s12918-018-0657-8

PMID:30598085

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6311907/

Abstract

BACKGROUND

Many mathematical and statistical models and algorithms have been proposed to do biomarker identification in recent years. However, the biomarkers inferred from different datasets suffer a lack of reproducibilities due to the heterogeneity of the data generated from different platforms or laboratories. This motivates us to develop robust biomarker identification methods by integrating multiple datasets.

METHODS

In this paper, we developed an integrative method for classification based on logistic regression. Different constant terms are set in the logistic regression model to measure the heterogeneity of the samples. By minimizing the differences of the constant terms within the same dataset, both the homogeneity within the same dataset and the heterogeneity in multiple datasets can be kept. The model is formulated as an optimization problem with a network penalty measuring the differences of the constant terms. The L penalty, elastic penalty and network related penalties are added to the objective function for the biomarker discovery purpose. Algorithms based on proximal Newton method are proposed to solve the optimization problem.

RESULTS

We first applied the proposed method to the simulated datasets. Both the AUC of the prediction and the biomarker identification accuracy are improved. We then applied the method to two breast cancer gene expression datasets. By integrating both datasets, the prediction AUC is improved over directly merging the datasets and MetaLasso. And it's comparable to the best AUC when doing biomarker identification in an individual dataset. The identified biomarkers using network related penalty for variables were further analyzed. Meaningful subnetworks enriched by breast cancer were identified.

CONCLUSION

A network-based integrative logistic regression model is proposed in the paper. It improves both the prediction and biomarker identification accuracy.

摘要

背景

近年来，人们提出了许多数学和统计模型及算法来进行生物标志物识别。然而，由于不同平台或实验室生成的数据存在异质性，从不同数据集中推断出的生物标志物缺乏可重复性。这促使我们通过整合多个数据集来开发强大的生物标志物识别方法。

方法

在本文中，我们开发了一种基于逻辑回归的综合分类方法。在逻辑回归模型中设置不同的常数项来衡量样本的异质性。通过最小化同一数据集中常数项的差异，可以同时保持同一数据集中的同质性和多个数据集中的异质性。该模型被表述为一个带有衡量常数项差异的网络惩罚的优化问题。为了发现生物标志物，将L惩罚、弹性惩罚和与网络相关的惩罚添加到目标函数中。提出了基于近端牛顿法的算法来解决该优化问题。

结果

我们首先将所提出的方法应用于模拟数据集。预测的AUC和生物标志物识别准确率均得到提高。然后我们将该方法应用于两个乳腺癌基因表达数据集。通过整合这两个数据集，预测AUC比直接合并数据集和MetaLasso有所提高。并且在单个数据集中进行生物标志物识别时，它与最佳AUC相当。对使用与网络相关的变量惩罚所识别出的生物标志物进行了进一步分析。识别出了富含乳腺癌的有意义的子网。

结论

本文提出了一种基于网络的综合逻辑回归模型。它提高了预测和生物标志物识别的准确率。

相似文献

Network-based logistic regression integration method for biomarker identification.用于生物标志物识别的基于网络的逻辑回归集成方法。

BMC Syst Biol. 2018 Dec 31;12(Suppl 9):135. doi: 10.1186/s12918-018-0657-8.

GSNFS: Gene subnetwork biomarker identification of lung cancer expression data.GSNFS：肺癌表达数据的基因子网生物标志物识别

BMC Med Genomics. 2016 Dec 5;9(Suppl 3):70. doi: 10.1186/s12920-016-0231-4.

Network-Regularized Sparse Logistic Regression Models for Clinical Risk Prediction and Biomarker Discovery.用于临床风险预测和生物标志物发现的基于网络正则化稀疏逻辑回归模型。

IEEE/ACM Trans Comput Biol Bioinform. 2018 May-Jun;15(3):944-953. doi: 10.1109/TCBB.2016.2640303. Epub 2016 Dec 15.

Sparse logistic regression with Lp penalty for biomarker identification.用于生物标志物识别的具有Lp惩罚的稀疏逻辑回归。

Stat Appl Genet Mol Biol. 2007;6:Article6. doi: 10.2202/1544-6115.1248. Epub 2007 Feb 10.

Comparison of methods for the detection of outliers and associated biomarkers in mislabeled omics data.比较用于检测组学数据中标记错误的异常值和相关生物标志物的方法。

BMC Bioinformatics. 2020 Aug 14;21(1):357. doi: 10.1186/s12859-020-03653-9.

BMRF-MI: integrative identification of protein interaction network by modeling the gene dependency.BMRF-MI：通过对基因依赖性进行建模来综合识别蛋白质相互作用网络。

BMC Genomics. 2015;16 Suppl 7(Suppl 7):S10. doi: 10.1186/1471-2164-16-S7-S10. Epub 2015 Jun 11.

NCC-AUC: an AUC optimization method to identify multi-biomarker panel for cancer prognosis from genomic and clinical data.NCC-AUC：一种 AUC 优化方法，用于从基因组和临床数据中识别用于癌症预后的多生物标志物组。

Bioinformatics. 2015 Oct 15;31(20):3330-8. doi: 10.1093/bioinformatics/btv374. Epub 2015 Jun 18.

Regularized logistic regression with network-based pairwise interaction for biomarker identification in breast cancer.用于乳腺癌生物标志物识别的基于网络的成对相互作用的正则化逻辑回归

BMC Bioinformatics. 2016 Feb 27;17:108. doi: 10.1186/s12859-016-0951-7.

IPF-LASSO: Integrative -Penalized Regression with Penalty Factors for Prediction Based on Multi-Omics Data.IPF-LASSO：基于多组学数据的带惩罚因子的整合惩罚回归用于预测

Comput Math Methods Med. 2017;2017:7691937. doi: 10.1155/2017/7691937. Epub 2017 May 4.

Incorporating prior biological knowledge for network-based differential gene expression analysis using differentially weighted graphical LASSO.利用差异加权图形套索法，将先验生物学知识纳入基于网络的差异基因表达分析。

BMC Bioinformatics. 2017 Feb 10;18(1):99. doi: 10.1186/s12859-017-1515-1.

引用本文的文献

Uncovering the Understanding of the Concept of Patient Similarity in Cancer Research and Treatment: Scoping Review.揭示癌症研究与治疗中患者相似性概念的理解：范围综述

J Med Internet Res. 2025 Aug 18;27:e71906. doi: 10.2196/71906.

From Data to Cure: A Comprehensive Exploration of Multi-omics Data Analysis for Targeted Therapies.从数据到治愈：靶向治疗多组学数据分析的全面探索

Mol Biotechnol. 2025 Apr;67(4):1269-1289. doi: 10.1007/s12033-024-01133-6. Epub 2024 Apr 2.

Adjustment of -value expression to ontology using machine learning for genetic prediction, prioritization, interaction, and its validation in glomerular disease.利用机器学习对肾小球疾病中的基因预测、优先级排序、相互作用及其验证进行-值表达与本体的调整。

Front Genet. 2023 Oct 12;14:1215232. doi: 10.3389/fgene.2023.1215232. eCollection 2023.

Machine learning prediction models for different stages of non-small cell lung cancer based on tongue and tumor marker: a pilot study.基于舌象和肿瘤标志物的非小细胞肺癌不同阶段的机器学习预测模型：一项初步研究。

BMC Med Inform Decis Mak. 2023 Sep 29;23(1):197. doi: 10.1186/s12911-023-02266-5.

A Study of Logistic Regression for Fatigue Classification Based on Data of Tongue and Pulse.基于舌脉数据的疲劳分类逻辑回归研究

Evid Based Complement Alternat Med. 2022 Mar 5;2022:2454678. doi: 10.1155/2022/2454678. eCollection 2022.

A New Approach of Fatigue Classification Based on Data of Tongue and Pulse With Machine Learning.一种基于舌象和脉象数据利用机器学习进行疲劳分类的新方法。

Front Physiol. 2022 Feb 7;12:708742. doi: 10.3389/fphys.2021.708742. eCollection 2021.

A New Method for Syndrome Classification of Non-Small-Cell Lung Cancer Based on Data of Tongue and Pulse with Machine Learning.基于舌象和脉象数据的机器学习在非小细胞肺癌中医证候分类中的应用

Biomed Res Int. 2021 Aug 11;2021:1337558. doi: 10.1155/2021/1337558. eCollection 2021.

Genomic, proteomic, and systems biology approaches in biomarker discovery for multiple sclerosis.基因组学、蛋白质组学和系统生物学方法在多发性硬化症生物标志物发现中的应用。

Cell Immunol. 2020 Dec;358:104219. doi: 10.1016/j.cellimm.2020.104219. Epub 2020 Sep 20.

Meta-Analysis Based on Nonconvex Regularization.基于非凸正则化的荟萃分析。

Sci Rep. 2020 Apr 1;10(1):5755. doi: 10.1038/s41598-020-62473-2.

本文引用的文献

Characterizing undiagnosed chronic obstructive pulmonary disease: a systematic review and meta-analysis.描述未诊断的慢性阻塞性肺疾病的特征：系统评价和荟萃分析。

Respir Res. 2018 Feb 7;19(1):26. doi: 10.1186/s12931-018-0731-1.

IEEE/ACM Trans Comput Biol Bioinform. 2018 May-Jun;15(3):944-953. doi: 10.1109/TCBB.2016.2640303. Epub 2016 Dec 15.

Microarray Meta-Analysis and Cross-Platform Normalization: Integrative Genomics for Robust Biomarker Discovery.微阵列元分析与跨平台归一化：用于可靠生物标志物发现的整合基因组学

Microarrays (Basel). 2015 Aug 21;4(3):389-406. doi: 10.3390/microarrays4030389.

Regularization Paths for Cox's Proportional Hazards Model via Coordinate Descent.通过坐标下降法求解Cox比例风险模型的正则化路径

J Stat Softw. 2011 Mar;39(5):1-13. doi: 10.18637/jss.v039.i05.

BMC Bioinformatics. 2016 Feb 27;17:108. doi: 10.1186/s12859-016-0951-7.

Integrative modeling of multi-omics data to identify cancer drivers and infer patient-specific gene activity.整合多组学数据进行建模以识别癌症驱动因素并推断患者特异性基因活性。

BMC Syst Biol. 2016 Feb 11;10:16. doi: 10.1186/s12918-016-0260-9.

Methods for the integration of multi-omics data: mathematical aspects.多组学数据整合方法：数学方面

BMC Bioinformatics. 2016 Jan 20;17 Suppl 2(Suppl 2):15. doi: 10.1186/s12859-015-0857-9.

Orthogonal projection correction for confounders in biological data classification.

Int J Data Min Bioinform. 2015;13(2):181-96. doi: 10.1504/ijdmb.2015.071553.

Functional Module Analysis for Gene Coexpression Networks with Network Integration.基于网络整合的基因共表达网络功能模块分析

IEEE/ACM Trans Comput Biol Bioinform. 2015 Sep-Oct;12(5):1146-60. doi: 10.1109/TCBB.2015.2396073.

NETWORK-REGULARIZED HIGH-DIMENSIONAL COX REGRESSION FOR ANALYSIS OF GENOMIC DATA.用于基因组数据分析的网络正则化高维Cox回归

Stat Sin. 2014 Jul;24(3):1433-1459. doi: 10.5705/ss.2012.317.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

用于生物标志物识别的基于网络的逻辑回归集成方法。

Network-based logistic regression integration method for biomarker identification.

作者信息

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSION

背景

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献