结构惩罚逻辑回归在基因表达数据分析中的基因选择。

Structured Penalized Logistic Regression for Gene Selection in Gene Expression Data Analysis.

出版信息

IEEE/ACM Trans Comput Biol Bioinform. 2019 Jan-Feb;16(1):312-321. doi: 10.1109/TCBB.2017.2767589. Epub 2017 Oct 30.

DOI:10.1109/TCBB.2017.2767589

Abstract

In gene expression data analysis, the problems of cancer classification and gene selection are closely related. Successfully selecting informative genes will significantly improve the classification performance. To identify informative genes from a large number of candidate genes, various methods have been proposed. However, the gene expression data may include some important correlation structures, and some of the genes can be divided into different groups based on their biological pathways. Many existing methods do not take into consideration the exact correlation structure within the data. Therefore, from both the knowledge discovery and biological perspectives, an ideal gene selection method should take this structural information into account. Moreover, the better generalization performance can be obtained by discovering correlation structure within data. In order to discover structure information among data and improve learning performance, we propose a structured penalized logistic regression model which simultaneously performs feature selection and model learning for gene expression data analysis. An efficient coordinate descent algorithm has been developed to optimize the model. The numerical simulation studies demonstrate that our method is able to select the highly correlated features. In addition, the results from real gene expression datasets show that the proposed method performs competitively with respect to previous approaches.

摘要

在基因表达数据分析中，癌症分类和基因选择问题密切相关。成功选择信息丰富的基因将显著提高分类性能。为了从大量候选基因中识别信息丰富的基因，已经提出了各种方法。然而，基因表达数据可能包含一些重要的相关结构，并且一些基因可以根据其生物途径分为不同的组。许多现有方法没有考虑到数据内部的确切相关结构。因此，从知识发现和生物学的角度来看，理想的基因选择方法应该考虑这种结构信息。此外，通过发现数据内部的相关结构，可以获得更好的泛化性能。为了发现数据之间的结构信息并提高学习性能，我们针对基因表达数据分析，提出了一种同时进行特征选择和模型学习的结构惩罚逻辑回归模型。开发了一种有效的坐标下降算法来优化模型。数值模拟研究表明，我们的方法能够选择高度相关的特征。此外，来自真实基因表达数据集的结果表明，与以前的方法相比，所提出的方法具有竞争力。

相似文献

Structured Penalized Logistic Regression for Gene Selection in Gene Expression Data Analysis.结构惩罚逻辑回归在基因表达数据分析中的基因选择。

IEEE/ACM Trans Comput Biol Bioinform. 2019 Jan-Feb;16(1):312-321. doi: 10.1109/TCBB.2017.2767589. Epub 2017 Oct 30.

Double Selection Based Semi-Supervised Clustering Ensemble for Tumor Clustering from Gene Expression Profiles.基于双重选择的半监督聚类集成用于从基因表达谱中进行肿瘤聚类

IEEE/ACM Trans Comput Biol Bioinform. 2014 Jul-Aug;11(4):727-40. doi: 10.1109/TCBB.2014.2315996.

Biomarker identification and cancer classification based on microarray data using Laplace naive Bayes model with mean shrinkage.基于微阵列数据的拉普拉斯朴素贝叶斯模型均值收缩的生物标志物识别和癌症分类。

IEEE/ACM Trans Comput Biol Bioinform. 2012 Nov-Dec;9(6):1649-62. doi: 10.1109/TCBB.2012.105.

Gene selection for microarray gene expression classification using Bayesian Lasso quantile regression.基于贝叶斯 Lasso 分位数回归的基因表达谱微阵列基因选择用于分类。

Comput Biol Med. 2018 Jun 1;97:145-152. doi: 10.1016/j.compbiomed.2018.04.018. Epub 2018 Apr 27.

Sparse logistic regression with a L1/2 penalty for gene selection in cancer classification.基于 L1/2 罚项的稀疏逻辑回归在癌症分类中的基因选择。

BMC Bioinformatics. 2013 Jun 19;14:198. doi: 10.1186/1471-2105-14-198.

A centroid-based gene selection method for microarray data classification.一种基于质心的微阵列数据分类基因选择方法。

J Theor Biol. 2016 Jul 7;400:32-41. doi: 10.1016/j.jtbi.2016.03.034. Epub 2016 Apr 4.

Tuning parameter estimation in SCAD-support vector machine using firefly algorithm with application in gene selection and cancer classification.使用萤火虫算法调整 SCAD-支持向量机的调参，并将其应用于基因选择和癌症分类。

Comput Biol Med. 2018 Dec 1;103:262-268. doi: 10.1016/j.compbiomed.2018.10.034. Epub 2018 Oct 31.

Improving feature selection performance for classification of gene expression data using Harris Hawks optimizer with variable neighborhood learning.利用具有可变邻域学习的哈里斯鹰优化算法提高基因表达数据分类的特征选择性能。

Brief Bioinform. 2021 Sep 2;22(5). doi: 10.1093/bib/bbab097.

An efficient statistical feature selection approach for classification of gene expression data.一种用于基因表达数据分类的高效统计特征选择方法。

J Biomed Inform. 2011 Aug;44(4):529-35. doi: 10.1016/j.jbi.2011.01.001. Epub 2011 Jan 15.

LogSum + L penalized logistic regression model for biomarker selection and cancer classification.LogSum+L 惩罚逻辑回归模型用于生物标志物选择和癌症分类。

Sci Rep. 2020 Dec 17;10(1):22125. doi: 10.1038/s41598-020-79028-0.

引用本文的文献

Machine learning-driven discovery of novel therapeutic targets in diabetic foot ulcers.基于机器学习的糖尿病足溃疡新型治疗靶点发现。

Mol Med. 2024 Nov 14;30(1):215. doi: 10.1186/s10020-024-00955-z.

Elucidating common biomarkers and pathways of osteoporosis and aortic valve calcification: insights into new therapeutic targets.阐明骨质疏松症和主动脉瓣钙化的常见生物标志物和途径：新治疗靶点的见解。

Sci Rep. 2024 Nov 13;14(1):27827. doi: 10.1038/s41598-024-78707-6.

Particle filter-based parameter estimation algorithm for prognostic risk assessment of progression in non-small cell lung cancer.基于粒子滤波的非小细胞肺癌进展预后风险评估参数估计算法。

BMC Med Inform Decis Mak. 2023 Dec 20;23(1):296. doi: 10.1186/s12911-023-02373-3.

Prior information-assisted integrative analysis of multiple datasets.基于先验信息的多数据集综合分析。

Bioinformatics. 2023 Aug 1;39(8). doi: 10.1093/bioinformatics/btad452.

Improved Regularized Multi-class Logistic Regression for Gene Classification with Optimal Kernel PCA and HC Algorithm.基于最优核主成分分析和 HC 算法的基因分类的正则化多类逻辑回归改进。

Adv Exp Med Biol. 2023;1424:273-279. doi: 10.1007/978-3-031-31982-2_31.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

结构惩罚逻辑回归在基因表达数据分析中的基因选择。

Structured Penalized Logistic Regression for Gene Selection in Gene Expression Data Analysis.

出版信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献