Faculty of Information Technology, Macau University of Science and Technology, Macau, China.
Shaoguan University, Shaoguan, Guangdong, China.
Technol Health Care. 2021;29(S1):287-295. doi: 10.3233/THC-218026.
In genome research, it is particularly important to identify molecular biomarkers or signaling pathways related to phenotypes. Logistic regression model is a powerful discrimination method that can offer a clear statistical explanation and obtain the classification probability of classification label information. However, it is unable to fulfill biomarker selection.
The aim of this paper is to give the model efficient gene selection capability.
In this paper, we propose a new penalized logsum network-based regularization logistic regression model for gene selection and cancer classification.
Experimental results on simulated data sets show that our method is effective in the analysis of high-dimensional data. For a large data set, the proposed method has achieved 89.66% (training) and 90.02% (testing) AUC performances, which are, on average, 5.17% (training) and 4.49% (testing) better than mainstream methods.
The proposed method can be considered a promising tool for gene selection and cancer classification of high-dimensional biological data.
在基因组研究中,识别与表型相关的分子生物标志物或信号通路尤为重要。逻辑回归模型是一种强大的判别方法,可提供清晰的统计解释,并获得分类标签信息的分类概率。但是,它无法完成生物标志物的选择。
本文旨在赋予模型有效的基因选择能力。
本文提出了一种新的基于惩罚对数和网络的正则化逻辑回归模型,用于基因选择和癌症分类。
在模拟数据集上的实验结果表明,我们的方法在分析高维数据方面是有效的。对于大数据集,所提出的方法在训练和测试中的 AUC 性能分别达到了 89.66%和 90.02%,平均比主流方法好 5.17%和 4.49%。
该方法可被视为高维生物数据基因选择和癌症分类的一种有前途的工具。