Sadiq Maryam, Alsadhan Nasser A, Shah Ramla, Younas Sidra, Rasheed Zahid
Department of Statistics, University of Azad Jammu and Kashmir, Muzaffarabad, Pakistan.
Department of Computer Science, College of Computer and Information Sciences, King Saud University, Riyadh, Saudi Arabia.
PLoS One. 2025 Jun 9;20(6):e0324395. doi: 10.1371/journal.pone.0324395. eCollection 2025.
Variable selection methods are very popular, especially in the field of big data with large predictors. These procedures improve the accuracy and performance of the model by eliminating irrelevant and redundant variables. The main contribution of this study is to couple a logit model with a novel variable selection approach, "Stability Competitive Adaptive Re-weighted Sampling" to address binary response. The efficiency of the proposed method is compared with the traditional logistic regression model based on eight model assessment criteria over real data from sexually transmitted infections in Indian men. Due to higher stability, the proposed method outperformed having a lower Akaike information criterion, and the Bayesian information criterion, as well as higher R-squared measures. The finally selected proposed model identified essential information regarding sexually transmitted infections in India for policymakers.
变量选择方法非常流行,尤其是在具有大量预测变量的大数据领域。这些方法通过消除不相关和冗余变量来提高模型的准确性和性能。本研究的主要贡献是将逻辑模型与一种新颖的变量选择方法“稳定性竞争自适应重加权采样”相结合,以处理二元响应。基于来自印度男性性传播感染的实际数据,根据八个模型评估标准,将所提出方法的效率与传统逻辑回归模型进行比较。由于具有更高的稳定性,所提出的方法表现更优,具有更低的赤池信息准则和贝叶斯信息准则,以及更高的决定系数。最终选定的模型为政策制定者确定了有关印度性传播感染的重要信息。