Liu Zhiping, Sadiq Maryam, Li Zhiguang
School of Mathematics and Statistics, Shanxi Datong University, Datong, 037009, China.
Department of Statistics, University of Azad Jammu and Kashmir, Muzaffarabad, 13100, Pakistan.
Sci Rep. 2025 Jul 2;15(1):23045. doi: 10.1038/s41598-025-07865-y.
Uncovering important factors is a fundamental and highly demanding phase from a technical prospect with numerous applications in recent scientific research. This study focused on improving factor selection techniques based on partial least squares for classification and therefore, traditional, recent, and proposed approaches are evaluated by means of efficiency. All considered techniques are executed on a real data set of sexually transmitted infections among men belonging to Balochistan (Pakistan) using the Monte Carlo simulation method. The optimal model, selected by linear discriminant analysis and the area under the Receiver Operating Characteristic curve (AUC-ROC), is employed to determine the significant factors associated with sexually transmitted infections among men. The Signal-to-noise ratio index, coupled with Yule's Q-partial least squares, is found to be the most accurate approach in terms of efficiency and frequency of the selected subset of factors. The suggested predictors provide vital facts about sexually transmitted infections and could be useful in related research. The findings also identify areas where further research is needed, such as understanding the drivers of STI transmission in rural areas using large data sets with multiple categories of STIs.
从技术角度来看,发现重要因素是一个基础且要求极高的阶段,在近期的科学研究中有众多应用。本研究聚焦于改进基于偏最小二乘法的分类因素选择技术,因此,通过效率对传统方法、近期方法和提出的方法进行了评估。所有考虑的技术都使用蒙特卡罗模拟方法在巴基斯坦俾路支省男性性传播感染的真实数据集上执行。通过线性判别分析和接收器操作特征曲线下面积(AUC-ROC)选择的最优模型,用于确定与男性性传播感染相关的重要因素。在所选因素子集的效率和频率方面,发现信噪比指数与尤尔Q偏最小二乘法相结合是最准确的方法。所建议的预测指标提供了有关性传播感染的重要事实,可能对相关研究有用。研究结果还确定了需要进一步研究的领域,例如使用包含多种性传播感染类别的大数据集来了解农村地区性传播感染传播的驱动因素。