Kordbagheri Alireza, Kordbagheri Mohammadreza, Tayim Natalie, Fakhrou Abdulnaser, Davoudi Mohammadreza
Department of Statistics, Mathematical Sciences, Shahid Beheshti University, Tehran, Iran.
Department of Psychology, School of Social Sciences and Humanities, Doha Institute for Graduate Studies, Doha, Qatar.
Comput Biol Med. 2025 Jan;184:109372. doi: 10.1016/j.compbiomed.2024.109372. Epub 2024 Nov 12.
Existing prediction methods for academic majors based on personality traits have notable gaps, including limited model complexity and generalizability.The current study aimed to utilize advanced Machine Learning (ML) algorithms with smoothing functions to predict academic majors completed based on personality subscales.
We used reports from 59,413 individuals to perform the current study. All advanced algorithms implemented in this article were based on R software (version 4.1.3, R Core Team, 2021). All model parameters were optimized based on resampling and cross-validation (CV). In addition, pseudo-R as a robust metric has been used to compare the performance of models, which, unlike most studies, considers the quality of model-predicted probabilities.
The results indicated that advanced ML models' performance on training and test data was superior to logistic regression. Pseudo-R and AUC results showed that advanced models such as kNN, GBE, and RF had the highest scores based on test data compared to other models. The pseudo-R values for the models used in this study varied across the test dataset; the lowest value belonged to the logistic regression algorithm at .022, and the highest value was recorded for the kNN algorithm at .099. The agreeableness subscale is the most influential component in predicting the completion of university education, followed by conscientiousness and emotional stability.
The potential of advanced methods to enhance the accuracy and validity of predictions is a promising development in our field. Their performance, particularly in handling large data sets with complex patterns, is a reason for optimism about the future of research in this area.
现有的基于人格特质预测学术专业的方法存在显著差距,包括模型复杂性和通用性有限。本研究旨在利用具有平滑函数的先进机器学习(ML)算法,根据人格子量表预测完成的学术专业。
我们使用了来自59413人的报告进行本研究。本文中实现的所有先进算法均基于R软件(版本4.1.3,R核心团队,2021)。所有模型参数均基于重采样和交叉验证(CV)进行优化。此外,伪R作为一种稳健的指标已被用于比较模型的性能,与大多数研究不同,它考虑了模型预测概率的质量。
结果表明,先进的ML模型在训练和测试数据上的性能优于逻辑回归。伪R和AUC结果表明,与其他模型相比,kNN、GBE和RF等先进模型在测试数据上得分最高。本研究中使用的模型的伪R值在测试数据集上各不相同;最低值属于逻辑回归算法,为0.022,最高值记录为kNN算法,为0.099。宜人性子量表是预测大学教育完成情况最有影响力的因素,其次是尽责性和情绪稳定性。
先进方法提高预测准确性和有效性的潜力是我们领域一个有前景的发展。它们的性能,特别是在处理具有复杂模式的大数据集方面,是对该领域未来研究持乐观态度的一个原因。