Nguyen-Louie Tam, McCarthy Michael J, Coccaro Emil F, Meruelo Alejandro D
University of California San Diego, 9500 Gilman Drive, La Jolla, CA, 92093, USA.
University of California, San Diego, VA San Diego Healthcare System, 3350 La Jolla Village Dr, San Diego, CA, 92161, USA.
J Psychiatr Res. 2025 Jun 4;189:91-103. doi: 10.1016/j.jpsychires.2025.06.005.
Aggressive behavior in adolescents and young adults is a significant public health concern associated with adverse educational, social, and mental health outcomes. This study aimed to identify key predictors of aggression using a cross-sectional dataset from a large, longitudinal U.S. cohort. The outcome was self-reported aggressive behavior, and predictors spanned demographic, psychosocial, behavioral, and contextual domains, including adverse life events, impulsivity, family conflict, peer and school environments, and chronotype.Multiple models were evaluated, including linear regression, a hypertuned random forest, a tuned gradient boosting machine (GBM), XGBoost, and an ensemble model combining random forest and GBM predictions. All models were trained using five-fold cross-validation across five multiply imputed datasets. Linear regression achieved the highest predictive accuracy (r = 0.313; MSE = 40.76), followed closely by the random forest (r = 0.311; MSE = 40.71). The ensemble and GBM models showed slightly lower performance. Across models, key predictors included adverse life events, delayed chronotype, peer network health, family cohesion, and normalized household income.These findings underscore the contribution of environmental and psychological stressors to adolescent aggression, particularly the buffering role of cohesive peer and family relationships. Despite similar predictive accuracy across models, machine learning methods offered advantages for variable importance ranking and interaction discovery. Results highlight the utility of integrating diverse psychosocial, behavioral, and contextual measures to better understand complex behavioral outcomes and inform targeted prevention strategies.
青少年和青年的攻击性行为是一个重大的公共卫生问题,与不良的教育、社会和心理健康结果相关。本研究旨在使用来自美国一个大型纵向队列的横断面数据集,确定攻击行为的关键预测因素。结果是自我报告的攻击行为,预测因素涵盖人口统计学、心理社会、行为和背景领域,包括不良生活事件、冲动性、家庭冲突、同伴和学校环境以及生物钟类型。评估了多个模型,包括线性回归、超参数调整的随机森林、调整后的梯度提升机(GBM)、XGBoost以及结合随机森林和GBM预测的集成模型。所有模型均使用五个多重插补数据集进行五折交叉验证训练。线性回归的预测准确率最高(r = 0.313;均方误差 = 40.76),随机森林紧随其后(r = 0.311;均方误差 = 40.71)。集成模型和GBM模型的表现略低。在所有模型中,关键预测因素包括不良生活事件、延迟的生物钟类型、同伴网络健康、家庭凝聚力和家庭收入标准化。这些发现强调了环境和心理压力源对青少年攻击行为的影响,特别是亲密同伴和家庭关系的缓冲作用。尽管各模型的预测准确率相似,但机器学习方法在变量重要性排序和交互发现方面具有优势。结果突出了整合多种心理社会、行为和背景测量方法的实用性,以更好地理解复杂的行为结果并为有针对性的预防策略提供依据。