[使用随机森林模型分析影响青少年性经历的因素:基于2019~2021年韩国青少年风险行为网络调查数据的二次数据分析]

[Factors Influencing Sexual Experiences in Adolescents Using a Random Forest Model: Secondary Data Analysis of the 2019~2021 Korea Youth Risk Behavior Web-based Survey Data].

作者信息

Yang Yoonseok, Kwon Ju Won, Yang Youngran

机构信息

Research Center of Healthcare & Welfare Instrument for the Aged, Division of Biomedical Engineering, College of Engineering, Jeonbuk National University, Jeonju, Korea.

Department of Electrical Engineering and Computer Science, Daegu Gyeongbuk Institute of Science and Technology, Daegu, Korea.

出版信息

J Korean Acad Nurs. 2024 May;54(2):193-210. doi: 10.4040/jkan.23134.

Abstract

PURPOSE

The objective of this study was to develop a predictive model for the sexual experiences of adolescents using the random forest method and to identify the "variable importance."

METHODS

The study utilized data from the 2019 to 2021 Korea Youth Risk Behavior Web-based Survey, which included 86,595 man and 80,504 woman participants. The number of independent variables stood at 44. SPSS was used to conduct Rao-Scott χ² tests and complex sample t-tests. Modeling was performed using the random forest algorithm in Python. Performance evaluation of each model included assessments of precision, recall, F1-score, receiver operating characteristics curve, and area under the curve calculations derived from the confusion matrix.

RESULTS

The prevalence of sexual experiences initially decreased during the COVID-19 pandemic, but later increased. "Variable importance" for predicting sexual experiences, ranked in the top six, included week and weekday sedentary time and internet usage time, followed by ease of cigarette purchase, age at first alcohol consumption, smoking initiation, breakfast consumption, and difficulty purchasing alcohol.

CONCLUSION

Education and support programs for promoting adolescent sexual health, based on the top-ranking important variables, should be integrated with health behavior intervention programs addressing internet usage, smoking, and alcohol consumption. We recommend active utilization of the random forest analysis method to develop high-performance predictive models for effective disease prevention, treatment, and nursing care.

摘要

目的

本研究的目的是使用随机森林方法开发一个用于预测青少年性经历的模型,并确定“变量重要性”。

方法

该研究利用了2019年至2021年韩国青少年风险行为网络调查的数据,其中包括86595名男性和80504名女性参与者。自变量数量为44个。使用SPSS进行Rao-Scott χ²检验和复杂样本t检验。使用Python中的随机森林算法进行建模。每个模型的性能评估包括对精确率、召回率、F1分数、受试者工作特征曲线以及从混淆矩阵得出的曲线下面积计算的评估。

结果

性经历的患病率在新冠疫情期间最初有所下降,但后来有所上升。预测性经历的“变量重要性”排名前六位的包括每周和工作日久坐时间以及互联网使用时间,其次是购买香烟的便利性、首次饮酒年龄、开始吸烟、早餐消费以及购买酒精的难度。

结论

基于排名靠前的重要变量,促进青少年性健康的教育和支持项目应与解决互联网使用、吸烟和饮酒问题的健康行为干预项目相结合。我们建议积极利用随机森林分析方法开发高性能预测模型,以实现有效的疾病预防、治疗和护理。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索