• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

道路交通事故死亡率预测模型比较:一种针对不平衡数据的集成技术。

Comparison of mortality prediction models for road traffic accidents: an ensemble technique for imbalanced data.

机构信息

Department of Health Administration, Dankook University, Cheonan, 31116, South Korea.

Department of Healthcare Management, Eulji University, Seongnam, 13135, South Korea.

出版信息

BMC Public Health. 2022 Aug 2;22(1):1476. doi: 10.1186/s12889-022-13719-3.

DOI:10.1186/s12889-022-13719-3
PMID:35918672
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9344638/
Abstract

BACKGROUND

Injuries caused by RTA are classified under the International Classification of Diseases-10 as 'S00-T99' and represent imbalanced samples with a mortality rate of only 1.2% among all RTA victims. To predict the characteristics of external causes of road traffic accident (RTA) injuries and mortality, we compared performances based on differences in the correction and classification techniques for imbalanced samples.

METHODS

The present study extracted and utilized data spanning over a 5-year period (2013-2017) from the Korean National Hospital Discharge In-depth Injury Survey (KNHDS), a national level survey conducted by the Korea Disease Control and Prevention Agency, A total of eight variables were used in the prediction, including patient, accident, and injury/disease characteristics. As the data was imbalanced, a sample consisting of only severe injuries was constructed and compared against the total sample. Considering the characteristics of the samples, preprocessing was performed in the study. The samples were standardized first, considering that they contained many variables with different units. Among the ensemble techniques for classification, the present study utilized Random Forest, Extra-Trees, and XGBoost. Four different over- and under-sampling techniques were used to compare the performance of algorithms using "accuracy", "precision", "recall", "F1", and "MCC".

RESULTS

The results showed that among the prediction techniques, XGBoost had the best performance. While the synthetic minority oversampling technique (SMOTE), a type of over-sampling, also demonstrated a certain level of performance, under-sampling was the most superior. Overall, prediction by the XGBoost model with samples using SMOTE produced the best results.

CONCLUSION

This study presented the results of an empirical comparison of the validity of sampling techniques and classification algorithms that affect the accuracy of imbalanced samples by combining two techniques. The findings could be used as reference data in classification analyses of imbalanced data in the medical field.

摘要

背景

道路交通伤害(RTA)造成的损伤根据国际疾病分类第 10 版(ICD-10)被归类为“S00-T99”,在所有 RTA 受害者中,其死亡率仅为 1.2%,属于不平衡样本。为了预测道路交通伤害(RTA)损伤和死亡率的外部原因特征,我们比较了基于不平衡样本校正和分类技术差异的表现。

方法

本研究从韩国疾病控制与预防署(Korea Disease Control and Prevention Agency)开展的全国性调查——韩国国家医院出院深入伤害调查(KNHDS)中提取并利用了 5 年(2013-2017 年)的数据。预测中使用了 8 个变量,包括患者、事故和损伤/疾病特征。由于数据不平衡,仅构建并比较了严重损伤的样本。考虑到样本的特点,本研究进行了预处理。首先,考虑到样本中包含许多具有不同单位的变量,对样本进行了标准化。在分类的集成技术中,本研究利用了随机森林、Extra-Trees 和 XGBoost。使用“准确性”、“精度”、“召回率”、“F1”和“MCC”比较了 4 种不同的过采样和欠采样技术对算法性能的影响。

结果

结果表明,在预测技术中,XGBoost 的性能最佳。虽然过采样技术中的合成少数类过采样技术(SMOTE)也表现出一定的性能,但欠采样效果最佳。总体而言,使用 SMOTE 对 XGBoost 模型进行采样的预测结果最佳。

结论

本研究通过结合两种技术,对影响不平衡样本准确性的采样技术和分类算法的有效性进行了实证比较。研究结果可作为医学领域不平衡数据分类分析的参考数据。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/abc4/9344638/350fbaba0816/12889_2022_13719_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/abc4/9344638/049b97a2ad4f/12889_2022_13719_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/abc4/9344638/bc17150bd87d/12889_2022_13719_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/abc4/9344638/b6f87693a7fe/12889_2022_13719_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/abc4/9344638/d79c1215b4bc/12889_2022_13719_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/abc4/9344638/350fbaba0816/12889_2022_13719_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/abc4/9344638/049b97a2ad4f/12889_2022_13719_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/abc4/9344638/bc17150bd87d/12889_2022_13719_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/abc4/9344638/b6f87693a7fe/12889_2022_13719_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/abc4/9344638/d79c1215b4bc/12889_2022_13719_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/abc4/9344638/350fbaba0816/12889_2022_13719_Fig5_HTML.jpg

相似文献

1
Comparison of mortality prediction models for road traffic accidents: an ensemble technique for imbalanced data.道路交通事故死亡率预测模型比较:一种针对不平衡数据的集成技术。
BMC Public Health. 2022 Aug 2;22(1):1476. doi: 10.1186/s12889-022-13719-3.
2
Comparison of Prediction Models for Mortality Related to Injuries from Road Traffic Accidents after Correcting for Undersampling.校正欠抽样后道路交通事故伤害相关死亡率预测模型的比较
Int J Environ Res Public Health. 2021 May 24;18(11):5604. doi: 10.3390/ijerph18115604.
3
Crash injury severity prediction considering data imbalance: A Wasserstein generative adversarial network with gradient penalty approach.考虑数据不平衡的碰撞损伤严重程度预测:带梯度惩罚的 Wasserstein 生成对抗网络方法。
Accid Anal Prev. 2023 Nov;192:107271. doi: 10.1016/j.aap.2023.107271. Epub 2023 Aug 31.
4
Examining imbalanced classification algorithms in predicting real-time traffic crash risk.研究不平衡分类算法在实时交通碰撞风险预测中的应用。
Accid Anal Prev. 2020 Sep;144:105610. doi: 10.1016/j.aap.2020.105610. Epub 2020 Jun 16.
5
Joint modeling strategy for using electronic medical records data to build machine learning models: an example of intracerebral hemorrhage.利用电子病历数据构建机器学习模型的联合建模策略:以脑出血为例。
BMC Med Inform Decis Mak. 2022 Oct 25;22(1):278. doi: 10.1186/s12911-022-02018-x.
6
Bayes classifiers for imbalanced traffic accidents datasets.贝叶斯分类器在交通事故数据集不平衡问题上的应用。
Accid Anal Prev. 2016 Mar;88:37-51. doi: 10.1016/j.aap.2015.12.003. Epub 2015 Dec 20.
7
Hospital mortality prediction in traumatic injuries patients: comparing different SMOTE-based machine learning algorithms.创伤性损伤患者的医院死亡率预测:比较不同基于 SMOTE 的机器学习算法。
BMC Med Res Methodol. 2023 Apr 22;23(1):101. doi: 10.1186/s12874-023-01920-w.
8
Ensemble-based model selection for imbalanced data to investigate the contributing factors to multiple fatality road crashes in Ghana.基于集成的不平衡数据模型选择,以调查加纳多死伤人道路交通事故的影响因素。
Accid Anal Prev. 2021 Mar;151:105851. doi: 10.1016/j.aap.2020.105851. Epub 2020 Dec 28.
9
Classification of truck-involved crash severity: Dealing with missing, imbalanced, and high dimensional safety data.卡车事故严重程度分类:处理缺失、不平衡和高维安全数据。
PLoS One. 2023 Mar 22;18(3):e0281901. doi: 10.1371/journal.pone.0281901. eCollection 2023.
10
Machine learning-based injury severity prediction of level 1 trauma center enrolled patients associated with car-to-car crashes in Korea.基于机器学习的韩国一级创伤中心收治的与汽车碰撞相关患者的损伤严重程度预测
Comput Biol Med. 2023 Feb;153:106393. doi: 10.1016/j.compbiomed.2022.106393. Epub 2022 Dec 9.

引用本文的文献

1
Analysis of Health-Related Quality of Life in Elderly Patients with Stroke Complicated by Hypertension in China Using the EQ-5D-3L Scale.使用EQ-5D-3L量表对中国老年高血压合并脑卒中患者的健康相关生活质量进行分析。
J Multidiscip Healthc. 2024 Apr 30;17:1981-1997. doi: 10.2147/JMDH.S459629. eCollection 2024.
2
Deep learning-based prediction of post-pancreaticoduodenectomy pancreatic fistula.基于深度学习的胰十二指肠切除术后胰瘘预测。
Sci Rep. 2024 Mar 1;14(1):5089. doi: 10.1038/s41598-024-51777-2.

本文引用的文献

1
Developing machine learning models for prediction of mortality in the medical intensive care unit.开发用于预测重症监护病房死亡率的机器学习模型。
Comput Methods Programs Biomed. 2022 Apr;216:106663. doi: 10.1016/j.cmpb.2022.106663. Epub 2022 Jan 26.
2
A state-of-the-art review of factors that predict mortality among traumatic injury patients following a road traffic crash.道路交通碰撞后创伤性损伤患者死亡率预测因素的最新综述。
Australas Emerg Care. 2022 Mar;25(1):13-22. doi: 10.1016/j.auec.2021.01.005. Epub 2021 Feb 19.
3
Comparison of Machine Learning Methods With Traditional Models for Use of Administrative Claims With Electronic Medical Records to Predict Heart Failure Outcomes.
利用电子病历中的行政索赔数据进行机器学习方法与传统模型预测心力衰竭结局的比较。
JAMA Netw Open. 2020 Jan 3;3(1):e1918962. doi: 10.1001/jamanetworkopen.2019.18962.
4
SMOTE for high-dimensional class-imbalanced data.过采样处理高维类别不平衡数据。
BMC Bioinformatics. 2013 Mar 22;14:106. doi: 10.1186/1471-2105-14-106.
5
Improved shrunken centroid classifiers for high-dimensional class-imbalanced data.用于高维类不平衡数据的改进的收缩质心分类器。
BMC Bioinformatics. 2013 Feb 23;14:64. doi: 10.1186/1471-2105-14-64.
6
Factors influencing hospital high length of stay outliers.影响医院高住院日离群值的因素。
BMC Health Serv Res. 2012 Aug 20;12:265. doi: 10.1186/1472-6963-12-265.
7
Evolutionary undersampling for classification with imbalanced datasets: proposals and taxonomy.用于不平衡数据集分类的进化欠采样:提议与分类法
Evol Comput. 2009 Fall;17(3):275-306. doi: 10.1162/evco.2009.17.3.275.
8
Predicting subscriber dissatisfaction and improving retention in the wireless telecommunications industry.预测无线通信行业用户的不满情绪并提高用户留存率。
IEEE Trans Neural Netw. 2000;11(3):690-6. doi: 10.1109/72.846740.
9
Cervical spine clearance in blunt trauma: evaluation of a computed tomography-based protocol.钝性创伤中颈椎的评估:基于计算机断层扫描方案的评价
J Trauma. 2005 Jul;59(1):179-83. doi: 10.1097/01.ta.0000171449.94650.81.
10
The effect of funding policy on day of week admissions and discharges in hospitals: the cases of Austria and Canada.资金政策对医院一周内各日入院和出院情况的影响:奥地利和加拿大的案例
Health Policy. 2003 Mar;63(3):239-57. doi: 10.1016/s0168-8510(02)00082-9.