• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

不平衡医疗数据再平衡框架用于罕见事件分类:以形似音似混淆事件检测为例。

A Framework of Rebalancing Imbalanced Healthcare Data for Rare Events' Classification: A Case of Look-Alike Sound-Alike Mix-Up Incident Detection.

机构信息

Department of Systems Engineering and Engineering Management, City University of Hong Kong, Kowloon, Hong Kong.

Graduate School of Public Health, St. Luke's International University, Tokyo, Japan.

出版信息

J Healthc Eng. 2018 May 22;2018:6275435. doi: 10.1155/2018/6275435. eCollection 2018.

DOI:10.1155/2018/6275435
PMID:29951182
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5987310/
Abstract

Identifying rare but significant healthcare events in massive unstructured datasets has become a common task in healthcare data analytics. However, imbalanced class distribution in many practical datasets greatly hampers the detection of rare events, as most classification methods implicitly assume an equal occurrence of classes and are designed to maximize the overall classification accuracy. In this study, we develop a framework for learning healthcare data with imbalanced distribution via incorporating different rebalancing strategies. The evaluation results showed that the developed framework can significantly improve the detection accuracy of medical incidents due to look-alike sound-alike (LASA) mix-ups. Specifically, logistic regression combined with the synthetic minority oversampling technique (SMOTE) produces the best detection results, with a significant 45.3% increase in recall (recall = 75.7%) compared with pure logistic regression (recall = 52.1%).

摘要

在大规模非结构化数据集中识别罕见但重要的医疗保健事件已成为医疗数据分析中的一项常见任务。然而,许多实际数据集中的类别分布不平衡极大地阻碍了罕见事件的检测,因为大多数分类方法隐含地假设类别出现的频率相等,并且旨在最大化整体分类准确性。在这项研究中,我们通过结合不同的再平衡策略来开发一种用于处理不平衡分布的医疗保健数据的学习框架。评估结果表明,由于类似发音的混淆 (LASA),所开发的框架可以显著提高医学事件的检测准确性。具体来说,逻辑回归与合成少数过采样技术 (SMOTE) 相结合可以产生最佳的检测结果,与纯逻辑回归 (召回率为 52.1%) 相比,召回率 (召回率 = 75.7%) 显著提高了 45.3%。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/225c/5987310/ac41f27caeb9/JHE2018-6275435.008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/225c/5987310/6dcef26bad41/JHE2018-6275435.001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/225c/5987310/6bd0ae0b49c5/JHE2018-6275435.002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/225c/5987310/b143d0f6f035/JHE2018-6275435.003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/225c/5987310/58b0e703591e/JHE2018-6275435.004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/225c/5987310/974ba9395d74/JHE2018-6275435.005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/225c/5987310/a059549689ea/JHE2018-6275435.006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/225c/5987310/72f66e48744a/JHE2018-6275435.007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/225c/5987310/ac41f27caeb9/JHE2018-6275435.008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/225c/5987310/6dcef26bad41/JHE2018-6275435.001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/225c/5987310/6bd0ae0b49c5/JHE2018-6275435.002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/225c/5987310/b143d0f6f035/JHE2018-6275435.003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/225c/5987310/58b0e703591e/JHE2018-6275435.004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/225c/5987310/974ba9395d74/JHE2018-6275435.005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/225c/5987310/a059549689ea/JHE2018-6275435.006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/225c/5987310/72f66e48744a/JHE2018-6275435.007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/225c/5987310/ac41f27caeb9/JHE2018-6275435.008.jpg

相似文献

1
A Framework of Rebalancing Imbalanced Healthcare Data for Rare Events' Classification: A Case of Look-Alike Sound-Alike Mix-Up Incident Detection.不平衡医疗数据再平衡框架用于罕见事件分类:以形似音似混淆事件检测为例。
J Healthc Eng. 2018 May 22;2018:6275435. doi: 10.1155/2018/6275435. eCollection 2018.
2
Statistical classification of drug incidents due to look-alike sound-alike mix-ups.因药品名称相似或发音相似而导致的用药差错事件的统计分类
Health Informatics J. 2016 Jun;22(2):276-92. doi: 10.1177/1460458214555040. Epub 2014 Nov 11.
3
Multilevel Weighted Support Vector Machine for Classification on Healthcare Data with Missing Values.用于对存在缺失值的医疗保健数据进行分类的多级加权支持向量机
PLoS One. 2016 May 19;11(5):e0155119. doi: 10.1371/journal.pone.0155119. eCollection 2016.
4
A multiple combined method for rebalancing medical data with class imbalances.一种用于平衡具有类别不平衡的医学数据的多重组合方法。
Comput Biol Med. 2021 Jul;134:104527. doi: 10.1016/j.compbiomed.2021.104527. Epub 2021 May 31.
5
An automated data verification approach for improving data quality in a clinical registry.一种自动化数据验证方法,用于提高临床注册中的数据质量。
Comput Methods Programs Biomed. 2019 Nov;181:104840. doi: 10.1016/j.cmpb.2019.01.012. Epub 2019 Jan 31.
6
Distance Metric Based Oversampling Method for Bioinformatics and Performance Evaluation.基于距离度量的生物信息学过采样方法及性能评估
J Med Syst. 2016 Jul;40(7):159. doi: 10.1007/s10916-016-0516-3. Epub 2016 May 16.
7
A self-inspected adaptive SMOTE algorithm (SASMOTE) for highly imbalanced data classification in healthcare.一种用于医疗保健中高度不平衡数据分类的自检测自适应合成少数过采样技术算法(SASMOTE)。
BioData Min. 2023 Apr 25;16(1):15. doi: 10.1186/s13040-023-00330-4.
8
Adaptive swarm cluster-based dynamic multi-objective synthetic minority oversampling technique algorithm for tackling binary imbalanced datasets in biomedical data classification.基于自适应群体聚类的动态多目标合成少数类过采样技术算法,用于处理生物医学数据分类中的二元不平衡数据集。
BioData Min. 2016 Dec 1;9:37. doi: 10.1186/s13040-016-0117-1. eCollection 2016.
9
A comprehensive data level analysis for cancer diagnosis on imbalanced data.针对不平衡数据进行癌症诊断的全面数据级别分析。
J Biomed Inform. 2019 Feb;90:103089. doi: 10.1016/j.jbi.2018.12.003. Epub 2019 Jan 3.
10
Structure-activity relationship-based chemical classification of highly imbalanced Tox21 datasets.基于结构-活性关系的高度不平衡Tox21数据集的化学分类
J Cheminform. 2020 Oct 27;12(1):66. doi: 10.1186/s13321-020-00468-x.

引用本文的文献

1
A holistic framework for intradialytic hypotension prediction using generative adversarial networks-based data balancing.一种基于生成对抗网络的数据平衡用于透析中低血压预测的整体框架。
BMC Med Inform Decis Mak. 2025 Jul 10;25(1):257. doi: 10.1186/s12911-025-03094-5.
2
Comment on "Use of an elastic-scattering spectroscopy and artificial intelligence device in the assessment of lesions suggestive of skin cancer: A comparative effectiveness study".关于“弹性散射光谱与人工智能设备在皮肤癌疑似病变评估中的应用:一项比较有效性研究”的评论
JAAD Int. 2024 Aug 16;17:122-123. doi: 10.1016/j.jdin.2024.06.005. eCollection 2024 Dec.
3

本文引用的文献

1
Clinical report classification using Natural Language Processing and Topic Modeling.使用自然语言处理和主题建模的临床报告分类
Proc Int Conf Mach Learn Appl. 2012 Dec;2012:204-209. doi: 10.1109/icmla.2012.173. Epub 2013 Jan 10.
2
Classification of radiology reports for falls in an HIV study cohort.一项HIV研究队列中跌倒的放射学报告分类
J Am Med Inform Assoc. 2016 Apr;23(e1):e113-7. doi: 10.1093/jamia/ocv155. Epub 2015 Nov 13.
3
Big data analytics in healthcare: promise and potential.医疗保健中的大数据分析:前景与潜力。
Evaluating the Role of Data Enrichment Approaches towards Rare Event Analysis in Manufacturing.
评估数据丰富方法在制造业稀有事件分析中的作用。
Sensors (Basel). 2024 Aug 2;24(15):5009. doi: 10.3390/s24155009.
4
Computational Hemodynamics-Based Growth Prediction for Small Abdominal Aortic Aneurysms: Laminar Simulations Versus Large Eddy Simulations.基于计算血液动力学的小型腹主动脉瘤生长预测:层流模拟与大涡模拟。
Ann Biomed Eng. 2024 Nov;52(11):3078-3097. doi: 10.1007/s10439-024-03572-3. Epub 2024 Jul 17.
5
Retrieval-Based Diagnostic Decision Support: Mixed Methods Study.基于检索的诊断决策支持:混合方法研究。
JMIR Med Inform. 2024 Jun 19;12:e50209. doi: 10.2196/50209.
6
Utilizing Large Language Models to Generate Synthetic Data to Increase the Performance of BERT-Based Neural Networks.利用大语言模型生成合成数据以提高基于BERT的神经网络的性能。
AMIA Jt Summits Transl Sci Proc. 2024 May 31;2024:429-438. eCollection 2024.
7
Supporting the decision to perform molecular profiling for cancer patients based on routinely collected data through the use of machine learning.支持基于机器学习使用常规收集的数据为癌症患者做出分子谱分析决策。
Clin Exp Med. 2024 Apr 10;24(1):73. doi: 10.1007/s10238-024-01336-w.
8
A self-inspected adaptive SMOTE algorithm (SASMOTE) for highly imbalanced data classification in healthcare.一种用于医疗保健中高度不平衡数据分类的自检测自适应合成少数过采样技术算法(SASMOTE)。
BioData Min. 2023 Apr 25;16(1):15. doi: 10.1186/s13040-023-00330-4.
9
Toward biophysical markers of depression vulnerability.迈向抑郁症易感性的生物物理标志物。
Front Psychiatry. 2022 Oct 18;13:938694. doi: 10.3389/fpsyt.2022.938694. eCollection 2022.
10
External validation of the PAR-Risk Score to assess potentially avoidable hospital readmission risk in internal medicine patients.评估内科患者潜在可避免的再入院风险的 PAR-Risk 评分的外部验证。
PLoS One. 2021 Nov 23;16(11):e0259864. doi: 10.1371/journal.pone.0259864. eCollection 2021.
Health Inf Sci Syst. 2014 Feb 7;2:3. doi: 10.1186/2047-2501-2-3. eCollection 2014.
4
Statistical classification of drug incidents due to look-alike sound-alike mix-ups.因药品名称相似或发音相似而导致的用药差错事件的统计分类
Health Informatics J. 2016 Jun;22(2):276-92. doi: 10.1177/1460458214555040. Epub 2014 Nov 11.
5
Learning from big health care data.从大型医疗保健数据中学习。
N Engl J Med. 2014 Jun 5;370(23):2161-3. doi: 10.1056/NEJMp1401111.
6
Predicting disease risks from highly imbalanced data using random forest.基于随机森林算法从高度不平衡数据中预测疾病风险。
BMC Med Inform Decis Mak. 2011 Jul 29;11:51. doi: 10.1186/1472-6947-11-51.
7
Automated categorisation of clinical incident reports using statistical text classification.使用统计文本分类对临床事件报告进行自动分类。
Qual Saf Health Care. 2010 Dec;19(6):e55. doi: 10.1136/qshc.2009.036657. Epub 2010 Aug 19.
8
SVMs modeling for highly imbalanced classification.用于高度不平衡分类的支持向量机建模
IEEE Trans Syst Man Cybern B Cybern. 2009 Feb;39(1):281-8. doi: 10.1109/TSMCB.2008.2002909. Epub 2008 Dec 9.
9
Look-alike, sound-alike drugs review: include look-alike packaging as an additional safety check.相似药品审查:将相似包装纳入额外的安全检查内容。
Jt Comm J Qual Patient Saf. 2005 Jan;31(1):47-53. doi: 10.1016/s1553-7250(05)31007-5.