• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用机器学习解读马来西亚临床记录中的缩写

Deciphering Abbreviations in Malaysian Clinical Notes Using Machine Learning.

作者信息

Sulaiman Ismat Mohd, Bulgiba Awang, Kareem Sameem Abdul, Latip Abdul Aziz

机构信息

Health Informatics Centre, Planning Division, Ministry of Health Malaysia, Putrajaya, Malaysia.

Academy of Sciences, Kuala Lumpur, Malaysia.

出版信息

Methods Inf Med. 2024 Dec;63(5-06):195-202. doi: 10.1055/a-2521-4372. Epub 2025 Jan 22.

DOI:10.1055/a-2521-4372
PMID:39842453
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12196825/
Abstract

OBJECTIVE

This is the first Malaysian machine learning model to detect and disambiguate abbreviations in clinical notes. The model has been designed to be incorporated into MyHarmony, a natural language processing system, that extracts clinical information for health care management. The model utilizes word embedding to ensure feasibility of use, not in real-time but for secondary analysis, within the constraints of low-resource settings.

METHODS

A Malaysian clinical embedding, based on Word2Vec model, was developed using 29,895 electronic discharge summaries. The embedding was compared against conventional rule-based and FastText embedding on two tasks: abbreviation detection and abbreviation disambiguation. Machine learning classifiers were applied to assess performance.

RESULTS

The Malaysian clinical word embedding contained 7 million word tokens, 24,352 unique vocabularies, and 100 dimensions. For abbreviation detection, the Decision Tree classifier augmented with the Malaysian clinical embedding showed the best performance (F-score of 0.9519). For abbreviation disambiguation, the classifier with the Malaysian clinical embedding had the best performance for most of the abbreviations (F-score of 0.9903).

CONCLUSION

Despite having a smaller vocabulary and dimension, our local clinical word embedding performed better than the larger nonclinical FastText embedding. Word embedding with simple machine learning algorithms can decipher abbreviations well. It also requires lower computational resources and is suitable for implementation in low-resource settings such as Malaysia. The integration of this model into MyHarmony will improve recognition of clinical terms, thus improving the information generated for monitoring Malaysian health care services and policymaking.

摘要

目的

这是首个用于检测和消除临床记录中缩写歧义的马来西亚机器学习模型。该模型旨在被纳入MyHarmony,这是一个自然语言处理系统,用于提取医疗保健管理的临床信息。该模型利用词嵌入来确保在低资源环境的限制下,虽非实时但用于二次分析时的使用可行性。

方法

基于Word2Vec模型开发了一种马来西亚临床词嵌入,使用了29,895份电子出院小结。在缩写检测和缩写消除歧义这两项任务上,将该词嵌入与传统的基于规则的方法和FastText词嵌入进行了比较。应用机器学习分类器来评估性能。

结果

马来西亚临床词嵌入包含700万个词元、24,352个唯一词汇和100个维度。对于缩写检测,使用马来西亚临床词嵌入增强的决策树分类器表现最佳(F值为0.9519)。对于缩写消除歧义,使用马来西亚临床词嵌入的分类器对大多数缩写表现最佳(F值为0.9903)。

结论

尽管词汇量和维度较小,但我们的本地临床词嵌入比更大的非临床FastText词嵌入表现更好。结合简单机器学习算法的词嵌入能够很好地解读缩写。它还需要更低的计算资源,适合在马来西亚这样的低资源环境中实施。将该模型集成到MyHarmony中将提高临床术语的识别能力,从而改善用于监测马来西亚医疗保健服务和政策制定所生成的信息。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/805e/12196825/115d6509274e/10-1055-a-2521-4372-i24050005-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/805e/12196825/7142ce04b502/10-1055-a-2521-4372-i24050005-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/805e/12196825/48d38d0a2d4f/10-1055-a-2521-4372-i24050005-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/805e/12196825/f897e0841247/10-1055-a-2521-4372-i24050005-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/805e/12196825/115d6509274e/10-1055-a-2521-4372-i24050005-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/805e/12196825/7142ce04b502/10-1055-a-2521-4372-i24050005-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/805e/12196825/48d38d0a2d4f/10-1055-a-2521-4372-i24050005-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/805e/12196825/f897e0841247/10-1055-a-2521-4372-i24050005-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/805e/12196825/115d6509274e/10-1055-a-2521-4372-i24050005-4.jpg

相似文献

1
Deciphering Abbreviations in Malaysian Clinical Notes Using Machine Learning.使用机器学习解读马来西亚临床记录中的缩写
Methods Inf Med. 2024 Dec;63(5-06):195-202. doi: 10.1055/a-2521-4372. Epub 2025 Jan 22.
2
Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.在基层医疗机构或医院门诊环境中,如果患者出现以下症状和体征,可判断其是否患有 COVID-19。
Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3.
3
The clinical effectiveness and cost-effectiveness of enzyme replacement therapy for Gaucher's disease: a systematic review.戈谢病酶替代疗法的临床疗效和成本效益:一项系统评价。
Health Technol Assess. 2006 Jul;10(24):iii-iv, ix-136. doi: 10.3310/hta10240.
4
Cost-effectiveness of using prognostic information to select women with breast cancer for adjuvant systemic therapy.利用预后信息为乳腺癌患者选择辅助性全身治疗的成本效益
Health Technol Assess. 2006 Sep;10(34):iii-iv, ix-xi, 1-204. doi: 10.3310/hta10340.
5
Health professionals' experience of teamwork education in acute hospital settings: a systematic review of qualitative literature.医疗专业人员在急症医院环境中团队合作教育的经验:对定性文献的系统综述
JBI Database System Rev Implement Rep. 2016 Apr;14(4):96-137. doi: 10.11124/JBISRIR-2016-1843.
6
A rapid and systematic review of the clinical effectiveness and cost-effectiveness of paclitaxel, docetaxel, gemcitabine and vinorelbine in non-small-cell lung cancer.对紫杉醇、多西他赛、吉西他滨和长春瑞滨在非小细胞肺癌中的临床疗效和成本效益进行的快速系统评价。
Health Technol Assess. 2001;5(32):1-195. doi: 10.3310/hta5320.
7
Machine Learning and Natural Language Processing in Mental Health: Systematic Review.机器学习和自然语言处理在心理健康中的应用:系统综述。
J Med Internet Res. 2021 May 4;23(5):e15708. doi: 10.2196/15708.
8
The effectiveness and cost-effectiveness of carmustine implants and temozolomide for the treatment of newly diagnosed high-grade glioma: a systematic review and economic evaluation.卡莫司汀植入剂与替莫唑胺治疗新诊断的高级别胶质瘤的有效性和成本效益:一项系统评价与经济学评估
Health Technol Assess. 2007 Nov;11(45):iii-iv, ix-221. doi: 10.3310/hta11450.
9
Home treatment for mental health problems: a systematic review.心理健康问题的居家治疗:一项系统综述
Health Technol Assess. 2001;5(15):1-139. doi: 10.3310/hta5150.
10
Does Augmenting Irradiated Autografts With Free Vascularized Fibula Graft in Patients With Bone Loss From a Malignant Tumor Achieve Union, Function, and Complication Rate Comparably to Patients Without Bone Loss and Augmentation When Reconstructing Intercalary Resections in the Lower Extremity?对于因恶性肿瘤导致骨缺损的患者,在重建下肢节段性切除时,采用带血管游离腓骨移植来增强照射后的自体骨移植,其骨愈合、功能及并发症发生率与无骨缺损且未进行增强的患者相比是否相当?
Clin Orthop Relat Res. 2025 Jun 26. doi: 10.1097/CORR.0000000000003599.

本文引用的文献

1
Toward expert-level medical question answering with large language models.迈向使用大语言模型实现专家级医学问答
Nat Med. 2025 Mar;31(3):943-950. doi: 10.1038/s41591-024-03423-7. Epub 2025 Jan 8.
2
Large language models in health care: Development, applications, and challenges.医疗保健领域的大语言模型:发展、应用与挑战。
Health Care Sci. 2023 Jul 24;2(4):255-263. doi: 10.1002/hcs2.61. eCollection 2023 Aug.
3
Disambiguation of acronyms in clinical narratives with large language models.利用大型语言模型对临床叙述中的缩略语进行消歧。
J Am Med Inform Assoc. 2024 Sep 1;31(9):2040-2046. doi: 10.1093/jamia/ocae157.
4
Leveraging Large Language Models for Clinical Abbreviation Disambiguation.利用大型语言模型进行临床缩写词消歧。
J Med Syst. 2024 Feb 27;48(1):27. doi: 10.1007/s10916-024-02049-z.
5
Embracing Large Language Models for Medical Applications: Opportunities and Challenges.拥抱用于医学应用的大语言模型:机遇与挑战。
Cureus. 2023 May 21;15(5):e39305. doi: 10.7759/cureus.39305. eCollection 2023 May.
6
A large language model for electronic health records.用于电子健康记录的大型语言模型。
NPJ Digit Med. 2022 Dec 26;5(1):194. doi: 10.1038/s41746-022-00742-2.
7
Disambiguating Clinical Abbreviations Using a One-Fits-All Classifier Based on Deep Learning Techniques.基于深度学习技术的一刀切分类器在临床缩写中的应用。
Methods Inf Med. 2022 Jun;61(S 01):e28-e34. doi: 10.1055/s-0042-1742388. Epub 2022 Feb 1.
8
A survey of word embeddings for clinical text.临床文本词嵌入研究
J Biomed Inform. 2019;100S:100057. doi: 10.1016/j.yjbinx.2019.100057. Epub 2019 Oct 28.
9
BioBERT: a pre-trained biomedical language representation model for biomedical text mining.BioBERT:一种用于生物医学文本挖掘的预训练生物医学语言表示模型。
Bioinformatics. 2020 Feb 15;36(4):1234-1240. doi: 10.1093/bioinformatics/btz682.
10
BioWordVec, improving biomedical word embeddings with subword information and MeSH.BioWordVec,利用子词信息和 MeSH 改进生物医学词向量。
Sci Data. 2019 May 10;6(1):52. doi: 10.1038/s41597-019-0055-0.