Suppr超能文献

使用机器学习解读马来西亚临床记录中的缩写

Deciphering Abbreviations in Malaysian Clinical Notes Using Machine Learning.

作者信息

Sulaiman Ismat Mohd, Bulgiba Awang, Kareem Sameem Abdul, Latip Abdul Aziz

机构信息

Health Informatics Centre, Planning Division, Ministry of Health Malaysia, Putrajaya, Malaysia.

Academy of Sciences, Kuala Lumpur, Malaysia.

出版信息

Methods Inf Med. 2024 Dec;63(5-06):195-202. doi: 10.1055/a-2521-4372. Epub 2025 Jan 22.

Abstract

OBJECTIVE

This is the first Malaysian machine learning model to detect and disambiguate abbreviations in clinical notes. The model has been designed to be incorporated into MyHarmony, a natural language processing system, that extracts clinical information for health care management. The model utilizes word embedding to ensure feasibility of use, not in real-time but for secondary analysis, within the constraints of low-resource settings.

METHODS

A Malaysian clinical embedding, based on Word2Vec model, was developed using 29,895 electronic discharge summaries. The embedding was compared against conventional rule-based and FastText embedding on two tasks: abbreviation detection and abbreviation disambiguation. Machine learning classifiers were applied to assess performance.

RESULTS

The Malaysian clinical word embedding contained 7 million word tokens, 24,352 unique vocabularies, and 100 dimensions. For abbreviation detection, the Decision Tree classifier augmented with the Malaysian clinical embedding showed the best performance (F-score of 0.9519). For abbreviation disambiguation, the classifier with the Malaysian clinical embedding had the best performance for most of the abbreviations (F-score of 0.9903).

CONCLUSION

Despite having a smaller vocabulary and dimension, our local clinical word embedding performed better than the larger nonclinical FastText embedding. Word embedding with simple machine learning algorithms can decipher abbreviations well. It also requires lower computational resources and is suitable for implementation in low-resource settings such as Malaysia. The integration of this model into MyHarmony will improve recognition of clinical terms, thus improving the information generated for monitoring Malaysian health care services and policymaking.

摘要

目的

这是首个用于检测和消除临床记录中缩写歧义的马来西亚机器学习模型。该模型旨在被纳入MyHarmony,这是一个自然语言处理系统,用于提取医疗保健管理的临床信息。该模型利用词嵌入来确保在低资源环境的限制下,虽非实时但用于二次分析时的使用可行性。

方法

基于Word2Vec模型开发了一种马来西亚临床词嵌入,使用了29,895份电子出院小结。在缩写检测和缩写消除歧义这两项任务上,将该词嵌入与传统的基于规则的方法和FastText词嵌入进行了比较。应用机器学习分类器来评估性能。

结果

马来西亚临床词嵌入包含700万个词元、24,352个唯一词汇和100个维度。对于缩写检测,使用马来西亚临床词嵌入增强的决策树分类器表现最佳(F值为0.9519)。对于缩写消除歧义,使用马来西亚临床词嵌入的分类器对大多数缩写表现最佳(F值为0.9903)。

结论

尽管词汇量和维度较小,但我们的本地临床词嵌入比更大的非临床FastText词嵌入表现更好。结合简单机器学习算法的词嵌入能够很好地解读缩写。它还需要更低的计算资源,适合在马来西亚这样的低资源环境中实施。将该模型集成到MyHarmony中将提高临床术语的识别能力,从而改善用于监测马来西亚医疗保健服务和政策制定所生成的信息。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/805e/12196825/7142ce04b502/10-1055-a-2521-4372-i24050005-1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验