• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用n元语法和K均值聚类对来自骨髓检查自由文本报告的数据进行分类。

Use of n-grams and K-means clustering to classify data from free text bone marrow reports.

作者信息

Xiang Richard F

机构信息

Department of Pathology and Laboratory Medicine, Dalhousie University, Halifax, Nova Scotia, Canada.

出版信息

J Pathol Inform. 2024 Jan 4;15:100358. doi: 10.1016/j.jpi.2023.100358. eCollection 2024 Dec.

DOI:10.1016/j.jpi.2023.100358
PMID:38292072
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10825612/
Abstract

Natural language processing (NLP) has been used to extract information from and summarize medical reports. Currently, the most advanced NLP models require large training datasets of accurately labeled medical text. An approach to creating these large datasets is to use low resource intensive classical NLP algorithms. In this manuscript, we examined how an automated classical NLP algorithm was able to classify portions of bone marrow report text into their appropriate sections. A total of 1480 bone marrow reports were extracted from the laboratory information system of a tertiary healthcare network. The free text of these bone marrow reports were preprocessed by separating the reports into text blocks and then removing the section headers. A natural language processing algorithm involving n-grams and K-means clustering was used to classify the text blocks into their appropriate bone marrow sections. The impact of token replacement of numerical values, accession numbers, and clusters of differentiation, varying the number of centroids (1-19) and n-grams (1-5), and utilizing an ensemble algorithm were assessed. The optimal NLP model was found to employ an ensemble algorithm that incorporated token replacement, utilized 1-gram or bag of words, and 10 centroids for K-means clustering. This optimal model was able to classify text blocks with an accuracy of 89%, suggesting that classical NLP models can accurately classify portions of marrow report text.

摘要

自然语言处理(NLP)已被用于从医学报告中提取信息并进行总结。目前,最先进的NLP模型需要大量精确标注的医学文本训练数据集。创建这些大型数据集的一种方法是使用资源消耗较低的经典NLP算法。在本手稿中,我们研究了一种自动化经典NLP算法如何能够将骨髓报告文本的各个部分分类到适当的章节中。从一个三级医疗保健网络的实验室信息系统中提取了总共1480份骨髓报告。这些骨髓报告的自由文本经过预处理,先将报告分成文本块,然后去除章节标题。使用一种涉及n元语法和K均值聚类的自然语言处理算法将文本块分类到适当的骨髓章节中。评估了数值、 accession编号和分化簇的令牌替换、质心数量(1 - 19)和n元语法数量(1 - 5)的变化以及使用集成算法的影响。发现最优的NLP模型采用了一种集成算法,该算法结合了令牌替换,使用了1元语法或词袋模型,以及用于K均值聚类的10个质心。这个最优模型能够以89%的准确率对文本块进行分类,这表明经典NLP模型可以准确地对骨髓报告文本的各个部分进行分类。

相似文献

1
Use of n-grams and K-means clustering to classify data from free text bone marrow reports.使用n元语法和K均值聚类对来自骨髓检查自由文本报告的数据进行分类。
J Pathol Inform. 2024 Jan 4;15:100358. doi: 10.1016/j.jpi.2023.100358. eCollection 2024 Dec.
2
Natural language processing for automated quantification of bone metastases reported in free-text bone scintigraphy reports.用于自动量化自由文本骨闪烁扫描报告中所报告的骨转移的自然语言处理。
Acta Oncol. 2020 Dec;59(12):1455-1460. doi: 10.1080/0284186X.2020.1819563. Epub 2020 Sep 12.
3
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
4
Prediction of Stroke Outcome Using Natural Language Processing-Based Machine Learning of Radiology Report of Brain MRI.使用基于自然语言处理的脑磁共振成像放射学报告机器学习预测卒中结局
J Pers Med. 2020 Dec 16;10(4):286. doi: 10.3390/jpm10040286.
5
A Question-and-Answer System to Extract Data From Free-Text Oncological Pathology Reports (CancerBERT Network): Development Study.从自由文本肿瘤病理学报告(CancerBERT 网络)中提取数据的问答系统:开发研究。
J Med Internet Res. 2022 Mar 23;24(3):e27210. doi: 10.2196/27210.
6
Natural Language Processing in Dutch Free Text Radiology Reports: Challenges in a Small Language Area Staging Pulmonary Oncology.荷兰语自由文本放射学报告中的自然语言处理:小语种地区肺部肿瘤分期面临的挑战
J Digit Imaging. 2020 Aug;33(4):1002-1008. doi: 10.1007/s10278-020-00327-z.
7
Use of Natural Language Processing Tools to Identify and Classify Periprosthetic Femur Fractures.使用自然语言处理工具识别和分类股骨假体周围骨折。
J Arthroplasty. 2019 Oct;34(10):2216-2219. doi: 10.1016/j.arth.2019.07.025. Epub 2019 Jul 24.
8
Natural Language Processing for Automated Quantification of Brain Metastases Reported in Free-Text Radiology Reports.用于对自由文本放射学报告中报告的脑转移瘤进行自动定量的自然语言处理
JCO Clin Cancer Inform. 2019 Apr;3:1-9. doi: 10.1200/CCI.18.00138.
9
The language of proteins: NLP, machine learning & protein sequences.蛋白质的语言:自然语言处理、机器学习与蛋白质序列
Comput Struct Biotechnol J. 2021 Mar 25;19:1750-1758. doi: 10.1016/j.csbj.2021.03.022. eCollection 2021.
10
Data for registry and quality review can be retrospectively collected using natural language processing from unstructured charts of arthroplasty patients.可以使用自然语言处理从关节置换患者的非结构化图表中回顾性地收集注册和质量审查数据。
Bone Joint J. 2020 Jul;102-B(7_Supple_B):99-104. doi: 10.1302/0301-620X.102B7.BJJ-2019-1574.R1.

本文引用的文献

1
Natural Language Processing in Pathology: Current Trends and Future Insights.病理学中的自然语言处理:当前趋势与未来展望
Am J Pathol. 2022 Nov;192(11):1486-1495. doi: 10.1016/j.ajpath.2022.07.012. Epub 2022 Aug 17.
2
A BERT model generates diagnostically relevant semantic embeddings from pathology synopses with active learning.一个BERT模型通过主动学习从病理摘要中生成与诊断相关的语义嵌入。
Commun Med (Lond). 2021 Jul 5;1:11. doi: 10.1038/s43856-021-00008-0. eCollection 2021.
3
Practical Guide to Natural Language Processing for Radiology.
实用放射医学自然语言处理指南。
Radiographics. 2021 Sep-Oct;41(5):1446-1453. doi: 10.1148/rg.2021200113.
4
A systematic review of natural language processing applied to radiology reports.自然语言处理在放射学报告中的应用的系统评价。
BMC Med Inform Decis Mak. 2021 Jun 3;21(1):179. doi: 10.1186/s12911-021-01533-7.
5
The influence of preprocessing on text classification using a bag-of-words representation.基于词袋模型的文本分类中预处理的影响。
PLoS One. 2020 May 1;15(5):e0232525. doi: 10.1371/journal.pone.0232525. eCollection 2020.
6
Automated Extraction of Grade, Stage, and Quality Information From Transurethral Resection of Bladder Tumor Pathology Reports Using Natural Language Processing.使用自然语言处理技术从膀胱肿瘤经尿道切除术病理报告中自动提取分级、分期和质量信息
JCO Clin Cancer Inform. 2018 Dec;2:1-8. doi: 10.1200/CCI.17.00128.
7
ICSH guidelines for the standardization of bone marrow specimens and reports.国际血液学标准化委员会(ICSH)关于骨髓标本及报告标准化的指南。
Int J Lab Hematol. 2008 Oct;30(5):349-64. doi: 10.1111/j.1751-553X.2008.01100.x.