鲁比：用于乳腺癌研究的法语电子病历的自然语言处理。

RUBY: Natural Language Processing of French Electronic Medical Records for Breast Cancer Research.

机构信息

Department of Epidemiology, Biostatistics and Health Data, Centre Antoine Lacassagne, University of Côte d'Azur, Nice, France.

Cervico-facial Oncology Surgical Department, University Institute of Face and Neck, University of Côte d'Azur, Nice, France.

出版信息

JCO Clin Cancer Inform. 2022 Jul;6:e2100199. doi: 10.1200/CCI.21.00199.

DOI:10.1200/CCI.21.00199

PMID:35960900

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9470144/

Abstract

PURPOSE

Electronic medical records are a valuable source of information about patients' clinical status but are often free-text documents that require laborious manual review to be exploited. Techniques from computer science have been investigated, but the literature has marginally focused on non-English language texts. We developed RUBY, a tool designed in collaboration with IBM-France to automatically structure clinical information from French medical records of patients with breast cancer.

MATERIALS AND METHODS

RUBY, which exploits state-of-the-art Named Entity Recognition models combined with keyword extraction and postprocessing rules, was applied on clinical texts. We investigated the precision of RUBY in extracting the target information.

RESULTS

RUBY has an average precision of 92.8% for the Surgery report, 92.7% for the Pathology report, 98.1% for the Biopsy report, and 81.8% for the Consultation report.

CONCLUSION

These results show that the automatic approach has the potential to effectively extract clinical knowledge from an extensive set of electronic medical records, reducing the manual effort required and saving a significant amount of time. A deeper semantic analysis and further understanding of the context in the text, as well as training on a larger and more recent set of reports, including those containing highly variable entities and the use of ontologies, could further improve the results.

摘要

目的

电子病历是患者临床状况的有价值信息来源，但通常是需要费力手动审查才能利用的纯文本文件。已经研究了来自计算机科学的技术，但文献仅略微关注非英语语言文本。我们开发了 RUBY，这是一款与 IBM-France 合作设计的工具，用于自动从乳腺癌患者的法国医疗记录中提取临床信息。

材料和方法

RUBY 利用最先进的命名实体识别模型结合关键字提取和后处理规则应用于临床文本。我们研究了 RUBY 在提取目标信息方面的精度。

结果

RUBY 在手术报告中的平均精度为 92.8%，在病理报告中的平均精度为 92.7%，在活检报告中的平均精度为 98.1%，在咨询报告中的平均精度为 81.8%。

结论

这些结果表明，自动方法有可能有效地从大量电子病历中提取临床知识，减少所需的手动工作量并节省大量时间。更深入的语义分析和对文本中上下文的进一步理解，以及在更大、更新的报告集上进行训练，包括那些包含高度可变实体和使用本体的报告，都可以进一步提高结果。

相似文献

RUBY: Natural Language Processing of French Electronic Medical Records for Breast Cancer Research.

JCO Clin Cancer Inform. 2022 Jul;6:e2100199. doi: 10.1200/CCI.21.00199.

Validation of RUBY for Breast Cancer Knowledge Extraction From a Large French Electronic Medical Record System.

JCO Clin Cancer Inform. 2023 May;7:e2200130. doi: 10.1200/CCI.22.00130.

SIFR annotator: ontology-based semantic annotation of French biomedical text and clinical notes.

BMC Bioinformatics. 2018 Nov 6;19(1):405. doi: 10.1186/s12859-018-2429-2.

Facilitating clinical research through automation: Combining optical character recognition with natural language processing.

Clin Trials. 2022 Oct;19(5):504-511. doi: 10.1177/17407745221093621. Epub 2022 May 24.

Extracting clinical named entity for pituitary adenomas from Chinese electronic medical records.

BMC Med Inform Decis Mak. 2022 Mar 23;22(1):72. doi: 10.1186/s12911-022-01810-z.

Transformers for extracting breast cancer information from Spanish clinical narratives.

Artif Intell Med. 2023 Sep;143:102625. doi: 10.1016/j.artmed.2023.102625. Epub 2023 Jul 13.

Extracting comprehensive clinical information for breast cancer using deep learning methods.

Int J Med Inform. 2019 Dec;132:103985. doi: 10.1016/j.ijmedinf.2019.103985. Epub 2019 Oct 2.

An attention-based deep learning model for clinical named entity recognition of Chinese electronic medical records.

BMC Med Inform Decis Mak. 2019 Dec 5;19(Suppl 5):235. doi: 10.1186/s12911-019-0933-6.

De-identifying Spanish medical texts - named entity recognition applied to radiology reports.

J Biomed Semantics. 2021 Mar 29;12(1):6. doi: 10.1186/s13326-021-00236-2.

Development and Validation of a Natural Language Processing Algorithm for Extracting Clinical and Pathological Features of Breast Cancer From Pathology Reports.

JCO Clin Cancer Inform. 2024 Aug;8:e2400034. doi: 10.1200/CCI.24.00034.

引用本文的文献

From manual clinical criteria to machine learning algorithms: Comparing outcome endpoints derived from diverse electronic health record data modalities.

PLOS Digit Health. 2025 May 14;4(5):e0000755. doi: 10.1371/journal.pdig.0000755. eCollection 2025 May.

Critical Appraisal and Future Challenges of Artificial Intelligence and Anticancer Drug Development.

Pharmaceuticals (Basel). 2024 Jun 21;17(7):816. doi: 10.3390/ph17070816.

Artificial Intelligence and Anticancer Drug Development-Keep a Cool Head.

Pharmaceutics. 2024 Jan 31;16(2):211. doi: 10.3390/pharmaceutics16020211.

Year 2022 in Medical Natural Language Processing: Availability of Language Models as a Step in the Democratization of NLP in the Biomedical Area.

Yearb Med Inform. 2023 Aug;32(1):244-252. doi: 10.1055/s-0043-1768752. Epub 2023 Dec 26.

本文引用的文献

Electronic Medical Record Search Engine (EMERSE): An Information Retrieval Tool for Supporting Cancer Research.

JCO Clin Cancer Inform. 2020 May;4:454-463. doi: 10.1200/CCI.19.00134.

Automatic extraction of cancer registry reportable information from free-text pathology reports using multitask convolutional neural networks.

J Am Med Inform Assoc. 2020 Jan 1;27(1):89-98. doi: 10.1093/jamia/ocz153.

Clinical Natural Language Processing in languages other than English: opportunities and challenges.

J Biomed Semantics. 2018 Mar 30;9(1):12. doi: 10.1186/s13326-018-0179-8.

Machine Learning Methods to Extract Documentation of Breast Cancer Symptoms From Electronic Health Records.

J Pain Symptom Manage. 2018 Jun;55(6):1492-1499. doi: 10.1016/j.jpainsymman.2018.02.016. Epub 2018 Feb 27.

Labeling for Big Data in radiation oncology: The Radiation Oncology Structures ontology.

PLoS One. 2018 Jan 19;13(1):e0191263. doi: 10.1371/journal.pone.0191263. eCollection 2018.

Does adoption of electronic health records improve the quality of care management in France? Results from the French e-SI (PREPS-SIPS) study.

Int J Med Inform. 2017 Jun;102:156-165. doi: 10.1016/j.ijmedinf.2017.04.002. Epub 2017 Apr 4.

Using machine learning to parse breast pathology reports.

Breast Cancer Res Treat. 2017 Jan;161(2):203-211. doi: 10.1007/s10549-016-4035-1. Epub 2016 Nov 8.

Validation of natural language processing to extract breast cancer pathology procedures and results.

J Pathol Inform. 2015 Jun 23;6:38. doi: 10.4103/2153-3539.159215. eCollection 2015.

Automatically extracting cancer disease characteristics from pathology reports into a Disease Knowledge Representation Model.

J Biomed Inform. 2009 Oct;42(5):937-49. doi: 10.1016/j.jbi.2008.12.005. Epub 2008 Dec 27.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

鲁比：用于乳腺癌研究的法语电子病历的自然语言处理。

RUBY: Natural Language Processing of French Electronic Medical Records for Breast Cancer Research.

机构信息

出版信息

PURPOSE

MATERIALS AND METHODS

RESULTS

CONCLUSION

目的

材料和方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献