• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

相似文献

1
Developing and testing a framework for coding general practitioners' free-text diagnoses in electronic medical records - a reliability study for generating training data in natural language processing.开发和测试电子病历中全科医生自由文本诊断编码的框架 - 自然语言处理中生成训练数据的可靠性研究。
BMC Prim Care. 2024 Jul 16;25(1):257. doi: 10.1186/s12875-024-02514-1.
2
Using clinical text to refine unspecific condition codes in Dutch general practitioner EHR data.利用临床文本完善荷兰全科医生电子健康记录数据中不明确的病症编码。
Int J Med Inform. 2024 Sep;189:105506. doi: 10.1016/j.ijmedinf.2024.105506. Epub 2024 May 29.
3
Incorporating natural language processing to improve classification of axial spondyloarthritis using electronic health records.利用电子健康记录纳入自然语言处理以改善轴性脊柱关节炎的分类。
Rheumatology (Oxford). 2020 May 1;59(5):1059-1065. doi: 10.1093/rheumatology/kez375.
4
Using natural language processing to identify problem usage of prescription opioids.使用自然语言处理来识别处方阿片类药物的问题使用情况。
Int J Med Inform. 2015 Dec;84(12):1057-64. doi: 10.1016/j.ijmedinf.2015.09.002. Epub 2015 Sep 25.
5
Evaluating a Natural Language Processing-Driven, AI-Assisted International Classification of Diseases, 10th Revision, Clinical Modification, Coding System for Diagnosis Related Groups in a Real Hospital Environment: Algorithm Development and Validation Study.在真实医院环境中评估自然语言处理驱动、人工智能辅助的国际疾病分类第 10 版临床修订版、诊断相关组编码系统:算法开发和验证研究。
J Med Internet Res. 2024 Sep 20;26:e58278. doi: 10.2196/58278.
6
A Deep Learning Framework for Automated ICD-10 Coding.一种用于自动ICD - 10编码的深度学习框架。
Stud Health Technol Inform. 2021 May 27;281:347-351. doi: 10.3233/SHTI210178.
7
Using natural language processing to identify opioid use disorder in electronic health record data.利用自然语言处理技术在电子健康记录数据中识别阿片类药物使用障碍。
Int J Med Inform. 2023 Feb;170:104963. doi: 10.1016/j.ijmedinf.2022.104963. Epub 2022 Dec 10.
8
Screening pregnant women for suicidal behavior in electronic medical records: diagnostic codes vs. clinical notes processed by natural language processing.在电子病历中筛查孕妇的自杀行为:诊断代码与自然语言处理后的临床记录比较。
BMC Med Inform Decis Mak. 2018 May 29;18(1):30. doi: 10.1186/s12911-018-0617-7.
9
Reliability measurement and ICD-10 validation of ICPC-2 for coding/classification of diagnoses/health problems in an African primary care setting.在非洲初级保健环境中,对用于诊断/健康问题编码/分类的ICPC - 2进行可靠性测量和ICD - 10验证。
Fam Pract. 2018 Jul 23;35(4):406-411. doi: 10.1093/fampra/cmx132.
10
Validation of diagnosis codes to identify side of colon in an electronic health record registry.验证诊断代码以在电子健康记录注册表中识别结肠侧。
BMC Med Res Methodol. 2019 Aug 19;19(1):177. doi: 10.1186/s12874-019-0824-7.

本文引用的文献

1
Importance of different electronic medical record components for chronic disease identification in a Swiss primary care database: a cross-sectional study.重要的是不同的电子病历组件的慢性病识别在瑞士初级保健数据库:一个横断面研究。
Swiss Med Wkly. 2023 Oct 2;153:40107. doi: 10.57187/smw.2023.40107.
2
A Qualitative Description of Clinician Free-Text Rationales Entered within Accountable Justification Interventions.临床医生在问责性说明干预中输入的自由文本理由的定性描述。
Appl Clin Inform. 2022 Aug;13(4):820-827. doi: 10.1055/s-0042-1756366. Epub 2022 Sep 7.
3
The Sentiworld project: global mapping of sentinel surveillance networks in general practice.Sentiworld 项目:全科医疗哨点监测网络的全球绘制。
BMC Prim Care. 2022 Jul 14;23(1):173. doi: 10.1186/s12875-022-01776-x.
4
Comparison of rural and urban French GPs' activity: a cross-sectional study.农村和城市法国全科医生活动比较:一项横断面研究。
Rural Remote Health. 2021 Sep;21(3):5865. doi: 10.22605/RRH5865. Epub 2021 Sep 1.
5
Reliability of trauma coding with ICD-10.创伤编码的可靠性与 ICD-10。
Chin J Traumatol. 2022 Mar;25(2):102-106. doi: 10.1016/j.cjtee.2021.08.005. Epub 2021 Aug 12.
6
Baseline characteristics and comparability of older multimorbid patients with polypharmacy and general practitioners participating in a randomized controlled primary care trial.基线特征和患有多种疾病且药物治疗复杂的老年患者与参与随机对照初级保健试验的全科医生的可比性。
BMC Fam Pract. 2021 Jun 22;22(1):123. doi: 10.1186/s12875-021-01488-8.
7
Effects of Electronic Health Record Implementation and Barriers to Adoption and Use: A Scoping Review and Qualitative Analysis of the Content.电子健康记录实施的效果以及采用和使用的障碍:一项范围综述和内容定性分析
Life (Basel). 2020 Dec 4;10(12):327. doi: 10.3390/life10120327.
8
Coding reliability and agreement of International Classification of Disease, 10 revision (ICD-10) codes in emergency department data.急诊部门数据中《国际疾病分类》第10次修订版(ICD - 10)编码的编码可靠性及一致性
Int J Popul Data Sci. 2018 Jul 26;3(1):445. doi: 10.23889/ijpds.v3i1.445.
9
Regional differences in reasons for consultation and general practitioners' spectrum of services in northern Germany - results of a cross-sectional observational study.德国北部地区就诊原因和全科医生服务范围的地域差异——一项横断面观察性研究的结果。
BMC Fam Pract. 2020 Jan 31;21(1):22. doi: 10.1186/s12875-020-1093-6.
10
Artificial intelligence in healthcare.人工智能在医疗保健领域的应用。
Nat Biomed Eng. 2018 Oct;2(10):719-731. doi: 10.1038/s41551-018-0305-z. Epub 2018 Oct 10.

开发和测试电子病历中全科医生自由文本诊断编码的框架 - 自然语言处理中生成训练数据的可靠性研究。

Developing and testing a framework for coding general practitioners' free-text diagnoses in electronic medical records - a reliability study for generating training data in natural language processing.

机构信息

Institute of primary care, University and University Hospital Zurich, Pestalozzistr. 24, Zürich, 8091, Switzerland.

出版信息

BMC Prim Care. 2024 Jul 16;25(1):257. doi: 10.1186/s12875-024-02514-1.

DOI:10.1186/s12875-024-02514-1
PMID:39014311
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11251376/
Abstract

BACKGROUND

Diagnoses entered by general practitioners into electronic medical records have great potential for research and practice, but unfortunately, diagnoses are often in uncoded format, making them of little use. Natural language processing (NLP) could assist in coding free-text diagnoses, but NLP models require local training data to unlock their potential. The aim of this study was to develop a framework of research-relevant diagnostic codes, to test the framework using free-text diagnoses from a Swiss primary care database and to generate training data for NLP modelling.

METHODS

The framework of diagnostic codes was developed based on input from local stakeholders and consideration of epidemiological data. After pre-testing, the framework contained 105 diagnostic codes, which were then applied by two raters who independently coded randomly drawn lines of free text (LoFT) from diagnosis lists extracted from the electronic medical records of 3000 patients of 27 general practitioners. Coding frequency and mean occurrence rates (n and %) and inter-rater reliability (IRR) of coding were calculated using Cohen's kappa (Κ).

RESULTS

The sample consisted of 26,980 LoFT and in 56.3% no code could be assigned because it was not a specific diagnosis. The most common diagnostic codes were, 'dorsopathies' (3.9%, a code covering all types of back problems, including non-specific lower back pain, scoliosis, and others) and 'other diseases of the circulatory system' (3.1%). Raters were in almost perfect agreement (Κ ≥ 0.81) for 69 of the 105 diagnostic codes, and 28 codes showed a substantial agreement (K between 0.61 and 0.80). Both high coding frequency and almost perfect agreement were found in 37 codes, including codes that are particularly difficult to identify from components of the electronic medical record, such as musculoskeletal conditions, cancer or tobacco use.

CONCLUSION

The coding framework was characterised by a subset of very frequent and highly reliable diagnostic codes, which will be the most valuable targets for training NLP models for automated disease classification based on free-text diagnoses from Swiss general practice.

摘要

背景

全科医生在电子病历中输入的诊断具有很大的研究和实践潜力,但不幸的是,这些诊断通常是未编码的格式,因此用处不大。自然语言处理(NLP)可以帮助对自由文本诊断进行编码,但 NLP 模型需要本地训练数据来释放其潜力。本研究的目的是开发一个与研究相关的诊断代码框架,使用瑞士初级保健数据库中的自由文本诊断来测试该框架,并为 NLP 建模生成训练数据。

方法

诊断代码框架是基于当地利益相关者的意见输入和考虑流行病学数据而开发的。经过预测试,该框架包含 105 个诊断代码,然后由两名评估者使用,他们分别对从 27 名全科医生的电子病历中提取的诊断列表中随机抽取的自由文本(LoFT)行进行独立编码。使用 Cohen's kappa(Κ)计算编码的编码频率和平均发生率(n 和%)和评估者间一致性(IRR)。

结果

样本包括 26980 LoFT,56.3%的诊断没有分配代码,因为它们不是特定的诊断。最常见的诊断代码是“dorsopathies”(3.9%,一个涵盖所有类型背部问题的代码,包括非特异性下背痛、脊柱侧凸等)和“其他循环系统疾病”(3.1%)。对于 105 个诊断代码中的 69 个,评估者之间几乎完全一致(Κ≥0.81),28 个代码显示出实质性的一致性(Κ介于 0.61 和 0.80 之间)。在 37 个代码中同时发现了高编码频率和几乎完美的一致性,包括那些特别难以从电子病历的组成部分中识别出来的代码,如肌肉骨骼疾病、癌症或吸烟习惯。

结论

该编码框架的特点是一组非常频繁且高度可靠的诊断代码,这将是基于瑞士全科医生的自由文本诊断进行自动疾病分类的 NLP 模型培训的最有价值的目标。