• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

通过机器学习方法在基层医疗电子健康记录中定义疾病表型:以类风湿关节炎识别为例的案例研究

Defining Disease Phenotypes in Primary Care Electronic Health Records by a Machine Learning Approach: A Case Study in Identifying Rheumatoid Arthritis.

作者信息

Zhou Shang-Ming, Fernandez-Gutierrez Fabiola, Kennedy Jonathan, Cooksey Roxanne, Atkinson Mark, Denaxas Spiros, Siebert Stefan, Dixon William G, O'Neill Terence W, Choy Ernest, Sudlow Cathie, Brophy Sinead

机构信息

Institute of Life Science, College of Medicine, Swansea University, Swansea, United Kingdom.

UCL Institute of Health Informatics and Farr Institute of Health Informatics Research, London, United Kingdom.

出版信息

PLoS One. 2016 May 2;11(5):e0154515. doi: 10.1371/journal.pone.0154515. eCollection 2016.

DOI:10.1371/journal.pone.0154515
PMID:27135409
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4852928/
Abstract

OBJECTIVES

  1. To use data-driven method to examine clinical codes (risk factors) of a medical condition in primary care electronic health records (EHRs) that can accurately predict a diagnosis of the condition in secondary care EHRs. 2) To develop and validate a disease phenotyping algorithm for rheumatoid arthritis using primary care EHRs.

METHODS

This study linked routine primary and secondary care EHRs in Wales, UK. A machine learning based scheme was used to identify patients with rheumatoid arthritis from primary care EHRs via the following steps: i) selection of variables by comparing relative frequencies of Read codes in the primary care dataset associated with disease case compared to non-disease control (disease/non-disease based on the secondary care diagnosis); ii) reduction of predictors/associated variables using a Random Forest method, iii) induction of decision rules from decision tree model. The proposed method was then extensively validated on an independent dataset, and compared for performance with two existing deterministic algorithms for RA which had been developed using expert clinical knowledge.

RESULTS

Primary care EHRs were available for 2,238,360 patients over the age of 16 and of these 20,667 were also linked in the secondary care rheumatology clinical system. In the linked dataset, 900 predictors (out of a total of 43,100 variables) in the primary care record were discovered more frequently in those with versus those without RA. These variables were reduced to 37 groups of related clinical codes, which were used to develop a decision tree model. The final algorithm identified 8 predictors related to diagnostic codes for RA, medication codes, such as those for disease modifying anti-rheumatic drugs, and absence of alternative diagnoses such as psoriatic arthritis. The proposed data-driven method performed as well as the expert clinical knowledge based methods.

CONCLUSION

Data-driven scheme, such as ensemble machine learning methods, has the potential of identifying the most informative predictors in a cost-effective and rapid way to accurately and reliably classify rheumatoid arthritis or other complex medical conditions in primary care EHRs.

摘要

目的

1)运用数据驱动方法,在初级医疗电子健康记录(EHR)中检查某种医疗状况的临床编码(风险因素),以便准确预测二级医疗EHR中的该疾病诊断。2)利用初级医疗EHR开发并验证类风湿性关节炎的疾病表型算法。

方法

本研究将英国威尔士的常规初级和二级医疗EHR相链接。通过以下步骤,采用基于机器学习的方案从初级医疗EHR中识别类风湿性关节炎患者:i)通过比较与疾病病例相关的初级医疗数据集中Read编码的相对频率与非疾病对照(基于二级医疗诊断的疾病/非疾病)来选择变量;ii)使用随机森林方法减少预测变量/相关变量;iii)从决策树模型中归纳决策规则。然后,在一个独立数据集上对所提出的方法进行广泛验证,并将其性能与另外两种使用专家临床知识开发的现有类风湿性关节炎确定性算法进行比较。

结果

有16岁以上的2,238,360名患者的初级医疗EHR可用,其中20,667名患者也与二级医疗风湿病临床系统相链接。在链接数据集中,在患有类风湿性关节炎的患者中,初级医疗记录中的900个预测变量(总共43,100个变量)比未患该病的患者中出现得更频繁。这些变量被缩减为37组相关临床编码,用于开发决策树模型。最终算法识别出8个与类风湿性关节炎诊断编码、药物编码(如改善病情抗风湿药的编码)以及不存在诸如银屑病关节炎等替代诊断相关的预测变量。所提出的数据驱动方法与基于专家临床知识的方法表现相当。

结论

诸如集成机器学习方法之类的数据驱动方案,有潜力以经济高效且快速的方式识别最具信息价值的预测变量,从而在初级医疗EHR中准确可靠地对类风湿性关节炎或其他复杂医疗状况进行分类。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5afb/4852928/fd3a0f992f5e/pone.0154515.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5afb/4852928/6577617d3b6b/pone.0154515.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5afb/4852928/fd3a0f992f5e/pone.0154515.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5afb/4852928/6577617d3b6b/pone.0154515.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5afb/4852928/fd3a0f992f5e/pone.0154515.g002.jpg

相似文献

1
Defining Disease Phenotypes in Primary Care Electronic Health Records by a Machine Learning Approach: A Case Study in Identifying Rheumatoid Arthritis.通过机器学习方法在基层医疗电子健康记录中定义疾病表型:以类风湿关节炎识别为例的案例研究
PLoS One. 2016 May 2;11(5):e0154515. doi: 10.1371/journal.pone.0154515. eCollection 2016.
2
Development and validation of a rheumatoid arthritis case definition: a machine learning approach using data from primary care electronic medical records.类风湿关节炎病例定义的制定和验证:基于初级保健电子病历数据的机器学习方法。
BMC Med Inform Decis Mak. 2024 Nov 27;24(1):360. doi: 10.1186/s12911-024-02776-w.
3
Quantifying and improving rheumatoid arthritis algorithm performance in biobank settings.量化并改善生物样本库环境中类风湿性关节炎算法的性能。
Semin Arthritis Rheum. 2025 Jun;72:152668. doi: 10.1016/j.semarthrit.2025.152668. Epub 2025 Feb 22.
4
Mining Primary Care Electronic Health Records for Automatic Disease Phenotyping: A Transparent Machine Learning Framework.挖掘初级保健电子健康记录以实现自动疾病表型分析:一个透明的机器学习框架。
Diagnostics (Basel). 2021 Oct 15;11(10):1908. doi: 10.3390/diagnostics11101908.
5
An administrative data validation study of the accuracy of algorithms for identifying rheumatoid arthritis: the influence of the reference standard on algorithm performance.一项关于类风湿性关节炎识别算法准确性的行政数据验证研究:参考标准对算法性能的影响。
BMC Musculoskelet Disord. 2014 Jun 23;15:216. doi: 10.1186/1471-2474-15-216.
6
What evidence is there for a delay in diagnostic coding of RA in UK general practice records? An observational study of free text.RA 在英国全科医疗记录中的诊断编码是否存在延迟?一项观察性的自由文本研究。
BMJ Open. 2016 Jun 28;6(6):e010393. doi: 10.1136/bmjopen-2015-010393.
7
Handwork vs machine: a comparison of rheumatoid arthritis patient populations as identified from EHR free-text by diagnosis extraction through machine-learning or traditional criteria-based chart review.手工与机器:通过机器学习或传统基于标准的图表审查从 EHR 自由文本中提取诊断来识别的类风湿关节炎患者人群的比较。
Arthritis Res Ther. 2021 Jun 22;23(1):174. doi: 10.1186/s13075-021-02553-4.
8
Automated feature selection of predictors in electronic medical records data.电子病历数据中预测指标的自动特征选择
Biometrics. 2019 Mar;75(1):268-277. doi: 10.1111/biom.12987. Epub 2019 Apr 2.
9
An algorithm to identify rheumatoid arthritis in primary care: a Clinical Practice Research Datalink study.基层医疗中类风湿关节炎的识别算法:一项临床实践研究数据链研究
BMJ Open. 2015 Dec 23;5(12):e009309. doi: 10.1136/bmjopen-2015-009309.
10
A Machine Learning Approach to Identify Predictors of Severe COVID-19 Outcome in Patients With Rheumatoid Arthritis.机器学习方法识别类风湿关节炎患者严重 COVID-19 结局的预测因子。
Pain Physician. 2022 Nov;25(8):593-602.

引用本文的文献

1
Artificial intelligence in autoimmune diseases: a bibliometric exploration of the past two decades.自身免疫性疾病中的人工智能:过去二十年的文献计量学探索
Front Immunol. 2025 Apr 22;16:1525462. doi: 10.3389/fimmu.2025.1525462. eCollection 2025.
2
Development and validation of identification algorithms for five autoimmune diseases using electronic health records: a retrospective cohort study in China.利用电子健康记录开发并验证五种自身免疫性疾病的识别算法:一项中国的回顾性队列研究
Front Immunol. 2025 Apr 10;16:1541203. doi: 10.3389/fimmu.2025.1541203. eCollection 2025.
3
Advancing osteoarthritis research: the role of AI in clinical, imaging and omics fields.

本文引用的文献

1
A GWAS Study on Liver Function Test Using eMERGE Network Participants.一项使用eMERGE网络参与者进行的肝功能测试全基因组关联研究。
PLoS One. 2015 Sep 28;10(9):e0138677. doi: 10.1371/journal.pone.0138677. eCollection 2015.
2
Suitability of UK Biobank Retinal Images for Automatic Analysis of Morphometric Properties of the Vasculature.英国生物银行视网膜图像用于血管形态学特性自动分析的适用性
PLoS One. 2015 May 22;10(5):e0127914. doi: 10.1371/journal.pone.0127914. eCollection 2015.
3
UK biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age.
推进骨关节炎研究:人工智能在临床、影像学和组学领域的作用。
Bone Res. 2025 Apr 22;13(1):48. doi: 10.1038/s41413-025-00423-2.
4
Improving Clinical Documentation with Artificial Intelligence: A Systematic Review.利用人工智能改善临床文档记录:一项系统综述。
Perspect Health Inf Manag. 2024 Jun 1;21(2):1d. eCollection 2024 Summer-Fall.
5
Management of Rheumatoid Arthritis in Primary Care: A Scoping Review.基层医疗中的类风湿关节炎管理:范围综述。
Int J Environ Res Public Health. 2024 May 22;21(6):662. doi: 10.3390/ijerph21060662.
6
A systematic review of clinical health conditions predicted by machine learning diagnostic and prognostic models trained or validated using real-world primary health care data.基于真实初级医疗保健数据进行训练或验证的机器学习诊断和预后模型预测的临床健康状况的系统评价。
PLoS One. 2023 Sep 8;18(9):e0274276. doi: 10.1371/journal.pone.0274276. eCollection 2023.
7
Fluorescence optical imaging feature selection with machine learning for differential diagnosis of selected rheumatic diseases.基于机器学习的荧光光学成像特征选择用于特定风湿性疾病的鉴别诊断
Front Med (Lausanne). 2023 Aug 21;10:1228833. doi: 10.3389/fmed.2023.1228833. eCollection 2023.
8
Predicting a diagnosis of ankylosing spondylitis using primary care health records-A machine learning approach.使用基层医疗健康记录预测强直性脊柱炎的诊断:一种机器学习方法。
PLoS One. 2023 Mar 31;18(3):e0279076. doi: 10.1371/journal.pone.0279076. eCollection 2023.
9
Predictive factors for degenerative lumbar spinal stenosis: a model obtained from a machine learning algorithm technique.退行性腰椎椎管狭窄症的预测因素:一种基于机器学习算法技术得到的模型。
BMC Musculoskelet Disord. 2023 Mar 23;24(1):218. doi: 10.1186/s12891-023-06330-z.
10
Machine learning identification of thresholds to discriminate osteoarthritis and rheumatoid arthritis synovial inflammation.机器学习识别阈值以区分骨关节炎和类风湿关节炎滑膜炎症。
Arthritis Res Ther. 2023 Mar 2;25(1):31. doi: 10.1186/s13075-023-03008-8.
英国生物银行:一个用于识别多种中老年复杂疾病病因的开放获取资源。
PLoS Med. 2015 Mar 31;12(3):e1001779. doi: 10.1371/journal.pmed.1001779. eCollection 2015 Mar.
4
Genetic variants associated with serum thyroid stimulating hormone (TSH) levels in European Americans and African Americans from the eMERGE Network.来自eMERGE网络的欧裔美国人和非裔美国人中与血清促甲状腺激素(TSH)水平相关的基因变异。
PLoS One. 2014 Dec 1;9(12):e111301. doi: 10.1371/journal.pone.0111301. eCollection 2014.
5
Local modelling techniques for assessing micro-level impacts of risk factors in complex data: understanding health and socioeconomic inequalities in childhood educational attainments.用于评估复杂数据中风险因素微观层面影响的局部建模技术:理解儿童教育成就中的健康和社会经济不平等现象。
PLoS One. 2014 Nov 19;9(11):e113592. doi: 10.1371/journal.pone.0113592. eCollection 2014.
6
Comparison of electronic health record system functionalities to support the patient recruitment process in clinical trials.支持临床试验患者招募过程的电子健康记录系统功能比较。
Int J Med Inform. 2014 Nov;83(11):860-8. doi: 10.1016/j.ijmedinf.2014.08.005. Epub 2014 Aug 25.
7
A case study of the Secure Anonymous Information Linkage (SAIL) Gateway: a privacy-protecting remote access system for health-related research and evaluation.安全匿名信息链接(SAIL)网关的案例研究:一种用于健康相关研究与评估的隐私保护远程访问系统。
J Biomed Inform. 2014 Aug;50(100):196-204. doi: 10.1016/j.jbi.2014.01.003. Epub 2014 Jan 15.
8
A systematic review of validated methods for identifying patients with rheumatoid arthritis using administrative or claims data.类风湿关节炎患者的行政或索赔数据识别方法的系统评价。
Vaccine. 2013 Dec 30;31 Suppl 10:K41-61. doi: 10.1016/j.vaccine.2013.03.075.
9
Prevalence and characteristics in coding, classification and diagnosis of diabetes in primary care.基层医疗中糖尿病编码、分类和诊断的流行情况和特征。
Postgrad Med J. 2014 Jan;90(1059):13-7. doi: 10.1136/postgradmedj-2013-132068. Epub 2013 Nov 13.
10
A review of approaches to identifying patient phenotype cohorts using electronic health records.利用电子健康记录识别患者表型队列的方法综述。
J Am Med Inform Assoc. 2014 Mar-Apr;21(2):221-30. doi: 10.1136/amiajnl-2013-001935. Epub 2013 Nov 7.