Suppr超能文献

挖掘初级保健电子健康记录以实现自动疾病表型分析:一个透明的机器学习框架。

Mining Primary Care Electronic Health Records for Automatic Disease Phenotyping: A Transparent Machine Learning Framework.

作者信息

Fernández-Gutiérrez Fabiola, Kennedy Jonathan I, Cooksey Roxanne, Atkinson Mark, Choy Ernest, Brophy Sinead, Huo Lin, Zhou Shang-Ming

机构信息

Swansea University Medical School, Swansea University, Swansea SA2 8PP, UK.

Arthritis Research UK CREATE Centre, Division Infection and Immunity, Cardiff University, Cardiff CF10 3NB, UK.

出版信息

Diagnostics (Basel). 2021 Oct 15;11(10):1908. doi: 10.3390/diagnostics11101908.

Abstract

(1) Background: We aimed to develop a transparent machine-learning (ML) framework to automatically identify patients with a condition from electronic health records (EHRs) via a parsimonious set of features. (2) Methods: We linked multiple sources of EHRs, including 917,496,869 primary care records and 40,656,805 secondary care records and 694,954 records from specialist surgeries between 2002 and 2012, to generate a unique dataset. Then, we treated patient identification as a problem of text classification and proposed a transparent disease-phenotyping framework. This framework comprises a generation of patient representation, feature selection, and optimal phenotyping algorithm development to tackle the imbalanced nature of the data. This framework was extensively evaluated by identifying rheumatoid arthritis (RA) and ankylosing spondylitis (AS). (3) Results: Being applied to the linked dataset of 9657 patients with 1484 cases of rheumatoid arthritis (RA) and 204 cases of ankylosing spondylitis (AS), this framework achieved accuracy and positive predictive values of 86.19% and 88.46%, respectively, for RA and 99.23% and 97.75% for AS, comparable with expert knowledge-driven methods. (4) Conclusions: This framework could potentially be used as an efficient tool for identifying patients with a condition of interest from EHRs, helping clinicians in clinical decision-support process.

摘要

(1)背景:我们旨在开发一个透明的机器学习(ML)框架,通过一组简洁的特征从电子健康记录(EHR)中自动识别患有某种疾病的患者。(2)方法:我们链接了多个电子健康记录源,包括2002年至2012年间的917496869份初级保健记录、40656805份二级保健记录以及694954份专科手术记录,以生成一个独特的数据集。然后,我们将患者识别视为一个文本分类问题,并提出了一个透明的疾病表型框架。该框架包括生成患者表征、特征选择以及开发最优表型算法以应对数据的不平衡特性。通过识别类风湿性关节炎(RA)和强直性脊柱炎(AS)对该框架进行了广泛评估。(3)结果:将该框架应用于包含1484例类风湿性关节炎(RA)和204例强直性脊柱炎(AS)的9657名患者的链接数据集时,对于RA,其准确率和阳性预测值分别达到了86.19%和88.46%,对于AS则分别为99.23%和97.75%,与专家知识驱动的方法相当。(4)结论:该框架有可能作为一种有效的工具,用于从电子健康记录中识别患有感兴趣疾病的患者,在临床决策支持过程中帮助临床医生。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2901/8534858/3f66fec429b1/diagnostics-11-01908-g0A1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验