Suppr超能文献

使用机器学习和知识工程在电子健康记录中检测罕见病:急性肝卟啉症案例研究。

Detecting rare diseases in electronic health records using machine learning and knowledge engineering: Case study of acute hepatic porphyria.

机构信息

Department of Medical Informatics & Clinical Epidemiology, School of Medicine, Oregon Health & Science University, Portland, Oregon, United States of America.

Alnylam Pharmaceuticals, Cambridge, Massachusetts, United States of America.

出版信息

PLoS One. 2020 Jul 2;15(7):e0235574. doi: 10.1371/journal.pone.0235574. eCollection 2020.

Abstract

BACKGROUND

With the growing adoption of the electronic health record (EHR) worldwide over the last decade, new opportunities exist for leveraging EHR data for detection of rare diseases. Rare diseases are often not diagnosed or delayed in diagnosis by clinicians who encounter them infrequently. One such rare disease that may be amenable to EHR-based detection is acute hepatic porphyria (AHP). AHP consists of a family of rare, metabolic diseases characterized by potentially life-threatening acute attacks and chronic debilitating symptoms. The goal of this study was to apply machine learning and knowledge engineering to a large extract of EHR data to determine whether they could be effective in identifying patients not previously tested for AHP who should receive a proper diagnostic workup for AHP.

METHODS AND FINDINGS

We used an extract of the complete EHR data of 200,000 patients from an academic medical center and enriched it with records from an additional 5,571 patients containing any mention of porphyria in the record. After manually reviewing the records of all 47 unique patients with the ICD-10-CM code E80.21 (Acute intermittent [hepatic] porphyria), we identified 30 patients who were positive cases for our machine learning models, with the rest of the patients used as negative cases. We parsed the record into features, which were scored by frequency of appearance and filtered using univariate feature analysis. We manually choose features not directly tied to provider attributes or suspicion of the patient having AHP. We trained on the full dataset, with the best cross-validation performance coming from support vector machine (SVM) algorithm using a radial basis function (RBF) kernel. The trained model was applied back to the full data set and patients were ranked by margin distance. The top 100 ranked negative cases were manually reviewed for symptom complexes similar to AHP, finding four patients where AHP diagnostic testing was likely indicated and 18 patients where AHP diagnostic testing was possibly indicated. From the top 100 ranked cases of patients with mention of porphyria in their record, we identified four patients for whom AHP diagnostic testing was possibly indicated and had not been previously performed. Based solely on the reported prevalence of AHP, we would have expected only 0.002 cases out of the 200 patients manually reviewed.

CONCLUSIONS

The application of machine learning and knowledge engineering to EHR data may facilitate the diagnosis of rare diseases such as AHP. Further work will recommend clinical investigation to identified patients' clinicians, evaluate more patients, assess additional feature selection and machine learning algorithms, and apply this methodology to other rare diseases. This work provides strong evidence that population-level informatics can be applied to rare diseases, greatly improving our ability to identify undiagnosed patients, and in the future improve the care of these patients and our ability study these diseases. The next step is to learn how best to apply these EHR-based machine learning approaches to benefit individual patients with a clinical study that provides diagnostic testing and clinical follow up for those identified as possibly having undiagnosed AHP.

摘要

背景

在过去十年中,随着电子健康记录 (EHR) 在全球的广泛采用,利用 EHR 数据检测罕见病的新机会已经出现。罕见病通常是由很少遇到它们的临床医生诊断或延迟诊断的。一种可能适合基于 EHR 检测的罕见疾病是急性肝卟啉症 (AHP)。AHP 由一组罕见的代谢疾病组成,其特征是潜在的危及生命的急性发作和慢性衰弱症状。本研究的目的是应用机器学习和知识工程对大量 EHR 数据进行提取,以确定它们是否可以有效识别以前未接受过 AHP 检测的患者,这些患者应接受适当的 AHP 诊断评估。

方法和发现

我们使用了来自学术医疗中心的 200,000 名患者的完整 EHR 数据提取,并使用另外 5,571 名患者的记录对其进行了丰富,这些记录中在记录中包含任何卟啉的提及。在对所有 47 名具有 ICD-10-CM 代码 E80.21(急性间歇性 [肝] 卟啉症)的独特患者的记录进行手动审查后,我们确定了 30 名患有我们的机器学习模型阳性病例的患者,其余患者则作为阴性病例。我们将记录解析为特征,这些特征的出现频率进行评分,并使用单变量特征分析进行过滤。我们手动选择与提供者属性或怀疑患者患有 AHP 没有直接关联的特征。我们在完整数据集上进行训练,使用具有径向基函数 (RBF) 内核的支持向量机 (SVM) 算法获得最佳的交叉验证性能。训练好的模型应用于完整数据集,患者根据边缘距离进行排名。对记录中提到卟啉的前 100 名阴性病例进行了类似 AHP 的症状综合分析,发现有 4 名患者可能需要进行 AHP 诊断测试,有 18 名患者可能需要进行 AHP 诊断测试。在前 100 名提到记录中有卟啉的患者中,我们发现有 4 名患者可能需要进行 AHP 诊断测试,但之前没有进行过。仅根据 AHP 的报告患病率,我们预计在手动审查的 200 名患者中只有 0.002 例。

结论

将机器学习和知识工程应用于 EHR 数据可能有助于诊断 AHP 等罕见疾病。进一步的工作将建议向确定的患者的临床医生推荐临床检查,评估更多患者,评估其他特征选择和机器学习算法,并将这种方法应用于其他罕见疾病。这项工作有力地证明了人群水平的信息学可以应用于罕见疾病,大大提高了我们识别未确诊患者的能力,并且在未来可以改善这些患者的护理和我们研究这些疾病的能力。下一步是学习如何最好地应用这些基于 EHR 的机器学习方法来使个体患者受益,进行临床研究,为可能未被诊断的 AHP 患者提供诊断测试和临床随访。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f5ec/7331997/7852a00eed2b/pone.0235574.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验