Bhasuran Balu, Schmolly Katharina, Kapoor Yuvraaj, Jayakumar Nanditha Lakshmi, Doan Raymond, Amin Jigar, Meninger Stephen, Cheng Nathan, Deering Robert, Anderson Karl, Beaven Simon W, Wang Bruce, Rudrapatna Vivek A
Bakar Computational Health Sciences Institute, San Francisco, CA, 94143.
David Geffen School of Medicine & Pfleger Liver Institute, University of California Los Angeles, Los Angeles, CA 90095.
medRxiv. 2023 Aug 31:2023.08.30.23293130. doi: 10.1101/2023.08.30.23293130.
Acute Hepatic Porphyria (AHP) is a group of rare but treatable conditions associated with diagnostic delays of fifteen years on average. The advent of electronic health records (EHR) data and machine learning (ML) may improve the timely recognition of rare diseases like AHP. However, prediction models can be difficult to train given the limited case numbers, unstructured EHR data, and selection biases intrinsic to healthcare delivery.
To train and characterize models for identifying patients with AHP.
This diagnostic study used structured and notes-based EHR data from two centers at the University of California, UCSF (2012-2022) and UCLA (2019-2022). The data were split into two cohorts (referral, diagnosis) and used to develop models that predict: 1) who will be referred for testing of acute porphyria, amongst those who presented with abdominal pain (a cardinal symptom of AHP), and 2) who will test positive, amongst those referred. The referral cohort consisted of 747 patients referred for testing and 99,849 contemporaneous patients who were not. The diagnosis cohort consisted of 72 confirmed AHP cases and 347 patients who tested negative. Cases were female predominant and 6-75 years old at the time of diagnosis. Candidate models used a range of architectures. Feature selection was semi-automated and incorporated publicly available data from knowledge graphs.
F-score on an outcome-stratified test set.
The best center-specific referral models achieved an F-score of 86-91%. The best diagnosis model achieved an F-score of 92%. To further test our model, we contacted 372 current patients who lack an AHP diagnosis but were predicted by our models as potentially having it (≥ 10% probability of referral, ≥ 50% of testing positive). However, we were only able to recruit 10 of these patients for biochemical testing, all of whom were negative. Nonetheless, evaluations suggested that these models could identify 71% of cases earlier than their diagnosis date, saving 1.2 years.
ML can reduce diagnostic delays in AHP and other rare diseases. Robust recruitment strategies and multicenter coordination will be needed to validate these models before they can be deployed.
急性肝卟啉病(AHP)是一组罕见但可治疗的疾病,平均诊断延迟达15年。电子健康记录(EHR)数据和机器学习(ML)的出现可能会改善对AHP等罕见疾病的及时识别。然而,鉴于病例数量有限、EHR数据非结构化以及医疗服务中固有的选择偏差,预测模型可能难以训练。
训练并描述用于识别AHP患者的模型。
设计、设置和参与者:这项诊断研究使用了来自加利福尼亚大学旧金山分校(UCSF,2012 - 2022年)和洛杉矶分校(UCLA,2019 - 2022年)两个中心的结构化和基于记录的EHR数据。数据被分为两个队列(转诊、诊断),并用于开发预测模型:1)在出现腹痛(AHP的主要症状)的患者中,谁将被转诊进行急性卟啉病检测;2)在被转诊的患者中,谁检测结果呈阳性。转诊队列包括747名被转诊进行检测的患者和99,849名同期未被转诊的患者。诊断队列包括72例确诊的AHP病例和347例检测呈阴性的患者。病例以女性为主,诊断时年龄在6至75岁之间。候选模型使用了一系列架构。特征选择是半自动的,并纳入了来自知识图谱的公开可用数据。
在按结果分层的测试集上的F分数。
最佳的特定中心转诊模型的F分数达到86 - 91%。最佳诊断模型的F分数达到92%。为了进一步测试我们的模型,我们联系了372名目前未被诊断为AHP但被我们的模型预测可能患有该病的患者(转诊概率≥10%,检测呈阳性概率≥50%)。然而,我们仅能招募到其中10名患者进行生化检测,所有这些患者检测结果均为阴性。尽管如此,评估表明这些模型能够比诊断日期提前识别出71%的病例,节省1.2年时间。
机器学习可以减少AHP和其他罕见疾病的诊断延迟。在这些模型能够部署之前,需要强有力的招募策略和多中心协调来验证它们。