Burns Gully, Kauffman Carey, Manion Michele, Pai Ruth-Anne, Milla Carlos, O'Connor Michael G, Shapiro Adam J, Bjornson-Pennell Heidi
Chan Zuckerberg Initiative, PO BOX 8040, Redwood City, CA 94063.
The Primary Ciliary Dyskinesia Foundation, Minneapolis, MN, USA.
medRxiv. 2025 Apr 20:2025.04.18.25326065. doi: 10.1101/2025.04.18.25326065.
Significant diagnostic delays are common in primary ciliary dyskinesia (PCD), a rare disease that is significantly underdiagnosed. Scalable screening methods could improve early identification and health outcomes.
Can machine learning (ML) be used to screen for PCD in pediatric patients?
We evaluated the feasibility of a random forest model to screen for PCD using data from the PCD Foundation Registry and a national claims database. We identified a cohort of pediatric patients with diagnostic codes indicative of conditions potentially associated with PCD, and studied diagnostic, procedural, and pharmaceutical codes associated with PCD to develop ML features. Models were trained on composite claims data from confirmed patients with PCD, patients with Q34.8 (Specific Congenital Malformation of the Respiratory System) diagnosed within six months of an Electron Microscopy procedure (Q34.8+EM), and a randomly-selected, matched control group. Model performance was tested through 5-fold cross-validation.
Using 82 confirmed PCD cases and 4,161 matched controls, the model demonstrated variable performance (positive predictive value 0.45-0.73, sensitivity 0.75-0.94). Synthetic data augmentation did not improve results (positive predictive value 0.45-0.67, sensitivity 0.71-1.00). Expanding the dataset to include 319 Q34.8+EM patients and 8,214 controls improved performance (positive predictive value 0.51-0.54, sensitivity 0.82-0.90), suitable for screening. In a cohort of 1.32 million pediatric patients, 7,705 were classified as positive, consistent with the estimated prevalence of PCD (1:7,554).
This study demonstrates the feasibility of using ML to screen for PCD using claims data, even in the absence of a specific International Classification of Disease (ICD) code. Such screening approaches may aid in the identification of individuals who may benefit from timely diagnostic testing and targeted interventions.
在原发性纤毛运动障碍(PCD)中,显著的诊断延迟很常见,这是一种罕见病,且严重漏诊。可扩展的筛查方法能够改善早期识别和健康结局。
机器学习(ML)能否用于筛查儿科患者的PCD?
我们使用PCD基金会登记处的数据和一个全国性索赔数据库,评估了随机森林模型筛查PCD的可行性。我们确定了一组具有指示可能与PCD相关疾病的诊断代码的儿科患者,并研究了与PCD相关的诊断、程序和药物代码,以开发ML特征。模型在来自确诊PCD患者、在电子显微镜检查(Q34.8+EM)后六个月内被诊断为Q34.8(呼吸系统特定先天性畸形)的患者以及随机选择的匹配对照组的综合索赔数据上进行训练。通过五折交叉验证测试模型性能。
使用82例确诊的PCD病例和4161例匹配对照,该模型表现出不同的性能(阳性预测值0.45 - 0.73,灵敏度0.75 - 0.94)。合成数据增强并未改善结果(阳性预测值0.45 - 0.67,灵敏度0.71 - 1.00)。将数据集扩大到包括319例Q34.8+EM患者和8214例对照可提高性能(阳性预测值0.51 - 0.54,灵敏度0.82 - 0.90),适合筛查。在一组132万儿科患者中,7705例被分类为阳性,与PCD的估计患病率(1:7554)一致。
本研究证明了使用ML通过索赔数据筛查PCD的可行性,即使在没有特定国际疾病分类(ICD)代码的情况下也是如此。这种筛查方法可能有助于识别那些可能从及时诊断测试和针对性干预中受益的个体。