Ali Waqar, Williams Jonathan, Xiong Betty, Zou James, Daneshjou Roxana
UCB Pharma, Slough, United Kingdom.
UCB Pharma, Brussels, Belgium.
JID Innov. 2025 Mar 10;5(3):100362. doi: 10.1016/j.xjidi.2025.100362. eCollection 2025 May.
Patients with hidradenitis suppurativa (HS) are often misdiagnosed and may wait up to 10 years to receive a diagnosis of HS. This study aimed to predict HS diagnosis prior to actual diagnosis on the basis of previous medical history using models developed with insurance claims data. Three machine learning models were compared with a model using features selected by a dermatologist (clinical baseline model). The study analyzed 5,900,000 United States individuals' insurance records over 13.5 years. The population included 13,886 patients with HS with at least 1 claim in each of the 2 years prior to their first HS diagnosis and 69,428 control patients with no HS diagnosis. The models aimed to classify HS diagnosis status on the basis of clinical features observed over 2 years. Model performance was assessed by area under the receiver operating characterisitic curve, F1-score, and precision and recall rates. The machine learning models (logistic regression, random forest, and XGBoost) showed a higher area under the receiver operating characterisitic curve than the clinical baseline model (logistic regression = 0.75, random forest = 0.79, XGBoost = 0.80, clinical = 0.71). In the clinical model and the best-performing XGBoost model, the top features associated with diagnosis were patient age at prediction and sex. The XGBoost model top features also included the use of sulfamethoxazole/trimethoprim and clindamycin phosphate and obesity.
化脓性汗腺炎(HS)患者常被误诊,可能要等待长达10年才能得到HS的诊断。本研究旨在使用基于保险理赔数据开发的模型,根据既往病史在实际诊断之前预测HS诊断。将三种机器学习模型与使用皮肤科医生选择的特征的模型(临床基线模型)进行比较。该研究分析了13.5年期间590万美国个人的保险记录。该人群包括13886例HS患者,在首次HS诊断前的两年中每年至少有1次理赔,以及69428例未诊断为HS的对照患者。这些模型旨在根据两年内观察到的临床特征对HS诊断状态进行分类。通过受试者工作特征曲线下面积、F1分数以及精确率和召回率评估模型性能。机器学习模型(逻辑回归、随机森林和XGBoost)显示受试者工作特征曲线下面积高于临床基线模型(逻辑回归=0.75,随机森林=0.79,XGBoost=0.80,临床=0.71)。在临床模型和表现最佳的XGBoost模型中,与诊断相关的首要特征是预测时的患者年龄和性别。XGBoost模型的首要特征还包括使用磺胺甲恶唑/甲氧苄啶和磷酸克林霉素以及肥胖。