Regeneron Pharmaceuticals, Inc., 777 Old Saw Mill River Road, Tarrytown, New York, NY, 10591, USA.
Komodo Health, New York, NY, USA.
Sci Rep. 2024 Apr 17;14(1):8890. doi: 10.1038/s41598-024-58719-y.
Homozygous familial hypercholesterolemia (HoFH) is an underdiagnosed and undertreated ultra-rare disease. We utilized claims data from the Komodo Healthcare Map database to develop a machine-learning model to identify potential HoFH patients. We tokenized patients enrolled in MyRARE (patient support program for those prescribed evinacumab-dgnb in the United States) and linked them with their Komodo claims. A true positive HoFH cohort (n = 331) was formed by including patients from MyRARE and patients with prescriptions for evinacumab-dgnb or lomitapide. The negative cohort (n = 1423) comprised patients with or at risk for cardiovascular disease. We divided the cohort into an 80% training and 20% testing set. Overall, 10,616 candidate features were investigated; 87 were selected due to clinical relevance and importance on prediction performance. Different machine-learning algorithms were explored, with fast interpretable greedy-tree sums selected as the final machine-learning tool. This selection was based on its satisfactory performance and its easily interpretable nature. The model identified four useful features and yielded precision (positive predicted value) of 0.98, recall (sensitivity) of 0.88, area under the receiver operating characteristic curve of 0.98, and accuracy of 0.97. The model performed well in identifying HoFH patients in the testing set, providing a useful tool to facilitate HoFH screening and diagnosis via healthcare claims data.
纯合子家族性高胆固醇血症(HoFH)是一种诊断不足且治疗不足的罕见疾病。我们利用 Komodo Healthcare Map 数据库中的索赔数据,开发了一种机器学习模型来识别潜在的 HoFH 患者。我们对参加 MyRARE(美国接受 evinacumab-dgnb 治疗的患者支持计划)的患者进行标记,并将其与他们的 Komodo 索赔联系起来。通过包括 MyRARE 中的患者以及接受 evinacumab-dgnb 或 lomitapide 处方的患者,形成了一个真正的阳性 HoFH 队列(n=331)。阴性队列(n=1423)由患有或有心血管疾病风险的患者组成。我们将队列分为 80%的训练集和 20%的测试集。总体而言,研究了 10616 个候选特征;由于临床相关性和对预测性能的重要性,选择了 87 个特征。探索了不同的机器学习算法,最终选择快速可解释的贪婪树总和作为最终的机器学习工具。这种选择基于其令人满意的性能及其易于解释的性质。该模型确定了四个有用的特征,其精度(阳性预测值)为 0.98,召回率(灵敏度)为 0.88,接收者操作特征曲线下面积为 0.98,准确性为 0.97。该模型在测试集中识别 HoFH 患者的表现良好,为通过医疗保健索赔数据进行 HoFH 筛查和诊断提供了有用的工具。