Kronzer Vanessa L, Williamson Katrina A, Hanson Andrew C, Sletten Jennifer A, Sparks Jeffrey A, Davis John M, Crowson Cynthia S
Division of Rheumatology, Mayo Clinic, Rochester, Minnesota, USA.
Department of Quantitative Health Sciences, Mayo Clinic, Rochester, Minnesota, USA.
Semin Arthritis Rheum. 2025 Jun;72:152668. doi: 10.1016/j.semarthrit.2025.152668. Epub 2025 Feb 22.
To quantify and improve the performance of standard rheumatoid arthritis (RA) algorithms in a biobank setting.
This retrospective cohort study within the Mayo Clinic (MC) Biobank and MC Tapestry Study identified RA cases by presence of at least two RA codes OR positive anti-cyclic citrullinated peptide antibodies (CCP) plus disease-modifying anti-rheumatic drug (DMARD) prescription as of 7/18/2022. Rheumatology physicians manually verified all RA cases using RA criteria and/or rheumatology physician diagnosis plus DMARD use. All other biobank participants served as non-RA controls. We defined seropositivity as rheumatoid factor and/or anti-CCP positivity. We assessed rules-based and Electronic Medical Records and Genomics (eMERGE) RA algorithms using positive predictive value (PPV). Finally, we developed a novel RA algorithm using a LASSO-based machine learning approach with five-fold cross validation.
We identified 1,316 confirmed RA cases (968 MC Biobank, 348 Tapestry, 70 % seropositive) and 82,123 non-RA controls (mean age 65, 61 % female). The PPV of 3 RA codes was 43 %, codes plus DMARD was 54 %, and codes plus DMARD plus seropositivity was 85 %. The PPV of eMERGE was 77 %. Available in the MC Biobank, self-reported RA (PPV 10 %) only minimally improved algorithm performance (PPV from 83 % to 85 %), whereas family history of RA (PPV 3 %) worsened performance. At 90 % PPV, the novel RA algorithm incorporating key variables such as anti-CCP and DMARD use increased sensitivity by 4-11 % compared to eMERGE.
Rules-based and eMERGE RA algorithms had worse performance in biobank than administrative settings. Our novel RA algorithm outperformed these standard algorithms.
在生物样本库环境中量化并提高标准类风湿关节炎(RA)算法的性能。
这项在梅奥诊所(MC)生物样本库和MC织锦研究中的回顾性队列研究,截至2022年7月18日,通过存在至少两个RA编码或抗环瓜氨酸肽抗体(CCP)阳性加上抗风湿药物(DMARD)处方来确定RA病例。风湿病医生使用RA标准和/或风湿病医生诊断加上DMARD使用情况手动核实所有RA病例。所有其他生物样本库参与者作为非RA对照。我们将血清阳性定义为类风湿因子和/或抗CCP阳性。我们使用阳性预测值(PPV)评估基于规则的以及电子病历与基因组学(eMERGE)RA算法。最后,我们使用基于套索的机器学习方法和五折交叉验证开发了一种新型RA算法。
我们确定了1316例确诊的RA病例(968例来自MC生物样本库,348例来自织锦研究,70%为血清阳性)和82123例非RA对照(平均年龄65岁,61%为女性)。3个RA编码的PPV为43%,编码加上DMARD为54%,编码加上DMARD加上血清阳性为85%。eMERGE的PPV为77%。在MC生物样本库中,自我报告的RA(PPV为10%)仅略微提高了算法性能(PPV从83%提高到85%),而RA家族史(PPV为3%)则使性能恶化。在PPV为90%时,纳入抗CCP和DMARD使用等关键变量的新型RA算法与eMERGE相比,敏感性提高了4 - 11%。
基于规则的和eMERGE RA算法在生物样本库中的性能比在行政环境中更差。我们的新型RA算法优于这些标准算法。