Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Oxford, United Kingdom.
MD Anderson Center for INSPiRED Cancer Care, University of Texas, Houston, TX, United States.
J Med Internet Res. 2021 Jul 30;23(7):e26412. doi: 10.2196/26412.
Computerized adaptive testing (CAT) has been shown to deliver short, accurate, and personalized versions of the CLEFT-Q patient-reported outcome measure for children and young adults born with a cleft lip and/or palate. Decision trees may integrate clinician-reported data (eg, age, gender, cleft type, and planned treatments) to make these assessments even shorter and more accurate.
We aimed to create decision tree models incorporating clinician-reported data into adaptive CLEFT-Q assessments and compare their accuracy to traditional CAT models.
We used relevant clinician-reported data and patient-reported item responses from the CLEFT-Q field test to train and test decision tree models using recursive partitioning. We compared the prediction accuracy of decision trees to CAT assessments of similar length. Participant scores from the full-length questionnaire were used as ground truth. Accuracy was assessed through Pearson's correlation coefficient of predicted and ground truth scores, mean absolute error, root mean squared error, and a two-tailed Wilcoxon signed-rank test comparing squared error.
Decision trees demonstrated poorer accuracy than CAT comparators and generally made data splits based on item responses rather than clinician-reported data.
When predicting CLEFT-Q scores, individual item responses are generally more informative than clinician-reported data. Decision trees that make binary splits are at risk of underfitting polytomous patient-reported outcome measure data and demonstrated poorer performance than CATs in this study.
计算机化自适应测试(CAT)已被证明可以为唇裂和/或腭裂患儿和青少年提供简短、准确且个性化的 CLEFT-Q 患者报告结局测量版本。决策树可以整合临床医生报告的数据(例如,年龄、性别、裂隙类型和计划治疗),使这些评估更加简短和准确。
我们旨在创建将临床医生报告的数据纳入适应性 CLEFT-Q 评估的决策树模型,并将其准确性与传统 CAT 模型进行比较。
我们使用相关的临床医生报告的数据和来自 CLEFT-Q 现场测试的患者报告项目反应来训练和测试使用递归分区的决策树模型。我们比较了决策树对相似长度的 CAT 评估的预测准确性。使用完整问卷的参与者得分作为真实得分。通过预测得分和真实得分的 Pearson 相关系数、平均绝对误差、均方根误差以及比较平方误差的双侧 Wilcoxon 符号秩检验来评估准确性。
决策树的准确性不如 CAT 对照物,并且通常根据项目反应而不是临床医生报告的数据进行数据分割。
在预测 CLEFT-Q 评分时,个体项目反应通常比临床医生报告的数据更具信息量。在这项研究中,做出二元分割的决策树有过度拟合多分类患者报告结局测量数据的风险,并且表现不如 CAT。