University of Exeter Medical School. Address: Clinical and Biomedical Sciences, RILD Building, Royal Devon & Exeter Hospital, Barrack Road, Exeter, EX2 5DW, UK.
University of Dundee. Address: Division of Population Health & Genomics, Ninewells Hospital and Medical School, University of Dundee, Dundee, DD1 9SY, UK.
BMC Med Res Methodol. 2024 Jun 4;24(1):128. doi: 10.1186/s12874-024-02239-w.
Clinical prediction models can help identify high-risk patients and facilitate timely interventions. However, developing such models for rare diseases presents challenges due to the scarcity of affected patients for developing and calibrating models. Methods that pool information from multiple sources can help with these challenges.
We compared three approaches for developing clinical prediction models for population screening based on an example of discriminating a rare form of diabetes (Maturity-Onset Diabetes of the Young - MODY) in insulin-treated patients from the more common Type 1 diabetes (T1D). Two datasets were used: a case-control dataset (278 T1D, 177 MODY) and a population-representative dataset (1418 patients, 96 MODY tested with biomarker testing, 7 MODY positive). To build a population-level prediction model, we compared three methods for recalibrating models developed in case-control data. These were prevalence adjustment ("offset"), shrinkage recalibration in the population-level dataset ("recalibration"), and a refitting of the model to the population-level dataset ("re-estimation"). We then developed a Bayesian hierarchical mixture model combining shrinkage recalibration with additional informative biomarker information only available in the population-representative dataset. We developed a method for dealing with missing biomarker and outcome information using prior information from the literature and other data sources to ensure the clinical validity of predictions for certain biomarker combinations.
The offset, re-estimation, and recalibration methods showed good calibration in the population-representative dataset. The offset and recalibration methods displayed the lowest predictive uncertainty due to borrowing information from the fitted case-control model. We demonstrate the potential of a mixture model for incorporating informative biomarkers, which significantly enhanced the model's predictive accuracy, reduced uncertainty, and showed higher stability in all ranges of predictive outcome probabilities.
We have compared several approaches that could be used to develop prediction models for rare diseases. Our findings highlight the recalibration mixture model as the optimal strategy if a population-level dataset is available. This approach offers the flexibility to incorporate additional predictors and informed prior probabilities, contributing to enhanced prediction accuracy for rare diseases. It also allows predictions without these additional tests, providing additional information on whether a patient should undergo further biomarker testing before genetic testing.
临床预测模型有助于识别高危患者并促进及时干预。然而,由于用于开发和校准模型的受影响患者稀缺,因此为罕见疾病开发此类模型具有挑战性。可以从多个来源汇集信息的方法可以帮助解决这些挑战。
我们比较了三种方法,用于基于区分接受胰岛素治疗的患者中的罕见糖尿病形式(青年发病的成年型糖尿病-MODY)与更常见的 1 型糖尿病(T1D)的人群筛查开发临床预测模型。使用了两个数据集:病例对照数据集(278 例 T1D,177 例 MODY)和代表性人群数据集(1418 例患者,96 例 MODY 经生物标志物检测,7 例 MODY 阳性)。为了构建人群水平的预测模型,我们比较了三种用于重新校准病例对照数据中开发的模型的方法。这些方法是患病率调整(“偏移”),人群水平数据集的收缩重新校准(“重新校准”),以及对人群水平数据集的模型重新拟合(“重新估计”)。然后,我们开发了一种贝叶斯分层混合模型,将收缩重新校准与仅在代表性人群数据集中可用的附加信息性生物标志物信息相结合。我们开发了一种处理缺失的生物标志物和结果信息的方法,使用来自文献和其他数据源的先验信息来确保对某些生物标志物组合的预测具有临床有效性。
偏移,重新估计和重新校准方法在代表性人群数据集中显示出良好的校准。由于从拟合的病例对照模型中借用了信息,因此偏移和重新校准方法显示出最低的预测不确定性。我们证明了混合模型纳入信息性生物标志物的潜力,这显著提高了模型的预测准确性,降低了不确定性,并在所有预测结果概率范围内显示出更高的稳定性。
我们比较了几种可用于开发罕见疾病预测模型的方法。我们的研究结果强调了如果有人群水平数据集,则重新校准混合模型是最佳策略。这种方法具有灵活性,可以纳入其他预测因子和信息性先验概率,从而提高对罕见疾病的预测准确性。它还允许在没有这些额外测试的情况下进行预测,从而提供有关患者是否应该在进行基因测试之前进行进一步的生物标志物测试的更多信息。