Kristensen Simon Bang, Clausen Anne, Skjødt Michael Kriegbaum, Søndergaard Jens, Abrahamsen Bo, Möller Sören, Rubin Katrine Hass
Research Unit OPEN, Department of Clinical Research, University of Southern Denmark, Heden 16, Odense C, 5000, Denmark.
OPEN - Open Patient data Explorative Network, Odense University Hospital, Odense, Denmark.
Diagn Progn Res. 2023 Oct 3;7(1):19. doi: 10.1186/s41512-023-00158-w.
Osteoporosis poses a growing healthcare challenge owing to its rising prevalence and a significant treatment gap, as patients are widely underdiagnosed and consequently undertreated, leaving them at high risk of osteoporotic fracture. Several tools aim to improve case-finding in osteoporosis. One such tool is the Fracture Risk Evaluation Model (FREM), which in contrast to other tools focuses on imminent fracture risk and holds potential for automation as it relies solely on data that is routinely collected via the Danish healthcare registers. The present article is an analysis protocol for a prediction model that is to be used as a modified version of FREM, with the intention of improving the identification of subjects at high imminent risk of fracture by including pharmacological exposures and using more advanced statistical methods compared to the original FREM. Its main purposes are to document and motivate various aspects and choices of data management and statistical analyses.
The model will be developed by employing logistic regression with grouped LASSO regularization as the primary statistical approach and gradient-boosted classification trees as a secondary statistical modality. Hyperparameter choices as well as computational considerations on these two approaches are investigated by an unsupervised data review (i.e., blinded to the outcome), which also investigates and handles multicollinarity among the included exposures. Further, we present an unsupervised review of the data and testing of analysis code with respect to speed and robustness on a remote analysis environment. The data review and code tests are used to adjust the analysis plans in a blinded manner, so as not to increase the risk of overfitting in the proposed methods.
This protocol specifies the planned tool development to ensure transparency in the modeling approach, hence improving the validity of the enhanced tool to be developed. Through an unsupervised data review, it is further documented that the planned statistical approaches are feasible and compatible with the data employed.
骨质疏松症因其患病率不断上升以及存在显著的治疗差距,对医疗保健构成了日益严峻的挑战,因为患者普遍未得到充分诊断,因此治疗不足,使他们面临骨质疏松性骨折的高风险。有几种工具旨在改善骨质疏松症的病例发现。其中一种工具是骨折风险评估模型(FREM),与其他工具不同的是,它侧重于即将发生的骨折风险,并且由于仅依赖通过丹麦医疗保健登记系统常规收集的数据,具有自动化的潜力。本文是一个预测模型的分析方案,该模型将用作FREM的修改版本,目的是通过纳入药物暴露并使用比原始FREM更先进的统计方法,改进对即将发生高骨折风险受试者的识别。其主要目的是记录并说明数据管理和统计分析的各个方面及选择。
该模型将通过采用分组LASSO正则化的逻辑回归作为主要统计方法以及梯度提升分类树作为次要统计方法来开发。通过无监督数据审查(即对结果进行盲法处理)来研究这两种方法的超参数选择以及计算方面的考虑,该审查还研究并处理纳入暴露因素之间的多重共线性。此外,我们在远程分析环境中对数据进行无监督审查,并对分析代码的速度和稳健性进行测试。数据审查和代码测试用于以盲法方式调整分析计划,以免在所提出的方法中增加过度拟合的风险。
本方案规定了计划中的工具开发,以确保建模方法的透明度,从而提高待开发增强工具的有效性。通过无监督数据审查,进一步证明了计划中的统计方法是可行的,并且与所使用的数据兼容。