Hunsdieck Berit, Bender Christian, Ickstadt Katja, Mielke Johanna
Computational Biology, Bayer AG, Wuppertal, Germany.
Department of Statistics, TU Dortmund University, Dortmund, Germany.
BioData Min. 2025 May 13;18(1):35. doi: 10.1186/s13040-025-00450-z.
Over the past decade an increase in usage of electronic health data (EHR) by office-based physicians and hospitals has been reported. However, these data types come with challenge regarding completeness and data quality and it is, especially for more complex models, unclear how these characteristics influence the performance.
In this paper, we focus on joint models which combines longitudinal modelling with survival modelling to incorporate all available information. The aim of this paper is to establish simulation-based guidelines for the necessary quality of longitudinal EHR data so that joint models perform better than cox models. We conducted an extensive simulation study by systematically and transparently varying different characteristics of data quality, e.g., measurement frequency, noise, and heterogeneity between patients. We apply the joint models and evaluate their performance relative to traditional Cox survival modelling techniques.
Key findings suggest that biomarker changes before disease onset must be consistent within similar patient groups. With increasing noise and a higher measurement density, the joint model surpasses the traditional Cox regression model in terms of model performance. We illustrate the usefulness and limitations of the guidelines with two real-world examples, namely the influence of serum bilirubin on primary biliary liver cirrhosis and the influence of the estimated glomerular filtration rate on chronic kidney disease.
在过去十年间,有报告称门诊医生和医院对电子健康数据(EHR)的使用有所增加。然而,这些数据类型在完整性和数据质量方面存在挑战,而且,尤其是对于更复杂的模型而言,尚不清楚这些特征如何影响其性能。
在本文中,我们聚焦于联合模型,该模型将纵向建模与生存建模相结合以纳入所有可用信息。本文的目的是为纵向EHR数据的必要质量建立基于模拟的指南,以便联合模型的表现优于Cox模型。我们通过系统且透明地改变数据质量的不同特征,例如测量频率、噪声以及患者之间的异质性,开展了一项广泛的模拟研究。我们应用联合模型,并相对于传统的Cox生存建模技术评估其性能。
主要发现表明,疾病发作前生物标志物的变化在相似患者组内必须是一致的。随着噪声增加和测量密度提高,联合模型在模型性能方面超过了传统的Cox回归模型。我们用两个实际例子说明了这些指南的有用性和局限性,即血清胆红素对原发性胆汁性肝硬化的影响以及估计肾小球滤过率对慢性肾病的影响。