Department of Statistics, Florida State University, Tallahassee, FL, 32304, USA.
Department of Statistics, Florida State University, Tallahassee, FL, 32304, USA.
Comput Biol Med. 2024 Jul;177:108665. doi: 10.1016/j.compbiomed.2024.108665. Epub 2024 May 27.
Longitudinal data in health informatics studies often present challenges due to sparse observations from each subject, limiting the application of contemporary deep learning for prediction. This issue is particularly relevant in predicting birthweight, a crucial factor in identifying conditions such as macrosomia and large-for-gestational age (LGA). Previous approaches have relied on empirical formulas for estimated fetal weights (EFWs) from ultrasound measurements and mixed-effects models for interim predictions.
The proposed novel supervised longitudinal learning procedure features a three-step approach. First, EFWs are generated using empirical formulas from ultrasound measurements. Second, nonlinear mixed-effects models are applied to create augmented sequences of EFWs, spanning daily gestational timepoints. This augmentation transforms sparse longitudinal data into a dense parallel sequence suitable for training recurrent neural networks (RNNs). A tailored RNN architecture is then devised to incorporate the augmented sequential EFWs along with non-sequential maternal characteristics.
The RNNs are trained on augmented data to predict birthweights, which are further classified for macrosomia and LGA. Application of this supervised longitudinal learning procedure to the Successive Small-for-Gestational-Age Births study yields improved performance in classification metrics. Specifically, sensitivity, area under the receiver operation characteristic curve, and Youden's Index demonstrate enhanced results, indicating the effectiveness of the proposed approach in overcoming sparsity challenges in longitudinal health informatics data.
The integration of mixed-effects models for temporal data augmentation and RNNs on augmented sequences shows effective in accurately predicting birthweights, particularly in the context of identifying excessive fetal growth conditions.
健康信息学研究中的纵向数据通常由于每个受试者的观察值稀疏而带来挑战,限制了当代深度学习在预测方面的应用。这在预测出生体重方面尤其重要,因为出生体重是识别巨大儿和大于胎龄儿(LGA)等情况的关键因素。以前的方法依赖于超声测量的胎儿估计体重(EFW)的经验公式和混合效应模型进行中期预测。
提出的新型监督纵向学习过程具有三步方法。首先,使用超声测量的经验公式生成 EFW。其次,应用非线性混合效应模型创建 EFW 的扩充序列,涵盖每日妊娠时间点。这种扩充将稀疏的纵向数据转换为适合训练递归神经网络(RNN)的密集并行序列。然后设计了一种量身定制的 RNN 架构,将扩充的序列 EFW 与非序列母体特征结合起来。
RNN 基于扩充数据进行训练以预测出生体重,进一步对巨大儿和 LGA 进行分类。将这种监督纵向学习过程应用于 Successive Small-for-Gestational-Age Births 研究,在分类度量方面的性能得到了提高。具体来说,敏感性、接收者操作特征曲线下的面积和 Youden 指数都显示出了更好的结果,表明了该方法在克服纵向健康信息学数据稀疏性挑战方面的有效性。
将混合效应模型用于时间数据扩充以及对扩充序列的 RNN 的集成,在准确预测出生体重方面表现出了有效性,特别是在识别胎儿过度生长情况方面。