Mendis Lochana, Karmakar Debjyoti, Palaniswami Marimuthu, Brownfoot Fiona, Keenan Emerson
Department of Electrical and Electronic EngineeringThe University of Melbourne Parkville VIC 3010 Australia.
Obstetric Diagnostics and Therapeutics GroupDepartment of Obstetrics and GynaecologyThe University of Melbourne Heidelberg VIC 3084 Australia.
IEEE J Transl Eng Health Med. 2025 Mar 5;13:123-135. doi: 10.1109/JTEHM.2025.3548401. eCollection 2025.
Continuous monitoring of fetal heart rate (FHR) and uterine contractions (UC), otherwise known as cardiotocography (CTG), is often used to assess the risk of fetal compromise during labor. However, interpreting CTG recordings visually is challenging for clinicians, given the complexity of CTG patterns, leading to poor sensitivity. Efforts to address this issue have focused on data-driven deep-learning methods to detect fetal compromise automatically. However, their progress is impeded by limited CTG training datasets and the absence of a standardized evaluation workflow, hindering algorithm comparisons. In this study, we use a private CTG dataset of 9,887 CTG recordings with pH measurements and 552 CTG recordings from the open-access CTU-UHB dataset to conduct a cross-database evaluation of six deep-learning models for fetal compromise detection. We explore the impact of input selection of FHR and UC signals, signal pre-processing, downsampling frequency, and the influence of removing intermediate pH samples from the training dataset. Our findings reveal that using only FHR and pre-processing FHR with artefact removal and interpolation provides a significant improvement to classification performance for some model architectures while excluding intermediate pH samples did not significantly improve performance for any model. From our comparison of the six models, ResNet exhibited the strongest fetal compromise classification performance across both databases at a downsampling rate of 1Hz. Finally, class activation maps from highly contributing signal regions in the ResNet model aligned with clinical knowledge of compromised FHR patterns, highlighting the model's interpretability. These insights may serve as a standardized reference for developing and comparing future works in this domain. Clinical and Translational Impact: This study provides a standardized workflow for comparing deep-learning methods for CTG classification. Ensuring new methods show generalizability and interpretability will improve their robustness and applicability in clinical settings.
连续监测胎儿心率(FHR)和子宫收缩(UC),即通常所说的胎心监护(CTG),常用于评估分娩期间胎儿窘迫的风险。然而,鉴于CTG模式的复杂性,临床医生通过视觉解读CTG记录具有挑战性,导致敏感性较差。解决这一问题的努力主要集中在数据驱动的深度学习方法上,以自动检测胎儿窘迫。然而,有限的CTG训练数据集以及缺乏标准化的评估工作流程阻碍了它们的进展,妨碍了算法比较。在本研究中,我们使用了一个包含9887份带有pH值测量的CTG记录的私有CTG数据集以及来自开放获取的CTU-UHB数据集的552份CTG记录,对六种用于检测胎儿窘迫的深度学习模型进行跨数据库评估。我们探讨了FHR和UC信号的输入选择、信号预处理、下采样频率以及从训练数据集中去除中间pH样本的影响。我们的研究结果表明,仅使用FHR并对FHR进行去除伪迹和插值的预处理,对于某些模型架构而言,可显著提高分类性能,而排除中间pH样本对任何模型的性能均未产生显著改善。通过对六种模型的比较,在1Hz的下采样率下,ResNet在两个数据库中均表现出最强的胎儿窘迫分类性能。最后,ResNet模型中高贡献信号区域的类激活映射与受损FHR模式的临床知识相符,突出了该模型的可解释性。这些见解可为该领域未来工作的开发和比较提供标准化参考。临床和转化影响:本研究为比较用于CTG分类的深度学习方法提供了标准化工作流程。确保新方法具有通用性和可解释性将提高其在临床环境中的稳健性和适用性。