一种混合人工智能模型利用多中心临床数据,改善跨时间 lapse 系统的胎儿心率妊娠预测。
A hybrid artificial intelligence model leverages multi-centric clinical data to improve fetal heart rate pregnancy prediction across time-lapse systems.
机构信息
ImVitro, Paris, France.
INOVIE Fertilité, Institut de Fertilité La Croix Du Sud, Toulouse, France.
出版信息
Hum Reprod. 2023 Apr 3;38(4):596-608. doi: 10.1093/humrep/dead023.
STUDY QUESTION
Can artificial intelligence (AI) algorithms developed to assist embryologists in evaluating embryo morphokinetics be enriched with multi-centric clinical data to better predict clinical pregnancy outcome?
SUMMARY ANSWER
Training algorithms on multi-centric clinical data significantly increased AUC compared to algorithms that only analyzed the time-lapse system (TLS) videos.
WHAT IS KNOWN ALREADY
Several AI-based algorithms have been developed to predict pregnancy, most of them based only on analysis of the time-lapse recording of embryo development. It remains unclear, however, whether considering numerous clinical features can improve the predictive performances of time-lapse based embryo evaluation.
STUDY DESIGN, SIZE, DURATION: A dataset of 9986 embryos (95.60% known clinical pregnancy outcome, 32.47% frozen transfers) from 5226 patients from 14 European fertility centers (in two countries) recorded with three different TLS was used to train and validate the algorithms. A total of 31 clinical factors were collected. A separate test set (447 videos) was used to compare performances between embryologists and the algorithm.
PARTICIPANTS/MATERIALS, SETTING, METHODS: Clinical pregnancy (defined as a pregnancy leading to a fetal heartbeat) outcome was first predicted using a 3D convolutional neural network that analyzed videos of the embryonic development up to 2 or 3 days of development (33% of the database) or up to 5 or 6 days of development (67% of the database). The output video score was then fed as input alongside clinical features to a gradient boosting algorithm that generated a second score corresponding to the hybrid model. AUC was computed across 7-fold of the validation dataset for both models. These predictions were compared to those of 13 senior embryologists made on the test dataset.
MAIN RESULTS AND THE ROLE OF CHANCE
The average AUC of the hybrid model across all 7-fold was significantly higher than that of the video model (0.727 versus 0.684, respectively, P = 0.015; Wilcoxon test). A SHapley Additive exPlanations (SHAP) analysis of the hybrid model showed that the six first most important features to predict pregnancy were morphokinetics of the embryo (video score), oocyte age, total gonadotrophin dose intake, number of embryos generated, number of oocytes retrieved, and endometrium thickness. The hybrid model was shown to be superior to embryologists with respect to different metrics, including the balanced accuracy (P ≤ 0.003; Wilcoxon test). The likelihood of pregnancy was linearly linked to the hybrid score, with increasing odds ratio (maximum P-value = 0.001), demonstrating the ranking capacity of the model. Training individual hybrid models did not improve predictive performance. A clinic hold-out experiment was conducted and resulted in AUCs ranging between 0.63 and 0.73. Performance of the hybrid model did not vary between TLS or between subgroups of embryos transferred at different days of embryonic development. The hybrid model did fare better for patients older than 35 years (P < 0.001; Mann-Whitney test), and for fresh transfers (P < 0.001; Mann-Whitney test).
LIMITATIONS, REASONS FOR CAUTION: Participant centers were located in two countries, thus limiting the generalization of our conclusion to wider subpopulations of patients. Not all clinical features were available for all embryos, thus limiting the performances of the hybrid model in some instances.
WIDER IMPLICATIONS OF THE FINDINGS
Our study suggests that considering clinical data improves pregnancy predictive performances and that there is no need to retrain algorithms at the clinic level unless they follow strikingly different practices. This study characterizes a versatile AI algorithm with similar performance on different time-lapse microscopes and on embryos transferred at different development stages. It can also help with patients of different ages and protocols used but with varying performances, presumably because the task of predicting fetal heartbeat becomes more or less hard depending on the clinical context. This AI model can be made widely available and can help embryologists in a wide range of clinical scenarios to standardize their practices.
STUDY FUNDING/COMPETING INTEREST(S): Funding for the study was provided by ImVitro with grant funding received in part from BPIFrance (Bourse French Tech Emergence (DOS0106572/00), Paris Innovation Amorçage (DOS0132841/00), and Aide au Développement DeepTech (DOS0152872/00)). A.B.-C. is a co-owner of, and holds stocks in, ImVitro SAS. A.B.-C. and F.D.M. hold a patent for 'Devices and processes for machine learning prediction of in vitro fertilization' (EP20305914.2). A.D., N.D., M.M.F., and F.D.M. are or have been employees of ImVitro and have been granted stock options. X.P.-V. has been paid as a consultant to ImVitro and has been granted stocks options of ImVitro. L.C.-D. and C.G.-S. have undertaken paid consultancy for ImVitro SAS. The remaining authors have no conflicts to declare.
TRIAL REGISTRATION NUMBER
N/A.
研究问题
人工智能 (AI) 算法能否通过分析胚胎形态动力学来协助胚胎学家评估胚胎形态学,从而改善对临床妊娠结局的预测?
总结答案
与仅分析时间-lapse 系统 (TLS) 视频的算法相比,在多中心临床数据上训练算法可显著提高 AUC。
已知情况
已经开发了几种基于人工智能的算法来预测妊娠,其中大多数仅基于胚胎发育的时间-lapse 记录分析。然而,尚不清楚考虑众多临床特征是否可以提高基于时间-lapse 的胚胎评估的预测性能。
研究设计、规模、持续时间:使用来自两个国家 14 个欧洲生育中心的 5226 名患者的 9986 个胚胎(95.60%已知临床妊娠结局,32.47%冷冻移植)的数据集来训练和验证算法。共收集了 31 个临床因素。使用单独的测试集(447 个视频)来比较胚胎学家和算法之间的性能。
参与者/材料、设置、方法:使用 3D 卷积神经网络首先预测临床妊娠(定义为导致胎心的妊娠)结局,该神经网络分析胚胎发育的视频,时间长达 2 或 3 天(数据库的 33%)或长达 5 或 6 天(数据库的 67%)。然后,将输出视频得分作为输入与临床特征一起输入梯度提升算法,生成与混合模型相对应的第二个得分。在验证数据集的 7 个折叠中计算这两个模型的 AUC。将这些预测与在测试数据集上进行的 13 位高级胚胎学家的预测进行比较。
主要结果和机会的作用
混合模型在所有 7 个折叠中的平均 AUC 显著高于视频模型(分别为 0.727 和 0.684,P=0.015;Wilcoxon 检验)。混合模型的 SHapley Additive exPlanations (SHAP) 分析表明,预测妊娠的前六个最重要特征是胚胎形态动力学(视频得分)、卵母细胞年龄、总促性腺激素剂量摄入、产生的胚胎数量、取回的卵母细胞数量和子宫内膜厚度。与胚胎学家相比,混合模型在不同指标上表现更优,包括平衡准确性(P≤0.003;Wilcoxon 检验)。妊娠的可能性与混合得分呈线性关联,优势比增加(最大 P 值=0.001),证明了模型的排序能力。单独训练混合模型并不能提高预测性能。进行了诊所保留实验,得出的 AUC 范围在 0.63 到 0.73 之间。混合模型在不同的 TLS 或在不同发育天数转移的胚胎亚组之间的性能没有差异。混合模型对年龄大于 35 岁的患者(P<0.001;Mann-Whitney 检验)和新鲜移植患者(P<0.001;Mann-Whitney 检验)的效果更好。
局限性、谨慎的原因:参与者中心位于两个国家,因此限制了我们的结论在更广泛的患者亚群中的推广。并非所有临床特征都可用于所有胚胎,因此在某些情况下限制了混合模型的性能。
研究结果的更广泛意义
我们的研究表明,考虑临床数据可以提高妊娠预测性能,而且除非诊所遵循明显不同的实践,否则不需要在诊所层面重新训练算法。本研究描述了一种多功能人工智能算法,在不同的时间-lapse 显微镜和不同发育阶段转移的胚胎上具有相似的性能。它还可以帮助不同年龄和使用协议的患者,但性能有所不同,这可能是因为根据临床背景,预测胎心的任务变得越来越难。该人工智能模型可以广泛提供,并可以帮助在广泛的临床情况下的胚胎学家,以标准化他们的实践。
研究资金/竞争利益:该研究的资金由 ImVitro 提供,部分由法国 BPIFrance 资助(法国科技创业赠款(DOS0106572/00),巴黎创新启动赠款(DOS0132841/00)和深度技术发展援助赠款(DOS0152872/00))。A.B.-C. 是 ImVitro SAS 的共同所有者和股东。A.B.-C. 和 F.D.M. 拥有一项用于机器学习预测体外受精的设备和过程的专利(EP20305914.2)。A.D.、N.D.、M.M.F. 和 F.D.M. 是或曾是 ImVitro 的员工,并获得了股票期权。X.P.-V. 曾担任 ImVitro 的顾问,并获得了股票期权。L.C.-D. 和 C.G.-S. 曾为 ImVitro SAS 提供有偿咨询服务。其他作者没有利益冲突。
试验注册号码
无。