Division of Reproductive Endocrinology and Infertility, Department of Obstetrics and Gynecology, Massachusetts General Hospital, Harvard Medical School, Boston, Massachusetts; Department of Medicine, Harvard Medical School, Boston, Massachusetts.
Division of Engineering in Medicine, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts.
Fertil Steril. 2020 Apr;113(4):781-787.e1. doi: 10.1016/j.fertnstert.2019.12.004.
To evaluate the consistency and objectivity of deep neural networks in embryo scoring and making disposition decisions for biopsy and cryopreservation in comparison to grading by highly trained embryologists.
Prospective double-blind study using retrospective data.
U.S.-based large academic fertility center.
Not applicable.
INTERVENTION(S): Embryo images (748 recorded at 70 hours postinsemination [hpi]) and 742 at 113 hpi) were used to evaluate embryologists and neural networks in embryo grading. The performance of 10 embryologists and a neural network were also evaluated in disposition decision making using 56 embryos.
Coefficients of variation (%CV) and measures of consistencies were compared.
Embryologists exhibited a high degree of variability (%CV averages: 82.84% for 70 hpi and 44.98% for 113 hpi) in grading embryo. When selecting blastocysts for biopsy or cryopreservation, embryologists had an average consistency of 52.14% and 57.68%, respectively. The neural network outperformed the embryologists in selecting blastocysts for biopsy and cryopreservation with a consistency of 83.92%. Cronbach's α analysis revealed an α coefficient of 0.60 for the embryologists and 1.00 for the network.
The results of our study show a high degree of interembryologist and intraembryologist variability in scoring embryos, likely due to the subjective nature of traditional morphology grading. This may ultimately lead to less precise disposition decisions and discarding of viable embryos. The application of a deep neural network, as shown in our study, can introduce improved reliability and high consistency during the process of embryo selection and disposition, potentially improving outcomes in an embryology laboratory.
与经验丰富的胚胎学家分级相比,评估深度神经网络在胚胎评分和活检及冷冻保存处置决策中的一致性和客观性。
使用回顾性数据的前瞻性双盲研究。
美国大型学术生育中心。
不适用。
使用胚胎图像(748 个在授精后 70 小时[hpi]记录,742 个在 113 hpi 记录)评估胚胎学家和神经网络在胚胎分级中的表现。还使用 56 个胚胎评估了 10 名胚胎学家和神经网络在处置决策中的表现。
比较变异系数(%CV)和一致性度量。
胚胎学家在胚胎分级中表现出高度的可变性(%CV 平均值:70 hpi 时为 82.84%,113 hpi 时为 44.98%)。在选择用于活检或冷冻保存的囊胚时,胚胎学家的一致性平均分别为 52.14%和 57.68%。神经网络在选择用于活检和冷冻保存的囊胚方面优于胚胎学家,一致性分别为 83.92%。Cronbach's α 分析显示胚胎学家的α系数为 0.60,网络的α系数为 1.00。
我们的研究结果表明,胚胎评分中胚胎学家之间和胚胎学家内部的可变性很大,这可能是由于传统形态学分级的主观性。这最终可能导致处置决策不精确,并丢弃有活力的胚胎。如我们的研究所示,深度神经网络的应用可以在胚胎选择和处置过程中引入更高的可靠性和高度一致性,从而有可能改善胚胎学实验室的结果。