Division of Reproductive Medicine, Department of Obstetrics, Gynecology and Reproductive Biology, Brigham and Women's Hospital, Harvard Medical School, 75 Francis Street, Boston, MA 02115, USA.
Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA.
Hum Reprod. 2017 Aug 1;32(8):1604-1611. doi: 10.1093/humrep/dex229.
How does automated time-lapse annotation (Eeva™) compare to manual annotation of the same video images performed by embryologists certified in measuring durations of the 2-cell (P2; time to the 3-cell minus time to the 2-cell, or t3-t2) and 3-cell (P3; time to 4-cell minus time to the 3-cell, or t4-t3) stages?
Manual annotation was superior to the automated annotation provided by Eeva™ version 2.2, because manual annotation assigned a rating to a higher proportion of embryos and yielded a greater sensitivity for blastocyst prediction than automated annotation.
While use of the Eeva™ test has been shown to improve an embryologist's ability to predict blastocyst formation compared to Day 3 morphology alone, the accuracy of the automated image analysis employed by the Eeva™ system has never been compared to manual annotation of the same time-lapse markers by a trained embryologist.
STUDY DESIGN, SIZE, DURATION: We conducted a prospective cohort study of embryos (n = 1477) cultured in the Eeva™ system (n = 8 microscopes) at our institution from August 2014 to February 2016.
PARTICIPANTS/MATERIALS, SETTING, METHODS: Embryos were assigned a blastocyst prediction rating of High (H), Medium (M), Low (L), or Not Rated (NR) by Eeva™ version 2.2 according to P2 and P3. An embryologist from a team of 10, then manually annotated each embryo and if the automated and manual ratings differed, a second embryologist independently annotated the embryo. If both embryologists disagreed with the automated Eeva™ rating, then the rating was classified as discordant. If the second embryologist agreed with the automated Eeva™ score, the rating was not considered discordant. Spearman's correlation (ρ), weighted kappa statistics and the intra-class correlation (ICC) coefficients with 95% confidence intervals (CI) between Eeva™ and manual annotation were calculated, as were the proportions of discordant embryos, and the sensitivity, specificity, positive predictive value (PPV) and NPV of each method for blastocyst prediction.
The distribution of H, M and L ratings differed by annotation method (P < 0.0001). The correlation between Eeva™ and manual annotation was higher for P2 (ρ = 0.75; ICC = 0.82; 95% CI 0.82-0.83) than for P3 (ρ = 0.39; ICC = 0.20; 95% CI 0.16-0.26). Eeva™ was more likely than an embryologist to rate an embryo as NR (11.1% vs. 3.0%, P < 0.0001). Discordance occurred in 30.0% (443/1477) of all embryos and was not associated with factors such as Day 3 cell number, fragmentation, symmetry or presence of abnormal cleavage. Rather, discordance was associated with direct cleavage (P2 ≤ 5 h) and short P3 (≤0.25 h), and also factors intrinsic to the Eeva™ system, such as the automated rating (proportion of discordant embryos by rating: H: 9.3%; M: 18.1%; L: 41.3%; NR: 31.4%; P < 0.0001), microwell location (peripheral: 31.2%; central: 23.8%; P = 0.02) and Eeva™ microscope (n = 8; range 22.9-42.6%; P < 0.0001). Manual annotation upgraded 82.6% of all discordant embryos from a lower to a higher rating, and improved the sensitivity for predicting blastocyst formation.
LIMITATIONS, REASONS FOR CAUTION: One team of embryologists performed the manual annotations; however, the study staff was trained and certified by the company sponsor. Only two time-lapse markers were evaluated, so the results are not generalizable to other parameters; likewise, the results are not generalizable to future versions of Eeva™ or other automated image analysis systems.
Based on the proportion of discordance and the improved performance of manual annotation, clinics using the Eeva™ system should consider manual annotation of P2 and P3 to confirm the automated ratings generated by Eeva™.
STUDY FUNDING/COMPETING INTEREST(S): These data were acquired in a study funded by Progyny, Inc. There are no competing interests.
N/A.
自动化时间 lapse 注释(Eeva™)与经过测量 2 细胞(P2;从 2 细胞到 3 细胞的时间减去从 2 细胞到 3 细胞的时间,或 t3-t2)和 3 细胞(P3;从 4 细胞到 3 细胞的时间减去从 3 细胞到 4 细胞的时间,或 t4-t3)阶段的胚胎学家认证的手动注释对同一视频图像的评估相比如何?
手动注释优于 Eeva™ 版本 2.2 提供的自动化注释,因为手动注释为更高比例的胚胎分配了评分,并比自动化注释具有更高的胚胎预测囊胚形成的敏感性。
虽然使用 Eeva™ 测试已被证明可以提高胚胎学家仅根据第 3 天形态预测囊胚形成的能力,但 Eeva™ 系统中使用的自动图像分析的准确性从未与经过训练的胚胎学家对同一时间 lapse 标记的手动注释进行比较。
研究设计、大小和持续时间:我们对 2014 年 8 月至 2016 年 2 月在我们机构使用 Eeva™ 系统(n = 8 台显微镜)培养的胚胎(n = 1477)进行了前瞻性队列研究。
参与者/材料、设置、方法:根据 P2 和 P3,Eeva™ 版本 2.2 将胚胎分配为高(H)、中(M)、低(L)或未评级(NR)的囊胚预测评分。然后,来自 10 名胚胎学家团队中的一名胚胎学家手动注释每个胚胎,如果自动和手动评分不同,则第二名胚胎学家独立注释胚胎。如果两名胚胎学家都不同意自动化 Eeva™ 评分,则评分被归类为不一致。如果第二名胚胎学家同意自动化 Eeva™ 评分,则该评分不被视为不一致。计算了 Eeva™ 和手动注释之间的 Spearman 相关系数(ρ)、加权 Kappa 统计量和 95%置信区间(CI)的组内相关系数(ICC),以及不一致胚胎的比例,以及每种方法对囊胚预测的灵敏度、特异性、阳性预测值(PPV)和阴性预测值(NPV)。
H、M 和 L 评分的分布因注释方法而异(P < 0.0001)。与 P3 相比,Eeva™ 和手动注释之间的相关性更高(ρ = 0.75;ICC = 0.82;95%CI 0.82-0.83)(ρ = 0.39;ICC = 0.20;95%CI 0.16-0.26)。Eeva™ 比胚胎学家更有可能将胚胎评为 NR(11.1%对 3.0%,P < 0.0001)。在所有胚胎中,有 30.0%(443/1477)出现不一致,且不一致与第 3 天细胞数、碎片化、对称性或异常分裂无关。相反,不一致与直接分裂(P2 ≤ 5 h)和短 P3(≤0.25 h)以及 Eeva™ 系统固有的因素有关,如自动评分(不一致胚胎的比例按评分:H:9.3%;M:18.1%;L:41.3%;NR:31.4%;P < 0.0001)、微井位置(周边:31.2%;中央:23.8%;P = 0.02)和 Eeva™ 显微镜(n = 8;范围 22.9-42.6%;P < 0.0001)。手动注释将所有不一致胚胎中 82.6%的胚胎从较低评分升级为较高评分,并提高了预测囊胚形成的敏感性。
局限性、谨慎的原因:一组胚胎学家进行了手动注释;然而,研究人员经过公司赞助商的培训和认证。仅评估了两个时间 lapse 标记,因此结果不能推广到其他参数;同样,结果不能推广到未来版本的 Eeva™ 或其他自动图像分析系统。
基于不一致的比例和手动注释的改进性能,使用 Eeva™ 系统的诊所应考虑手动注释 P2 和 P3 以确认 Eeva™ 生成的自动评分。
研究资助/利益冲突:这些数据是在由 Progyny,Inc. 资助的研究中获得的。没有利益冲突。
无。