Cao Ping, Acharya Ganesh, Salumets Andres, Zamani Esteki Masoud
Department of Clinical Genetics, Maastricht University Medical Center+ (MUMC+), Maastricht, The Netherlands.
Department of Genetics and Cell Biology, GROW Research Institute for Oncology and Reproduction, Faculty of Health, Medicine and Life Sciences (FHML), Maastricht University, Maastricht, The Netherlands.
Acta Obstet Gynecol Scand. 2025 Jan;104(1):6-12. doi: 10.1111/aogs.14989. Epub 2024 Oct 28.
We evaluated the efficacy of large language models (LLMs), specifically, generative pre-trained transformer-4 (GPT-4), in predicting pregnancy following in vitro fertilization (IVF) treatment and compared its accuracy with results from an original published study. Our findings revealed that GPT-4 can autonomously develop and refine advanced machine learning models for pregnancy prediction with minimal human intervention. The prediction accuracy was 0.79, and the area under the receiver operating characteristic curve (AUROC) was 0.89, exceeding or being at least equivalent to the metrics reported in the original study, that is, 0.78 for accuracy and 0.87 for AUROC. The results suggest that LLMs can facilitate data processing, optimize machine learning models in predicting IVF success rates, and provide data interpretation methods. This capacity can help bridge the knowledge gap between data scientists and medical personnel to solve the most pressing clinical challenges. However, more experiments on diverse and larger datasets are needed to validate and promote broader applications of LLMs in assisted reproduction.
我们评估了大语言模型(LLMs),特别是生成式预训练变换器-4(GPT-4)在预测体外受精(IVF)治疗后妊娠情况方面的功效,并将其准确性与一项已发表的原始研究结果进行了比较。我们的研究结果显示,GPT-4能够在极少人工干预的情况下自主开发并完善用于妊娠预测的先进机器学习模型。预测准确率为0.79,受试者工作特征曲线下面积(AUROC)为0.89,超过或至少等同于原始研究报告的指标,即准确率为0.78,AUROC为0.87。结果表明,大语言模型能够促进数据处理,优化预测IVF成功率的机器学习模型,并提供数据解读方法。这种能力有助于弥合数据科学家和医务人员之间的知识差距,以解决最紧迫的临床挑战。然而,需要在更多样化且更大的数据集上进行更多实验,以验证并推广LLMs在辅助生殖中的更广泛应用。