Department of Obstetrics and Gynecology, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei, China.
Front Endocrinol (Lausanne). 2023 Mar 15;14:1087429. doi: 10.3389/fendo.2023.1087429. eCollection 2023.
Early detection of ovarian aging is of huge importance, although no ideal marker or acknowledged evaluation system exists. The purpose of this study was to develop a better prediction model to assess and quantify ovarian reserve using machine learning methods.
This is a multicenter, nationwide population-based study including a total of 1,020 healthy women. For these healthy women, their ovarian reserve was quantified in the form of ovarian age, which was assumed equal to their chronological age, and least absolute shrinkage and selection operator (LASSO) regression was used to select features to construct models. Seven machine learning methods, namely artificial neural network (ANN), support vector machine (SVM), generalized linear model (GLM), K-nearest neighbors regression (KNN), gradient boosting decision tree (GBDT), extreme gradient boosting (XGBoost), and light gradient boosting machine (LightGBM) were applied to construct prediction models separately. Pearson's correlation coefficient (PCC), mean absolute error (MAE), and mean squared error (MSE) were used to compare the efficiency and stability of these models.
Anti-Müllerian hormone (AMH) and antral follicle count (AFC) were detected to have the highest absolute PCC values of 0.45 and 0.43 with age and held similar age distribution curves. The LightGBM model was thought to be the most suitable model for ovarian age after ranking analysis, combining PCC, MAE, and MSE values. The LightGBM model obtained PCC values of 0.82, 0.56, and 0.70 for the training set, the test set, and the entire dataset, respectively. The LightGBM method still held the lowest MAE and cross-validated MSE values. Further, in two different age groups (20-35 and >35 years), the LightGBM model also obtained the lowest MAE value of 2.88 for women between the ages of 20 and 35 years and the second lowest MAE value of 5.12 for women over the age of 35 years.
Machine learning methods combining multi-features were reliable in assessing and quantifying ovarian reserve, and the LightGBM method turned out to be the approach with the best result, especially in the child-bearing age group of 20 to 35 years.
尽管目前尚无理想的标志物或公认的评估系统,但早期发现卵巢衰老具有重要意义。本研究旨在利用机器学习方法开发更好的预测模型来评估和量化卵巢储备。
这是一项多中心、全国性的基于人群的研究,共纳入 1020 名健康女性。对于这些健康女性,其卵巢储备以卵巢年龄的形式进行量化,假设卵巢年龄与实际年龄相同,然后使用最小绝对收缩和选择算子(LASSO)回归选择特征来构建模型。分别使用 7 种机器学习方法,即人工神经网络(ANN)、支持向量机(SVM)、广义线性模型(GLM)、K 最近邻回归(KNN)、梯度提升决策树(GBDT)、极端梯度提升(XGBoost)和轻梯度提升机(LightGBM)分别构建预测模型。采用 Pearson 相关系数(PCC)、平均绝对误差(MAE)和均方误差(MSE)比较这些模型的效率和稳定性。
抗苗勒管激素(AMH)和窦卵泡计数(AFC)与年龄的相关性最高,绝对 PCC 值分别为 0.45 和 0.43,且年龄分布曲线相似。经过排序分析,LightGBM 模型被认为是最适合卵巢年龄的模型,结合了 PCC、MAE 和 MSE 值。LightGBM 模型在训练集、测试集和整个数据集上的 PCC 值分别为 0.82、0.56 和 0.70。LightGBM 方法仍具有最低的 MAE 和交叉验证 MSE 值。此外,在两个不同的年龄组(20-35 岁和>35 岁)中,LightGBM 模型在 20-35 岁女性中获得的 MAE 值最低为 2.88,在>35 岁女性中获得的 MAE 值次低为 5.12。
结合多特征的机器学习方法可用于可靠地评估和量化卵巢储备,LightGBM 方法的效果最佳,特别是在 20-35 岁的生育年龄组。