Tian Lu, Zeng Yan, Zheng Helin, Cai Jinhua
Department of Radiology, National Clinical Research Center for Child Health and Disorders, Ministry of Education Key Laboratory of Child Development and Disorders, Chongqing Key Laboratory of Pediatric Metabolism and Inflammatory Diseases, Children's Hospital of Chongqing Medical University, Chongqing, 400014, China.
Department of Endocrinology, National Clinical Research Center for Child Health and Disorders, Ministry of Education Key Laboratory of Child Development and Disorders, Chongqing Key Laboratory of Pediatric Metabolism and Inflammatory Diseases, Children's Hospital of Chongqing Medical University, Chongqing, 400014, China.
BMC Endocr Disord. 2025 Jul 1;25(1):159. doi: 10.1186/s12902-025-01983-4.
The study aimed to develop interpretable machine learning models for the identification of idiopathic central precocious puberty (ICPP) in girls, without the need for the expensive and time-consuming gonadotropin-releasing hormone (GnRH) stimulation test, which is currently the gold standard for diagnosing ICPP.
A total of 246 female paediatric patients who had secondary sexual characteristics before 8 years old and had taken a GnRH stimulation test were randomly divided into a training set (172 patients, 70%) and a validation set (74 patients, 30%). Characteristic parameters were extracted from easily available clinical data and were statistically analysed. The least absolute shrinkage and selection operator (LASSO) method was used to select essential characteristic parameters associated with ICPP and were used to construct logistic regression (LR) and five machine learning (ML) models, including support vector machine (SVM), Gaussian naive bayes (GaussianNB), extreme gradient boosting (XGBoost), random forest (RF), and k- nearest neighbor algorithm (kNN). Then, the area under the receiver operating characteristic curve (AUROC), sensitivity, specificity, false positive and negative values, Youden's index, accuracy, positive and negative likelihood ratios, calibration plots, and decision curve analysis (DCA) were used to evaluate the models' effectiveness. Finally, the shapley additive explanations (SHAP) package was used to interpret the best-performing model.
Four essential characteristic parameters, namely uterine volume, bone age/chronological age (BA/CA), basal follicle-stimulating hormone (FSH), and basal luteinizing hormone (LH), were selected using the LASSO method. Based on these characteristic parameters, the LR and five machine learning models achieved AUC values ranging from 0.72 to 0.96 in the training set and AUC values ranging from 0.65 to 0.90 in the validation set for diagnosing ICPP. Among the LR and five machine learning models, the XGBoost model demonstrated superior performance, achieving the highest AUC values, accuracy, specificity, and sensitivity in both the training and validation sets. Moreover, calibration plots and DCA confirmed that this model exhibited the best calibration and clinical utility.
An accurate and interpretable ML-based model has been developed to aid clinicians in the diagnosis of ICPP, assisting in clinical decision-making.
本研究旨在开发可解释的机器学习模型,用于识别女童特发性中枢性性早熟(ICPP),而无需进行目前诊断ICPP的金标准——昂贵且耗时的促性腺激素释放激素(GnRH)刺激试验。
将246例8岁前出现第二性征且接受过GnRH刺激试验的儿科女性患者随机分为训练集(172例,70%)和验证集(74例,30%)。从易于获取的临床数据中提取特征参数并进行统计分析。采用最小绝对收缩和选择算子(LASSO)方法选择与ICPP相关的重要特征参数,并用于构建逻辑回归(LR)模型和五种机器学习(ML)模型,包括支持向量机(SVM)、高斯朴素贝叶斯(GaussianNB)、极端梯度提升(XGBoost)、随机森林(RF)和k近邻算法(kNN)。然后,使用受试者操作特征曲线下面积(AUROC)、敏感性、特异性、假阳性和假阴性值、约登指数、准确性、阳性和阴性似然比、校准图以及决策曲线分析(DCA)来评估模型的有效性。最后,使用Shapley加性解释(SHAP)软件包来解释表现最佳的模型。
使用LASSO方法选择了四个重要特征参数,即子宫体积、骨龄/实际年龄(BA/CA)、基础促卵泡生成素(FSH)和基础促黄体生成素(LH)。基于这些特征参数,LR模型和五种机器学习模型在训练集中诊断ICPP的AUC值范围为0.72至0.96,在验证集中的AUC值范围为0.65至0.90。在LR模型和五种机器学习模型中,XGBoost模型表现出卓越的性能,在训练集和验证集中均实现了最高的AUC值、准确性、特异性和敏感性。此外,校准图和DCA证实该模型具有最佳的校准和临床实用性。
已开发出一种准确且可解释的基于机器学习的模型,以帮助临床医生诊断ICPP,辅助临床决策。