Li S J, Wang Y, Huang R, Yang L M, Lyu X D, Huang X W, Peng X B, Song D M, Ma N, Xiao Y, Zhou Q Y, Guo Y, Liang N, Liu S, Gao K, Yan Y N, Xia E L
Hysteroscopy Center, Fuxing Hospital, Capital Medical University, Beijing 100038, China.
Center for Applied Statistics, Renmin University of China, School of Statistics, Renmin University of China, Beijing 100086, China.
Zhonghua Yi Xue Za Zhi. 2025 Aug 12;105(30):2551-2557. doi: 10.3760/cma.j.cn112137-20250302-00493.
To develop a machine learning diagnostic model for T-shaped uterus based on quantitative parameters from 3D transvaginal ultrasound. A retrospective cross-sectional study was conducted, recruiting 304 patients who visited the hysteroscopy centre of Fuxing Hospital, Beijing, China, between July 2021 and June 2024 for reasons such as "infertility or recurrent pregnancy loss" and other adverse obstetric histories. Twelve experts, including seven clinicians and five sonographers, from Fuxing Hospital and Beijing Obstetrics and Gynecology Hospital of Capital Medical University, Peking University People's Hospital, and Beijing Hospital, independently and anonymously assessed the diagnosis of T-shaped uterus using a modified Delphi method. Based on the consensus results, 56 cases were classified into the T-shaped uterus group and 248 cases into the non-T-shaped uterus group. A total of 7 clinical features and 14 sonographic features were initially included. Features demonstrating significant diagnostic impact were selected using 10-fold cross-validated LASSO (Least Absolute Shrinkage and Selection Operator) regression. Four machine learning algorithms [logistic regression (LR), decision tree (DT), random forest (RF), and support vector machine (SVM)] were subsequently implemented to develop T-shaped uterus diagnostic models. Using the Python random module, the patient dataset was randomly divided into five subsets, each maintaining the original class distribution (T-shaped uterus: non-T-shaped uterus ≈ 1∶4) and a balanced number of samples between the two categories. Five-fold cross-validation was performed, with four subsets used for training and one for validation in each round, to enhance the reliability of model evaluation. Model performance was rigorously assessed using established metrics: area under the curve (AUC) of receiver operator characteristic (ROC) curve, sensitivity, specificity, precision, and F1-score. In the RF model, feature importance was assessed by the mean decrease in Gini impurity attributed to each variable. A total of 304 patients had a mean age of (35±4) years, and the age of the T-shaped uterus group was (35±5) years; the age of the non-T-shaped uterus group was (34±4) years.. Eight features with non-zero coefficients were selected by LASSO regression, including average lateral wall indentation width, average lateral wall indentation angle, upper cavity depth, endometrial thickness, uterine cavity area, cavity width at level of lateral wall indentation, angle formed by the bilateral lateral walls, and average cornual angle (coefficient: 0.125, -0.064,-0.037,-0.030,-0.026,-0.025,-0.025 and -0.024, respectively). The RF model showed the best diagnostic performance: in training set, AUC was 0.986 (95%: 0.980-0.992), sensitivity was 0.978, specificity 0.946, precision 0.802, and F1-score 0.881; in testing set, AUC was 0.948 (95%: 0.911-0.985), sensitivity was 0.873, specificity 0.919, precision 0.716, and F1-score 0.784. RF model feature importance analysis revealed that average lateral wall indentation width, upper cavity depth, and average lateral wall indentation angle were the top three features (over 65% in total), playing a decisive role in model prediction. The machine learning models developed in this study, particularly the RF model, are promising for the diagnosis of T-shaped uterus, offering new perspectives and technical support for clinical practice.
基于三维经阴道超声的定量参数开发一种用于诊断T形子宫的机器学习模型。进行了一项回顾性横断面研究,招募了2021年7月至2024年6月期间因“不孕或复发性流产”等原因前往中国北京复兴医院宫腔镜中心就诊以及有其他不良产科病史的304例患者。来自复兴医院、首都医科大学附属北京妇产医院、北京大学人民医院和北京医院的12名专家(包括7名临床医生和5名超声科医生)采用改良德尔菲法独立且匿名地评估T形子宫的诊断。根据共识结果,将56例患者分为T形子宫组,248例患者分为非T形子宫组。最初纳入了7项临床特征和14项超声特征。使用10折交叉验证的LASSO(最小绝对收缩和选择算子)回归选择具有显著诊断影响的特征。随后实施了四种机器学习算法[逻辑回归(LR)、决策树(DT)、随机森林(RF)和支持向量机(SVM)]来开发T形子宫诊断模型。使用Python随机模块将患者数据集随机分为五个子集,每个子集保持原始类别分布(T形子宫:非T形子宫≈1∶4)且两类之间样本数量均衡。进行五折交叉验证,每一轮中使用四个子集进行训练,一个子集进行验证,以提高模型评估的可靠性。使用既定指标严格评估模型性能:受试者操作特征(ROC)曲线下面积(AUC)、敏感性、特异性、精确度和F1分数。在随机森林(RF)模型中,通过每个变量的基尼杂质平均减少量评估特征重要性。304例患者的平均年龄为(35±4)岁,T形子宫组的年龄为(35±5)岁;非T形子宫组的年龄为(34±4)岁。通过LASSO回归选择了8个非零系数的特征,包括平均侧壁凹陷宽度、平均侧壁凹陷角度、上腔深度、子宫内膜厚度、子宫腔面积、侧壁凹陷水平处的腔宽度、双侧侧壁形成的角度以及平均宫角(系数分别为0.125、-0.064、-0.037、-0.030、-0.026、-0.025、-0.025和-0.024)。RF模型显示出最佳诊断性能:在训练集中,AUC为0.986(95%:0.980 - 0.992),敏感性为0.978,特异性为0.946,精确度为0.802,F1分数为0.881;在测试集中,AUC为0.948(95%:0.911 - 0.985),敏感性为0.873,特异性为0.919,精确度为0.716,F1分数为0.784。RF模型特征重要性分析表明,平均侧壁凹陷宽度、上腔深度和平均侧壁凹陷角度是前三个特征(总计超过65%),在模型预测中起决定性作用。本研究开发的机器学习模型,尤其是RF模型,在T形子宫诊断方面具有前景,为临床实践提供了新的视角和技术支持。