Li Dengke, Chang Baoyuan, Huang Qunlian
Department of Urology, The First Affiliated Hospital of Wannan Medical College, Yijishan Hospital, Wuhu, 241001, Anhui, People's Republic of China.
Department of Urology, Suzhou Hospital of Anhui Medical University,(Suzhou Municipal Hospital of Anhui Province), suzhou, 237000, Anhui, People's Republic of China.
Sci Rep. 2025 Jan 9;15(1):1532. doi: 10.1038/s41598-025-85963-7.
To create a diagnostic tool before biopsy for patients with prostate-specific antigen (PSA) levels < 20 ng/ml to minimize prostate biopsy-related discomfort and risks. Data from 655 patients who underwent transperineal prostate biopsy at the First Affiliated Hospital of Wannan Medical College from July 2021 to January 2023 were collected and analyzed. After applying the Synthetic Minority Over-sampling TEchnique class balancing on the training set, multiple machine learning models were constructed by using the Least Absolute Shrinkage and Selection Operator (LASSO) feature selection to identify the significant variables. The best-performing model was selected and evaluated through tenfold cross-validation to ensure interpretability. Finally, the performance was assessed using the test set data for validation. The age, prostate-specific antigen mass ratio (PSAMR), Prostate Imaging-Reporting and Data System, and prostate volume were selected as the variables for model construction based on the LASSO regression. The receiver operating characteristic (ROC) results for multiple models in the validation set were as follows: XGBoost: 0.93 (0.88-0.97); logistic: 0.89 (0.83-0.95); LightGBM: 0.87 (0.80-0.93); AdaBoost: 0.90 (0.85-0.96); GNB: 0.88 (0.82-0.95); CNB: 0.79 (0.71-0.87); MLP: 0.78 (0.69-0.86); and Support Vector Machine: 0.81 (0.73-0.89). XGBoost was selected as the best model and reconstructed with tenfold cross-validation on the training data, resulting in the following ROC scores: training set 0.995 (0.991-0.999), validation set 0.945 (0.885-0.997 ), and test set 0.920 (0.868-0.972). The Kolmogorov-Smirnov curve, calibration curve and learning curve yielded positive results; The decision curve demonstrates that patients with threshold probabilities ranging from 10 to 95% can benefit from this model. We developed an XGBoost machine learning model based on the PSAMR indicator and interpreted it using the SHapley Additive exPlanations method. The model offered a high-performance non-invasive technique to diagnose prostate cancer in patients with PSA levels < 20 ng/ml.
为前列腺特异性抗原(PSA)水平<20 ng/ml的患者在活检前创建一种诊断工具,以尽量减少前列腺活检相关的不适和风险。收集并分析了2021年7月至2023年1月在皖南医学院第一附属医院接受经会阴前列腺活检的655例患者的数据。在对训练集应用合成少数过采样技术进行类别平衡后,使用最小绝对收缩和选择算子(LASSO)特征选择构建多个机器学习模型,以识别显著变量。通过十折交叉验证选择并评估表现最佳的模型,以确保可解释性。最后,使用测试集数据评估性能以进行验证。基于LASSO回归,选择年龄、前列腺特异性抗原质量比(PSAMR)、前列腺影像报告和数据系统以及前列腺体积作为模型构建的变量。验证集中多个模型的受试者操作特征(ROC)结果如下:XGBoost:0.93(0.88 - 0.97);逻辑回归:0.89(0.83 - 0.95);LightGBM:0.87(0.80 - 0.93);AdaBoost:0.90(0.85 - 0.96);高斯朴素贝叶斯(GNB):0.88(0.82 - 0.95);互补朴素贝叶斯(CNB):0.79(0.71 - 0.87);多层感知器(MLP):0.78(0.69 - 0.86);支持向量机:0.81(0.73 - 0.89)。选择XGBoost作为最佳模型,并在训练数据上进行十折交叉验证重建,得到以下ROC分数:训练集0.99(0.991 - 0.999),验证集0.945(0.885 - 0.997),测试集0.920(0.868 - 0.972)。柯尔莫哥洛夫 - 斯米尔诺夫曲线、校准曲线和学习曲线均得出阳性结果;决策曲线表明阈值概率在10%至95%之间的患者可从该模型中受益。我们基于PSAMR指标开发了一种XGBoost机器学习模型,并使用夏普利值加法解释(SHapley Additive exPlanations)方法对其进行解释。该模型为诊断PSA水平<20 ng/ml的前列腺癌患者提供了一种高性能的非侵入性技术。