Zhang Xinru, Feng Chao, Bai Xiao, Peng Xufeng, Guo Qian, Chen Lei, Xue Jingdong
Department of Urology, Shanghai Sixth People's Hospital Affiliated to Shanghai Jiao Tong University School of Medicine, 200233 Shanghai, China.
Department of Rheumatology and Immunology, Beijing Tsinghua Changgung Hospital, School of Clinical Medicine, Tsinghua University, 102218 Beijing, China.
Arch Esp Urol. 2023 Sep;76(7):494-503. doi: 10.56434/j.arch.esp.urol.20237607.61.
Innovative strategies are necessary to enhance prostate cancer diagnosis whilst reducing unnecessary and invasive repeat biopsies. This study aimed to determine the significant parameters affecting repeat prostate biopsy outcomes and develop an optimal machine learning algorithm for predicting positive repeat prostate biopsy results.
We analysed data from 174 men who underwent repeated prostate biopsies between January 2008 and December 2022. Systematic multiple-core, ultrasound-targeted prostate biopsies were performed, each two samples from prostatic transitional zone and peripheral zone were obtained bilaterally. Clinical characteristics were collected, including patients' age, initial prostate volume, prostate-specific antigen (PSA) level, free PSA (fPSA)/PSA ratio, biopsy core numbers, pathological result; The time interval between first and latest prostate biopsy; Latest PSA level, fPSA/PSA ratio, biopsy core numbers; And final pathological diagnosis. Six feature selection methods, namely, variable ranking, correlation matrix, random forest regression, recursive feature elimination, cross-validation and forward selection, were employed to identify key influencing factors for repeat biopsy outcomes. Subsequently, the performance of seven machine learning algorithms, namely, multivariable logistic regression (LR), K-nearest neighbour search (KNN), support vector classification (SVC), decision tree (DT), random forest classifier (RF), naïve Bayes classifier (NBC) and gradient booster tree (GB), was assessed based on accuracy, misclassification, recall, specificity, precision and receiver operating characteristic (ROC) area under the curve (AUC). About 70% of patients were used as the training dataset, meanwhile remaining 30% as validation dataset.
52 were ultimately diagnosed with prostate cancer following the final pathological examination. The remaining 122 patients were negative. Amongst six feature selection methods, the variable ranking emerged as the most effective method for identifying the essential factors influencing repeat biopsy results. Amongst the machine learning algorithms, SVC demonstrated superior accuracy (0.7365), low recall rate (0.2500) and low misclassification rate (0.2093) for both patients with cancer and healthy individuals. Meanwhile, the ROC curve of SVC showed a relatively high AUC (0.6871).
We developed an SVC-based machine learning algorithm for predicting positive repeat prostate biopsy results. Our analysis revealed that initial and latest prostate volumes, initial and latest PSA levels, latest fPSA/PSA ratio and age are significant factors for this model.
创新策略对于提高前列腺癌诊断水平同时减少不必要的侵入性重复活检至关重要。本研究旨在确定影响重复前列腺活检结果的重要参数,并开发一种用于预测重复前列腺活检阳性结果的最佳机器学习算法。
我们分析了2008年1月至2022年12月期间接受重复前列腺活检的174名男性的数据。进行了系统的多点、超声靶向前列腺活检,双侧从前列腺移行区和外周区各获取两个样本。收集了临床特征,包括患者年龄、初始前列腺体积、前列腺特异性抗原(PSA)水平、游离PSA(fPSA)/PSA比值、活检芯数、病理结果;首次和最新前列腺活检之间的时间间隔;最新PSA水平、fPSA/PSA比值、活检芯数;以及最终病理诊断。采用六种特征选择方法,即变量排名、相关矩阵、随机森林回归、递归特征消除、交叉验证和向前选择,来识别影响重复活检结果的关键因素。随后,基于准确率、误分类率、召回率、特异性、精确率和曲线下接受者操作特征(ROC)面积(AUC),评估了七种机器学习算法的性能,这七种算法分别是多变量逻辑回归(LR)、K近邻搜索(KNN)、支持向量分类(SVC)、决策树(DT)、随机森林分类器(RF)、朴素贝叶斯分类器(NBC)和梯度提升树(GB)。约70%的患者用作训练数据集,同时其余30%用作验证数据集。
最终病理检查后,52例被诊断为前列腺癌。其余122例患者为阴性。在六种特征选择方法中,变量排名是识别影响重复活检结果的关键因素的最有效方法。在机器学习算法中,SVC对癌症患者和健康个体均表现出较高的准确率(0.7365)、较低的召回率(0.2500)和较低的误分类率(0.2093)。同时,SVC的ROC曲线显示出相对较高的AUC(0.6871)。
我们开发了一种基于SVC的机器学习算法来预测重复前列腺活检的阳性结果。我们的分析表明,初始和最新前列腺体积、初始和最新PSA水平、最新fPSA/PSA比值和年龄是该模型的重要因素。