Tang Tiffany M, Zhang Yuping, Kenney Ana M, Xie Cassie, Xiao Lanbo, Siddiqui Javed, Srivastava Sudhir, Sanda Martin G, Wei John T, Feng Ziding, Tosoian Jeffrey J, Zheng Yingye, Chinnaiyan Arul M, Yu Bin
Department of Statistics, University of Michigan, Ann Arbor, MI, USA.
Department of Pathology, University of Michigan, Ann Arbor, MI, USA.
Cancer Biomark. 2025 Jan;42(1):18758592241308755. doi: 10.1177/18758592241308755. Epub 2025 Mar 20.
The limited diagnostic accuracy of prostate-specific antigen screening for prostate cancer (PCa) has prompted innovative solutions, such as the state-of-the-art 18-gene urine test for clinically-significant PCa (MyProstateScore2.0 (MPS2)). We aim to develop a non-invasive biomarker test, the simplified MPS2 (sMPS2), which achieves similar state-of-the-art accuracy as MPS2 for predicting high-grade PCa but requires substantially fewer genes than the 18-gene MPS2 to improve its accessibility for routine clinical care. We grounded the development of sMPS2 in the Predictability, Computability, and Stability (PCS) framework for veridical data science. Under this framework, we stress-tested the development of sMPS2 across various data preprocessing and modeling choices and developed a stability-driven PCS ranking procedure for selecting the most predictive and robust genes for use in sMPS2. The final sMPS2 model consisted of 7 genes and achieved a 0.784 AUROC (95% confidence interval, 0.742-0.825) for predicting high-grade PCa on a blinded external validation cohort. This is only 2.3% lower than the 18-gene MPS2, which is similar in magnitude to the 1-2% in uncertainty induced by different data preprocessing choices. The 7-gene sMPS2 provides a unique opportunity to expand the reach and adoption of non-invasive PCa screening.
前列腺特异性抗原筛查对前列腺癌(PCa)的诊断准确性有限,这促使人们寻求创新解决方案,例如用于临床意义重大的PCa的先进18基因尿液检测(MyProstateScore2.0,简称MPS2)。我们旨在开发一种非侵入性生物标志物检测方法,即简化版MPS2(sMPS2),它在预测高级别PCa方面能达到与MPS2相似的先进准确性,但所需基因数量比18基因的MPS2少得多,以提高其在常规临床护理中的可及性。我们基于真实数据科学的可预测性、可计算性和稳定性(PCS)框架来开发sMPS2。在此框架下,我们在各种数据预处理和建模选择中对sMPS2的开发进行了压力测试,并开发了一种稳定性驱动的PCS排名程序,以选择用于sMPS2的最具预测性和稳健性的基因。最终的sMPS2模型由7个基因组成,在一个盲法外部验证队列中预测高级别PCa时的曲线下面积(AUROC)为0.784(95%置信区间,0.742 - 0.825)。这仅比18基因的MPS2低2.3%,与不同数据预处理选择所导致的1 - 2%的不确定性幅度相似。7基因的sMPS2为扩大非侵入性PCa筛查的范围和应用提供了独特机会。