Dascena Inc., Houston, Texas, USA.
Montera Inc., San Francisco, CA, USA.
Cancer Med. 2023 Jan;12(1):379-386. doi: 10.1002/cam4.4934. Epub 2022 Jun 25.
Prostate cancer (PCa) screening is not routinely conducted in men aged 55 and younger, although this age group accounts for more than 10% of cases. Polygenic risk scores (PRSs) and patient data applied toward early prediction of PCa may lead to earlier interventions and increased survival. We have developed machine learning (ML) models to predict PCa risk in men 55 and under using PRSs combined with patient data.
We conducted a retrospective study on 91,106 male patients aged 35-55 using the UK Biobank database. Five gradient boosting models were developed and validated utilizing routine screening data, PRSs, additional clinical data, or combinations of the three.
Combinations of PRSs and patient data outperformed models that utilized PRS or patient data only, and the highest performing models achieved an area under the receiver operating characteristic curve of 0.788. Our models demonstrated a substantially lower false positive rate (35.4%) in comparison to standard screening using prostate-specific antigen (60%-67%).
This study provides the first preliminary evidence for the use of PRSs with patient data in a ML algorithm for PCa risk prediction in men aged 55 and under for whom screening is not standard practice.
尽管 55 岁及以下年龄段的男性占前列腺癌 (PCa) 病例的 10%以上,但前列腺癌筛查并未常规用于该年龄段的男性。多基因风险评分 (PRSs) 和用于早期预测前列腺癌的患者数据可能会导致更早的干预和更高的生存率。我们已经开发了机器学习 (ML) 模型,以使用 PRSs 结合患者数据来预测 55 岁及以下男性的 PCa 风险。
我们使用英国生物银行数据库对 91106 名 35-55 岁的男性进行了回顾性研究。利用常规筛查数据、PRSs、额外的临床数据或三者的组合,开发和验证了 5 个梯度提升模型。
PRSs 和患者数据的组合优于仅使用 PRS 或患者数据的模型,表现最佳的模型的受试者工作特征曲线下面积达到 0.788。与使用前列腺特异性抗原 (PSA) 的标准筛查(60%-67%)相比,我们的模型的假阳性率(35.4%)显著降低。
本研究首次提供了初步证据,证明可以在 ML 算法中使用 PRSs 结合患者数据来预测 55 岁及以下男性的 PCa 风险,对于这些不进行常规筛查的人群,该算法具有一定的应用价值。