Department of Medical Physics, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA.
Department of Radiation Oncology, University of Rochester Medical Center, Rochester, NY 14620, USA.
Sci Rep. 2017 Feb 24;7:43381. doi: 10.1038/srep43381.
The biological cause of clinically observed variability of normal tissue damage following radiotherapy is poorly understood. We hypothesized that machine/statistical learning methods using single nucleotide polymorphism (SNP)-based genome-wide association studies (GWAS) would identify groups of patients of differing complication risk, and furthermore could be used to identify key biological sources of variability. We developed a novel learning algorithm, called pre-conditioned random forest regression (PRFR), to construct polygenic risk models using hundreds of SNPs, thereby capturing genomic features that confer small differential risk. Predictive models were trained and validated on a cohort of 368 prostate cancer patients for two post-radiotherapy clinical endpoints: late rectal bleeding and erectile dysfunction. The proposed method results in better predictive performance compared with existing computational methods. Gene ontology enrichment analysis and protein-protein interaction network analysis are used to identify key biological processes and proteins that were plausible based on other published studies. In conclusion, we confirm that novel machine learning methods can produce large predictive models (hundreds of SNPs), yielding clinically useful risk stratification models, as well as identifying important underlying biological processes in the radiation damage and tissue repair process. The methods are generally applicable to GWAS data and are not specific to radiotherapy endpoints.
放疗后正常组织损伤的临床观察变异性的生物学原因尚不清楚。我们假设,使用基于单核苷酸多态性(SNP)的全基因组关联研究(GWAS)的机器/统计学习方法将确定具有不同并发症风险的患者群体,并且还可以用于确定变异性的关键生物学来源。我们开发了一种称为条件随机森林回归(PRFR)的新学习算法,该算法使用数百个 SNP 构建多基因风险模型,从而捕获赋予微小差异风险的基因组特征。在 368 例前列腺癌患者的队列上针对两种放射治疗后临床终点(晚期直肠出血和勃起功能障碍)对预测模型进行了训练和验证。与现有计算方法相比,所提出的方法可产生更好的预测性能。通过基因本体论富集分析和蛋白质-蛋白质相互作用网络分析,鉴定了基于其他已发表研究的合理关键生物学过程和蛋白质。总之,我们证实了新型机器学习方法可以生成大型预测模型(数百个 SNP),从而产生具有临床应用价值的风险分层模型,并确定放射损伤和组织修复过程中的重要潜在生物学过程。该方法通常适用于 GWAS 数据,并且与放射治疗终点无关。