de Maturana Evangelina López, Chanok Stephen J, Picornell Antoni C, Rothman Nathaniel, Herranz Jesús, Calle M Luz, García-Closas Montserrat, Marenne Gaëlle, Brand Angela, Tardón Adonina, Carrato Alfredo, Silverman Debra T, Kogevinas Manolis, Gianola Daniel, Real Francisco X, Malats Núria
Spanish National Cancer Research Center (CNIO), Madrid, Spain.
Genet Epidemiol. 2014 Jul;38(5):467-76. doi: 10.1002/gepi.21809. Epub 2014 May 5.
To build a predictive model for urothelial carcinoma of the bladder (UCB) risk combining both genomic and nongenomic data, 1,127 cases and 1,090 controls from the Spanish Bladder Cancer/EPICURO study were genotyped using the HumanHap 1M SNP array. After quality control filters, genotypes from 475,290 variants were available. Nongenomic information comprised age, gender, region, and smoking status. Three Bayesian threshold models were implemented including: (1) only genomic information, (2) only nongenomic data, and (3) both sources of information. The three models were applied to the whole population, to only nonsmokers, to male smokers, and to extreme phenotypes to potentiate the UCB genetic component. The area under the ROC curve allowed evaluating the predictive ability of each model in a 10-fold cross-validation scenario. Smoking status showed the highest predictive ability of UCB risk (AUCtest = 0.62). On the other hand, the AUC of all genetic variants was poorer (0.53). When the extreme phenotype approach was applied, the predictive ability of the genomic model improved 15%. This study represents a first attempt to build a predictive model for UCB risk combining both genomic and nongenomic data and applying state-of-the-art statistical approaches. However, the lack of genetic relatedness among individuals, the complexity of UCB etiology, as well as a relatively small statistical power, may explain the low predictive ability for UCB risk. The study confirms the difficulty of predicting complex diseases using genetic data, and suggests the limited translational potential of findings from this type of data into public health interventions.
为构建一个结合基因组和非基因组数据的膀胱尿路上皮癌(UCB)风险预测模型,我们使用HumanHap 1M SNP芯片对西班牙膀胱癌/EPICURO研究中的1127例病例和1090例对照进行了基因分型。经过质量控制筛选后,获得了475290个变异的基因型。非基因组信息包括年龄、性别、地区和吸烟状况。实施了三种贝叶斯阈值模型,包括:(1)仅基因组信息,(2)仅非基因组数据,以及(3)两种信息来源。这三种模型应用于全体人群、仅不吸烟者、男性吸烟者以及极端表型人群,以增强UCB的遗传成分。ROC曲线下面积用于评估每个模型在10倍交叉验证情况下的预测能力。吸烟状况显示出对UCB风险的最高预测能力(AUCtest = 0.62)。另一方面,所有基因变异的AUC较差(0.53)。当应用极端表型方法时,基因组模型的预测能力提高了15%。本研究首次尝试构建一个结合基因组和非基因组数据并应用最新统计方法的UCB风险预测模型。然而,个体间缺乏遗传相关性、UCB病因的复杂性以及相对较小的统计效力,可能解释了UCB风险预测能力较低的原因。该研究证实了使用遗传数据预测复杂疾病的困难,并表明这类数据的研究结果转化为公共卫生干预措施的潜力有限。