Department of Epidemiology and Biostatistics, Memorial Sloan-Kettering Cancer Center, New York, NY 10065, USA.
Genet Epidemiol. 2011 Jul;35(5):389-97. doi: 10.1002/gepi.20587. Epub 2011 Apr 25.
Current evidence suggests that the genetic risk of breast cancer may be caused primarily by rare variants. However, while classification of protein-truncating mutations as deleterious is relatively straightforward, distinguishing as deleterious or neutral the large number of rare missense variants is a difficult on-going task. In this article, we present one approach to this problem, hierarchical statistical modeling of data observed in a case-control study of contralateral breast cancer (CBC) in which all the participants were genotyped for variants in BRCA1 and BRCA2. Hierarchical modeling permits leverage of information from observed correlations of characteristics of groups of variants with case-control status to infer with greater precision the risks of individual rare variants. A total of 181 distinct rare missense variants were identified among the 705 cases with CBC and the 1,398 controls with unilateral breast cancer. The model identified three bioinformatic hierarchical covariates, align-GV, align-GD, and SIFT scores, each of which was modestly associated with risk. Collectively, the 11 variants that were classified as adverse on the basis of all the three bioinformatic predictors demonstrated a stronger risk signal. This group included five of six missense variants that were classified as deleterious at the outset by conventional criteria. The remaining six variants can be considered as plausibly deleterious, and deserving of further investigation (BRCA1 R866C; BRCA2 G1529R, D2665G, W2626C, E2663V, and R3052W). Hierarchical modeling is a strategy that has promise for interpreting the evidence from future association studies that involve sequencing of known or suspected cancer genes.
目前的证据表明,乳腺癌的遗传风险可能主要由罕见变异引起。然而,虽然将蛋白质截断突变归类为有害是相对直接的,但区分大量罕见错义变异是一个困难的持续任务。在本文中,我们提出了一种解决这个问题的方法,即对双侧乳腺癌(CBC)病例对照研究中观察到的数据进行分层统计建模,所有参与者都对 BRCA1 和 BRCA2 中的变异进行了基因分型。分层建模允许利用观察到的群体变异特征与病例对照状态的相关性信息,以更高的精度推断个体罕见变异的风险。在 705 例 CBC 病例和 1398 例单侧乳腺癌对照中,共鉴定出 181 个独特的罕见错义变异。该模型确定了三个生物信息学分层协变量,即 align-GV、align-GD 和 SIFT 评分,每个协变量与风险都有一定的相关性。基于所有三个生物信息学预测因子,共有 11 个变异被归类为不良,显示出更强的风险信号。这一组包括在传统标准下最初被归类为有害的六个错义变异中的五个。其余六个变异可以被认为是可能有害的,值得进一步研究(BRCA1 R866C;BRCA2 G1529R、D2665G、W2626C、E2663V 和 R3052W)。分层建模是一种有前途的策略,可以解释涉及已知或可疑癌症基因测序的未来关联研究的证据。