Fife James D, Cassa Christopher A
Division of Genetics, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts.
Harvard Medical School, Boston, Massachusetts.
medRxiv. 2023 Jan 9:2023.01.06.23284281. doi: 10.1101/2023.01.06.23284281.
While pathogenic variants significantly increase disease risk in many genes, it is still challenging to estimate the clinical impact of rare missense variants more generally. Even in genes such as or , large cohort studies find no significant association between breast cancer and rare germline missense variants collectively. Here we introduce REGatta, a method to improve the estimation of clinical risk in gene segments. We define gene regions using the density of pathogenic diagnostic reports, and then calculate the relative risk in each of these regions using 109,581 exome sequences from women in the UK Biobank. We apply this method in seven established breast cancer genes, and identify regions in each gene with statistically significant differences in breast cancer incidence for rare missense carriers. Even in genes with no significant difference at the gene level, this approach significantly separates rare missense variant carriers at higher or lower risk ( regional model OR=1.46 [1.12, 1.79], p=0.0036 vs. gene model OR=0.96 [0.85,1.07] p=0.4171). We find high concordance between these regional risk estimates and high-throughput functional assays of variant impact. We compare with existing methods and the use of protein domains (Pfam) as regions, and find REGatta better identifies individuals at elevated or reduced risk. These regions provide useful priors which can potentially be used to improve risk assessment and clinical management.
虽然致病变异在许多基因中显著增加疾病风险,但更普遍地估计罕见错义变异的临床影响仍然具有挑战性。即使在某些基因(如 或 )中,大型队列研究也未发现乳腺癌与罕见种系错义变异之间存在显著的总体关联。在此,我们介绍REGatta,一种用于改进基因片段临床风险估计的方法。我们利用致病诊断报告的密度来定义基因区域,然后使用英国生物银行中109,581名女性的外显子序列计算每个区域的相对风险。我们将此方法应用于七个已确定的乳腺癌基因,并在每个基因中识别出罕见错义变异携带者的乳腺癌发病率存在统计学显著差异的区域。即使在基因水平上无显著差异的基因中,这种方法也能显著区分风险较高或较低的罕见错义变异携带者(区域模型OR = 1.46 [1.12, 1.79],p = 0.0036,而基因模型OR = 0.96 [0.85, 1.07],p = 0.4171)。我们发现这些区域风险估计与变异影响的高通量功能测定之间具有高度一致性。我们与现有方法以及使用蛋白质结构域(Pfam)作为区域的方法进行比较,发现REGatta能更好地识别风险升高或降低的个体。这些区域提供了有用的先验信息,有可能用于改进风险评估和临床管理。