将非编码注释纳入罕见变异分析。
Incorporating Non-Coding Annotations into Rare Variant Analysis.
作者信息
Richardson Tom G, Campbell Colin, Timpson Nicholas J, Gaunt Tom R
机构信息
MRC Integrative Epidemiology Unit, School of Social and Community Medicine, University of Bristol, Bristol, United Kingdom.
Intelligent Systems Laboratory, University of Bristol, Bristol, United Kingdom.
出版信息
PLoS One. 2016 Apr 29;11(4):e0154181. doi: 10.1371/journal.pone.0154181. eCollection 2016.
BACKGROUND
The success of collapsing methods which investigate the combined effect of rare variants on complex traits has so far been limited. The manner in which variants within a gene are selected prior to analysis has a crucial impact on this success, which has resulted in analyses conventionally filtering variants according to their consequence. This study investigates whether an alternative approach to filtering, using annotations from recently developed bioinformatics tools, can aid these types of analyses in comparison to conventional approaches.
METHODS & RESULTS: We conducted a candidate gene analysis using the UK10K sequence and lipids data, filtering according to functional annotations using the resource CADD (Combined Annotation-Dependent Depletion) and contrasting results with 'nonsynonymous' and 'loss of function' consequence analyses. Using CADD allowed the inclusion of potentially deleterious intronic variants, which was not possible when filtering by consequence. Overall, different filtering approaches provided similar evidence of association, although filtering according to CADD identified evidence of association between ANGPTL4 and High Density Lipoproteins (P = 0.02, N = 3,210) which was not observed in the other analyses. We also undertook genome-wide analyses to determine how filtering in this manner compared to conventional approaches for gene regions. Results suggested that filtering by annotations according to CADD, as well as other tools known as FATHMM-MKL and DANN, identified association signals not detected when filtering by variant consequence and vice versa.
CONCLUSION
Incorporating variant annotations from non-coding bioinformatics tools should prove to be a valuable asset for rare variant analyses in the future. Filtering by variant consequence is only possible in coding regions of the genome, whereas utilising non-coding bioinformatics annotations provides an opportunity to discover unknown causal variants in non-coding regions as well. This should allow studies to uncover a greater number of causal variants for complex traits and help elucidate their functional role in disease.
背景
迄今为止,研究罕见变异对复杂性状综合影响的合并方法的成功率有限。在分析之前选择基因内变异的方式对这种成功率有至关重要的影响,这导致分析通常根据变异的结果进行筛选。本研究调查了一种使用最近开发的生物信息学工具注释进行筛选的替代方法,与传统方法相比,是否有助于这类分析。
方法与结果
我们使用UK10K序列和脂质数据进行了候选基因分析,使用资源CADD(综合注释依赖损耗)根据功能注释进行筛选,并将结果与“非同义”和“功能丧失”结果分析进行对比。使用CADD允许纳入潜在有害的内含子变异,而按结果筛选时则不可能。总体而言,不同的筛选方法提供了相似的关联证据,尽管根据CADD筛选发现了血管生成素样蛋白4(ANGPTL4)与高密度脂蛋白之间的关联证据(P = 0.02,N = 3210),这在其他分析中未观察到。我们还进行了全基因组分析,以确定这种筛选方式与基因区域的传统方法相比如何。结果表明,根据CADD以及其他称为FATHMM-MKL和DANN的工具进行注释筛选,识别出了按变异结果筛选时未检测到的关联信号,反之亦然。
结论
纳入来自非编码生物信息学工具的变异注释在未来应被证明是罕见变异分析的一项宝贵资产。按变异结果进行筛选仅在基因组的编码区域可行,而利用非编码生物信息学注释也提供了在非编码区域发现未知因果变异的机会。这应使研究能够发现更多复杂性状的因果变异,并有助于阐明它们在疾病中的功能作用。