Victorian Life Sciences Computation Initiative, The University of Melbourne, 187 Grattan Street Carlton, Melbourne, Victoria 3010, Australia.
BMC Bioinformatics. 2013 Feb 25;14:65. doi: 10.1186/1471-2105-14-65.
Characterising genetic diversity through the analysis of massively parallel sequencing (MPS) data offers enormous potential to significantly improve our understanding of the genetic basis for observed phenotypes, including predisposition to and progression of complex human disease. Great challenges remain in resolving genetic variants that are genuine from the millions of artefactual signals.
FAVR is a suite of new methods designed to work with commonly used MPS analysis pipelines to assist in the resolution of some of the issues related to the analysis of the vast amount of resulting data, with a focus on relatively rare genetic variants. To the best of our knowledge, no equivalent method has previously been described. The most important and novel aspect of FAVR is the use of signatures in comparator sequence alignment files during variant filtering, and annotation of variants potentially shared between individuals. The FAVR methods use these signatures to facilitate filtering of (i) platform and/or mapping-specific artefacts, (ii) common genetic variants, and, where relevant, (iii) artefacts derived from imbalanced paired-end sequencing, as well as annotation of genetic variants based on evidence of co-occurrence in individuals. We applied conventional variant calling applied to whole-exome sequencing datasets, produced using both SOLiD and TruSeq chemistries, with or without downstream processing by FAVR methods. We demonstrate a 3-fold smaller rare single nucleotide variant shortlist with no detected reduction in sensitivity. This analysis included Sanger sequencing of rare variant signals not evident in dbSNP131, assessment of known variant signal preservation, and comparison of observed and expected rare variant numbers across a range of first cousin pairs. The principles described herein were applied in our recent publication identifying XRCC2 as a new breast cancer risk gene and have been made publically available as a suite of software tools.
FAVR is a platform-agnostic suite of methods that significantly enhances the analysis of large volumes of sequencing data for the study of rare genetic variants and their influence on phenotypes.
通过分析大规模平行测序(MPS)数据来描述遗传多样性,为深入了解观察到的表型的遗传基础,包括对复杂人类疾病的易感性和进展,提供了巨大的潜力。从数百万个人为信号中分辨出真正的遗传变异仍然存在巨大的挑战。
FAVR 是一套新的方法,旨在与常用的 MPS 分析管道一起使用,以帮助解决与分析大量相关数据相关的一些问题,重点是相对罕见的遗传变异。据我们所知,以前没有描述过等效的方法。FAVR 最重要和新颖的方面是在变体过滤和潜在个体间共享变体注释过程中使用比较序列比对文件中的特征。FAVR 方法使用这些特征来促进(i)平台和/或映射特定人为因素的过滤、(ii)常见遗传变异,以及在相关情况下,(iii)来自不平衡的配对末端测序的人为因素的过滤,以及基于个体中共同出现证据的遗传变异注释。我们应用了传统的变体调用方法,对使用 SOLiD 和 TruSeq 化学方法生成的全外显子测序数据集进行了分析,这些数据集要么经过 FAVR 方法的下游处理,要么没有经过处理。我们展示了一个罕见的单核苷酸变异短名单,其数量减少了三分之一,而灵敏度没有降低。该分析包括在 dbSNP131 中未明显显示的稀有变异信号的 Sanger 测序、已知变异信号保存的评估,以及在一系列第一代表亲之间观察到的和预期的稀有变异数量的比较。本文所述的原理应用于我们最近发表的一篇论文中,该论文确定 XRCC2 为一种新的乳腺癌风险基因,并已作为一套软件工具公开发布。
FAVR 是一种与平台无关的方法套件,可显著增强对大量测序数据的分析,以研究稀有遗传变异及其对表型的影响。