School of Medicine, University of Leeds, Leeds, United Kingdom.
Hum Mutat. 2013 Jul;34(7):945-52. doi: 10.1002/humu.22322. Epub 2013 Apr 29.
Massively parallel ("next generation") DNA sequencing (NGS) has quickly become the method of choice for seeking pathogenic mutations in rare uncharacterized monogenic diseases. Typically, before DNA sequencing, protein-coding regions are enriched from patient genomic DNA, representing either the entire genome ("exome sequencing") or selected mapped candidate loci. Sequence variants, identified as differences between the patient's and the human genome reference sequences, are then filtered according to various quality parameters. Changes are screened against datasets of known polymorphisms, such as dbSNP and the 1000 Genomes Project, in the effort to narrow the list of candidate causative variants. An increasing number of commercial services now offer to both generate and align NGS data to a reference genome. This potentially allows small groups with limited computing infrastructure and informatics skills to utilize this technology. However, the capability to effectively filter and assess sequence variants is still an important bottleneck in the identification of deleterious sequence variants in both research and diagnostic settings. We have developed an approach to this problem comprising a user-friendly suite of programs that can interactively analyze, filter and screen data from enrichment-capture NGS data. These programs ("Agile Suite") are particularly suitable for small-scale gene discovery or for diagnostic analysis.
大规模平行(“下一代”)DNA 测序(NGS)已迅速成为寻找罕见未明单基因疾病致病突变的首选方法。通常,在 DNA 测序之前,从患者基因组 DNA 中富集编码蛋白的区域,代表整个基因组(“外显子组测序”)或选择映射的候选基因座。然后根据各种质量参数对序列变体进行过滤,这些变体被定义为患者与人类基因组参考序列之间的差异。然后将这些变化与已知多态性数据集(如 dbSNP 和 1000 基因组计划)进行筛选,以缩小候选致病变体的列表。现在越来越多的商业服务机构提供生成和将 NGS 数据与参考基因组对齐的服务。这使得计算基础设施和信息学技能有限的小团队能够利用这项技术。然而,在研究和诊断环境中有效筛选和评估序列变体的能力仍然是识别有害序列变体的一个重要瓶颈。我们已经开发了一种方法来解决这个问题,该方法由一套用户友好的程序组成,可以对富集捕获 NGS 数据进行交互式分析、过滤和筛选。这些程序(“Agile 套件”)特别适合小规模的基因发现或诊断分析。