Department of Human Genetics, Emory University School of Medicine, Atlanta, GA, USA.
BMC Bioinformatics. 2010 Sep 20;11:471. doi: 10.1186/1471-2105-11-471.
The enormous throughput and low cost of second-generation sequencing platforms now allow research and clinical geneticists to routinely perform single experiments that identify tens of thousands to millions of variant sites. Existing methods to annotate variant sites using information from publicly available databases via web browsers are too slow to be useful for the large sequencing datasets being routinely generated by geneticists. Because sequence annotation of variant sites is required before functional characterization can proceed, the lack of a high-throughput pipeline to efficiently annotate variant sites can act as a significant bottleneck in genetics research.
SeqAnt (Sequence Annotator) is an open source web service and software package that rapidly annotates DNA sequence variants and identifies recessive or compound heterozygous loci in human, mouse, fly, and worm genome sequencing experiments. Variants are characterized with respect to their functional type, frequency, and evolutionary conservation. Annotated variants can be viewed on a web browser, downloaded in a tab-delimited text file, or directly uploaded in a BED format to the UCSC genome browser. To demonstrate the speed of SeqAnt, we annotated a series of publicly available datasets that ranged in size from 37 to 3,439,107 variant sites. The total time to completely annotate these data completely ranged from 0.17 seconds to 28 minutes 49.8 seconds.
SeqAnt is an open source web service and software package that overcomes a critical bottleneck facing research and clinical geneticists using second-generation sequencing platforms. SeqAnt will prove especially useful for those investigators who lack dedicated bioinformatics personnel or infrastructure in their laboratories.
第二代测序平台具有高通量和低成本的优势,现在研究人员和临床遗传学家可以常规地进行单个实验,识别数万到数百万个变异位点。现有的通过网络浏览器利用公共数据库信息注释变异位点的方法太慢,无法用于遗传学家常规生成的大型测序数据集。由于在进行功能特征分析之前需要对变异位点进行序列注释,因此缺乏高通量的变异位点注释流水线可能成为遗传学研究的一个重要瓶颈。
SeqAnt(序列注释器)是一个开源的网络服务和软件包,可快速注释 DNA 序列变异,并识别人类、小鼠、果蝇和线虫基因组测序实验中的隐性或复合杂合子位点。变异体的功能类型、频率和进化保守性等特征都可以进行注释。注释后的变异可以在网络浏览器中查看,也可以下载为制表符分隔的文本文件,或者直接以 BED 格式上传到 UCSC 基因组浏览器。为了展示 SeqAnt 的速度,我们对一系列大小从 37 到 3439107 个变异位点的公开数据集进行了注释。完全注释这些数据所需的总时间从 0.17 秒到 28 分钟 49.8 秒不等。
SeqAnt 是一个开源的网络服务和软件包,它克服了使用第二代测序平台的研究人员和临床遗传学家面临的一个关键瓶颈。对于那些在实验室中缺乏专用生物信息学人员或基础设施的研究人员来说,SeqAnt 将特别有用。