CASCAD:一个与表达序列相关的注释候选单核苷酸多态性数据库。
CASCAD: a database of annotated candidate single nucleotide polymorphisms associated with expressed sequences.
作者信息
Guryev Victor, Berezikov Eugene, Cuppen Edwin
机构信息
Hubrecht Laboratory, Netherlands Institute for Developmental Biology, Uppsalalaan 8, 3584CT, Utrecht, The Netherlands.
出版信息
BMC Genomics. 2005 Jan 27;6:10. doi: 10.1186/1471-2164-6-10.
BACKGROUND
With the recent progress made in large-scale genome sequencing projects a vast amount of novel data is becoming available. A comparative sequence analysis, exploiting sequence information from various resources, can be used to uncover hidden information, such as genetic variation. Although there are enormous amounts of SNPs for a wide variety of organisms submitted to NCBI dbSNP and annotated in most genome assembly viewers like Ensembl and the UCSC Genome Browser, these platforms do not easily allow for extensive annotation and incorporation of experimental data supporting the polymorphism. However, such information is very important for selecting the most promising and useful candidate polymorphisms for use in experimental setups.
DESCRIPTION
The CASCAD database is designed for presentation and query of candidate SNPs that are retrieved by in silico mining of high-throughput sequencing data. Currently, the database provides collections of laboratory rat (Rattus norvegicus) and zebrafish (Danio rerio) candidate SNPs. The database stores detailed information about raw data supporting the candidate, extensive annotation and links to external databases (e.g. GenBank, Ensembl, UniGene, and LocusLink), verification information, and predictions of a potential effect for non-synonymous polymorphisms in coding regions. The CASCAD website allows search based on an arbitrary combination of 27 different parameters related to characteristics like candidate SNP quality, genomic localization, and sequence data source or strain. In addition, the database can be queried with any custom nucleotide sequences of interest. The interface is crosslinked to other public databases and tightly coupled with primer design and local genome assembly interfaces in order to facilitate experimental verification of candidates.
CONCLUSIONS
The CASCAD database discloses detailed information on rat and zebrafish candidate SNPs, including the raw data underlying its discovery. An advanced web-based search interface http://cascad.niob.knaw.nl allows universal access to the database content and allows various queries supporting many types of research utilizing single nucleotide polymorphisms.
背景
随着大规模基因组测序项目的近期进展,大量新数据不断涌现。利用来自各种资源的序列信息进行的比较序列分析,可用于揭示隐藏信息,如遗传变异。尽管有大量针对各种生物的单核苷酸多态性(SNP)提交到NCBI的dbSNP,并在诸如Ensembl和加州大学圣克鲁兹分校基因组浏览器等大多数基因组组装浏览器中进行了注释,但这些平台并不容易实现对支持多态性的实验数据进行广泛注释和整合。然而,此类信息对于选择最有前景和最有用的候选多态性用于实验设置非常重要。
描述
CASCAD数据库旨在展示和查询通过对高通量测序数据进行计算机挖掘检索到的候选SNP。目前,该数据库提供了实验室大鼠(褐家鼠)和斑马鱼(斑马丹尼奥)候选SNP的集合。该数据库存储了有关支持候选SNP的原始数据的详细信息、广泛的注释以及与外部数据库(如GenBank、Ensembl、UniGene和LocusLink)的链接、验证信息,以及对编码区非同义多态性潜在影响的预测。CASCAD网站允许基于与候选SNP质量、基因组定位、序列数据源或品系等特征相关的27个不同参数的任意组合进行搜索。此外,该数据库可以用任何感兴趣的自定义核苷酸序列进行查询。该界面与其他公共数据库交联,并与引物设计和本地基因组组装界面紧密耦合,以便于对候选SNP进行实验验证。
结论
CASCAD数据库披露了大鼠和斑马鱼候选SNP的详细信息,包括其发现所依据的原始数据。先进的基于网络的搜索界面http://cascad.niob.knaw.nl允许对数据库内容进行通用访问,并允许进行各种查询,支持利用单核苷酸多态性的多种类型研究。