Department of Computer Engineering and Computer Science, Speed School of Engineering, University of Louisville, Duthie Center, Room 208, Louisville, KY, USA.
BMC Genomics. 2010 Jun 1;11:347. doi: 10.1186/1471-2164-11-347.
Sequencing of the approximately 1.7 billion bases of the zebrafish genome is currently underway. To date, few high resolution genetic maps exist for the zebrafish genome, based mainly on single nucleotide polymorphisms (SNPs) and short microsatellite repeats. The desire to construct a higher resolution genetic map led to the construction of a database of tandemly repeating elements within the zebrafish Zv8 assembly.
Exact tandem repeats with a repeat length of at least three bases and a copy number of at least 10 were reported. Repeats with a total length of 250 or fewer bases and their flanking regions were masked for known vertebrate repeats. Optimal primer pairs were computationally designed in the regions flanking the detected repeats. This database of exact tandem repeats can then be used as a resource by molecular biologists with interests in experimentally testing VNTRs within a zebrafish population.
A total of 116,915 repeats with a base length of at least three nucleotides were detected. The longest of these was a 54-base repeat with fourteen tandem copies. A significant number of repeats with a base length of 18, 24, 27 and 30 were detected, many with potentially novel proline-rich coding regions.Detection of exact tandem repeats in the zebrafish genome leads to a wealth of information regarding potential polymorphic sites for VNTRs. The association of many of these repeats with potentially novel yet similar coding regions yields an exciting potential for disease associated genes. A web interface for querying repeats is available at http://bioinformatics.louisville.edu/zebrafish/. This portal allows for users to search for a repeats of a selected base size from any valid specified region within the 25 linkage groups.
目前正在对斑马鱼基因组的约 17 亿个碱基进行测序。迄今为止,基于单核苷酸多态性 (SNP) 和短微卫星重复,仅有少量高分辨率遗传图谱存在于斑马鱼基因组中。构建更高分辨率遗传图谱的愿望导致了斑马鱼 Zv8 组装体内串联重复元件数据库的构建。
报道了至少具有三个碱基重复长度和至少 10 个拷贝数的精确串联重复。具有 250 个碱基或更少总长度的重复及其侧翼区域被屏蔽以避免已知的脊椎动物重复。在检测到的重复侧翼区域中,计算设计了最佳的引物对。然后,这个精确串联重复数据库可以作为对斑马鱼群体中的 VNTR 进行实验测试的分子生物学家的资源。
总共检测到至少三个碱基长度的 116,915 个重复。其中最长的是一个由 14 个串联拷贝组成的 54 个碱基重复。检测到许多具有 18、24、27 和 30 个碱基长度的重复,其中许多具有潜在的新脯氨酸丰富的编码区。在斑马鱼基因组中检测到精确的串联重复,为潜在的 VNTR 多态性位点提供了丰富的信息。许多这些重复与潜在的新但相似的编码区相关联,为疾病相关基因提供了令人兴奋的潜力。一个用于查询重复的 Web 界面可在 http://bioinformatics.louisville.edu/zebrafish/ 上获得。该门户允许用户从 25 个连锁群中的任何有效指定区域搜索选定碱基大小的重复。