Missirlis Perseus I, Mead Carri-Lyn R, Butland Stefanie L, Ouellette B F Francis, Devon Rebecca S, Leavitt Blair R, Holt Robert A
Genome Sciences Centre, BC Cancer Agency, Suite 100, 570 West 7th Ave, Vancouver, BC, V5Z 4S6, Canada.
BMC Bioinformatics. 2005 Jun 10;6:145. doi: 10.1186/1471-2105-6-145.
To date, 35 human diseases, some of which also exhibit anticipation, have been associated with unstable repeats. Anticipation has been reported in a number of diseases in which repeat expansion may have a role in etiology. Despite the growing importance of unstable repeats in disease, currently no resource exists for the prioritization of repeats. Here we present Satellog, a database that catalogs all pure 1-16 repeat unit satellite repeats in the human genome along with supplementary data. Satellog analyzes each pure repeat in UniGene clusters for evidence of repeat polymorphism.
A total of 5,546 such repeats were identified, providing the first indication of many novel polymorphic sites in the genome. Overall, polymorphic repeats were over-represented within 3'-UTR sequence relative to 5'-UTR and coding sequence. Interestingly, we observed that repeat polymorphism within coding sequence is restricted to trinucleotide repeats whereas UTR sequence tolerated a wider range of repeat period polymorphisms. For each pure repeat we also calculate its repeat length percentile rank, its location either within or adjacent to EnsEMBL genes, and its expression profile in normal tissues according to the GeneNote database.
Satellog provides the ability to dynamically prioritize repeats based on any of their characteristics (i.e. repeat unit, class, period, length, repeat length percentile rank, genomic co-ordinates), polymorphism profile within UniGene, proximity to or presence within gene regions (i.e. cds, UTR, 15 kb upstream etc.), metadata of the genes they are detected within and gene expression profiles within normal human tissues. Unstable repeats associated with 31 diseases were analyzed in Satellog to evaluate their common repeat properties. The utility of Satellog was highlighted by prioritizing repeats for Huntington's disease and schizophrenia. Satellog is available online at http://satellog.bcgsc.ca.
迄今为止,已有35种人类疾病与不稳定重复序列相关,其中一些疾病还表现出遗传早现现象。在许多疾病中都报道了遗传早现现象,重复序列扩增可能在其病因学中起作用。尽管不稳定重复序列在疾病中的重要性日益增加,但目前尚无用于对重复序列进行优先级排序的资源。在此,我们展示了Satellog,这是一个对人类基因组中所有纯1-16重复单元卫星重复序列以及补充数据进行编目的数据库。Satellog分析了UniGene簇中的每个纯重复序列,以寻找重复序列多态性的证据。
共鉴定出5546个此类重复序列,这首次表明基因组中存在许多新的多态性位点。总体而言,相对于5'-UTR和编码序列,多态性重复序列在3'-UTR序列中出现的频率过高。有趣的是,我们观察到编码序列内的重复序列多态性仅限于三核苷酸重复序列,而UTR序列则容忍更广泛的重复周期多态性。对于每个纯重复序列,我们还根据GeneNote数据库计算其重复长度百分位数排名、其在EnsEMBL基因内部或附近的位置以及其在正常组织中的表达谱。
Satellog能够根据重复序列的任何特征(即重复单元、类别、周期、长度、重复长度百分位数排名、基因组坐标)、UniGene内的多态性谱、与基因区域(即编码区、UTR、上游15 kb等)的接近程度或在基因区域内的存在情况、它们在其中被检测到的基因的元数据以及正常人体组织内的基因表达谱,动态地对重复序列进行优先级排序。在Satellog中分析了与31种疾病相关的不稳定重复序列,以评估它们共同的重复特性。通过对亨廷顿舞蹈病和精神分裂症的重复序列进行优先级排序,突出了Satellog的实用性。Satellog可在http://satellog.bcgsc.ca在线获取。