Subramanian Subbaya, Madgula Vamsi M, George Ranjan, Mishra Rakesh K, Pandit Madhusudhan W, Kumar Chanderashekar S, Singh Lalji
Centre for Cellular and Molecular Biology, Uppal Road, Hyderabad, 500 007, India.
Bioinformatics. 2003 Mar 22;19(5):549-52. doi: 10.1093/bioinformatics/btg029.
Simple sequence repeats (SSRs) or microsatellite repeats are found abundantly in many prokaryotic and eukaryotic genomes. Among SSRs, triplet repeats are of special significance because some of them have been linked to various genetic disorders. The objective of the study is to analyze the triplet repeats of complete human genome and to identify the genes that contain the triplet repeats in their coding region. The analysis will help us to identify the candidate genes that have potential for repeat expansion.
We have analyzed triplet repeats in the complete human genome from the publicly available sequences. Our analysis revealed that AGC and CCG repeat were predominantly present in the coding regions of the genome while UTRs and the upstream sequences contained CCG repeats in relative abundance. Analysis of density of triplet repeats (bp/Mb) revealed that AAT and AAC were the abundant repeats whereas ACT and ACG were the rare repeats found in human genome. We could identify about 2135 known or predicted genes that were associated with at least one of the triplet repeat types. A large proportion of putative transcripts that were identified by gene finding programs were found to be associated with triplet repeats. These transcripts will be the candidate genes for analysis of triplet repeat expansion and a possible association with disease phenotypes. Identification of 171 genes which contain a minimum of ten repeat units will be of particular interest in future in correlating their association with any disease phenotype due to the expansion potential of repeats present in them. The list of genes and other details of analysis are given in the online supplementary data (http://www.ingenovis.com/tripletrepeats).
简单序列重复(SSRs)或微卫星重复在许多原核生物和真核生物基因组中大量存在。在SSRs中,三联体重复具有特殊意义,因为其中一些与各种遗传疾病有关。本研究的目的是分析完整人类基因组中的三联体重复,并确定其编码区域包含三联体重复的基因。该分析将有助于我们识别具有重复序列扩展潜力的候选基因。
我们从公开可用序列中分析了完整人类基因组中的三联体重复。我们的分析表明,AGC和CCG重复主要存在于基因组的编码区域,而UTR和上游序列中CCG重复相对丰富。三联体重复密度(bp/Mb)分析表明,AAT和AAC是丰富的重复序列,而ACT和ACG是人类基因组中罕见的重复序列。我们能够识别出约2135个已知或预测的与至少一种三联体重复类型相关的基因。通过基因发现程序鉴定的大量假定转录本被发现与三联体重复有关。这些转录本将是用于分析三联体重复扩展及其与疾病表型可能关联的候选基因。由于其中存在的重复序列具有扩展潜力,鉴定出至少包含十个重复单元的171个基因在未来将特别有助于关联它们与任何疾病表型的关系。基因列表和其他分析细节见在线补充数据(http://www.ingenovis.com/tripletrepeats)。