Warburton Peter E, Giordano Joti, Cheung Fanny, Gelfand Yefgeniy, Benson Gary
Department of Human Genetics, Mount Sinai School of Medicine, New York, New York 10029, USA.
Genome Res. 2004 Oct;14(10A):1861-9. doi: 10.1101/gr.2542904.
We have performed the first genome-wide analysis of the Inverted Repeat (IR) structure in the human genome, using a novel and efficient software package called Inverted Repeats Finder (IRF). After masking of known repetitive elements, IRF detected 22,624 human IRs characterized by arm size from 25 bp to >100 kb with at least 75% identity, and spacer length up to 100 kb. This analysis required 6 h on a desktop PC. In all, 166 IRs had arm lengths >8 kb. From this set, IRs were excluded if they were in unfinished/unassembled regions of the genome, or clustered with other closely related IRs, yielding a set of 96 large IRs. Of these, 24 (25%) occurred on the X-chromosome, although it represents only approximately 5% of the genome. Of the X-chromosome IRs, 83.3% were >/=99% identical, compared with 28.8% of autosomal IRs. Eleven IRs from Chromosome X, one from Chromosome 11, and seven already described from Chromosome Y contain genes predominantly expressed in testis. PCR analysis of eight of these IRs correctly amplified the corresponding region in the human genome, and six were also confirmed in gorilla or chimpanzee genomes. Similarity dot-plots revealed that 22 IRs contained further secondary homologous structures partially categorized into three distinct patterns. The prevalence of large highly homologous IRs containing testes genes on the X- and Y-chromosomes suggests a possible role in male germ-line gene expression and/or maintaining sequence integrity by gene conversion.
我们使用一种名为反向重复序列查找器(IRF)的新型高效软件包,对人类基因组中的反向重复序列(IR)结构进行了首次全基因组分析。在屏蔽已知的重复元件后,IRF检测到22,624个人类IR,其特征为臂长从25 bp到>100 kb,同一性至少为75%,间隔长度可达100 kb。此分析在台式计算机上需要6小时。总共有166个IR的臂长>8 kb。从这个集合中,如果IR位于基因组的未完成/未组装区域,或与其他密切相关的IR聚集在一起,则将其排除,从而得到一组96个大的IR。其中,24个(25%)出现在X染色体上,尽管它仅占基因组的约5%。在X染色体IR中,83.3%的同一性> /=99%,而常染色体IR的这一比例为28.8%。来自X染色体的11个IR、来自11号染色体的1个IR以及来自Y染色体的7个已描述的IR包含主要在睾丸中表达的基因。对其中8个IR进行的PCR分析正确扩增了人类基因组中的相应区域,并且在大猩猩或黑猩猩基因组中也证实了6个。相似性点图显示,22个IR包含进一步的二级同源结构,部分可分为三种不同模式。X和Y染色体上含有睾丸基因的大型高度同源IR的普遍性表明,它们可能在雄性生殖系基因表达和/或通过基因转换维持序列完整性方面发挥作用。