Kirsch Stefan, Weiss Birgit, Miner Tracie L, Waterston Robert H, Clark Royden A, Eichler Evan E, Münch Claudia, Schempp Werner, Rappold Gudrun
Institute of Human Genetics, University of Heidelberg, INF 366, 69120 Heidelberg, Germany.
Genome Res. 2005 Feb;15(2):195-204. doi: 10.1101/gr.3302705. Epub 2005 Jan 14.
Basic medical research critically depends on the finished human genome sequence. Two types of gaps are known to exist in the human genome: those associated with heterochromatic sequences and those embedded within euchromatin. We identified and analyzed a euchromatic island within the pericentromeric repeats of the human Y chromosome. This 450-kb island, although not recalcitrant to subcloning and present in 100 tested males from different ethnic origins, was not detected and is not contained within the published Y chromosomal sequence. The entire 450-kb interval is almost completely duplicated and consists predominantly of interchromosomal rather than intrachromosomal duplication events that are usually prevalent on the Y chromosome. We defined the modular structure of this interval and detected a total of 128 underlying pairwise alignments (>/=90% and >/=1 kb in length) to various autosomal pericentromeric and ancestral pericentromeric regions. We also analyzed the putative gene content of this region by a combination of in silico gene prediction and paralogy analysis. We can show that even in this exceptionally duplicated region of the Y chromosome, eight putative genes with open reading frames reside, including fusion transcripts formed by the splicing of exons from two different duplication modules as well as members of the homeobox gene family DUX.
基础医学研究严重依赖于完整的人类基因组序列。已知人类基因组中存在两种类型的缺口:与异染色质序列相关的缺口和嵌入常染色质内的缺口。我们鉴定并分析了人类Y染色体着丝粒周围重复序列中的一个常染色质岛。这个450kb的岛虽然不难进行亚克隆,且在来自不同种族的100名受试男性中都存在,但在已发表的Y染色体序列中未被检测到且不包含在内。整个450kb的区间几乎完全重复,主要由染色体间而非染色体内部的重复事件组成,而染色体内部的重复事件通常在Y染色体上更为普遍。我们定义了这个区间的模块结构,并检测到与各种常染色体着丝粒周围和祖先着丝粒周围区域总共128个潜在的成对比对(长度≥90%且≥1kb)。我们还通过计算机基因预测和旁系同源分析相结合的方法分析了该区域的推定基因含量。我们可以证明,即使在Y染色体这个异常重复的区域,也存在八个具有开放阅读框的推定基因,包括由来自两个不同重复模块的外显子拼接形成的融合转录本以及同源异型框基因家族DUX的成员。