Volfovsky Natalia, Oleksyk Taras K, Cruz Kristine C, Truelove Ann L, Stephens Robert M, Smith Michael W
Advanced Biomedical Computing Center, Advanced Technology Program, SAIC-Frederick, National Cancer Institute at Frederick, Frederick, MD 21702, USA.
BMC Genomics. 2009 Jan 26;10:51. doi: 10.1186/1471-2164-10-51.
Understanding structure and function of human genome requires knowledge of genomes of our closest living relatives, the primates. Nucleotide insertions and deletions (indels) play a significant role in differentiation that underlies phenotypic differences between humans and chimpanzees. In this study, we evaluated distribution, evolutionary history, and function of indels found by comparing syntenic regions of the human and chimpanzee genomes.
Specifically, we identified 6,279 indels of 10 bp or greater in a ~33 Mb alignment between human and chimpanzee chromosome 22. After the exclusion of those in repetitive DNA, 1,429 or 23% of indels still remained. This group was characterized according to the local or genome-wide repetitive nature, size, location relative to genes, and other genomic features. We defined three major classes of these indels, using local structure analysis: (i) those indels found uniquely without additional copies of indel sequence in the surrounding (10 Kb) region, (ii) those with at least one exact copy found nearby, and (iii) those with similar but not identical copies found locally. Among these classes, we encountered a high number of exactly repeated indel sequences, most likely due to recent duplications. Many of these indels (683 of 1,429) were in proximity of known human genes. Coding sequences and splice sites contained significantly fewer of these indels than expected from random expectations, suggesting that selection is a factor in limiting their persistence. A subset of indels from coding regions was experimentally validated and their impacts were predicted based on direct sequencing in several human populations as well as chimpanzees, bonobos, gorillas, and two subspecies of orangutans.
Our analysis demonstrates that while indels are distributed essentially randomly in intergenic and intronic genomic regions, they are significantly under-represented in coding sequences. There are substantial differences in representation of indel classes among genomic elements, most likely caused by differences in their evolutionary histories. Using local sequence context, we predicted origins and phylogenetic relationships of gene-impacting indels in primate species. These results suggest that genome plasticity is a major force behind speciation events separating the great ape lineages.
了解人类基因组的结构和功能需要知晓我们现存最近的近亲灵长类动物的基因组。核苷酸插入和缺失(indels)在人类与黑猩猩之间表型差异所基于的分化过程中起着重要作用。在本研究中,我们通过比较人类和黑猩猩基因组的同线性区域,评估了indels的分布、进化历史及功能。
具体而言,我们在人类和黑猩猩22号染色体约33 Mb的比对中鉴定出6279个长度为10 bp或更长的indels。排除重复DNA中的indels后,仍有1429个(即23%)indels留存。该组indels根据局部或全基因组的重复性质、大小、相对于基因的位置以及其他基因组特征进行了分类。我们通过局部结构分析定义了这些indels的三大类:(i)在周围(10 Kb)区域中唯一发现且无indel序列额外拷贝的indels;(ii)在附近发现至少有一个精确拷贝的indels;(iii)在局部发现有相似但不完全相同拷贝的indels。在这些类别中,我们遇到了大量精确重复的indel序列,这很可能是由于近期的重复事件导致的。这些indels中有许多(1429个中的683个)位于已知人类基因附近。编码序列和剪接位点中的这些indels明显少于随机预期,这表明选择是限制它们留存的一个因素。对来自编码区域的一部分indels进行了实验验证,并根据在几个人类群体以及黑猩猩、倭黑猩猩、大猩猩和两种猩猩亚种中的直接测序预测了它们的影响。
我们的分析表明,虽然indels在基因间和内含子基因组区域基本随机分布,但它们在编码序列中的占比显著偏低。基因组元件之间indel类别的表现存在实质性差异,这很可能是由它们进化历史的差异导致的。利用局部序列背景,我们预测了灵长类物种中影响基因的indels的起源和系统发育关系。这些结果表明,基因组可塑性是区分大猩猩谱系的物种形成事件背后 的主要力量。