Chen Feng-Chi, Chen Chueng-Jong, Li Wen-Hsiung, Chuang Trees-Juen
Division of Biostatistics and Bioinformatics, National Health Research Institute, Miaoli County 350, Taiwan.
Genome Res. 2007 Jan;17(1):16-22. doi: 10.1101/gr.5429606. Epub 2006 Nov 9.
It has been suggested that insertions and deletions (indels) have contributed to the sequence divergence between the human and chimpanzee genomes more than do nucleotide changes (3% vs. 1.2%). However, although there have been studies of large indels between the two genomes, no systematic analysis of small indels (i.e., indels </= 100 bp) has been published. In this study, we first estimated that the false-positive rate of small indels inferred from human-chimpanzee pairwise sequence alignments is quite high, suggesting that the chimpanzee genome draft is not sufficiently accurate for our purpose. We have therefore inferred only human-specific indels using multiple sequence alignments of mammalian genomes. We identified >840,000 "small" indels, which affect >7000 UCSC-annotated human genes (>11,000 transcripts). These indels, however, amount to only approximately 0.21% sequence change in the human lineage for the regions compared, whereas in pseudogenes indels contribute to a sequence divergence of 1.40%, suggesting that most of the indels that occurred in genic regions have been eliminated. Functional analysis reveals that the genes whose coding exons have been affected by human-specific indels are enriched in transcription and translation regulatory activities but are underrepresented in catalytic and transporter activities, cellular and physiological processes, and extracellular region/matrix. This functional bias suggests that human-specific indels might have contributed to human unique traits by causing changes at the RNA and protein level.
有人提出,插入和缺失(indels)对人类和黑猩猩基因组之间的序列差异的贡献超过了核苷酸变化(分别为3%和1.2%)。然而,尽管已经有关于两个基因组之间大indels的研究,但尚未发表关于小indels(即长度≤100bp的indels)的系统分析。在本研究中,我们首先估计,从人类-黑猩猩成对序列比对中推断出的小indels的假阳性率相当高,这表明黑猩猩基因组草图对于我们的研究目的来说不够准确。因此,我们仅使用哺乳动物基因组的多序列比对来推断人类特有的indels。我们鉴定出超过840,000个“小”indels,这些indels影响了超过7000个UCSC注释的人类基因(超过11,000个转录本)。然而,在所比较的区域中,这些indels在人类谱系中的序列变化仅约为0.21%,而在假基因中,indels导致的序列差异为1.40%,这表明发生在基因区域的大多数indels已被消除。功能分析表明,其编码外显子受到人类特有的indels影响的基因在转录和翻译调控活动中富集,但在催化和转运活动、细胞和生理过程以及细胞外区域/基质中代表性不足。这种功能偏向表明,人类特有的indels可能通过在RNA和蛋白质水平上引起变化,对人类独特性状的形成做出了贡献。