Britten Roy J
California Institute of Technology, 101 Dahlia Avenue, Corona del Mar, CA 92625, USA.
Proc Natl Acad Sci U S A. 2006 Dec 12;103(50):19027-32. doi: 10.1073/pnas.0608796103. Epub 2006 Dec 4.
Results of protein sequence comparison at open criterion show a very large number of relationships that have, up to now, gone unreported. The relationships suggest many ancient events of gene duplication. It is well known that gene duplication has been a major process in the evolution of genomes. A collection of human genes that have known functions have been examined for a history of gene duplications detected by means of amino acid sequence similarity by using BLASTp with an expectation of two or less (open criterion). Because the collection of genes in build 35 includes sets of transcript variants, all genes of known function were collected, and only the longest transcription variant was included, yielding a 13,298-member library called KGMV (for known genes maximum variant). When all lengths of matches are accepted, >97% of human genes show significant matches to each other. Many form matches with a large number of other different proteins, showing that most genes are made up from parts of many others as a result of ancient events of duplication. To support the use of the open criterion, all of the members of the KGMV library were twice replaced with random protein sequences of the same length and average composition, and all were compared with each other with BLASTp at expectation two or less. The set of matches averaged 0.35% of that observed for the KGMV set of proteins.
开放标准下的蛋白质序列比较结果显示,有大量关系至今未被报道。这些关系表明存在许多古老的基因复制事件。众所周知,基因复制是基因组进化中的一个主要过程。通过使用期望值为2或更低的BLASTp(开放标准),利用氨基酸序列相似性检测了一组已知功能的人类基因的基因复制历史。由于构建版本35中的基因集合包括转录变体集,因此收集了所有已知功能的基因,并且仅包含最长的转录变体,从而产生了一个名为KGMV(已知基因最大变体)的包含13298个成员的文库。当接受所有匹配长度时,超过97%的人类基因彼此之间显示出显著匹配。许多基因与大量其他不同蛋白质形成匹配,这表明由于古老的复制事件,大多数基因是由许多其他基因的部分组成的。为了支持开放标准的使用,KGMV文库的所有成员被两次替换为相同长度和平均组成的随机蛋白质序列,并使用期望值为2或更低的BLASTp将所有序列相互比较。匹配集平均为KGMV蛋白质集观察到的匹配集的0.35%。