Ding Guohui, Sun Yan, Li Hong, Wang Zhen, Fan Haiwei, Wang Chuan, Yang Dan, Li Yixue
Bioinformatics Center, Key Lab of Systems Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, 320 Yueyang Road, P. R. China.
Nucleic Acids Res. 2008 Jan;36(Database issue):D255-62. doi: 10.1093/nar/gkm924. Epub 2007 Nov 5.
Gene duplication is common in all three domains of life, especially in eukaryotic genomes. The duplicates provide new material for the action of evolutionary forces such as selection or genetic drift. Here we describe a sophisticated procedure to extract duplicated genes (paralogs) from 26 available eukaryotic genomes, to pre-calculate several evolutionary indexes (evolutionary rate, synonymous distance/clock, transition redundant exchange clock, etc.) based on the paralog family, and to identify block or segmental duplications (paralogons). We also constructed an internet-accessible Eukaryotic Paralog Group Database (EPGD; http://epgd.biosino.org/EPGD/). The database is gene-centered and organized by paralog family. It focuses on paralogs and evolutionary duplication events. The paralog families and paralogons can be searched by text or sequence, and are downloadable from the website as plain text files. The database will be very useful for both experimentalists and bioinformaticians interested in the study of duplication events or paralog families.
基因复制在生命的所有三个领域都很常见,尤其是在真核生物基因组中。这些复制基因为选择或遗传漂变等进化力量的作用提供了新的物质。在这里,我们描述了一种复杂的程序,用于从26个可用的真核生物基因组中提取复制基因(旁系同源基因),基于旁系同源基因家族预先计算几个进化指标(进化速率、同义距离/时钟、转换冗余交换时钟等),并识别块状或片段性复制(旁系同源区域)。我们还构建了一个可通过互联网访问的真核生物旁系同源基因组数据库(EPGD;http://epgd.biosino.org/EPGD/)。该数据库以基因为中心,按旁系同源基因家族组织。它专注于旁系同源基因和进化复制事件。旁系同源基因家族和旁系同源区域可以通过文本或序列进行搜索,并可从网站下载为纯文本文件。该数据库对于对复制事件或旁系同源基因家族研究感兴趣的实验人员和生物信息学家都将非常有用。