INRA, UMR1348 PEGASE, Saint-Gilles, France.
PLoS One. 2012;7(11):e50653. doi: 10.1371/journal.pone.0050653. Epub 2012 Nov 28.
There has been a surge in studies linking genome structure and gene expression, with special focus on duplicated genes. Although initially duplicated from the same sequence, duplicated genes can diverge strongly over evolution and take on different functions or regulated expression. However, information on the function and expression of duplicated genes remains sparse. Identifying groups of duplicated genes in different genomes and characterizing their expression and function would therefore be of great interest to the research community. The 'Duplicated Genes Database' (DGD) was developed for this purpose.
Nine species were included in the DGD. For each species, BLAST analyses were conducted on peptide sequences corresponding to the genes mapped on a same chromosome. Groups of duplicated genes were defined based on these pairwise BLAST comparisons and the genomic location of the genes. For each group, Pearson correlations between gene expression data and semantic similarities between functional GO annotations were also computed when the relevant information was available.
The Duplicated Gene Database provides a list of co-localised and duplicated genes for several species with the available gene co-expression level and semantic similarity value of functional annotation. Adding these data to the groups of duplicated genes provides biological information that can prove useful to gene expression analyses. The Duplicated Gene Database can be freely accessed through the DGD website at http://dgd.genouest.org.
基因组结构与基因表达之间的关联研究呈激增趋势,特别关注重复基因。尽管最初是从相同的序列复制而来,但重复基因在进化过程中会发生强烈的分歧,并具有不同的功能或表达调控。然而,关于重复基因的功能和表达的信息仍然很少。因此,识别不同基因组中的重复基因群,并描述它们的表达和功能,将引起研究界的极大兴趣。为此,开发了“重复基因数据库”(DGD)。
DGD 包含了 9 个物种。对于每个物种,在映射到同一染色体上的基因的肽序列上进行 BLAST 分析。基于这些成对 BLAST 比较和基因的基因组位置,定义了重复基因群。对于每个基因群,如果有相关信息,还计算了基因表达数据与功能 GO 注释之间的语义相似性之间的 Pearson 相关性。
重复基因数据库为多个物种提供了一组共定位和重复基因的列表,以及可用的基因共表达水平和功能注释的语义相似性值。将这些数据添加到重复基因群中,可以提供有助于基因表达分析的生物学信息。重复基因数据库可通过 DGD 网站(http://dgd.genouest.org)免费访问。