Bergman Casey M, Pfeiffer Barret D, Rincón-Limas Diego E, Hoskins Roger A, Gnirke Andreas, Mungall Chris J, Wang Adrienne M, Kronmiller Brent, Pacleb Joanne, Park Soo, Stapleton Mark, Wan Kenneth, George Reed A, de Jong Pieter J, Botas Juan, Rubin Gerald M, Celniker Susan E
Berkeley Drosophila Genome Project, Lawrence Berkeley National Laboratory, One Cyclotron Road, Berkeley, CA 94720, USA.
Genome Biol. 2002;3(12):RESEARCH0086. doi: 10.1186/gb-2002-3-12-research0086. Epub 2002 Dec 30.
It is widely accepted that comparative sequence data can aid the functional annotation of genome sequences; however, the most informative species and features of genome evolution for comparison remain to be determined.
We analyzed conservation in eight genomic regions (apterous, even-skipped, fushi tarazu, twist, and Rhodopsins 1, 2, 3 and 4) from four Drosophila species (D. erecta, D. pseudoobscura, D. willistoni, and D. littoralis) covering more than 500 kb of the D. melanogaster genome. All D. melanogaster genes (and 78-82% of coding exons) identified in divergent species such as D. pseudoobscura show evidence of functional constraint. Addition of a third species can reveal functional constraint in otherwise non-significant pairwise exon comparisons. Microsynteny is largely conserved, with rearrangement breakpoints, novel transposable element insertions, and gene transpositions occurring in similar numbers. Rates of amino-acid substitution are higher in uncharacterized genes relative to genes that have previously been studied. Conserved non-coding sequences (CNCSs) tend to be spatially clustered with conserved spacing between CNCSs, and clusters of CNCSs can be used to predict enhancer sequences.
Our results provide the basis for choosing species whose genome sequences would be most useful in aiding the functional annotation of coding and cis-regulatory sequences in Drosophila. Furthermore, this work shows how decoding the spatial organization of conserved sequences, such as the clustering of CNCSs, can complement efforts to annotate eukaryotic genomes on the basis of sequence conservation alone.
比较序列数据有助于基因组序列的功能注释,这一点已被广泛接受;然而,用于比较的最具信息价值的物种和基因组进化特征仍有待确定。
我们分析了来自四种果蝇(直立果蝇、拟暗果蝇、威氏果蝇和海滨果蝇)的八个基因组区域(无翅基因、偶数跳基因、腹沟基因、扭曲基因以及视紫红质1、2、3和4)的保守性,这些区域覆盖了黑腹果蝇基因组超过500 kb的范围。在拟暗果蝇等分化物种中鉴定出的所有黑腹果蝇基因(以及78 - 82%的编码外显子)都显示出功能受限的证据。加入第三个物种可以揭示原本在成对外显子比较中不显著的功能受限情况。微同线性在很大程度上是保守的,重排断点、新的转座元件插入和基因转座的数量相似。相对于先前研究过的基因,未表征基因的氨基酸替换率更高。保守非编码序列(CNCSs)倾向于在空间上聚集,且CNCSs之间具有保守的间距,CNCSs簇可用于预测增强子序列。
我们的结果为选择基因组序列对辅助果蝇编码序列和顺式调控序列功能注释最有用的物种提供了依据。此外,这项工作表明了解码保守序列的空间组织,如CNCSs的聚类,如何能够补充仅基于序列保守性注释真核基因组的努力。