Odenwald Ward F, Rasband Wayne, Kuzin Alexander, Brody Thomas
Neural Cell-Fate Determinants Section, National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD 20892, USA.
Proc Natl Acad Sci U S A. 2005 Oct 11;102(41):14700-5. doi: 10.1073/pnas.0506915102. Epub 2005 Oct 3.
Here, we describe a multigenomic DNA sequence-analysis tool, evoprinter, that facilitates the rapid identification of evolutionary conserved sequences within the context of a single species. The evoprinter output identifies multispecies-conserved DNA sequences as they exist in a reference DNA. This identification is accomplished by superimposing multiple reference DNA vs. test-genome pairwise blat (blast-like alignment tool) readouts of the reference DNA to identify conserved nucleotides that are shared by all orthologous DNAs. evoprinter analysis of well characterized genes reveals that most, if not all, of the conserved sequences are essential for gene function. For example, analysis of orthologous genes that are shared by many vertebrates identifies conserved DNA in both protein-encoding sequences and noncoding cis-regulatory regions, including enhancers and mRNA microRNA binding sites. In Drosophila, the combined mutational histories of five or more species affords near-base pair resolution of conserved transcription factor DNA-binding sites, and essential amino acids are revealed by the nucleotide flexibility of their codon-wobble position(s). Conserved small peptide-encoding genes, which had been undetected by conventional gene-prediction algorithms, are identified by the codon-wobble signatures of invariant amino acids. Also, evoprinter allows one to assess the degree of evolutionary divergence between orthologous DNAs by highlighting differences between a selected species and the other test species.
在此,我们描述了一种多基因组DNA序列分析工具——进化打印机(evoprinter),它有助于在单一物种的背景下快速识别进化保守序列。进化打印机的输出结果可识别参考DNA中存在的多物种保守DNA序列。这种识别是通过将多个参考DNA与测试基因组的成对 blat(类似blast的比对工具)对参考DNA的读数进行叠加,以识别所有直系同源DNA共有的保守核苷酸来实现的。对特征明确的基因进行进化打印机分析表明,大多数(如果不是全部)保守序列对基因功能至关重要。例如,对许多脊椎动物共有的直系同源基因进行分析,可在蛋白质编码序列和非编码顺式调控区域(包括增强子和mRNA微小RNA结合位点)中识别出保守DNA。在果蝇中,五个或更多物种的综合突变历史能够提供保守转录因子DNA结合位点近乎碱基对水平的分辨率,并且通过其密码子摆动位置的核苷酸灵活性揭示了必需氨基酸。传统基因预测算法未检测到的保守小肽编码基因,可通过不变氨基酸的密码子摆动特征来识别。此外,进化打印机还能通过突出选定物种与其他测试物种之间的差异,来评估直系同源DNA之间的进化分歧程度。