Evolutionary Medicine Unit, University of the Witwatersrand and National Health Laboratory Service, Johannesburg, South Africa.
BMC Bioinformatics. 2010 Mar 24;11:151. doi: 10.1186/1471-2105-11-151.
Sequence alignments form part of many investigations in molecular biology, including the determination of phylogenetic relationships, the prediction of protein structure and function, and the measurement of evolutionary rates. However, to obtain meaningful results, a significant degree of sequence similarity is required to ensure that the alignments are accurate and the inferences correct. Limitations arise when sequence similarity is low, which is particularly problematic when working with fast-evolving genes, evolutionary distant taxa, genomes with nucleotide biases, and cases of convergent evolution.
A novel approach was conceptualized to address the "low sequence similarity" alignment problem. We developed an alignment algorithm termed FIRE (Functional Inference using the Rates of Evolution), which aligns sequences using the evolutionary rate at codon sites, as measured by the dN/dS ratio, rather than nucleotide or amino acid residues. FIRE was used to test the hypotheses that evolutionary rates can be used to align sequences and that the alignments may be used to infer protein domain function. Using a range of test data, we found that aligning domains based on evolutionary rates was possible even when sequence similarity was very low (for example, antibody variable regions). Furthermore, the alignment has the potential to infer protein domain function, indicating that domains with similar functions are subject to similar evolutionary constraints. These data suggest that an evolutionary rate-based approach to sequence analysis (particularly when combined with structural data) may be used to study cases of convergent evolution or when sequences have very low similarity. However, when aligning homologous gene sets with sequence similarity, FIRE did not perform as well as the best traditional alignment algorithms indicating that the conventional approach of aligning residues as opposed to evolutionary rates remains the method of choice in these cases.
FIRE provides proof of concept that it is possible to align sequences and infer domain function by using evolutionary rates rather than residue similarity. This represents a new approach to sequence analysis with a wide range of potential applications in molecular biology.
序列比对是分子生物学中许多研究的一部分,包括确定系统发育关系、预测蛋白质结构和功能以及测量进化率。然而,为了获得有意义的结果,需要有相当程度的序列相似性,以确保比对准确,推断正确。当序列相似性较低时,就会出现限制,特别是在处理快速进化的基因、进化距离较远的分类群、核苷酸偏倚的基因组以及趋同进化的情况下。
我们提出了一种新的方法来解决“低序列相似性”比对问题。我们开发了一种称为 FIRE(使用进化速率进行功能推断)的比对算法,该算法使用由 dN/dS 比衡量的密码子位点的进化率来比对序列,而不是使用核苷酸或氨基酸残基。FIRE 用于测试以下假设:进化率可用于对齐序列,并且比对可以用于推断蛋白质结构域的功能。使用一系列测试数据,我们发现即使在序列相似性非常低的情况下(例如抗体可变区),基于进化率对齐结构域也是可能的。此外,该比对有可能推断蛋白质结构域的功能,表明具有相似功能的结构域受到相似的进化限制。这些数据表明,基于进化率的序列分析方法(特别是与结构数据结合使用时)可用于研究趋同进化的情况或当序列具有非常低的相似性时。然而,当用序列相似性对齐同源基因集时,FIRE 的性能不如最佳传统比对算法好,这表明在这些情况下,将残基而非进化率进行比对仍然是首选方法。
FIRE 证明了通过使用进化率而不是残基相似性来对齐序列并推断结构域功能是可行的。这代表了一种新的序列分析方法,在分子生物学中有广泛的潜在应用。