Zhao Jizhen, Che Dongsheng, Cai Liming
Department of Computer Science, University of Georgia, Athens, GA 30602, USA.
Pac Symp Biocomput. 2007:496-507.
Template-based comparative analysis is a viable approach to the prediction and annotation of pathways in genomes. Methods based solely on sequence similarity may not be effective enough; functional and structural information such as protein-DNA interactions and operons can prove useful in improving the prediction accuracy. In this paper, we present a novel approach to predicting pathways by seeking high overall sequence similarity, functional and structural consistency between the predicted pathways and their templates. In particular, the prediction problem is formulated into finding the maximum independent set (MIS) in the graph constructed based on operon or interaction structures as well as homologous relationships of the involved genes. On such graphs, the MIS problem is solved efficiently via non-trivial tree decomposition of the graphs. The developed algorithm is evaluated based on the annotation of 40 pathways in Escherichia coli (E. coli) K12 using those in Bacillus subtilis (B. subtilis) 168 as templates. It demonstrates overall accuracy that outperforms those of the methods based solely on sequence similarity or using structural information of the genome with integer programming.
基于模板的比较分析是预测和注释基因组中通路的一种可行方法。仅基于序列相似性的方法可能不够有效;诸如蛋白质 - DNA相互作用和操纵子等功能和结构信息在提高预测准确性方面可能很有用。在本文中,我们提出了一种通过寻求预测通路与其模板之间的高整体序列相似性、功能和结构一致性来预测通路的新方法。具体而言,预测问题被转化为在基于操纵子或相互作用结构以及所涉及基因的同源关系构建的图中找到最大独立集(MIS)。在这样的图上,通过对图进行非平凡的树分解有效地解决了MIS问题。使用枯草芽孢杆菌(B. subtilis)168中的通路作为模板,基于大肠杆菌(E. coli)K12中40条通路的注释对所开发的算法进行了评估。结果表明,该算法的总体准确性优于仅基于序列相似性或使用整数规划结合基因组结构信息的方法。