Wang Shuqin, Wang Yan, Du Wei, Sun Fangxun, Wang Xiumei, Zhou Chunguang, Liang Yanchun
College of Computer Science and Technology, Jilin University, Key Laboratory of Symbol Computation and Knowledge Engineering of the Ministry of Education, Changchun 130012, China.
Artif Intell Med. 2007 Oct;41(2):151-9. doi: 10.1016/j.artmed.2007.07.010. Epub 2007 Sep 14.
The prediction of operons is critical to the reconstruction of regulatory networks at the whole genome level. Multiple genome features have been used for predicting operons. However, multiple genome features are usually dealt with using only single method in the literatures. The aim of this paper is to develop a combined method for operon prediction by using different methods to preprocess different genome features in order for exerting their unique characteristics.
A novel multi-approach-guided genetic algorithm for operon prediction is presented. We exploit different methods for intergenic distance, cluster of orthologous groups (COG) gene functions, metabolic pathway and microarray expression data. A novel local-entropy-minimization method is proposed to partition intergenic distance. Our program can be used for other newly sequenced genomes by transferring the knowledge that has been obtained from Escherichia coli data. We calculate the log-likelihood for COG gene functions and Pearson correlation coefficient for microarray expression data. The genetic algorithm is used for integrating the four types of data.
The proposed method is examined on E. coli K12 genome, Bacillus subtilis genome, and Pseudomonas aeruginosa PAO1 genome. The accuracies of prediction for these three genomes are 85.9987%, 88.296%, and 81.2384%, respectively.
Simulated experimental results demonstrate that in the genetic algorithm the preprocessing for genome data using multiple approaches ensures the effective utilization of different biological characteristics. Experimental results also show that the proposed method is applicable for predicting operons in prokaryote.
操纵子预测对于全基因组水平调控网络的重建至关重要。多种基因组特征已被用于预测操纵子。然而,在文献中多种基因组特征通常仅用单一方法处理。本文的目的是开发一种组合方法,通过使用不同方法对不同基因组特征进行预处理,以发挥它们的独特特性来进行操纵子预测。
提出了一种用于操纵子预测的新型多方法引导遗传算法。我们利用不同方法处理基因间距离、直系同源簇(COG)基因功能、代谢途径和微阵列表达数据。提出了一种新的局部熵最小化方法来划分基因间距离。通过转移从大肠杆菌数据中获得的知识,我们的程序可用于其他新测序的基因组。我们计算COG基因功能的对数似然值和微阵列表达数据的皮尔逊相关系数。遗传算法用于整合这四类数据。
在大肠杆菌K12基因组、枯草芽孢杆菌基因组和铜绿假单胞菌PAO1基因组上检验了所提出的方法。这三个基因组的预测准确率分别为85.9987%、88.296%和81.2384%。
模拟实验结果表明,在遗传算法中使用多种方法对基因组数据进行预处理可确保有效利用不同的生物学特征。实验结果还表明所提出的方法适用于预测原核生物中的操纵子。