Lin Hao, Li Qian-Zhong
Key Laboratory for NeuroInformation of Ministry of Education, Center of Bioinformatics, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China.
Theory Biosci. 2011 Jun;130(2):91-100. doi: 10.1007/s12064-010-0114-8. Epub 2010 Nov 3.
Promoters are modular DNA structures containing complex regulatory elements required for gene transcription initiation. Hence, the identification of promoters using machine learning approach is very important for improving genome annotation and understanding transcriptional regulation. In recent years, many methods have been proposed for the prediction of eukaryotic and prokaryotic promoters. However, the performances of these methods are still far from being satisfactory. In this article, we develop a hybrid approach (called IPMD) that combines position correlation score function and increment of diversity with modified Mahalanobis Discriminant to predict eukaryotic and prokaryotic promoters. By applying the proposed method to Drosophila melanogaster, Homo sapiens, Caenorhabditis elegans, Escherichia coli, and Bacillus subtilis promoter sequences, we achieve the sensitivities and specificities of 90.6 and 97.4% for D. melanogaster, 88.1 and 94.1% for H. sapiens, 83.3 and 95.2% for C. elegans, 84.9 and 91.4% for E. coli, as well as 80.4 and 91.3% for B. subtilis. The high accuracies indicate that the IPMD is an efficient method for the identification of eukaryotic and prokaryotic promoters. This approach can also be extended to predict other species promoters.
启动子是模块化的DNA结构,包含基因转录起始所需的复杂调控元件。因此,使用机器学习方法识别启动子对于改进基因组注释和理解转录调控非常重要。近年来,已经提出了许多预测真核和原核启动子的方法。然而,这些方法的性能仍远不能令人满意。在本文中,我们开发了一种混合方法(称为IPMD),该方法将位置相关评分函数和多样性增量与改进的马氏距离判别相结合,以预测真核和原核启动子。通过将所提出的方法应用于黑腹果蝇、智人、秀丽隐杆线虫、大肠杆菌和枯草芽孢杆菌的启动子序列,我们分别在黑腹果蝇上实现了90.6%的灵敏度和97.4%的特异性,在智人上实现了88.1%的灵敏度和94.1%的特异性,在秀丽隐杆线虫上实现了83.3%的灵敏度和95.2%的特异性,在大肠杆菌上实现了84.9%的灵敏度和91.4%的特异性,在枯草芽孢杆菌上实现了80.4%的灵敏度和91.3%的特异性。这些高精度表明IPMD是一种识别真核和原核启动子的有效方法。该方法还可以扩展到预测其他物种的启动子。