Robinson Mark, Sun Yi, Boekhorst Rene Te, Kaye Paul, Adams Rod, Davey Neil, Rust Alistair G
Science and Technology Research Institute, University of Hertfordshire, College Lane Hatfield, Hertfordshire AL10 9AB, UK.
Pac Symp Biocomput. 2006:391-402.
The location of cis-regulatory binding sites determine the connectivity of genetic regulatory networks and therefore constitute a natural focal point for research into the many biological systems controlled by such regulatory networks. Accurate computational prediction of these binding sites would facilitate research into a multitude of key areas, including embryonic development, evolution, pharmacogenemics, cancer and many other transcriptional diseases, and is likely to be an important precursor for the reverse engineering of genome wide, genetic regulatory networks. Many algorithmic strategies have been developed for the computational prediction of cis-regulatory binding sites but currently all approaches are prone to high rates of false positive predictions, and many are highly dependent on additional information, limiting their usefulness as research tools. In this paper we present an approach for improving the accuracy of a selection of established prediction algorithms. Firstly, it is shown that species specific optimization of algorithmic parameters can, in some cases, significantly improve the accuracy of algorithmic predictions. Secondly, it is demonstrated that the use of non-linear classification algorithms to integrate predictions from multiple sources can result in more accurate predictions. Finally, it is shown that further improvements in prediction accuracy can be gained with the use of biologically inspired post-processing of predictions.
顺式调控结合位点的位置决定了基因调控网络的连通性,因此成为研究受此类调控网络控制的众多生物系统的自然焦点。对这些结合位点进行准确的计算预测将有助于开展众多关键领域的研究,包括胚胎发育、进化、药物基因组学、癌症以及许多其他转录疾病,并且很可能是全基因组基因调控网络逆向工程的重要前提。已经开发了许多算法策略用于顺式调控结合位点的计算预测,但目前所有方法都容易出现高比例的假阳性预测,而且许多方法高度依赖额外信息,限制了它们作为研究工具的实用性。在本文中,我们提出了一种提高一系列既定预测算法准确性的方法。首先,研究表明在某些情况下,算法参数的物种特异性优化可显著提高算法预测的准确性。其次,证明了使用非线性分类算法整合来自多个来源的预测能够得到更准确的预测结果。最后,研究表明通过对预测进行生物启发式后处理可进一步提高预测准确性。