Center for Computational Biology and Bioinformatics and Department of Biomedical Informatics, Columbia University, Irving Cancer Research Center, New York, New York, USA.
Nat Chem Biol. 2010 Jan;6(1):34-40. doi: 10.1038/nchembio.266. Epub 2009 Nov 22.
With the increasing role of computational tools in the analysis of sequenced genomes, there is an urgent need to maintain high accuracy of functional annotations. Misannotations can be easily generated and propagated through databases by functional transfer based on sequence homology. We developed and optimized an automatic policing method to detect biochemical misannotations using context genomic correlations. The method works by finding genes with unusually weak genomic correlations in their assigned network positions. We demonstrate the accuracy of the method using a cross-validated approach. In addition, we show that the method identifies a significant number of potential misannotations in Bacillus subtilis, including metabolic assignments already shown to be incorrect experimentally. The experimental analysis of the mispredicted genes forming the leucine degradation pathway in B. subtilis demonstrates that computational policing tools can generate important biological hypotheses.
随着计算工具在分析测序基因组中的作用不断增加,迫切需要保持功能注释的高度准确性。基于序列同源性的功能转移,误注释很容易在数据库中生成和传播。我们开发并优化了一种自动检测方法,利用上下文基因组相关性来检测生化误注释。该方法通过找到在分配的网络位置中具有异常弱基因组相关性的基因来工作。我们使用交叉验证方法证明了该方法的准确性。此外,我们还表明,该方法可以识别枯草芽孢杆菌中大量潜在的误注释,包括已经通过实验证明不正确的代谢分配。枯草芽孢杆菌中亮氨酸降解途径的误预测基因的实验分析表明,计算检测工具可以生成重要的生物学假设。