Suppr超能文献

利用功能和组织信息改进通路基因组数据库上转录单元的全基因组计算预测。

Using functional and organizational information to improve genome-wide computational prediction of transcription units on pathway-genome databases.

作者信息

Romero P R, Karp P D

机构信息

Bioinformatics Research Group, Artificial Intelligence Center, SRI International, 333 Ravenswood Avenue, Menlo Park, CA 950151, USA.

出版信息

Bioinformatics. 2004 Mar 22;20(5):709-17. doi: 10.1093/bioinformatics/btg471. Epub 2004 Jan 29.

Abstract

MOTIVATION

The prediction of transcription units (TUs, which are similar to operons) is an important problem that has been tackled using many different approaches. The availability of complete microbial genomes has made genome-wide TU predictions possible. Pathway-genome databases (PGDBs) add metabolic and other organizational (i.e. protein complexes) information to the annotated genome, and are able to capture TU organization information. These characteristics of PGDBs make them a suitable framework for the development and implementation of TU predictors.

RESULTS

We implemented a TU predictor that uses only intergenic distance and functional classification of genes to predict TU boundaries, and applied it to EcoCyc, our PGDB of Escherichia coli. To this original predictor, we added information on metabolic pathways, protein complexes and transporters, all readily available in EcoCyc, in order to generate an enhanced predictor. The enhanced predictor correctly predicted 80% of the known E.coli TUs (69% of the known operons), a moderate improvement over the original predictor's performance (75% of TUs and 65% of operons correctly predicted), demonstrating that the extra information available in the PGDB does indeed improve prediction performance. Performance of this E.coli-based predictor on a genome other than that of E.coli was tested on BsubCyc, our computationally generated PGDB for Bacillus subtilis, for which a set of 100 known operons is available. Prediction accuracy decreased substantially (46% of the known operons correctly predicted). This was due in part to missing information in BsubCyc, which prevented full use of the predictor's features. The augmented predictor has been implemented as part of our Pathway Tools software suite, and can be used to populate a PGDB with predicted TUs.

AVAILABILITY

The TU predictor is included in version 7.0 of the Pathway Tools software suite. Pathway Tools 7.0 is available free of charge to academic institutions and for a fee to commercial enterprises. It runs on Sun Solaris 8, Linux and Windows. TUs predicted on the Caulobacter crescentus and Mycobacterium tuberculosis (H37Rv) genomes are available in our CauloCyc and MtbrvCyc databases, available at the BioCyc web site (http://biocyc.org). To obtain version 7.0 of Pathway Tools, follow the directions in our web site, http://biocyc.org/download.shtml.

摘要

动机

转录单元(TU,类似于操纵子)的预测是一个重要问题,已经采用了许多不同方法来解决。完整微生物基因组的可得性使得全基因组TU预测成为可能。通路基因组数据库(PGDB)为注释基因组添加了代谢和其他组织信息(即蛋白质复合物),并且能够获取TU组织信息。PGDB的这些特性使其成为开发和实施TU预测器的合适框架。

结果

我们实现了一个仅使用基因间距离和基因功能分类来预测TU边界的TU预测器,并将其应用于我们的大肠杆菌PGDB EcoCyc。对于这个原始预测器,我们添加了代谢途径、蛋白质复合物和转运蛋白的信息(这些在EcoCyc中均可轻松获取),以生成一个增强型预测器。增强型预测器正确预测了80%的已知大肠杆菌TU(69%的已知操纵子),相较于原始预测器的性能(正确预测75%的TU和65%的操纵子)有适度提升,这表明PGDB中可用的额外信息确实提高了预测性能。基于大肠杆菌的这个预测器在除大肠杆菌基因组之外的其他基因组上的性能,在我们针对枯草芽孢杆菌通过计算生成的PGDB BsubCyc上进行了测试,该数据库有一组100个已知操纵子。预测准确率大幅下降(正确预测46%的已知操纵子)。这部分是由于BsubCyc中缺少信息,从而妨碍了对预测器功能的充分利用。增强型预测器已作为我们的通路工具软件套件的一部分实现,可用于用预测的TU填充PGDB。

可用性

TU预测器包含在通路工具软件套件的7.0版本中。通路工具7.0向学术机构免费提供,向商业企业收费。它运行在Sun Solaris 8、Linux和Windows上。在新月柄杆菌和结核分枝杆菌(H37Rv)基因组上预测的TU可在我们的CauloCyc和MtbrvCyc数据库中获取,可在BioCyc网站(http://biocyc.org)上找到。要获取通路工具7.0版本,请遵循我们网站(http://biocyc.org/download.shtml)上的说明。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验