Goldovsky Leon, Janssen Paul, Ahrén Dag, Audit Benjamin, Cases Ildefonso, Darzentas Nikos, Enright Anton J, López-Bigas Núria, Peregrin-Alvarez José M, Smith Mike, Tsoka Sophia, Kunin Victor, Ouzounis Christos A
Computational Genomics Group, The European Bioinformatics Institute EMBL, Cambridge Outstation, Cambridge CB10 1SD, UK.
Bioinformatics. 2005 Oct 1;21(19):3806-10. doi: 10.1093/bioinformatics/bti579.
CoGenT++ is a data environment for computational research in comparative and functional genomics, designed to address issues of consistency, reproducibility, scalability and accessibility.
CoGenT++ facilitates the re-distribution of all fully sequenced and published genomes, storing information about species, gene names and protein sequences. We describe our scalable implementation of ProXSim, a continually updated all-against-all similarity database, which stores pairwise relationships between all genome sequences. Based on these similarities, derived databases are generated for gene fusions--AllFuse, putative orthologs--OFAM, protein families--TRIBES, phylogenetic profiles--ProfUse and phylogenetic trees. Extensions based on the CoGenT++ environment include disease gene prediction, pattern discovery, automated domain detection, genome annotation and ancestral reconstruction.
CoGenT++ provides a comprehensive environment for computational genomics, accessible primarily for large-scale analyses as well as manual browsing.
CoGenT++是一个用于比较和功能基因组学计算研究的数据环境,旨在解决一致性、可重复性、可扩展性和可访问性问题。
CoGenT++促进所有已完全测序和发表的基因组的重新分发,存储有关物种、基因名称和蛋白质序列的信息。我们描述了ProXSim的可扩展实现,这是一个不断更新的全基因组比对相似性数据库,它存储所有基因组序列之间的成对关系。基于这些相似性,生成了用于基因融合的衍生数据库——AllFuse、假定的直系同源基因——OFAM、蛋白质家族——TRIBES、系统发育谱——ProfUse和系统发育树。基于CoGenT++环境的扩展包括疾病基因预测、模式发现、自动结构域检测、基因组注释和祖先重建。
CoGenT++为计算基因组学提供了一个全面的环境,主要可用于大规模分析以及手动浏览。