Suter-Crazzolara C, Kurapkat G
LION bioscience, Waldhofer Strasse 98, 69123 Heidelberg, Germany.
Genome Inform Ser Workshop Genome Inform. 2000;11:24-32.
Current genome projects are resulting in a flood of sequence data. The interpretation of these sequences is lagging, and optimized data analysis strategies need to be developed. Much can be learned from comparing different genomes, as genomes of distant organisms may still encode proteins with high sequence similarity. The order of genes (co linearity) in genomes may also be conserved to some extend. We have employed both these observations to create a multi-functional, computational analysis system (genomeSCOUT) which allows for rapid identification and functional characterization of genes and proteins through genome comparison. With a number of independent algorithms, information about different levels of protein homology (concerning e.g. paralogs, orthologs and clusters of orthologous groups, COGs) and gene order is collected and stored in several value added databases. These databases are then used for interactive comparison of genomes and subsequent analysis. The application is based on the well established data integration system SRS. This ensures (1) fast handling of large genomic data sets, (2) straightforward access to a multitude of biological databases, (3) unique linking functions between these databases, (4) highly efficient collection of information on genes and proteins, and 5. fully integrated and user friendly graphical representations of search results. This application can be used for projects as diverse as the correct annotation of genomes, the optimization of (micro) organisms for industrial production, or the identification of drug targets.
当前的基因组计划产生了大量的序列数据。对这些序列的解读滞后,因此需要开发优化的数据分析策略。通过比较不同的基因组可以学到很多东西,因为亲缘关系较远的生物体的基因组可能仍然编码具有高度序列相似性的蛋白质。基因组中基因的顺序(共线性)在一定程度上也可能是保守的。我们利用这两个观察结果创建了一个多功能的计算分析系统(genomeSCOUT),该系统允许通过基因组比较快速识别基因和蛋白质并对其进行功能表征。通过多种独立算法,收集有关不同水平蛋白质同源性(例如旁系同源物、直系同源物和直系同源簇,即COG)和基因顺序的信息,并存储在多个增值数据库中。然后使用这些数据库进行基因组的交互式比较和后续分析。该应用程序基于成熟的数据集成系统SRS。这确保了:(1)快速处理大型基因组数据集;(2)直接访问众多生物数据库;(3)这些数据库之间独特的链接功能;(4)高效收集有关基因和蛋白质的信息;以及(5)搜索结果的完全集成且用户友好的图形表示。该应用程序可用于各种项目,如基因组的正确注释、优化(微)生物用于工业生产或识别药物靶点。