Park Daniel, Singh Rohit, Baym Michael, Liao Chung-Shou, Berger Bonnie
Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139, USA.
Nucleic Acids Res. 2011 Jan;39(Database issue):D295-300. doi: 10.1093/nar/gkq1234.
We describe IsoBase, a database identifying functionally related proteins, across five major eukaryotic model organisms: Saccharomyces cerevisiae, Drosophila melanogaster, Caenorhabditis elegans, Mus musculus and Homo Sapiens. Nearly all existing algorithms for orthology detection are based on sequence comparison. Although these have been successful in orthology prediction to some extent, we seek to go beyond these methods by the integration of sequence data and protein-protein interaction (PPI) networks to help in identifying true functionally related proteins. With that motivation, we introduce IsoBase, the first publicly available ortholog database that focuses on functionally related proteins. The groupings were computed using the IsoRankN algorithm that uses spectral methods to combine sequence and PPI data and produce clusters of functionally related proteins. These clusters compare favorably with those from existing approaches: proteins within an IsoBase cluster are more likely to share similar Gene Ontology (GO) annotation. A total of 48,120 proteins were clustered into 12,693 functionally related groups. The IsoBase database may be browsed for functionally related proteins across two or more species and may also be queried by accession numbers, species-specific identifiers, gene name or keyword. The database is freely available for download at http://isobase.csail.mit.edu/.
我们介绍了IsoBase,这是一个可识别功能相关蛋白质的数据库,涵盖五种主要的真核模式生物:酿酒酵母、黑腹果蝇、秀丽隐杆线虫、小家鼠和智人。几乎所有现有的直系同源检测算法都是基于序列比较的。尽管这些算法在一定程度上成功地进行了直系同源预测,但我们试图超越这些方法,通过整合序列数据和蛋白质-蛋白质相互作用(PPI)网络来帮助识别真正功能相关的蛋白质。出于这个动机,我们推出了IsoBase,这是首个专注于功能相关蛋白质的公开可用直系同源数据库。这些分组是使用IsoRankN算法计算得出的,该算法使用光谱方法来结合序列和PPI数据,并生成功能相关蛋白质的簇。这些簇与现有方法得出的簇相比具有优势:IsoBase簇中的蛋白质更有可能共享相似的基因本体(GO)注释。总共48,120种蛋白质被聚类为12,693个功能相关的组。可以通过IsoBase数据库浏览两个或更多物种间功能相关的蛋白质,也可以通过登录号、物种特异性标识符、基因名称或关键词进行查询。该数据库可在http://isobase.csail.mit.edu/免费下载。