Persson Emma, Sonnhammer Erik L L
Department of Biochemistry and Biophysics, Stockholm University, Science for Life Laboratory, Box 1031, 17121 Solna, Sweden. Electronic address: https://twitter.com/eriksonnhammer.
Department of Biochemistry and Biophysics, Stockholm University, Science for Life Laboratory, Box 1031, 17121 Solna, Sweden.
J Mol Biol. 2023 Jul 15;435(14):168001. doi: 10.1016/j.jmb.2023.168001. Epub 2023 Feb 9.
Prediction of orthologs is an important bioinformatics pursuit that is frequently used for inferring protein function and evolutionary analyses. The InParanoid database is a well known resource of ortholog predictions between a wide variety of organisms. Although orthologs have historically been inferred at the level of full-length protein sequences, many proteins consist of several independent protein domains that may be orthologous to domains in other proteins in a way that differs from the full-length protein case. To be able to capture all types of orthologous relations, conventional full-length protein orthologs can be complemented with orthologs inferred at the domain level. We here present InParanoiDB 9, covering 640 species and providing orthologs for both protein domains and full-length proteins. InParanoiDB 9 was built using the faster InParanoid-DIAMOND algorithm for orthology analysis, as well as Domainoid and Pfam to infer orthologous domains. InParanoiDB 9 is based on proteomes from 447 eukaryotes, 158 bacteria and 35 archaea, and includes over one billion predicted ortholog groups. A new website has been built for the database, providing multiple search options as well as visualization of groups of orthologs and orthologous domains. This release constitutes a major upgrade of the InParanoid database in terms of the number of species as well as the new capability to operate on the domain level. InParanoiDB 9 is available at https://inparanoidb.sbc.su.se/.
直系同源物的预测是一项重要的生物信息学工作,常用于推断蛋白质功能和进行进化分析。InParanoid数据库是一个知名的资源库,可用于预测多种生物之间的直系同源物。虽然传统上直系同源物是在全长蛋白质序列水平上推断的,但许多蛋白质由几个独立的蛋白质结构域组成,这些结构域可能与其他蛋白质中的结构域直系同源,其方式与全长蛋白质的情况不同。为了能够捕捉所有类型的直系同源关系,可以用在结构域水平上推断的直系同源物来补充传统的全长蛋白质直系同源物。我们在此展示InParanoiDB 9,它涵盖640个物种,并提供蛋白质结构域和全长蛋白质的直系同源物。InParanoiDB 9是使用更快的InParanoid-DIAMOND算法进行直系同源分析构建的,同时还使用了Domainoid和Pfam来推断直系同源结构域。InParanoiDB 9基于447个真核生物、158个细菌和35个古细菌的蛋白质组,包括超过10亿个预测的直系同源组。已为该数据库建立了一个新网站,提供多种搜索选项以及直系同源物和直系同源结构域组的可视化。就物种数量以及在结构域水平上操作的新能力而言,此版本构成了InParanoid数据库的一次重大升级。InParanoiDB 9可在https://inparanoidb.sbc.su.se/获取。