Lakshman Aidan H, Wright Erik S
Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, USA.
Center for Evolutionary Biology and Medicine, Pittsburgh, PA, USA.
Nat Commun. 2025 Apr 24;16(1):3878. doi: 10.1038/s41467-025-59175-6.
The known universe of uncharacterized proteins is expanding far faster than our ability to annotate their functions through laboratory study. Computational annotation approaches rely on similarity to previously studied proteins, thereby ignoring unstudied proteins. Coevolutionary approaches hold promise for injecting new information into our knowledge of the protein universe by linking proteins through 'guilt-by-association'. However, existing coevolutionary algorithms have insufficient accuracy and scalability to connect the entire universe of proteins. We present EvoWeaver, a method that weaves together 12 signals of coevolution to quantify the degree of shared evolution between genes. EvoWeaver accurately identifies proteins involved in protein complexes or separate steps of a biochemical pathway. We show the merits of EvoWeaver by partly reconstructing known biochemical pathways without any prior knowledge other than that available from genomic sequences. Applying EvoWeaver to 1545 gene groups from 8564 genomes reveals missing connections in popular databases and potentially undiscovered links between proteins.
未表征蛋白质的已知范围正在以远超我们通过实验室研究注释其功能能力的速度不断扩大。计算注释方法依赖于与先前研究的蛋白质的相似性,从而忽略了未研究的蛋白质。共进化方法有望通过“关联有罪”将蛋白质联系起来,为我们对蛋白质领域的认知注入新信息。然而,现有的共进化算法在准确性和可扩展性方面不足以连接整个蛋白质领域。我们提出了EvoWeaver,这是一种将12种共进化信号编织在一起以量化基因之间共享进化程度的方法。EvoWeaver能准确识别参与蛋白质复合物或生化途径不同步骤的蛋白质。我们通过在除基因组序列之外没有任何先验知识的情况下部分重建已知生化途径,展示了EvoWeaver的优点。将EvoWeaver应用于来自8564个基因组的1545个基因组揭示了流行数据库中缺失的联系以及蛋白质之间潜在的未被发现的联系。