Sledzieski Samuel, Versavel Charlotte, Singh Rohit, Ocitti Faith, Devkota Kapil, Kumar Lokender, Shpilker Polina, Roger Liza, Yang Jinkyu, Lewinski Nastassja, Putnam Hollie, Berger Bonnie, Klein-Seetharaman Judith, Cowen Lenore
Center for Computational Biology, Flatiron Institute, New York, NY, USA.
Department of Computer Science, Tufts University, Medford MA, USA.
bioRxiv. 2025 Jan 14:2024.10.25.620267. doi: 10.1101/2024.10.25.620267.
Protein-protein interaction (PPI) networks are a fundamental resource for modeling cellular and molecular function, and a large and sophisticated toolbox has been developed to leverage their structure and topological organization to predict the functional roles of under-studied genes, proteins, and pathways. However, the overwhelming majority of experimentally-determined interactions from which such networks are constructed come from a small number of well-studied model organisms. Indeed, most species lack even a single experimentally-determined interaction in these databases, much less a network to enable the analysis of cellular function, and methods for computational PPI prediction are too noisy to apply directly. We introduce PHILHARMONIC, a novel computational approach that couples deep learning network inference with robust unsupervised spectral clustering algorithms to uncover functional relationships and high-level organization in non-model organisms. Our clustering approach allows us to de-noise the predicted network, producing highly informative functional modules. We also develop a novel algorithm called ReCIPE, which aims to reconnect disconnected clusters, increasing functional enrichment and biological interpretability. We perform remote homology-based functional annotation by leveraging hmmscan and GODomainMiner to assign initial functions to proteins at large evolutionary distances. Our clusters enable us to newly assign functions to uncharacterized proteins through "function by association." We demonstrate the ability of PHILHARMONIC to recover clusters with significant functional coherence in the reef-building coral , its algal symbiont , and the well-annotated fruit fly . We perform a deeper analysis of the network, where we show that PHILHARMONIC clusters correlate strongly with gene co-expression and investigate several clusters that participate in temperature regulation in the coral, including the first putative functional annotation of several previously uncharacterized proteins. Easy to run end-to-end and requiring only a sequenced proteome, PHILHARMONIC is an engine for biological hypothesis generation and discovery in non-model organisms.
蛋白质-蛋白质相互作用(PPI)网络是用于模拟细胞和分子功能的基本资源,并且已经开发了一个庞大而复杂的工具箱来利用其结构和拓扑组织来预测研究较少的基因、蛋白质和通路的功能作用。然而,构建此类网络的绝大多数实验确定的相互作用来自少数经过充分研究的模式生物。实际上,在这些数据库中,大多数物种甚至没有一个实验确定的相互作用,更不用说一个能够分析细胞功能的网络了,而且计算PPI预测方法的噪声太大,无法直接应用。我们引入了PHILHARMONIC,这是一种新颖的计算方法,它将深度学习网络推理与强大的无监督谱聚类算法相结合,以揭示非模式生物中的功能关系和高级组织。我们的聚类方法使我们能够对预测网络进行去噪,产生信息丰富的功能模块。我们还开发了一种名为ReCIPE的新颖算法,其目的是重新连接断开的聚类,增加功能富集和生物学可解释性。我们通过利用hmmscan和GODomainMiner进行基于远程同源性的功能注释,为进化距离较远蛋白质赋予初始功能。我们的聚类使我们能够通过“关联功能”为未表征的蛋白质新赋予功能。我们展示了PHILHARMONIC在造礁珊瑚、其藻类共生体和注释良好的果蝇中恢复具有显著功能一致性聚类的能力。我们对该网络进行了更深入的分析,表明PHILHARMONIC聚类与基因共表达密切相关,并研究了几个参与珊瑚温度调节的聚类,包括对几个先前未表征蛋白质的首次推定功能注释。PHILHARMONIC易于端到端运行,只需要一个已测序的蛋白质组,是在非模式生物中产生生物学假设和发现的引擎。