Department of Systems and Computational Biology, Albert Einstein College of Medicine, Bronx, New York, USA.
Department of Genetics, Albert Einstein College of Medicine, Bronx, New York, USA.
Proteins. 2024 Apr;92(4):567-580. doi: 10.1002/prot.26648. Epub 2023 Dec 5.
Cells detect changes in their external environments or communicate with each other through proteins on their surfaces. These cell surface proteins form a complicated network of interactions in order to fulfill their functions. The interactions between cell surface proteins are highly dynamic and, thus, challenging to detect using traditional experimental techniques. Here, we tackle this challenge using a computational framework. The primary focus of the framework is to develop new tools to identify interactions between domains in the immunoglobulin (Ig) fold, which is the most abundant domain family in cell surface proteins. These interactions could be formed between ligands and receptors from different cells or between proteins on the same cell surface. In practice, we collected all structural data on Ig domain interactions and transformed them into an interface fragment pair library. A high-dimensional profile can then be constructed from the library for a given pair of query protein sequences. Multiple machine learning models were used to read this profile so that the probability of interaction between the query proteins could be predicted. We tested our models on an experimentally derived dataset that contains 564 cell surface proteins in humans. The cross-validation results show that we can achieve higher than 70% accuracy in identifying the PPIs within this dataset. We then applied this method to a group of 46 cell surface proteins in Caenorhabditis elegans. We screened every possible interaction between these proteins. Many interactions recognized by our machine learning classifiers have been experimentally confirmed in the literature. In conclusion, our computational platform serves as a useful tool to help identify potential new interactions between cell surface proteins in addition to current state-of-the-art experimental techniques. The tool is freely accessible for use by the scientific community. Moreover, the general framework of the machine learning classification can also be extended to study the interactions of proteins in other domain superfamilies.
细胞通过其表面的蛋白质来检测其外部环境的变化或与彼此进行通信。这些细胞表面蛋白形成了一个复杂的相互作用网络,以履行其功能。细胞表面蛋白的相互作用高度动态,因此,使用传统的实验技术难以检测。在这里,我们使用计算框架来解决这个挑战。该框架的主要重点是开发新工具来识别免疫球蛋白 (Ig) 折叠中结构域之间的相互作用,Ig 折叠是细胞表面蛋白中最丰富的结构域家族。这些相互作用可以在不同细胞的配体和受体之间或同一细胞表面的蛋白质之间形成。在实践中,我们收集了所有关于 Ig 结构域相互作用的结构数据,并将其转化为一个界面片段对库。然后可以从库中为给定的查询蛋白序列对构建高维轮廓。使用多个机器学习模型来读取此轮廓,以便可以预测查询蛋白之间的相互作用的概率。我们在一个包含人类 564 种细胞表面蛋白的实验衍生数据集上测试了我们的模型。交叉验证结果表明,我们可以在该数据集中达到 70%以上的识别精度。然后,我们将该方法应用于一组 46 种秀丽隐杆线虫的细胞表面蛋白。我们筛选了这些蛋白质之间的每一种可能的相互作用。我们的机器学习分类器识别的许多相互作用已在文献中得到实验证实。总之,除了当前最先进的实验技术外,我们的计算平台还可以作为一种有用的工具来帮助识别细胞表面蛋白之间的潜在新相互作用。该工具可供科学界免费使用。此外,机器学习分类的一般框架也可以扩展到研究其他结构域超家族中蛋白质的相互作用。