Looxid Labs, Seoul, 06628, Republic of Korea.
BK21 FOUR Intelligence Computing, Seoul National University, Seoul, 08826, Republic of Korea.
Sci Rep. 2021 May 5;11(1):9543. doi: 10.1038/s41598-021-88623-8.
GPCR proteins belong to diverse families of proteins that are defined at multiple hierarchical levels. Inspecting relationships between GPCR proteins on the hierarchical structure is important, since characteristics of the protein can be inferred from proteins in similar hierarchical information. However, modeling of GPCR families has been performed separately for each of the family, subfamily, and sub-subfamily level. Relationships between GPCR proteins are ignored in these approaches as they process the information in the proteins with several disconnected models. In this study, we propose DeepHier, a deep learning model to simultaneously learn representations of GPCR family hierarchy from the protein sequences with a unified single model. Novel loss term based on metric learning is introduced to incorporate hierarchical relations between proteins. We tested our approach using a public GPCR sequence dataset. Metric distances in the deep feature space corresponded to the hierarchical family relation between GPCR proteins. Furthermore, we demonstrated that further downstream tasks, like phylogenetic reconstruction and motif discovery, are feasible in the constructed embedding space. These results show that hierarchical relations between sequences were successfully captured in both of technical and biological aspects.
G 蛋白偶联受体(GPCR)蛋白属于多种蛋白质家族,这些家族在多个层次上进行定义。检查 GPCR 蛋白在层次结构上的关系很重要,因为可以从具有相似层次信息的蛋白质中推断出蛋白质的特性。然而,GPCR 家族的建模分别针对每个家族、亚家族和亚亚家族进行。在这些方法中,忽略了 GPCR 蛋白之间的关系,因为它们使用几个不相关的模型处理蛋白质中的信息。在这项研究中,我们提出了 DeepHier,这是一种深度学习模型,可以从蛋白质序列中使用统一的单个模型同时学习 GPCR 家族层次结构的表示。引入了基于度量学习的新损失项,以在蛋白质之间纳入层次关系。我们使用公共的 GPCR 序列数据集测试了我们的方法。在深度特征空间中的度量距离与 GPCR 蛋白之间的层次家族关系相对应。此外,我们证明,在构建的嵌入空间中,进一步的下游任务,如系统发育重建和模体发现,是可行的。这些结果表明,在技术和生物学方面都成功地捕获了序列之间的层次关系。