Greener Joe G, Jamali Kiarash
Medical Research Council Laboratory of Molecular Biology, Cambridge, CB2 0QH, United Kingdom.
Bioinform Adv. 2024 Mar 5;5(1):vbaf042. doi: 10.1093/bioadv/vbaf042. eCollection 2025.
Comparing and searching protein structures independent of primary sequence has proved useful for remote homology detection, function annotation, and protein classification. Fast and accurate methods to search with structures will be essential to make use of the vast databases that have recently become available, in the same way that fast protein sequence searching underpins much of bioinformatics. We train a simple graph neural network using supervised contrastive learning to learn a low-dimensional embedding of protein domains.
The method, called Progres, is available as software at https://github.com/greener-group/progres and as a web server at https://progres.mrc-lmb.cam.ac.uk. It has accuracy comparable to the best current methods and can search the AlphaFold database TED domains in a 10th of a second per query on CPU.
独立于一级序列比较和搜索蛋白质结构已被证明在远程同源性检测、功能注释和蛋白质分类中很有用。快速准确的结构搜索方法对于利用最近可用的大量数据库至关重要,就像快速蛋白质序列搜索是许多生物信息学的基础一样。我们使用监督对比学习训练一个简单的图神经网络,以学习蛋白质结构域的低维嵌入。
该方法称为Progres,可作为软件在https://github.com/greener-group/progres上获取,也可作为网络服务器在https://progres.mrc-lmb.cam.ac.uk上获取。它的准确性与当前最好的方法相当,并且在CPU上每个查询可以在十分之一秒内搜索AlphaFold数据库TED结构域。