Independent Scientist, Corte Madera, CA 94925, United States.
Bioinformatics. 2024 Nov 1;40(11). doi: 10.1093/bioinformatics/btae687.
Recent breakthroughs in protein fold prediction from amino acid sequences have unleashed a deluge of new structures, presenting new opportunities and challenges to bioinformatics.
Reseek is a novel protein structure alignment algorithm based on sequence alignment where each residue in the protein backbone is represented by a letter in a "mega-alphabet" of 85 899 345 920 (∼1011) distinct states. Reseek achieves substantially improved sensitivity to remote homologs compared to state-of-the-art methods including DALI, TMalign, and Foldseek, with comparable speed to Foldseek, the fastest previous method. Scaling to large databases of AI-predicted folds is analyzed. Foldseek E-values are shown to be under-estimated by several orders of magnitude, while Reseek E-values are in good agreement with measured error rates.
最近在从氨基酸序列预测蛋白质折叠方面的突破,引发了大量新结构的出现,这给生物信息学带来了新的机遇和挑战。
Reseek 是一种新颖的蛋白质结构比对算法,基于序列比对,其中蛋白质骨架中的每个残基都由“兆字母表”中的一个字母表示,该字母表有 8599345920(约 1011)个不同的状态。与包括 DALI、TMalign 和 Foldseek 在内的最先进的方法相比,Reseek 对远程同源物的敏感性有了显著提高,其速度与 Foldseek 相当,Foldseek 是之前最快的方法。对人工智能预测折叠的大型数据库进行了扩展分析。结果表明,Foldseek 的 E 值被低估了几个数量级,而 Reseek 的 E 值与测量的错误率吻合较好。