Department of Statistics.
Department of Chemistry, Florida State University, Tallahassee, FL 32306, USA.
Bioinformatics. 2020 May 1;36(10):3056-3063. doi: 10.1093/bioinformatics/btaa076.
Global protein surface comparison (GPSC) studies have been limited compared to other research works on protein structure alignment/comparison due to lack of real applications associated with GPSC. However, the technology advances in cryo-electron tomography (CET) have made methods to identify proteins from their surface shapes extremely useful.
In this study, we developed a new method called Farthest point sampling (FPS)-enhanced Triangulation-based Iterative-closest-Point (ICP) (FTIP) for GPSC. We applied it to protein classification using only surface shape information. Our method first extracts a set of feature points from protein surfaces using FPS and then uses a triangulation-based efficient ICP algorithm to align the feature points of the two proteins to be compared. Tested on a benchmark dataset with 2329 proteins using nearest-neighbor classification, FTIP outperformed the state-of-the-art method for GPSC based on 3D Zernike descriptors. Using real and simulated cryo-EM data, we show that FTIP could be applied in the future to address problems in protein identification in CET experiments.
Programs/scripts we developed/used in the study are available at http://ani.stat.fsu.edu/∼yuan/index.fld/FTIP.tar.bz2.
Supplementary data are available at Bioinformatics online.
与蛋白质结构比对/比较的其他研究工作相比,全球蛋白质表面比对(GPSC)研究受到限制,这是因为缺乏与 GPSC 相关的实际应用。然而,低温电子断层扫描(CET)技术的进步使得基于表面形状识别蛋白质的方法变得非常有用。
在这项研究中,我们开发了一种新的方法,称为最远点采样(FPS)增强三角剖分迭代最近点(ICP)(FTIP),用于 GPSC。我们仅使用表面形状信息将其应用于蛋白质分类。我们的方法首先使用 FPS 从蛋白质表面提取一组特征点,然后使用基于三角剖分的高效 ICP 算法将要比较的两个蛋白质的特征点对齐。在使用最近邻分类的 2329 个蛋白质基准数据集上进行测试,FTIP 优于基于 3D Zernike 描述符的 GPSC 的最新方法。使用真实和模拟的冷冻电镜数据,我们表明 FTIP 将来可用于解决 CET 实验中蛋白质识别问题。
我们在研究中开发/使用的程序/脚本可在 http://ani.stat.fsu.edu/∼yuan/index.fld/FTIP.tar.bz2 获得。
补充数据可在生物信息学在线获得。