Marsico Annalisa, Labudde Dirk, Sapra Tanuj, Muller Daniel J, Schroeder Michael
Biotec, TU Dresden, Germany.
Bioinformatics. 2007 Jan 15;23(2):e231-6. doi: 10.1093/bioinformatics/btl293.
Misfolding of membrane proteins plays an important role in many human diseases such as retinitis pigmentosa, hereditary deafness and diabetes insipidus. Little is known about membrane proteins as there are only very few high-resolution structures. Single-molecule force spectroscopy is a novel technique, which measures the force necessary to pull a protein out of a membrane. Such force curves contain valuable information on the protein structure, conformation, and inter- and intra-molecular forces. High-throughput force spectroscopy experiments generate hundreds of force curves including spurious ones and good curves, which correspond to different unfolding pathways. Manual analysis of these data is a bottleneck and source of inconsistent and subjective annotation.
We propose a novel algorithm for the identification of spurious curves and curves representing different unfolding pathways. Our algorithm proceeds in three stages: first, we reduce noise in the curves by applying dimension reduction; second, we align the curves with dynamic programming and compute pairwise distances and third, we cluster the curves based on these distances. We apply our method to a hand-curated dataset of 135 force curves of bacteriorhodopsin mutant P50A. Our algorithm achieves a success rate of 81% distinguishing spurious from good curves and a success rate of 76% classifying unfolding pathways. As a result, we discuss five different unfolding pathways of bacteriorhodopsin including three main unfolding events and several minor ones. Finally, we link folding barriers to the degree of conservation of residues. Overall, the algorithm tackles the force spectroscopy bottleneck and leads to more consistent and reproducible results paving the way for high-throughput analysis of structural features of membrane proteins.
膜蛋白错误折叠在许多人类疾病中起重要作用,如色素性视网膜炎、遗传性耳聋和尿崩症。由于仅有极少数高分辨率结构,人们对膜蛋白了解甚少。单分子力谱是一种新技术,它测量将蛋白质从膜中拉出所需的力。此类力曲线包含有关蛋白质结构、构象以及分子间和分子内力的宝贵信息。高通量力谱实验会产生数百条力曲线,包括虚假曲线和良好曲线,它们对应不同的展开途径。对这些数据进行人工分析是一个瓶颈,也是注释不一致和主观的根源。
我们提出了一种用于识别虚假曲线和代表不同展开途径的曲线的新算法。我们的算法分三个阶段进行:首先,通过应用降维来降低曲线中的噪声;其次,使用动态规划对齐曲线并计算成对距离;第三,基于这些距离对曲线进行聚类。我们将我们的方法应用于一个精心挑选的数据集,该数据集包含细菌视紫红质突变体P50A的135条力曲线。我们的算法区分虚假曲线和良好曲线的成功率为81%,分类展开途径的成功率为76%。结果,我们讨论了细菌视紫红质的五种不同展开途径,包括三个主要展开事件和几个次要展开事件。最后,我们将折叠障碍与残基的保守程度联系起来。总体而言,该算法解决了力谱瓶颈问题,带来了更一致且可重复的结果,为膜蛋白结构特征的高通量分析铺平了道路。