Scuola Internazionale Superiore di Studi Avanzati (SISSA), Trieste 34136, Italy.
Institut de Neurosciences de la Timone UMR 7289, Aix Marseille Université, CNRS, Marseille 13005, France.
Bioinformatics. 2020 Dec 22;36(20):5014-5020. doi: 10.1093/bioinformatics/btaa626.
Single-molecule force spectroscopy (SMFS) experiments pose the challenge of analysing protein unfolding data (traces) coming from preparations with heterogeneous composition (e.g. where different proteins are present in the sample). An automatic procedure able to distinguish the unfolding patterns of the proteins is needed. Here, we introduce a data analysis pipeline able to recognize in such datasets traces with recurrent patterns (clusters).
We illustrate the performance of our method on two prototypical datasets: ∼50 000 traces from a sample containing tandem GB1 and ∼400 000 traces from a native rod membrane. Despite a daunting signal-to-noise ratio in the data, we are able to identify several unfolding clusters. This work demonstrates how an automatic pattern classification can extract relevant information from SMFS traces from heterogeneous samples without prior knowledge of the sample composition.
https://github.com/ninailieva/SMFS_clustering.
Supplementary data are available at Bioinformatics online.
单分子力谱 (SMFS) 实验面临着分析来自组成不均匀的(例如,样品中存在不同蛋白质)制备物的蛋白质解折叠数据(轨迹)的挑战。需要一种能够区分蛋白质解折叠模式的自动程序。在这里,我们引入了一个数据分析管道,能够在这些数据集中识别具有重复模式(簇)的轨迹。
我们在两个典型的数据集上展示了我们方法的性能:来自含有串联 GB1 的样品的约 50000 条轨迹和来自天然杆状膜的约 400000 条轨迹。尽管数据中的信噪比令人望而却步,但我们能够识别出几个解折叠簇。这项工作表明,自动模式分类如何在没有样品组成先验知识的情况下,从异质样品的 SMFS 轨迹中提取相关信息。
https://github.com/ninailieva/SMFS_clustering。
补充数据可在 Bioinformatics 在线获得。