Suppr超能文献

深度学习语言模型对螺旋聚合物的分类。

Classification of helical polymers with deep-learning language models.

机构信息

Department of Biological Sciences, Purdue University.

Department of Biological Sciences, Purdue University.

出版信息

J Struct Biol. 2023 Dec;215(4):108041. doi: 10.1016/j.jsb.2023.108041. Epub 2023 Nov 7.

Abstract

Many macromolecules in biological systems exist in the form of helical polymers. However, the inherent polymorphism and heterogeneity of samples complicate the reconstruction of helical polymers from cryo-EM images. Currently, available 2D classification methods are effective at separating particles of interest from contaminants, but they do not effectively differentiate between polymorphs, resulting in heterogeneity in the 2D classes. As such, it is crucial to develop a method that can computationally divide a dataset of polymorphic helical structures into homogenous subsets. In this work, we utilized deep-learning language models to embed the filaments as vectors in hyperspace and group them into clusters. Tests with both simulated and experimental datasets have demonstrated that our method - HLM (Helical classification with Language Model) can effectively distinguish different types of filaments, in the presence of many contaminants and low signal-to-noise ratios. We also demonstrate that HLM can isolate homogeneous subsets of particles from a publicly available dataset, resulting in the discovery of a previously unreported filament variant with an extra density around the tau filaments.

摘要

许多生物系统中的大分子以螺旋聚合物的形式存在。然而,样品的固有多态性和异质性使得从冷冻电镜图像中重建螺旋聚合物变得复杂。目前,可用的 2D 分类方法在从污染物中分离感兴趣的粒子方面非常有效,但它们不能有效地区分多态性,导致 2D 类中的异质性。因此,开发一种能够将多态性螺旋结构的数据集计算地划分为同质子集的方法至关重要。在这项工作中,我们利用深度学习语言模型将纤维嵌入到超空间中作为向量,并将它们分组到聚类中。使用模拟和实验数据集的测试表明,我们的方法——HLM(带语言模型的螺旋分类)可以在存在许多污染物和低信噪比的情况下,有效地区分不同类型的纤维。我们还证明,HLM 可以从公开可用的数据集中分离出同质的粒子子集,从而发现了一种以前未报道的具有 tau 纤维周围额外密度的纤维变体。

相似文献

1
Classification of helical polymers with deep-learning language models.深度学习语言模型对螺旋聚合物的分类。
J Struct Biol. 2023 Dec;215(4):108041. doi: 10.1016/j.jsb.2023.108041. Epub 2023 Nov 7.
3
Cryo-EM of Helical Polymers.螺旋聚合物的冷冻电镜技术
Chem Rev. 2022 Sep 14;122(17):14055-14065. doi: 10.1021/acs.chemrev.1c00753. Epub 2022 Feb 8.

引用本文的文献

本文引用的文献

6
The performance of BERT as data representation of text clustering.作为文本聚类数据表示的BERT性能。
J Big Data. 2022;9(1):15. doi: 10.1186/s40537-022-00564-9. Epub 2022 Feb 8.
8
Clustering polymorphs of tau and IAPP fibrils with the CHEP algorithm.用 CHEP 算法对 tau 和 IAPP 纤维的多形体进行聚类。
Prog Biophys Mol Biol. 2021 Mar;160:16-25. doi: 10.1016/j.pbiomolbio.2020.11.007. Epub 2021 Feb 6.
10
Novel tau filament fold in corticobasal degeneration.新型 tau 丝在皮质基底节变性中的折叠。
Nature. 2020 Apr;580(7802):283-287. doi: 10.1038/s41586-020-2043-0. Epub 2020 Feb 12.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验