Suppr超能文献

FSCC:基于冷冻电子断层扫描中对比学习和分布校准的大分子分类少样本学习

FSCC: Few-Shot Learning for Macromolecule Classification Based on Contrastive Learning and Distribution Calibration in Cryo-Electron Tomography.

作者信息

Gao Shan, Zeng Xiangrui, Xu Min, Zhang Fa

机构信息

High Performance Computer Research Center, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China.

University of Chinese Academy of Sciences, Beijing, China.

出版信息

Front Mol Biosci. 2022 Jul 5;9:931949. doi: 10.3389/fmolb.2022.931949. eCollection 2022.

Abstract

Cryo-electron tomography (Cryo-ET) is an emerging technology for three-dimensional (3D) visualization of macromolecular structures in the near-native state. To recover structures of macromolecules, millions of diverse macromolecules captured in tomograms should be accurately classified into structurally homogeneous subsets. Although existing supervised deep learning-based methods have improved classification accuracy, such trained models have limited ability to classify novel macromolecules that are unseen in the training stage. To adapt the trained model to the macromolecule classification of a novel class, massive labeled macromolecules of the novel class are needed. However, data labeling is very time-consuming and labor-intensive. In this work, we propose a novel few-shot learning method for the classification of novel macromolecules (named FSCC). A two-stage training strategy is designed in FSCC to enhance the generalization ability of the model to novel macromolecules. First, FSCC uses contrastive learning to pre-train the model on a sufficient number of labeled macromolecules. Second, FSCC uses distribution calibration to re-train the classifier, enabling the model to classify macromolecules of novel classes (unseen class in the pre-training). Distribution calibration transfers learned knowledge in the pre-training stage to novel macromolecules with limited labeled macromolecules of novel class. Experiments were performed on both synthetic and real datasets. On the synthetic datasets, compared with the state-of-the-art (SOTA) method based on supervised deep learning, FSCC achieves competitive performance. To achieve such performance, FSCC only needs five labeled macromolecules per novel class. However, the SOTA method needs 1100 ∼ 1500 labeled macromolecules per novel class. On the real datasets, FSCC improves the accuracy by 5% ∼ 16% when compared to the baseline model. These demonstrate good generalization ability of contrastive learning and calibration distribution to classify novel macromolecules with very few labeled macromolecules.

摘要

冷冻电子断层扫描(Cryo-ET)是一种用于近天然状态下大分子结构三维(3D)可视化的新兴技术。为了恢复大分子的结构,在断层扫描中捕获的数百万个不同的大分子应被准确分类为结构上均匀的子集。尽管现有的基于监督深度学习的方法提高了分类准确率,但这种经过训练的模型对训练阶段未见过的新型大分子进行分类的能力有限。为了使训练好的模型适应新类别的大分子分类,需要大量新类别的标记大分子。然而,数据标记非常耗时且劳动强度大。在这项工作中,我们提出了一种用于新型大分子分类的新型少样本学习方法(名为FSCC)。FSCC设计了一种两阶段训练策略,以增强模型对新型大分子的泛化能力。首先,FSCC使用对比学习在足够数量的标记大分子上对模型进行预训练。其次,FSCC使用分布校准对分类器进行重新训练,使模型能够对新类别的大分子(预训练中未见过的类)进行分类。分布校准将预训练阶段学到的知识转移到具有有限新类别标记大分子的新型大分子上。在合成数据集和真实数据集上都进行了实验。在合成数据集上,与基于监督深度学习的最先进(SOTA)方法相比,FSCC取得了有竞争力的性能。为了达到这样的性能,FSCC每个新类别只需要五个标记大分子。然而,SOTA方法每个新类别需要1100至1500个标记大分子。在真实数据集上,与基线模型相比,FSCC的准确率提高了5%至16%。这些结果表明对比学习和校准分布在使用极少标记大分子对新型大分子进行分类方面具有良好的泛化能力。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a02f/9294403/117cbbddea30/fmolb-09-931949-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验