Suppr超能文献

通过对辅助语义原型进行建模,从冷冻电子断层扫描密度体积中发现分子结构

Towards molecular structure discovery from cryo-ET density volumes via modelling auxiliary semantic prototypes.

作者信息

Nair Ashwin, Li Xingjian, Solanki Bhupendra, Mukhopadhyay Souradeep, Jha Ankit, Rafid Uddin Mostofa, Singha Mainak, Banerjee Biplab, Xu Min

机构信息

Department of Data Science, Indian Institute of Science Education and Research, Vithura, 695551, Kerela, India.

Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA, 15213, USA.

出版信息

Brief Bioinform. 2024 Nov 22;26(1). doi: 10.1093/bib/bbae570.

Abstract

Cryo-electron tomography (cryo-ET) is confronted with the intricate task of unveiling novel structures. General class discovery (GCD) seeks to identify new classes by learning a model that can pseudo-label unannotated (novel) instances solely using supervision from labeled (base) classes. While 2D GCD for image data has made strides, its 3D counterpart remains unexplored. Traditional methods encounter challenges due to model bias and limited feature transferability when clustering unlabeled 2D images into known and potentially novel categories based on labeled data. To address this limitation and extend GCD to 3D structures, we propose an innovative approach that harnesses a pretrained 2D transformer, enriched by an effective weight inflation strategy tailored for 3D adaptation, followed by a decoupled prototypical network. Incorporating the power of pretrained weight-inflated Transformers, we further integrate CLIP, a vision-language model to incorporate textual information. Our method synergizes a graph convolutional network with CLIP's frozen text encoder, preserving class neighborhood structure. In order to effectively represent unlabeled samples, we devise semantic distance distributions, by formulating a bipartite matching problem for category prototypes using a decoupled prototypical network. Empirical results unequivocally highlight our method's potential in unveiling hitherto unknown structures in cryo-ET. By bridging the gap between 2D GCD and the distinctive challenges of 3D cryo-ET data, our approach paves novel avenues for exploration and discovery in this domain.

摘要

冷冻电子断层扫描(cryo-ET)面临着揭示新结构这一复杂任务。通用类别发现(GCD)旨在通过学习一个模型来识别新类别,该模型仅利用来自已标注(基础)类别的监督来对未标注(新)实例进行伪标注。虽然用于图像数据的二维GCD已经取得了进展,但其三维对应物仍未得到探索。当基于标注数据将未标注的二维图像聚类为已知和潜在的新类别时,传统方法由于模型偏差和有限的特征可转移性而面临挑战。为了解决这一局限性并将GCD扩展到三维结构,我们提出了一种创新方法,该方法利用预训练的二维变压器,并通过为三维适应量身定制的有效权重膨胀策略进行强化,随后是一个解耦的原型网络。结合预训练的权重膨胀变压器的力量,我们进一步集成了CLIP,一种视觉语言模型以纳入文本信息。我们的方法将图卷积网络与CLIP的冻结文本编码器协同,保留类别邻域结构。为了有效地表示未标注样本,我们通过使用解耦的原型网络为类别原型制定二分匹配问题来设计语义距离分布。实证结果明确突出了我们的方法在揭示cryo-ET中迄今未知结构方面的潜力。通过弥合二维GCD与三维cryo-ET数据独特挑战之间的差距,我们的方法为该领域的探索和发现开辟了新途径。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d88b/11790060/2dfc4abe0e17/bbae570f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验