• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

胶原转换器:使用自然语言处理方法预测胶原三螺旋热稳定性的端到端转换器模型。

CollagenTransformer: End-to-End Transformer Model to Predict Thermal Stability of Collagen Triple Helices Using an NLP Approach.

机构信息

Laboratory for Atomistic and Molecular Mechanics (LAMM), Massachusetts Institute of Technology, 77 Massachusetts Ave., Cambridge, Massachusetts 02139, United States.

Department of Materials Science and Engineering, Massachusetts Institute of Technology, 77 Massachusetts Ave., Cambridge, Massachusetts 02139, United States.

出版信息

ACS Biomater Sci Eng. 2022 Oct 10;8(10):4301-4310. doi: 10.1021/acsbiomaterials.2c00737. Epub 2022 Sep 23.

DOI:10.1021/acsbiomaterials.2c00737
PMID:36149671
Abstract

Collagen is one of the most important structural proteins in biology, and its structural hierarchy plays a crucial role in many mechanically important biomaterials. Here, we demonstrate how transformer models can be used to predict, directly from the primary amino acid sequence, the thermal stability of collagen triple helices, measured via the melting temperature . We report two distinct transformer architectures to compare performance. First, we train a small transformer model from scratch, using our collagen data set featuring only 633 sequence-to- pairings. Second, we use a large pretrained transformer model, ProtBERT, and fine-tune it for a particular downstream task by utilizing sequence-to- pairings, using a deep convolutional network to translate natural language processing BERT embeddings into required features. Both the small transformer model and the fine-tuned ProtBERT model have similar values of test data ( = 0.84 vs 0.79, respectively), but the ProtBERT is a much larger pretrained model that may not always be applicable for other biological or biomaterials questions. Specifically, we show that the small transformer model requires only 0.026% of the number of parameters compared to the much larger model but reaches almost the same accuracy for the test set. We compare the performance of both models against 71 newly published sequences for which has been obtained as a validation set and find reasonable agreement, with ProtBERT outperforming the small transformer model. The results presented here are, to our best knowledge, the first demonstration of the use of transformer models for relatively small data sets and for the prediction of specific biophysical properties of interest. We anticipate that the work presented here serves as a starting point for transformer models to be applied to other biophysical problems.

摘要

胶原蛋白是生物学中最重要的结构蛋白之一,其结构层次在许多机械重要的生物材料中起着至关重要的作用。在这里,我们展示了如何使用转换器模型直接从初级氨基酸序列预测胶原蛋白三螺旋的热稳定性,通过熔点来测量。我们报告了两种不同的转换器架构来比较性能。首先,我们从头开始训练一个小的转换器模型,使用我们的胶原蛋白数据集,其中仅包含 633 个序列对。其次,我们使用一个大型预训练的转换器模型 ProtBERT,并通过使用序列对将自然语言处理 BERT 嵌入转换为所需的特征,使用深度卷积网络对其进行特定下游任务的微调。小型转换器模型和微调后的 ProtBERT 模型的测试数据值相似(分别为 0.84 和 0.79),但 ProtBERT 是一个更大的预训练模型,可能并不总是适用于其他生物学或生物材料问题。具体来说,我们表明,与更大的模型相比,小型转换器模型仅需要参数的 0.026%,但对于测试集的准确性几乎相同。我们将这两种模型的性能与作为验证集获得的 71 个新发布序列进行了比较,发现结果合理一致,ProtBERT 优于小型转换器模型。据我们所知,这里提出的结果是首次使用转换器模型对相对较小的数据集进行预测,并对感兴趣的特定生物物理特性进行预测。我们预计,这里提出的工作将成为转换器模型应用于其他生物物理问题的起点。

相似文献

1
CollagenTransformer: End-to-End Transformer Model to Predict Thermal Stability of Collagen Triple Helices Using an NLP Approach.胶原转换器:使用自然语言处理方法预测胶原三螺旋热稳定性的端到端转换器模型。
ACS Biomater Sci Eng. 2022 Oct 10;8(10):4301-4310. doi: 10.1021/acsbiomaterials.2c00737. Epub 2022 Sep 23.
2
Pretrained Transformer Language Models Versus Pretrained Word Embeddings for the Detection of Accurate Health Information on Arabic Social Media: Comparative Study.用于在阿拉伯社交媒体上检测准确健康信息的预训练Transformer语言模型与预训练词嵌入:比较研究
JMIR Form Res. 2022 Jun 29;6(6):e34834. doi: 10.2196/34834.
3
ColGen: An end-to-end deep learning model to predict thermal stability of de novo collagen sequences.ColGen:一种端到端深度学习模型,用于预测从头生成的胶原蛋白序列的热稳定性。
J Mech Behav Biomed Mater. 2022 Jan;125:104921. doi: 10.1016/j.jmbbm.2021.104921. Epub 2021 Oct 31.
4
Discovering design principles of collagen molecular stability using a genetic algorithm, deep learning, and experimental validation.利用遗传算法、深度学习和实验验证发现胶原蛋白分子稳定性的设计原则。
Proc Natl Acad Sci U S A. 2022 Oct 4;119(40):e2209524119. doi: 10.1073/pnas.2209524119. Epub 2022 Sep 26.
5
Transformers-sklearn: a toolkit for medical language understanding with transformer-based models.Transformer-sklearn:一个基于 Transformer 的模型的医学语言理解工具包。
BMC Med Inform Decis Mak. 2021 Jul 30;21(Suppl 2):90. doi: 10.1186/s12911-021-01459-0.
6
Enhanced identification of membrane transport proteins: a hybrid approach combining ProtBERT-BFD and convolutional neural networks.增强膜转运蛋白的鉴定:结合 ProtBERT-BFD 和卷积神经网络的混合方法。
J Integr Bioinform. 2023 Jul 28;20(2). doi: 10.1515/jib-2022-0055. eCollection 2023 Jun 1.
7
End-to-End Protein Normal Mode Frequency Predictions Using Language and Graph Models and Application to Sonification.使用语言和图形模型进行端到端蛋白质正常模式频率预测及其在可听化中的应用。
ACS Nano. 2022 Dec 27;16(12):20656-20670. doi: 10.1021/acsnano.2c07681. Epub 2022 Nov 23.
8
Clinical concept extraction using transformers.使用转换器进行临床概念提取。
J Am Med Inform Assoc. 2020 Dec 9;27(12):1935-1942. doi: 10.1093/jamia/ocaa189.
9
AMMU: A survey of transformer-based biomedical pretrained language models.基于变压器的生物医学预训练语言模型综述。
J Biomed Inform. 2022 Feb;126:103982. doi: 10.1016/j.jbi.2021.103982. Epub 2021 Dec 31.
10
Predicting Semantic Similarity Between Clinical Sentence Pairs Using Transformer Models: Evaluation and Representational Analysis.使用Transformer模型预测临床句子对之间的语义相似性:评估与表征分析
JMIR Med Inform. 2021 May 26;9(5):e23099. doi: 10.2196/23099.

引用本文的文献

1
Diffusion model assisted designing self-assembling collagen mimetic peptides as biocompatible materials.扩散模型辅助设计自组装胶原模拟肽作为生物相容性材料。
Brief Bioinform. 2024 Nov 22;26(1). doi: 10.1093/bib/bbae622.
2
Machine Learning-Based Process Optimization in Biopolymer Manufacturing: A Review.基于机器学习的生物聚合物制造过程优化:综述
Polymers (Basel). 2024 Nov 29;16(23):3368. doi: 10.3390/polym16233368.
3
GRACE: Generative Redesign in Artificial Computational Enzymology.GRACE:人工计算酶学中的生成式重新设计
ACS Synth Biol. 2024 Dec 20;13(12):4154-4164. doi: 10.1021/acssynbio.4c00624. Epub 2024 Nov 8.
4
Application of Transformers in Cheminformatics.Transformer 在化学信息学中的应用。
J Chem Inf Model. 2024 Jun 10;64(11):4392-4409. doi: 10.1021/acs.jcim.3c02070. Epub 2024 May 30.
5
Generative Retrieval-Augmented Ontologic Graph and Multiagent Strategies for Interpretive Large Language Model-Based Materials Design.基于生成式检索增强本体图和多智能体策略的基于解释性大语言模型的材料设计
ACS Eng Au. 2024 Jan 12;4(2):241-277. doi: 10.1021/acsengineeringau.3c00058. eCollection 2024 Apr 17.
6
ForceGen: End-to-end de novo protein generation based on nonlinear mechanical unfolding responses using a language diffusion model.ForceGen:基于语言扩散模型的非线性机械展开响应的从头开始的蛋白质从头生成。
Sci Adv. 2024 Feb 9;10(6):eadl4000. doi: 10.1126/sciadv.adl4000. Epub 2024 Feb 7.
7
Survey of transformers and towards ensemble learning using transformers for natural language processing.变压器综述以及迈向使用变压器进行自然语言处理的集成学习
J Big Data. 2024;11(1):25. doi: 10.1186/s40537-023-00842-0. Epub 2024 Feb 4.
8
Generative design of proteins based on secondary structure constraints using an attention-based diffusion model.基于二级结构约束,使用基于注意力的扩散模型进行蛋白质的生成式设计。
Chem. 2023 Jul 13;9(7):1828-1849. doi: 10.1016/j.chempr.2023.03.020. Epub 2023 Apr 20.
9
Stability of collagen heterotrimer with same charge pattern and different charged residue identities.具有相同电荷模式和不同带电残基身份的胶原三聚体的稳定性。
Biophys J. 2023 Jul 11;122(13):2686-2695. doi: 10.1016/j.bpj.2023.05.023. Epub 2023 May 23.
10
Predicting Collagen Triple Helix Stability through Additive Effects of Terminal Residues and Caps.通过末端残基和帽的加和效应预测胶原三螺旋稳定性。
Angew Chem Int Ed Engl. 2023 Jan 16;62(3):e202214728. doi: 10.1002/anie.202214728. Epub 2022 Dec 14.