使用三重 BERT 网络对深度学习蛋白质。

Deep Learning Proteins using a Triplet-BERT network.

出版信息

Annu Int Conf IEEE Eng Med Biol Soc. 2021 Nov;2021:4341-4347. doi: 10.1109/EMBC46164.2021.9630387.

DOI:10.1109/EMBC46164.2021.9630387

Abstract

Modern sequencing technology has produced a vast quantity of proteomic data, which has been key to the development of various deep learning models within the field. However, there are still challenges to overcome with regards to modelling the properties of a protein, especially when labelled resources are scarce. Developing interpretable deep learning models is an essential criterion, as proteomics research requires methods to understand the functional properties of proteins. The ability to derive quality information from both the model and the data will play a vital role in the advancement of proteomics research. In this paper, we seek to leverage a BERT model that has been pre-trained on a vast quantity of proteomic data, to model a collection of regression tasks using only a minimal amount of data. We adopt a triplet network structure to fine-tune the BERT model for each dataset and evaluate its performance on a set of downstream task predictions: plasma membrane localisation, thermostability, peak absorption wavelength, and enantioselectivity. Our results significantly improve upon the original BERT baseline as well as the previous state-of-the-art models for each task, demonstrating the benefits of using a triplet network for refining such a large pre-trained model on a limited dataset. As a form of white-box deep learning, we also visualise how the model attends to specific parts of the protein and how the model detects critical modifications that change its overall function.

摘要

现代测序技术产生了大量的蛋白质组学数据，这是该领域各种深度学习模型发展的关键。然而，在对蛋白质的性质进行建模方面仍然存在挑战，尤其是在标记资源稀缺的情况下。开发可解释的深度学习模型是一个基本标准，因为蛋白质组学研究需要方法来理解蛋白质的功能性质。从模型和数据中获取高质量信息的能力将在蛋白质组学研究的进展中发挥重要作用。在本文中，我们试图利用已经在大量蛋白质组学数据上进行预训练的 BERT 模型，仅使用少量数据来对一组回归任务进行建模。我们采用三元网络结构来微调每个数据集的 BERT 模型，并在一组下游任务预测上评估其性能：质膜定位、热稳定性、峰值吸收波长和对映选择性。我们的结果显著优于原始 BERT 基线以及每个任务的最新最先进模型，证明了在有限的数据集上使用三元网络来精炼如此大型的预训练模型的好处。作为一种白盒深度学习形式，我们还可视化了模型如何关注蛋白质的特定部分，以及模型如何检测改变其整体功能的关键修饰。

相似文献

Deep Learning Proteins using a Triplet-BERT network.使用三重 BERT 网络对深度学习蛋白质。

Annu Int Conf IEEE Eng Med Biol Soc. 2021 Nov;2021:4341-4347. doi: 10.1109/EMBC46164.2021.9630387.

Modelling Drug-Target Binding Affinity using a BERT based Graph Neural network.基于 BERT 的图神经网络的药物靶点结合亲和力建模。

Annu Int Conf IEEE Eng Med Biol Soc. 2021 Nov;2021:4348-4353. doi: 10.1109/EMBC46164.2021.9629695.

Extracting comprehensive clinical information for breast cancer using deep learning methods.利用深度学习方法提取乳腺癌全面临床信息。

Int J Med Inform. 2019 Dec;132:103985. doi: 10.1016/j.ijmedinf.2019.103985. Epub 2019 Oct 2.

Expanding the Vocabulary of a Protein: Application of Subword Algorithms to Protein Sequence Modelling.扩展蛋白质的词汇：子词算法在蛋白质序列建模中的应用

Annu Int Conf IEEE Eng Med Biol Soc. 2020 Jul;2020:2361-2367. doi: 10.1109/EMBC44109.2020.9176380.

Drug knowledge discovery via multi-task learning and pre-trained models.通过多任务学习和预训练模型进行药物知识发现。

BMC Med Inform Decis Mak. 2021 Nov 16;21(Suppl 9):251. doi: 10.1186/s12911-021-01614-7.

Relation classification via BERT with piecewise convolution and focal loss.基于分段卷积和焦点损失的 BERT 关系分类。

PLoS One. 2021 Sep 10;16(9):e0257092. doi: 10.1371/journal.pone.0257092. eCollection 2021.

A general approach for improving deep learning-based medical relation extraction using a pre-trained model and fine-tuning.一种使用预训练模型和微调改进基于深度学习的医学关系抽取的通用方法。

Database (Oxford). 2019 Jan 1;2019. doi: 10.1093/database/baz116.

Knowledge-based BERT: a method to extract molecular features like computational chemists.基于知识的 BERT：一种像计算化学家一样提取分子特征的方法。

Brief Bioinform. 2022 May 13;23(3). doi: 10.1093/bib/bbac131.

Highly accurate classification of chest radiographic reports using a deep learning natural language model pre-trained on 3.8 million text reports.利用在 380 万份文本报告上预训练的深度学习自然语言模型，实现胸部 X 光报告的高精度分类。

Bioinformatics. 2021 Jan 29;36(21):5255-5261. doi: 10.1093/bioinformatics/btaa668.

Hate speech detection and racial bias mitigation in social media based on BERT model.基于 BERT 模型的社交媒体中的仇恨言论检测和种族偏见缓解。

PLoS One. 2020 Aug 27;15(8):e0237861. doi: 10.1371/journal.pone.0237861. eCollection 2020.

引用本文的文献

Self-supervised learning of T cell receptor sequences exposes core properties for T cell membership.基于 TCR 序列的自监督学习揭示了 T 细胞身份的核心特征。

Sci Adv. 2024 Apr 26;10(17):eadk4670. doi: 10.1126/sciadv.adk4670.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

使用三重 BERT 网络对深度学习蛋白质。

Deep Learning Proteins using a Triplet-BERT network.

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献