Suppr超能文献

使用深度半监督学习框架提高单序列预测方法的准确性。

Increasing the accuracy of single sequence prediction methods using a deep semi-supervised learning framework.

机构信息

Department of Computer Science, University College London, London WC1E 6BT, UK.

Biomedical Data Science Laboratory, The Francis Crick Institute, London NW1 1AT, UK.

出版信息

Bioinformatics. 2021 Nov 5;37(21):3744-3751. doi: 10.1093/bioinformatics/btab491.

Abstract

MOTIVATION

Over the past 50 years, our ability to model protein sequences with evolutionary information has progressed in leaps and bounds. However, even with the latest deep learning methods, the modelling of a critically important class of proteins, single orphan sequences, remains unsolved.

RESULTS

By taking a bioinformatics approach to semi-supervised machine learning, we develop Profile Augmentation of Single Sequences (PASS), a simple but powerful framework for building accurate single-sequence methods. To demonstrate the effectiveness of PASS we apply it to the mature field of secondary structure prediction. In doing so we develop S4PRED, the successor to the open-source PSIPRED-Single method, which achieves an unprecedented Q3 score of 75.3% on the standard CB513 test. PASS provides a blueprint for the development of a new generation of predictive methods, advancing our ability to model individual protein sequences.

AVAILABILITY AND IMPLEMENTATION

The S4PRED model is available as open source software on the PSIPRED GitHub repository (https://github.com/psipred/s4pred), along with documentation. It will also be provided as a part of the PSIPRED web service (http://bioinf.cs.ucl.ac.uk/psipred/).

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

在过去的 50 年中,我们利用进化信息对蛋白质序列建模的能力取得了飞速的发展。然而,即使是最新的深度学习方法,对于一类非常重要的蛋白质,即单条孤儿序列的建模仍然没有得到解决。

结果

通过采用生物信息学方法进行半监督机器学习,我们开发了单序列特征增强(PASS),这是一个简单但强大的构建准确单序列方法的框架。为了展示 PASS 的有效性,我们将其应用于二级结构预测这一成熟领域。在此过程中,我们开发了 S4PRED,它是开源 PSIPRED-Single 方法的后继者,在标准 CB513 测试中取得了前所未有的 Q3 得分为 75.3%的成绩。PASS 为新一代预测方法的发展提供了蓝图,提高了我们对单个蛋白质序列建模的能力。

可用性和实现

S4PRED 模型作为开源软件在 PSIPRED GitHub 存储库(https://github.com/psipred/s4pred)上提供,同时提供文档。它也将作为 PSIPRED 网络服务的一部分提供(http://bioinf.cs.ucl.ac.uk/psipred/)。

补充信息

补充数据可在“Bioinformatics”在线获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f7c5/8570780/d18cf702a992/btab491f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验