黛西：一个综合的重复蛋白注释服务。

Daisy: An integrated repeat protein curation service.

机构信息

Department of Engineering, Pontifical Catholic University of Peru, Lima 32, Peru.

出版信息

J Struct Biol. 2023 Dec;215(4):108033. doi: 10.1016/j.jsb.2023.108033. Epub 2023 Oct 3.

DOI:10.1016/j.jsb.2023.108033

Abstract

Tandem repeats in proteins identification, classification and curation is a complex process that requires manual processing from experts, processing power and time. There are recent and relevant advances applying machine learning for protein structure prediction and repeat classification that are useful for this process. However, no service contemplates required databases and software to supplement researching on repeat proteins. In this publication we present Daisy, an integrated repeat protein curation web service. This service can process Protein Data Bank (PDB) and the AlphaFold Database entries for tandem repeats identification. In addition, it uses an algorithm to search a sequence against a library of Pfam hidden Markov model (HMM). Repeat classifications are associated with the identified families through RepeatsDB. This prediction is considered for enhancing the ReUPred algorithm execution and hastening the repeat units identification process. The service can also operate every associated PDB and AlphaFold structure with a UniProt proteome registry. Availability: The Daisy web service is freely accessible at daisy.bioinformatica.org.

摘要

蛋白质串联重复的鉴定、分类和注释是一个复杂的过程，需要专家进行人工处理，耗费大量的处理能力和时间。最近有一些相关的应用机器学习进行蛋白质结构预测和重复分类的进展，这对这个过程很有用。然而，目前没有服务考虑到需要的数据库和软件来补充重复蛋白质的研究。在本出版物中，我们介绍了 Daisy，这是一个集成的重复蛋白质注释网络服务。该服务可以处理蛋白质数据库 (PDB) 和 AlphaFold 数据库条目，以识别串联重复。此外，它还使用一种算法在 Pfam 隐马尔可夫模型 (HMM) 库中搜索序列。通过 RepeatsDB 将重复分类与鉴定的家族相关联。这一预测有助于增强 ReUPred 算法的执行，并加速重复单元的识别过程。该服务还可以对每个相关的 PDB 和 AlphaFold 结构与 UniProt 蛋白质组注册中心进行操作。可用性：Daisy 网络服务可在 daisy.bioinformatica.org 免费访问。