Aqil Sadaf, Cadavid Isabel C, Rodrigues Nureyev F, Balbinott Natalia, Zanatta Geancarlo, Margis Rogerio
Programa de Pós-Graduação em Genética e Biologia Molecular, Departamento de Genética, Instituto de Biociências, Universidade Federal do Rio Grande do Sul, Porto Alegre CEP 91501-970, Brazil.
Departamento de Biofísica, Centro de Biotecnologia, Universidade Federal do Rio Grande do Sul, prédio 43422, sala 206, Porto Alegre CEP 91501-970, Brazil.
J Chem Inf Model. 2025 Sep 22;65(18):9425-9434. doi: 10.1021/acs.jcim.5c00916. Epub 2025 Sep 4.
Phytocystatins are proteinaceous inhibitors found in plants that competitively target various classes of cysteine proteinases, including papain-like enzymes, cathepsins, and legumains. Based on structural characteristics and gene organization, phytocystatins can be classified into four subtypes: intronless (I1 and I2), intron-containing (IwI), and multidomain cystatins containing more than one inhibitory region (II). This work presents PhyCysID, a dedicated web server designed for the rapid classification of phytocystatin subtypes. PhyCysID uses a set of 21 features derived from amino acid composition, in combination with 15 distinct machine learning algorithms, to classify phytocystatin sequences into one of the four subtypes. Initially, the input sequence is analyzed to verify if it comprises a true phytocystatin sequence. If so, the input sequence is further analyzed using a specialized classification pipeline called PhyCysID 12M, which integrates 12 machine learning models to assign it to one of the four defined phytocystatin classes. As a case study, a curated dataset of phytocystatin sequences from the UniProt database was used to evaluate the algorithm's performance. The PhyCysID web server enables rapid classification of both individual and batch-submitted sequences in less than 15 s, providing high-throughput analysis for an accurate identification of phytocystatin class and function. PhyCysID is freely available at https://www.ufrgs.br/labec/phycysid.
植物胱抑素是在植物中发现的蛋白质类抑制剂,它能竞争性地靶向各类半胱氨酸蛋白酶,包括木瓜蛋白酶样酶、组织蛋白酶和豆球蛋白。根据结构特征和基因组织,植物胱抑素可分为四种亚型:无内含子型(I1和I2)、含内含子型(IwI)以及含有多个抑制区域的多结构域胱抑素(II)。这项工作展示了PhyCysID,这是一个专门用于快速分类植物胱抑素亚型的网络服务器。PhyCysID使用一组从氨基酸组成中衍生出的21个特征,并结合15种不同的机器学习算法,将植物胱抑素序列分类为四种亚型之一。首先,对输入序列进行分析,以验证它是否包含真正的植物胱抑素序列。如果是这样,将使用一个名为PhyCysID 12M的专门分类流程对输入序列进行进一步分析,该流程整合了12个机器学习模型,将其分配到四个已定义的植物胱抑素类别之一。作为一个案例研究,使用了来自UniProt数据库的一组经过整理的植物胱抑素序列数据集来评估该算法的性能。PhyCysID网络服务器能够在不到15秒的时间内对单个和批量提交的序列进行快速分类,为准确识别植物胱抑素类别和功能提供高通量分析。PhyCysID可在https://www.ufrgs.br/labec/phycysid免费获取。