Cátedras CONACYT - Departamento de Ciencias de la Computación, Centro de Investigación Científica y de Educación Superior de Ensenada (CICESE), 22860 Ensenada, Baja California, México.
Departamento de Ciencias de la Computación, Centro de Investigación Científica y de Educación Superior de Ensenada (CICESE), 22860 Ensenada, Baja California, México.
Brief Bioinform. 2022 Nov 19;23(6). doi: 10.1093/bib/bbac428.
Antimicrobial peptides (AMPs) have received a great deal of attention given their potential to become a plausible option to fight multi-drug resistant bacteria as well as other pathogens. Quantitative sequence-activity models (QSAMs) have been helpful to discover new AMPs because they allow to explore a large universe of peptide sequences and help reduce the number of wet lab experiments. A main aspect in the building of QSAMs based on shallow learning is to determine an optimal set of protein descriptors (features) required to discriminate between sequences with different antimicrobial activities. These features are generally handcrafted from peptide sequence datasets that are labeled with specific antimicrobial activities. However, recent developments have shown that unsupervised approaches can be used to determine features that outperform human-engineered (handcrafted) features. Thus, knowing which of these two approaches contribute to a better classification of AMPs, it is a fundamental question in order to design more accurate models. Here, we present a systematic and rigorous study to compare both types of features. Experimental outcomes show that non-handcrafted features lead to achieve better performances than handcrafted features. However, the experiments also prove that an improvement in performance is achieved when both types of features are merged. A relevance analysis reveals that non-handcrafted features have higher information content than handcrafted features, while an interaction-based importance analysis reveals that handcrafted features are more important. These findings suggest that there is complementarity between both types of features. Comparisons regarding state-of-the-art deep models show that shallow models yield better performances both when fed with non-handcrafted features alone and when fed with non-handcrafted and handcrafted features together.
抗菌肽 (AMPs) 因其有可能成为对抗多药耐药菌以及其他病原体的可行选择而受到广泛关注。定量序列活性模型 (QSAMs) 有助于发现新的 AMPs,因为它们可以探索大量的肽序列,并有助于减少湿实验室实验的数量。基于浅层学习的 QSAM 构建的一个主要方面是确定一组最佳的蛋白质描述符(特征),这些描述符需要区分具有不同抗菌活性的序列。这些特征通常是从带有特定抗菌活性的肽序列数据集手工制作的。然而,最近的研究表明,无监督方法可用于确定优于人工设计(手工制作)特征的特征。因此,了解这两种方法中的哪一种有助于更好地分类 AMPs,是设计更准确模型的一个基本问题。在这里,我们进行了一项系统而严格的研究,比较了这两种类型的特征。实验结果表明,非手工制作的特征比手工制作的特征能够实现更好的性能。然而,实验还证明,当两种类型的特征合并时,性能会得到提高。相关性分析表明,非手工制作的特征比手工制作的特征具有更高的信息量,而基于交互的重要性分析表明,手工制作的特征更为重要。这些发现表明这两种类型的特征具有互补性。与最先进的深度模型的比较表明,浅层模型在单独使用非手工制作的特征以及同时使用非手工制作和手工制作的特征时,都能产生更好的性能。