Canada's Michael Smith Genome Sciences Centre, BC Cancer Agency, Vancouver, BC, V5Z 4S6, Canada.
Bioinformatics Graduate Program, University of British Columbia, Vancouver, BC, V6T 1Z4, Canada.
BMC Res Notes. 2023 Feb 2;16(1):11. doi: 10.1186/s13104-023-06279-1.
Antibiotic resistance is a rising global threat to human health and is prompting researchers to seek effective alternatives to conventional antibiotics, which include antimicrobial peptides (AMPs). Recently, we have reported AMPlify, an attentive deep learning model for predicting AMPs in databases of peptide sequences. In our tests, AMPlify outperformed the state-of-the-art. We have illustrated its use on data describing the American bullfrog (Rana [Lithobates] catesbeiana) genome. Here we present the model files and training/test data sets we used in that study. The original model (the balanced model) was trained on a balanced set of AMP and non-AMP sequences curated from public databases. In this data note, we additionally provide a model trained on an imbalanced set, in which non-AMP sequences far outnumber AMP sequences. We note that the balanced and imbalanced models would serve different use cases, and both would serve the research community, facilitating the discovery and development of novel AMPs.
This data note provides two sets of models, as well as two AMP and four non-AMP sequence sets for training and testing the balanced and imbalanced models. Each model set includes five single sub-models that form an ensemble model. The first model set corresponds to the original model trained on a balanced training set that has been described in the original AMPlify manuscript, while the second model set was trained on an imbalanced training set.
抗生素耐药性是对人类健康的一个日益严重的全球性威胁,促使研究人员寻求替代传统抗生素的有效方法,包括抗菌肽(AMPs)。最近,我们报告了 AMPlify,这是一种用于在肽序列数据库中预测 AMP 的专注于深度学习的模型。在我们的测试中,AMPlify 的表现优于最先进的模型。我们已经在描述美洲牛蛙(Rana [Lithobates] catesbeiana)基因组的数据上展示了它的用途。在这里,我们提供了在该研究中使用的模型文件和训练/测试数据集。原始模型(平衡模型)是在从公共数据库中精心挑选的 AMP 和非 AMP 序列的平衡集上进行训练的。在本数据说明中,我们还提供了在不平衡集上训练的模型,其中非 AMP 序列的数量远远超过 AMP 序列。我们注意到,平衡模型和不平衡模型将用于不同的用例,两者都将为研究界提供服务,促进新型 AMP 的发现和开发。
本数据说明提供了两组模型,以及两组 AMP 和四组非 AMP 序列集,用于训练和测试平衡模型和不平衡模型。每个模型集都包含五个形成集成模型的单个子模型。第一组模型对应于在已描述于原始 AMPlify 手稿的平衡训练集上进行训练的原始模型,而第二组模型是在不平衡训练集上进行训练的。