Macrel:基因组和宏基因组中的抗菌肽筛选
Macrel: antimicrobial peptide screening in genomes and metagenomes.
作者信息
Santos-Júnior Célio Dias, Pan Shaojun, Zhao Xing-Ming, Coelho Luis Pedro
机构信息
Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai, China.
Ministry of Education, Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence, Shanghai, China.
出版信息
PeerJ. 2020 Dec 18;8:e10555. doi: 10.7717/peerj.10555. eCollection 2020.
MOTIVATION
Antimicrobial peptides (AMPs) have the potential to tackle multidrug-resistant pathogens in both clinical and non-clinical contexts. The recent growth in the availability of genomes and metagenomes provides an opportunity for in silico prediction of novel AMP molecules. However, due to the small size of these peptides, standard gene prospection methods cannot be applied in this domain and alternative approaches are necessary. In particular, standard gene prediction methods have low precision for short peptides, and functional classification by homology results in low recall.
RESULTS
Here, we present Macrel (for metagenomic AMP classification and retrieval), which is an end-to-end pipeline for the prospection of high-quality AMP candidates from (meta)genomes. For this, we introduce a novel set of 22 peptide features. These were used to build classifiers which perform similarly to the state-of-the-art in the prediction of both antimicrobial and hemolytic activity of peptides, but with enhanced precision (using standard benchmarks as well as a stricter testing regime). We demonstrate that Macrel recovers high-quality AMP candidates using realistic simulations and real data.
AVAILABILITY
Macrel is implemented in Python 3. It is available as open source at https://github.com/BigDataBiology/macrel and through bioconda. Classification of peptides or prediction of AMPs in contigs can also be performed on the webserver: https://big-data-biology.org/software/macrel.
动机
抗菌肽(AMPs)在临床和非临床环境中都有潜力应对多重耐药病原体。基因组和宏基因组可用性的近期增长为新型AMPs分子的计算机预测提供了机会。然而,由于这些肽的长度较短,标准的基因探测方法无法应用于该领域,因此需要替代方法。特别是,标准的基因预测方法对短肽的精度较低,通过同源性进行功能分类的召回率也较低。
结果
在此,我们展示了Macrel(用于宏基因组AMPs分类和检索),这是一种用于从(宏)基因组中探测高质量AMP候选物的端到端流程。为此,我们引入了一组22个新颖的肽特征。这些特征被用于构建分类器,在肽的抗菌和溶血活性预测方面,其表现与当前最先进的方法相似,但精度有所提高(使用标准基准以及更严格的测试方案)。我们证明,Macrel通过实际模拟和真实数据能够找到高质量的AMP候选物。
可用性
Macrel用Python 3实现。它以开源形式提供,可在https://github.com/BigDataBiology/macrel获取,也可通过生物conda获取。肽的分类或重叠群中AMPs的预测也可以在网络服务器上进行:https://big-data-biology.org/software/macrel。