Centro de Análises Proteômicas e Bioquímicas, Pós-Graduação em Ciências Genômicas e Biotecnologia Universidade Católica de Brasília, Brasília, DF, Brazil; Porto Reports, Brasília, DF, Brazil.
Centro de Análises Proteômicas e Bioquímicas, Pós-Graduação em Ciências Genômicas e Biotecnologia Universidade Católica de Brasília, Brasília, DF, Brazil.
Biotechnol Adv. 2017 May-Jun;35(3):337-349. doi: 10.1016/j.biotechadv.2017.02.001. Epub 2017 Feb 12.
Data mining has been recognized by many researchers as a hot topic in different areas. In the post-genomic era, the growing number of sequences deposited in databases has been the reason why these databases have become a resource for novel biological information. In recent years, the identification of antimicrobial peptides (AMPs) in databases has gained attention. The identification of unannotated AMPs has shed some light on the distribution and evolution of AMPs and, in some cases, indicated suitable candidates for developing novel antimicrobial agents. The data mining process has been performed mainly by local alignments and/or regular expressions. Nevertheless, for the identification of distant homologous sequences, other techniques such as antimicrobial activity prediction and molecular modelling are required. In this context, this review addresses the tools and techniques, and also their limitations, for mining AMPs from databases. These methods could be helpful not only for the development of novel AMPs, but also for other kinds of proteins, at a higher level of structural genomics. Moreover, solving the problem of unannotated proteins could bring immeasurable benefits to society, especially in the case of AMPs, which could be helpful for developing novel antimicrobial agents and combating resistant bacteria.
数据挖掘已被许多研究人员视为不同领域的热门话题。在后基因组时代,越来越多的序列被存入数据库,这使得这些数据库成为新生物信息的资源。近年来,数据库中抗菌肽(AMPs)的鉴定引起了关注。未注释的 AMPs 的鉴定揭示了 AMPs 的分布和进化,并在某些情况下为开发新型抗菌剂提供了合适的候选者。数据挖掘过程主要通过局部比对和/或正则表达式来完成。然而,对于鉴定远距离同源序列,需要其他技术,如抗菌活性预测和分子建模。在这种情况下,本文综述了从数据库中挖掘 AMPs 的工具和技术,以及它们的局限性。这些方法不仅有助于开发新型 AMPs,而且有助于结构基因组学更高水平的其他类型的蛋白质。此外,解决未注释蛋白质的问题将给社会带来不可估量的好处,特别是对于 AMPs,这对于开发新型抗菌剂和对抗耐药菌可能有帮助。