Department of Computer Science, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil ; Department of Biochemistry and Immunology, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil.
Department of Computer Science, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil.
PLoS One. 2014 Feb 19;9(2):e89162. doi: 10.1371/journal.pone.0089162. eCollection 2014.
The volume and diversity of biological data are increasing at very high rates. Vast amounts of protein sequences and structures, protein and genetic interactions and phenotype studies have been produced. The majority of data generated by high-throughput devices is automatically annotated because manually annotating them is not possible. Thus, efficient and precise automatic annotation methods are required to ensure the quality and reliability of both the biological data and associated annotations. We proposed ENZYMatic Annotation Predictor (ENZYMAP), a technique to characterize and predict EC number changes based on annotations from UniProt/Swiss-Prot using a supervised learning approach. We evaluated ENZYMAP experimentally, using test data sets from both UniProt/Swiss-Prot and UniProt/TrEMBL, and showed that predicting EC changes using selected types of annotation is possible. Finally, we compared ENZYMAP and DETECT with respect to their predictions and checked both against the UniProt/Swiss-Prot annotations. ENZYMAP was shown to be more accurate than DETECT, coming closer to the actual changes in UniProt/Swiss-Prot. Our proposal is intended to be an automatic complementary method (that can be used together with other techniques like the ones based on protein sequence and structure) that helps to improve the quality and reliability of enzyme annotations over time, suggesting possible corrections, anticipating annotation changes and propagating the implicit knowledge for the whole dataset.
生物数据的数量和多样性正在以非常高的速度增长。大量的蛋白质序列和结构、蛋白质和遗传相互作用以及表型研究已经产生。由于手动注释它们是不可能的,因此需要高效和精确的自动注释方法,以确保生物数据及其相关注释的质量和可靠性。
我们提出了 ENZYMatic Annotation Predictor (ENZYMAP),这是一种使用监督学习方法根据 UniProt/Swiss-Prot 中的注释来描述和预测 EC 编号变化的技术。我们使用来自 UniProt/Swiss-Prot 和 UniProt/TrEMBL 的测试数据集对 ENZYMAP 进行了实验评估,并表明使用选定类型的注释来预测 EC 变化是可行的。最后,我们比较了 ENZYMAP 和 DETECT 在预测方面的表现,并将两者与 UniProt/Swiss-Prot 的注释进行了比较。结果表明,ENZYMAP 比 DETECT 更准确,更接近 UniProt/Swiss-Prot 的实际变化。
我们的提议旨在成为一种自动补充方法(可以与其他技术结合使用,如基于蛋白质序列和结构的技术),有助于随着时间的推移提高酶注释的质量和可靠性,建议进行可能的更正,预测注释变化并传播整个数据集的隐含知识。