College of Intelligence and Computing, Tianjin University, Tianjin, China; College of Computer and Information Engineering, Inner Mongolia Agricultural University, Hohhot, China.
Rehabilitation Department, Heilongjiang Province Land Reclamation Headquarters General Hospital, Harbin, China.
Biochim Biophys Acta Proteins Proteom. 2020 Jun;1868(6):140406. doi: 10.1016/j.bbapap.2020.140406. Epub 2020 Mar 2.
Phage virion protein (PVP) identification plays key role in elucidating relationships between phages and hosts. Moreover, PVP identification can facilitate the design of related biochemical entities. Recently, several machine learning approaches have emerged for this purpose and have shown their potential capacities. In this study, the proposed PVP identifiers are systemically reviewed, and the related algorithms and tools are comprehensively analyzed. We summarized the common framework of these PVP identifiers and constructed our own novel identifiers based upon the framework. Furthermore, we focus on a performance comparison of all PVP identifiers by using a training dataset and an independent dataset. Highlighting the pros and cons of these identifiers demonstrates that g-gap DPC (dipeptide composition) features are capable of representing characteristics of PVPs. Moreover, SVM (support vector machine) is proven to be the more effective classifier to distinguish PVPs and non-PVPs.
噬菌体病毒蛋白 (PVP) 的鉴定在阐明噬菌体与宿主之间的关系方面起着关键作用。此外,PVP 的鉴定可以促进相关生化实体的设计。最近,已经出现了几种用于此目的的机器学习方法,并显示出它们的潜在能力。在这项研究中,我们系统地回顾了所提出的 PVP 标识符,并全面分析了相关的算法和工具。我们总结了这些 PVP 标识符的通用框架,并基于该框架构建了我们自己的新型标识符。此外,我们专注于通过使用训练数据集和独立数据集对所有 PVP 标识符进行性能比较。突出这些标识符的优缺点表明,g-gap DPC(二肽组成)特征能够代表 PVPs 的特征。此外,事实证明 SVM(支持向量机)是区分 PVPs 和非 PVPs 的更有效分类器。