Suppr超能文献

具有杂交序列特征的噬菌体病毒蛋白的鉴定

Identification of Phage Viral Proteins With Hybrid Sequence Features.

作者信息

Ru Xiaoqing, Li Lihong, Wang Chunyu

机构信息

School of Information and Electrical Engineering, Hebei University of Engineering, Handan, China.

School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China.

出版信息

Front Microbiol. 2019 Mar 26;10:507. doi: 10.3389/fmicb.2019.00507. eCollection 2019.

Abstract

The uniqueness of bacteriophages plays an important role in bioinformatics research. In real applications, the function of the bacteriophage virion proteins is the main area of interest. Therefore, it is very important to classify bacteriophage virion proteins and non-phage virion proteins accurately. Extracting comprehensive and effective sequence features from proteins plays a vital role in protein classification. In order to more fully represent protein information, this paper is more comprehensive and effective by combining the features extracted by the feature information representation algorithm based on sequence information (CCPA) and the feature representation algorithm based on sequence and structure information. After extracting features, the Max-Relevance-Max-Distance (MRMD) algorithm is used to select the optimal feature set with the strongest correlation between class labels and low redundancy between features. Given the randomness of the samples selected by the random forest classification algorithm and the randomness features for producing each node variable, a random forest method is employed to perform 10-fold cross-validation on the bacteriophage protein classification. The accuracy of this model is as high as 93.5% in the classification of phage proteins in this study. This study also found that, among the eight physicochemical properties considered, the charge property has the greatest impact on the classification of bacteriophage proteins These results indicate that the model discussed in this paper is an important tool in bacteriophage protein research.

摘要

噬菌体的独特性在生物信息学研究中发挥着重要作用。在实际应用中,噬菌体病毒粒子蛋白的功能是主要关注领域。因此,准确区分噬菌体病毒粒子蛋白和非噬菌体病毒粒子蛋白非常重要。从蛋白质中提取全面有效的序列特征在蛋白质分类中起着至关重要的作用。为了更全面地表示蛋白质信息,本文通过结合基于序列信息的特征信息表示算法(CCPA)和基于序列与结构信息的特征表示算法,使特征提取更加全面有效。提取特征后,使用最大相关最大距离(MRMD)算法选择类标签之间相关性最强且特征之间冗余度低的最优特征集。鉴于随机森林分类算法所选择样本的随机性以及生成每个节点变量的随机特征,采用随机森林方法对噬菌体蛋白质分类进行10折交叉验证。在本研究中,该模型对噬菌体蛋白质分类的准确率高达93.5%。本研究还发现,在所考虑的八种理化性质中,电荷性质对噬菌体蛋白质分类的影响最大。这些结果表明,本文所讨论的模型是噬菌体蛋白质研究中的一个重要工具。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/670b/6443926/f5cbd2df0caf/fmicb-10-00507-g0001.jpg

相似文献

1
Identification of Phage Viral Proteins With Hybrid Sequence Features.
Front Microbiol. 2019 Mar 26;10:507. doi: 10.3389/fmicb.2019.00507. eCollection 2019.
2
RF_phage virion: Classification of phage virion proteins with a random forest model.
Front Genet. 2023 Feb 8;13:1103783. doi: 10.3389/fgene.2022.1103783. eCollection 2022.
3
PVP-SVM: Sequence-Based Prediction of Phage Virion Proteins Using a Support Vector Machine.
Front Microbiol. 2018 Mar 16;9:476. doi: 10.3389/fmicb.2018.00476. eCollection 2018.
4
Identifying Phage Virion Proteins by Using Two-Step Feature Selection Methods.
Molecules. 2018 Aug 10;23(8):2000. doi: 10.3390/molecules23082000.
5
6
Recent Advances of Computational Methods for Identifying Bacteriophage Virion Proteins.
Protein Pept Lett. 2020;27(4):259-264. doi: 10.2174/0929866526666190410124642.
8
Prediction of antioxidant proteins using hybrid feature representation method and random forest.
Genomics. 2020 Nov;112(6):4666-4674. doi: 10.1016/j.ygeno.2020.08.016. Epub 2020 Aug 17.
9
Incorporating Distance-Based Top-n-gram and Random Forest To Identify Electron Transport Proteins.
J Proteome Res. 2019 Jul 5;18(7):2931-2939. doi: 10.1021/acs.jproteome.9b00250. Epub 2019 Jun 3.

引用本文的文献

1
PhaVIP: Phage VIrion Protein classification based on chaos game representation and Vision Transformer.
Bioinformatics. 2023 Jun 30;39(39 Suppl 1):i30-i39. doi: 10.1093/bioinformatics/btad229.
2
DeePVP: Identification and classification of phage virion proteins using deep learning.
Gigascience. 2022 Aug 11;11. doi: 10.1093/gigascience/giac076.
4
Large-scale comparative review and assessment of computational methods for phage virion proteins identification.
EXCLI J. 2022 Jan 3;21:11-29. doi: 10.17179/excli2021-4411. eCollection 2022.
6
Predicting Cell Wall Lytic Enzymes Using Combined Features.
Front Bioeng Biotechnol. 2021 Jan 6;8:627335. doi: 10.3389/fbioe.2020.627335. eCollection 2020.

本文引用的文献

3
Identify origin of replication in Saccharomyces cerevisiae using two-step feature selection technique.
Bioinformatics. 2019 Jun 1;35(12):2075-2083. doi: 10.1093/bioinformatics/bty943.
4
PredT4SE-Stack: Prediction of Bacterial Type IV Secreted Effectors From Protein Sequences Using a Stacked Ensemble Method.
Front Microbiol. 2018 Oct 26;9:2571. doi: 10.3389/fmicb.2018.02571. eCollection 2018.
5
Predicting Influenza Antigenicity by Matrix Completion With Antigen and Antiserum Similarity.
Front Microbiol. 2018 Oct 23;9:2500. doi: 10.3389/fmicb.2018.02500. eCollection 2018.
7
Classifying Included and Excluded Exons in Exon Skipping Event Using Histone Modifications.
Front Genet. 2018 Oct 1;9:433. doi: 10.3389/fgene.2018.00433. eCollection 2018.
9
A Parallel Workflow Pattern Modeling Using Spiking Neural P Systems With Colored Spikes.
IEEE Trans Nanobioscience. 2018 Oct;17(4):474-484. doi: 10.1109/TNB.2018.2873221. Epub 2018 Oct 1.
10
iTerm-PseKNC: a sequence-based tool for predicting bacterial transcriptional terminators.
Bioinformatics. 2019 May 1;35(9):1469-1477. doi: 10.1093/bioinformatics/bty827.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验