• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

利用机器学习对PE_PGRS蛋白进行计算分析和预测。

Computational analysis and prediction of PE_PGRS proteins using machine learning.

作者信息

Li Fuyi, Guo Xudong, Xiang Dongxu, Pitt Miranda E, Bainomugisa Arnold, Coin Lachlan J M

机构信息

Department of Microbiology and Immunology, The Peter Doherty Institute for Infection and Immunity, The University of Melbourne, 792 Elizabeth Street, Melbourne, VIC 3000, Australia.

School of Information Engineering, Ningxia University, Yinchuan, Ningxia 750021, China.

出版信息

Comput Struct Biotechnol J. 2022 Jan 22;20:662-674. doi: 10.1016/j.csbj.2022.01.019. eCollection 2022.

DOI:10.1016/j.csbj.2022.01.019
PMID:35140886
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8804200/
Abstract

genome comprises approximately 10% of two families of poorly characterised genes due to their high GC content and highly repetitive nature. The largest sub-group, the proline-glutamic acid polymorphic guanine-cytosine-rich sequence (PE_PGRS) family, is thought to be involved in host response and disease pathogenicity. Due to their high genetic variability and complexity of analysis, they are typically disregarded for further research in genomic studies. There are currently limited online resources and homology computational tools that can identify and analyse PE_PGRS proteins. In addition, they are computational-intensive and time-consuming, and lack sensitivity. Therefore, computational methods that can rapidly and accurately identify PE_PGRS proteins are valuable to facilitate the functional elucidation of the PE_PGRS family proteins. In this study, we developed the first machine learning-based bioinformatics approach, termed PEPPER, to allow users to identify PE_PGRS proteins rapidly and accurately. PEPPER was built upon a comprehensive evaluation of 13 popular machine learning algorithms with various sequence and physicochemical features. Empirical studies demonstrated that PEPPER achieved significantly better performance than alignment-based approaches, BLASTP and PHMMER, in both prediction accuracy and speed. PEPPER is anticipated to facilitate community-wide efforts to conduct high-throughput identification and analysis of PE_PGRS proteins.

摘要

由于高GC含量和高度重复的性质,基因组中约10%由两个特征不明的基因家族组成。最大的亚组,即脯氨酸-谷氨酸多态性富含鸟嘌呤-胞嘧啶序列(PE_PGRS)家族,被认为与宿主反应和疾病致病性有关。由于其高遗传变异性和分析的复杂性,它们在基因组研究中通常被忽视而不再进一步研究。目前,能够识别和分析PE_PGRS蛋白的在线资源和同源性计算工具有限。此外,它们计算量大、耗时且缺乏敏感性。因此,能够快速准确识别PE_PGRS蛋白的计算方法对于促进PE_PGRS家族蛋白的功能阐释具有重要价值。在本研究中,我们开发了第一种基于机器学习的生物信息学方法,称为PEPPER,以允许用户快速准确地识别PE_PGRS蛋白。PEPPER是基于对13种具有各种序列和物理化学特征的流行机器学习算法的全面评估构建的。实证研究表明,在预测准确性和速度方面,PEPPER的性能明显优于基于比对的方法BLASTP和PHMMER。预计PEPPER将促进全社区对PE_PGRS蛋白进行高通量识别和分析的努力。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d35b/8804200/c56cc83439fe/gr5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d35b/8804200/117361d03c89/ga1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d35b/8804200/a80ab3fa73e2/gr1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d35b/8804200/ef4bb8414eab/gr2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d35b/8804200/46780e849fd5/gr3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d35b/8804200/a30945fe5584/gr4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d35b/8804200/c56cc83439fe/gr5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d35b/8804200/117361d03c89/ga1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d35b/8804200/a80ab3fa73e2/gr1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d35b/8804200/ef4bb8414eab/gr2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d35b/8804200/46780e849fd5/gr3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d35b/8804200/a30945fe5584/gr4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d35b/8804200/c56cc83439fe/gr5.jpg

相似文献

1
Computational analysis and prediction of PE_PGRS proteins using machine learning.利用机器学习对PE_PGRS蛋白进行计算分析和预测。
Comput Struct Biotechnol J. 2022 Jan 22;20:662-674. doi: 10.1016/j.csbj.2022.01.019. eCollection 2022.
2
Digerati - A multipath parallel hybrid deep learning framework for the identification of mycobacterial PE/PPE proteins.Digerati-一种用于鉴定分枝杆菌 PE/PPE 蛋白的多路径并行混合深度学习框架。
Comput Biol Med. 2023 Sep;163:107155. doi: 10.1016/j.compbiomed.2023.107155. Epub 2023 Jun 21.
3
PE_PGRS antigens of Mycobacterium tuberculosis induce maturation and activation of human dendritic cells.结核分枝杆菌的 PE_PGRS 抗原诱导人树突状细胞的成熟和激活。
J Immunol. 2010 Apr 1;184(7):3495-504. doi: 10.4049/jimmunol.0903299. Epub 2010 Feb 22.
4
The PE-PGRS glycine-rich proteins of Mycobacterium tuberculosis: a new family of fibronectin-binding proteins?结核分枝杆菌富含甘氨酸的PE-PGRS蛋白:一个新的纤连蛋白结合蛋白家族?
Microbiology (Reading). 1999 Dec;145 ( Pt 12):3487-3495. doi: 10.1099/00221287-145-12-3487.
5
PE_PGRS proteins of : A specialized molecular task force at the forefront of host-pathogen interaction.PE_PGRS 蛋白:处于宿主-病原体相互作用最前沿的专门分子特遣部队。
Virulence. 2020 Dec;11(1):898-915. doi: 10.1080/21505594.2020.1785815.
6
Sequence diversity in the pe_pgrs genes of Mycobacterium tuberculosis is independent of human T cell recognition.结核分枝杆菌pe_pgrs基因中的序列多样性独立于人类T细胞识别。
mBio. 2014 Jan 14;5(1):e00960-13. doi: 10.1128/mBio.00960-13.
7
PE_PGRS: Vital proteins in promoting mycobacterial survival and modulating host immunity and metabolism.PE_PGRS:促进分枝杆菌存活、调节宿主免疫和代谢的重要蛋白。
Cell Microbiol. 2021 Mar;23(3):e13290. doi: 10.1111/cmi.13290. Epub 2020 Dec 1.
8
Evolution of smooth tubercle Bacilli PE and PE_PGRS genes: evidence for a prominent role of recombination and imprint of positive selection.光滑棒状杆菌 PE 和 PE_PGRS 基因的进化:重组和正选择印记的突出作用证据。
PLoS One. 2013 May 21;8(5):e64718. doi: 10.1371/journal.pone.0064718. Print 2013.
9
An overview to understand the role of PE_PGRS family proteins in Mycobacterium tuberculosis H37 Rv and their potential as new drug targets.了解PE_PGRS家族蛋白在结核分枝杆菌H37 Rv中的作用及其作为新药物靶点潜力的综述。
Biotechnol Appl Biochem. 2015 Mar-Apr;62(2):145-53. doi: 10.1002/bab.1266. Epub 2014 Nov 11.
10
Evidence that mycobacterial PE_PGRS proteins are cell surface constituents that influence interactions with other cells.分枝杆菌PE_PGRS蛋白是影响与其他细胞相互作用的细胞表面成分的证据。
Infect Immun. 2001 Dec;69(12):7326-33. doi: 10.1128/IAI.69.12.7326-7333.2001.

引用本文的文献

1
An efficient machine-learning framework for predicting protein post-translational modification sites.一种用于预测蛋白质翻译后修饰位点的高效机器学习框架。
Sci Rep. 2025 Aug 25;15(1):31179. doi: 10.1038/s41598-025-13178-x.
2
AAGP integrates physicochemical and compositional features for machine learning-based prediction of anti-aging peptides.AAGP整合物理化学和组成特征,用于基于机器学习的抗衰肽预测。
Sci Rep. 2025 Aug 8;15(1):29036. doi: 10.1038/s41598-025-12759-0.
3
SaGP: identifying plant saline-alkali tolerance genes based on machine learning techniques.

本文引用的文献

1
Positive-unlabeled learning in bioinformatics and computational biology: a brief review.生物信息学和计算生物学中的正无标记学习:简要综述。
Brief Bioinform. 2022 Jan 17;23(1). doi: 10.1093/bib/bbab461.
2
Learning embedding features based on multisense-scaled attention architecture to improve the predictive performance of anticancer peptides.基于多尺度注意力架构学习嵌入特征,以提高抗癌肽的预测性能。
Bioinformatics. 2021 Dec 11;37(24):4684-4693. doi: 10.1093/bioinformatics/btab560.
3
Porpoise: a new approach for accurate prediction of RNA pseudouridine sites.
SaGP:基于机器学习技术鉴定植物耐盐碱基因
Front Plant Sci. 2025 Jul 16;16:1629794. doi: 10.3389/fpls.2025.1629794. eCollection 2025.
4
Machine learning-based prediction of antibiotic resistance in Mycobacterium tuberculosis clinical isolates from Uganda.基于机器学习对乌干达结核分枝杆菌临床分离株抗生素耐药性的预测
BMC Infect Dis. 2024 Dec 5;24(1):1391. doi: 10.1186/s12879-024-10282-7.
5
Deciphering the role of VapBC13 and VapBC26 toxin antitoxin systems in the pathophysiology of Mycobacterium tuberculosis.解析 VapBC13 和 VapBC26 毒素抗毒素系统在结核分枝杆菌病理生理学中的作用。
Commun Biol. 2024 Oct 30;7(1):1417. doi: 10.1038/s42003-024-06998-6.
6
GP-HTNLoc: A graph prototype head-tail network-based model for multi-label subcellular localization prediction of ncRNAs.GP-HTNLoc:一种基于图原型头-尾网络的非编码RNA多标签亚细胞定位预测模型。
Comput Struct Biotechnol J. 2024 May 3;23:2034-2048. doi: 10.1016/j.csbj.2024.04.052. eCollection 2024 Dec.
7
MERITS: a web-based integrated PE/PPE protein database.优点:一个基于网络的整合型PE/PPE蛋白数据库。
Bioinform Adv. 2024 Mar 2;4(1):vbae035. doi: 10.1093/bioadv/vbae035. eCollection 2024.
8
Deepstacked-AVPs: predicting antiviral peptides using tri-segment evolutionary profile and word embedding based multi-perspective features with deep stacking model.深度堆叠 AVPs:使用三片段进化特征和基于单词嵌入的多视角特征与深度堆叠模型预测抗病毒肽。
BMC Bioinformatics. 2024 Mar 7;25(1):102. doi: 10.1186/s12859-024-05726-5.
9
Design of a Multi-Epitope Vaccine against Tuberculosis from PE_PGRS49 and PE_PGRS56 Proteins by Reverse Vaccinology.基于反向疫苗学设计针对结核分枝杆菌PE_PGRS49和PE_PGRS56蛋白的多表位疫苗
Microorganisms. 2023 Jun 24;11(7):1647. doi: 10.3390/microorganisms11071647.
10
Clarion is a multi-label problem transformation method for identifying mRNA subcellular localizations.Clarion 是一种多标签问题转换方法,用于识别 mRNA 亚细胞定位。
Brief Bioinform. 2022 Nov 19;23(6). doi: 10.1093/bib/bbac467.
海豚:一种准确预测 RNA 假尿嘧啶位点的新方法。
Brief Bioinform. 2021 Nov 5;22(6). doi: 10.1093/bib/bbab245.
4
Attention-based multi-label neural networks for integrated prediction and interpretation of twelve widely occurring RNA modifications.基于注意力的多标签神经网络,用于十二种广泛存在的 RNA 修饰的综合预测和解释。
Nat Commun. 2021 Jun 29;12(1):4011. doi: 10.1038/s41467-021-24313-3.
5
DeepIPs: comprehensive assessment and computational identification of phosphorylation sites of SARS-CoV-2 infection using a deep learning-based approach.DeepIPs:基于深度学习的方法对 SARS-CoV-2 感染的磷酸化位点进行全面评估和计算识别。
Brief Bioinform. 2021 Nov 5;22(6). doi: 10.1093/bib/bbab244.
6
mRNALocater: Enhance the prediction accuracy of eukaryotic mRNA subcellular localization by using model fusion strategy.mRNA 定位器:通过模型融合策略提高真核 mRNA 亚细胞定位的预测准确性。
Mol Ther. 2021 Aug 4;29(8):2617-2623. doi: 10.1016/j.ymthe.2021.04.004. Epub 2021 Apr 3.
7
iLearnPlus: a comprehensive and automated machine-learning platform for nucleic acid and protein sequence analysis, prediction and visualization.iLearnPlus:一个全面的、自动化的机器学习平台,用于核酸和蛋白质序列分析、预测和可视化。
Nucleic Acids Res. 2021 Jun 4;49(10):e60. doi: 10.1093/nar/gkab122.
8
Protein subcellular localization based on deep image features and criterion learning strategy.基于深度图像特征和准则学习策略的蛋白质亚细胞定位。
Brief Bioinform. 2021 Jul 20;22(4). doi: 10.1093/bib/bbaa313.
9
DeepYY1: a deep learning approach to identify YY1-mediated chromatin loops.DeepYY1:一种用于识别 YY1 介导的染色质环的深度学习方法。
Brief Bioinform. 2021 Jul 20;22(4). doi: 10.1093/bib/bbaa356.
10
PE_PGRS: Vital proteins in promoting mycobacterial survival and modulating host immunity and metabolism.PE_PGRS:促进分枝杆菌存活、调节宿主免疫和代谢的重要蛋白。
Cell Microbiol. 2021 Mar;23(3):e13290. doi: 10.1111/cmi.13290. Epub 2020 Dec 1.