• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

SProtFP:一种基于机器学习的原核生物中小开放阅读框功能分类方法。

SProtFP: a machine learning-based method for functional classification of small ORFs in prokaryotes.

作者信息

Khanduja Akshay, Mohanty Debasisa

机构信息

National Institute of Immunology, Aruna Asaf Ali Marg, New Delhi 110067, India.

出版信息

NAR Genom Bioinform. 2025 Jan 7;7(1):lqae186. doi: 10.1093/nargab/lqae186. eCollection 2025 Mar.

DOI:10.1093/nargab/lqae186
PMID:39781515
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11704790/
Abstract

Small proteins (≤100 amino acids) play important roles across all life forms, ranging from unicellular bacteria to higher organisms. In this study, we have developed SProtFP which is a machine learning-based method for functional annotation of prokaryotic small proteins into selected functional categories. SProtFP uses independent artificial neural networks (ANNs) trained using a combination of physicochemical descriptors for classifying small proteins into antitoxin type 2, bacteriocin, DNA-binding, metal-binding, ribosomal protein, RNA-binding, type 1 toxin and type 2 toxin proteins. We have also trained a model for identification of small open reading frame (smORF)-encoded antimicrobial peptides (AMPs). Comprehensive benchmarking of SProtFP revealed an average area under the receiver operator curve (ROC-AUC) of 0.92 during 10-fold cross-validation and an ROC-AUC of 0.94 and 0.93 on held-out balanced and imbalanced test sets. Utilizing our method to annotate bacterial isolates from the human gut microbiome, we could identify thousands of remote homologs of known small protein families and assign putative functions to uncharacterized proteins. This highlights the utility of SProtFP for large-scale functional annotation of microbiome datasets, especially in cases where sequence homology is low. SProtFP is freely available at http://www.nii.ac.in/sprotfp.html and can be combined with genome annotation tools such as ProsmORF-pred to uncover the functional repertoire of novel small proteins in bacteria.

摘要

小蛋白(≤100个氨基酸)在从单细胞细菌到高等生物的所有生命形式中都发挥着重要作用。在本研究中,我们开发了SProtFP,这是一种基于机器学习的方法,用于将原核小蛋白功能注释到选定的功能类别中。SProtFP使用独立的人工神经网络(ANN),通过结合物理化学描述符进行训练,将小蛋白分类为2型抗毒素、细菌素、DNA结合蛋白、金属结合蛋白、核糖体蛋白、RNA结合蛋白、1型毒素和2型毒素蛋白。我们还训练了一个用于识别小开放阅读框(smORF)编码的抗菌肽(AMP)的模型。SProtFP的综合基准测试显示,在10倍交叉验证期间,受试者工作特征曲线下面积(ROC-AUC)平均为0.92,在保留的平衡和不平衡测试集上的ROC-AUC分别为0.94和0.93。利用我们的方法对人类肠道微生物群中的细菌分离株进行注释,我们可以识别数千个已知小蛋白家族的远缘同源物,并为未表征的蛋白赋予推定功能。这突出了SProtFP在微生物组数据集大规模功能注释中的实用性,特别是在序列同源性较低的情况下。SProtFP可在http://www.nii.ac.in/sprotfp.html免费获取,并且可以与ProsmORF-pred等基因组注释工具结合使用,以揭示细菌中新型小蛋白的功能库。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f11/11704790/ab15e86bdce0/lqae186fig7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f11/11704790/71b246e88241/lqae186fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f11/11704790/b2a272b93fe4/lqae186fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f11/11704790/11995cba0582/lqae186fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f11/11704790/37689bb74bbb/lqae186fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f11/11704790/ca33e176a18a/lqae186fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f11/11704790/28eda70a972d/lqae186fig6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f11/11704790/ab15e86bdce0/lqae186fig7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f11/11704790/71b246e88241/lqae186fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f11/11704790/b2a272b93fe4/lqae186fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f11/11704790/11995cba0582/lqae186fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f11/11704790/37689bb74bbb/lqae186fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f11/11704790/ca33e176a18a/lqae186fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f11/11704790/28eda70a972d/lqae186fig6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f11/11704790/ab15e86bdce0/lqae186fig7.jpg

相似文献

1
SProtFP: a machine learning-based method for functional classification of small ORFs in prokaryotes.SProtFP:一种基于机器学习的原核生物中小开放阅读框功能分类方法。
NAR Genom Bioinform. 2025 Jan 7;7(1):lqae186. doi: 10.1093/nargab/lqae186. eCollection 2025 Mar.
2
ProsmORF-pred: a machine learning-based method for the identification of small ORFs in prokaryotic genomes.ProsmORF-pred:一种基于机器学习的方法,用于鉴定原核基因组中的小开放阅读框。
Brief Bioinform. 2023 May 19;24(3). doi: 10.1093/bib/bbad101.
3
Common and phylogenetically widespread coding for peptides by bacterial small RNAs.细菌小RNA对肽进行编码的现象普遍存在且在系统发育上广泛存在。
BMC Genomics. 2017 Jul 21;18(1):553. doi: 10.1186/s12864-017-3932-y.
4
Automated Prediction and Annotation of Small Open Reading Frames in Microbial Genomes.微生物基因组中小开放阅读框的自动预测和注释。
Cell Host Microbe. 2021 Jan 13;29(1):121-131.e4. doi: 10.1016/j.chom.2020.11.002. Epub 2020 Dec 7.
5
Use of Chou's 5-steps rule to predict the subcellular localization of gram-negative and gram-positive bacterial proteins by multi-label learning based on gene ontology annotation and profile alignment.利用 Chou 的 5 步规则,通过基于基因本体论注释和序列比对的多标签学习,预测革兰氏阴性和革兰氏阳性细菌蛋白质的亚细胞定位。
J Integr Bioinform. 2020 Jun 29;18(1):51-79. doi: 10.1515/jib-2019-0091.
6
Identification of Novel Bacterial Microproteins Encoded by Small Open Reading Frames Using a Computational Proteogenomics Workflow.基于计算蛋白质组学工作流程鉴定由小开放阅读框编码的新型细菌微蛋白。
Methods Mol Biol. 2024;2836:19-34. doi: 10.1007/978-1-0716-4007-4_2.
7
REPARATION: ribosome profiling assisted (re-)annotation of bacterial genomes.REPARATION:核糖体谱分析辅助的细菌基因组(重新)注释
Nucleic Acids Res. 2017 Nov 16;45(20):e168. doi: 10.1093/nar/gkx758.
8
smORFer: a modular algorithm to detect small ORFs in prokaryotes.smORFer:一种用于在原核生物中检测小开放阅读框的模块化算法。
Nucleic Acids Res. 2021 Sep 7;49(15):e89. doi: 10.1093/nar/gkab477.
9
MiPepid: MicroPeptide identification tool using machine learning.MiPepid:基于机器学习的微肽鉴定工具。
BMC Bioinformatics. 2019 Nov 8;20(1):559. doi: 10.1186/s12859-019-3033-9.
10
Accurate annotation of human protein-coding small open reading frames.准确注释人类蛋白质编码的小开放阅读框。
Nat Chem Biol. 2020 Apr;16(4):458-468. doi: 10.1038/s41589-019-0425-0. Epub 2019 Dec 9.

本文引用的文献

1
Discovery of antimicrobial peptides in the global microbiome with machine learning.利用机器学习在全球微生物组中发现抗菌肽。
Cell. 2024 Jul 11;187(14):3761-3778.e16. doi: 10.1016/j.cell.2024.05.013. Epub 2024 Jun 5.
2
Perspectives in Searching Antimicrobial Peptides (AMPs) Produced by the Microbiota.探寻微生物群产生的抗菌肽(AMPs)的新视角。
Microb Ecol. 2023 Dec 1;87(1):8. doi: 10.1007/s00248-023-02313-8.
3
Evolution of YacG to safeguard DNA gyrase from external perturbation.YacG的进化以保护DNA回旋酶免受外部干扰。
Res Microbiol. 2023 Sep-Oct;174(7):104093. doi: 10.1016/j.resmic.2023.104093. Epub 2023 Jun 19.
4
A global data-driven census of small proteins and their potential functions in bacterial virulence.一项基于全球数据的关于小蛋白及其在细菌毒力中潜在功能的普查。
Microlife. 2020 Oct 17;1(1):uqaa002. doi: 10.1093/femsml/uqaa002. eCollection 2020.
5
Fast and accurate protein structure search with Foldseek.使用 Foldseek 进行快速准确的蛋白质结构搜索。
Nat Biotechnol. 2024 Feb;42(2):243-246. doi: 10.1038/s41587-023-01773-0. Epub 2023 May 8.
6
ProsmORF-pred: a machine learning-based method for the identification of small ORFs in prokaryotic genomes.ProsmORF-pred:一种基于机器学习的方法,用于鉴定原核基因组中的小开放阅读框。
Brief Bioinform. 2023 May 19;24(3). doi: 10.1093/bib/bbad101.
7
Evolutionary-scale prediction of atomic-level protein structure with a language model.用语言模型进行原子级蛋白质结构的进化尺度预测。
Science. 2023 Mar 17;379(6637):1123-1130. doi: 10.1126/science.ade2574. Epub 2023 Mar 16.
8
Evolutionary origins and interactomes of human, young microproteins and small peptides translated from short open reading frames.人类、年轻微蛋白和小肽的进化起源和互作组,这些微蛋白和小肽是由短开放阅读框翻译而来的。
Mol Cell. 2023 Mar 16;83(6):994-1011.e18. doi: 10.1016/j.molcel.2023.01.023. Epub 2023 Feb 17.
9
A random forest classifier for protein-protein docking models.一种用于蛋白质-蛋白质对接模型的随机森林分类器。
Bioinform Adv. 2021 Dec 10;2(1):vbab042. doi: 10.1093/bioadv/vbab042. eCollection 2022.
10
RCSB Protein Data Bank (RCSB.org): delivery of experimentally-determined PDB structures alongside one million computed structure models of proteins from artificial intelligence/machine learning.RCSB 蛋白质数据库(RCSB.org):提供实验测定的 PDB 结构以及来自人工智能/机器学习的 100 万个蛋白质计算结构模型。
Nucleic Acids Res. 2023 Jan 6;51(D1):D488-D508. doi: 10.1093/nar/gkac1077.