• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

Deep-Sep:一种基于深度学习的快速准确预测细菌中硒蛋白基因的方法。

deep-Sep: a deep learning-based method for fast and accurate prediction of selenoprotein genes in bacteria.

作者信息

Xiao Yao, Zhang Yan

机构信息

Shenzhen Key Laboratory of Marine Bioresources and Ecology, Brain Disease and Big Data Research Institute, College of Life Sciences and Oceanography, Shenzhen University, Shenzhen, Guangdong, China.

Shenzhen-Hong Kong Institute of Brain Science-Shenzhen Fundamental Research Institutions, Shenzhen, Guangdong, China.

出版信息

mSystems. 2025 Apr 22;10(4):e0125824. doi: 10.1128/msystems.01258-24. Epub 2025 Mar 10.

DOI:10.1128/msystems.01258-24
PMID:40062874
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12013277/
Abstract

Selenoproteins are a special group of proteins with major roles in cellular antioxidant defense. They contain the 21st amino acid selenocysteine (Sec) in the active sites, which is encoded by an in-frame UGA codon. Compared to eukaryotes, identification of selenoprotein genes in bacteria remains challenging due to the absence of an effective strategy for distinguishing the Sec-encoding UGA codon from a normal stop signal. In this study, we have developed a deep learning-based algorithm, deep-Sep, for quickly and precisely identifying selenoprotein genes in bacterial genomic sequences. This algorithm uses a Transformer-based neural network architecture to construct an optimal model for detecting Sec-encoding UGA codons and a homology search-based strategy to remove additional false positives. During the training and testing stages, deep-Sep has demonstrated commendable performance, including an score of 0.939 and an area under the receiver operating characteristic curve of 0.987. Furthermore, when applied to 20 bacterial genomes as independent test data sets, deep-Sep exhibited remarkable capability in identifying both known and new selenoprotein genes, which significantly outperforms the existing state-of-the-art method. Our algorithm has proved to be a powerful tool for comprehensively characterizing selenoprotein genes in bacterial genomes, which should not only assist in accurate annotation of selenoprotein genes in genome sequencing projects but also provide new insights for a deeper understanding of the roles of selenium in bacteria.IMPORTANCESelenium is an essential micronutrient present in selenoproteins in the form of Sec, which is a rare amino acid encoded by the opal stop codon UGA. Identification of all selenoproteins is of vital importance for investigating the functions of selenium in nature. Previous strategies for predicting selenoprotein genes mainly relied on the identification of a special -acting Sec insertion sequence (SECIS) element within mRNAs. However, due to the complexity and variability of SECIS elements, recognition of all selenoprotein genes in bacteria is still a major challenge in the annotation of bacterial genomes. We have developed a deep learning-based algorithm to predict selenoprotein genes in bacterial genomic sequences, which demonstrates superior performance compared to currently available methods. This algorithm can be utilized in either web-based or local (standalone) modes, serving as a promising tool for identifying the complete set of selenoprotein genes in bacteria.

摘要

硒蛋白是一类特殊的蛋白质,在细胞抗氧化防御中发挥着重要作用。它们在活性位点含有第21种氨基酸硒代半胱氨酸(Sec),该氨基酸由框内UGA密码子编码。与真核生物相比,由于缺乏将编码Sec的UGA密码子与正常终止信号区分开的有效策略,细菌中硒蛋白基因的鉴定仍然具有挑战性。在本研究中,我们开发了一种基于深度学习的算法deep-Sep,用于快速、准确地鉴定细菌基因组序列中的硒蛋白基因。该算法使用基于Transformer的神经网络架构构建一个用于检测编码Sec的UGA密码子的优化模型,并采用基于同源性搜索的策略来去除额外的假阳性。在训练和测试阶段,deep-Sep表现出了令人称赞的性能,包括0.939的F1分数和0.987的受试者工作特征曲线下面积。此外,当将其应用于20个细菌基因组作为独立测试数据集时,deep-Sep在鉴定已知和新的硒蛋白基因方面表现出卓越的能力,显著优于现有的最先进方法。我们的算法已被证明是全面表征细菌基因组中硒蛋白基因的有力工具,这不仅有助于在基因组测序项目中准确注释硒蛋白基因,还能为更深入了解硒在细菌中的作用提供新的见解。重要性硒是以Sec形式存在于硒蛋白中的一种必需微量营养素,Sec是由乳白终止密码子UGA编码的稀有氨基酸。鉴定所有硒蛋白对于研究自然界中硒的功能至关重要。以前预测硒蛋白基因的策略主要依赖于识别mRNA内特殊的Sec插入序列(SECIS)元件。然而,由于SECIS元件的复杂性和变异性,识别细菌中的所有硒蛋白基因仍然是细菌基因组注释中的一项重大挑战。我们开发了一种基于深度学习的算法来预测细菌基因组序列中的硒蛋白基因,与现有方法相比,该算法表现出卓越的性能。该算法可以以基于网络或本地(独立)模式使用,是识别细菌中完整硒蛋白基因集的有前途的工具。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5aea/12013277/d848cd9cfed7/msystems.01258-24.f003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5aea/12013277/c0e235506523/msystems.01258-24.f001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5aea/12013277/a83255e4f6af/msystems.01258-24.f002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5aea/12013277/d848cd9cfed7/msystems.01258-24.f003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5aea/12013277/c0e235506523/msystems.01258-24.f001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5aea/12013277/a83255e4f6af/msystems.01258-24.f002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5aea/12013277/d848cd9cfed7/msystems.01258-24.f003.jpg

相似文献

1
deep-Sep: a deep learning-based method for fast and accurate prediction of selenoprotein genes in bacteria.Deep-Sep:一种基于深度学习的快速准确预测细菌中硒蛋白基因的方法。
mSystems. 2025 Apr 22;10(4):e0125824. doi: 10.1128/msystems.01258-24. Epub 2025 Mar 10.
2
Selenoprofiles: A Computational Pipeline for Annotation of Selenoproteins.硒蛋白谱:一种用于硒蛋白注释的计算流程
Methods Mol Biol. 2018;1661:17-28. doi: 10.1007/978-1-4939-7258-6_2.
3
High-level expression in Escherichia coli of selenocysteine-containing rat thioredoxin reductase utilizing gene fusions with engineered bacterial-type SECIS elements and co-expression with the selA, selB and selC genes.利用与工程化细菌型硒代半胱氨酸插入序列元件的基因融合以及与selA、selB和selC基因共表达,在大肠杆菌中实现含硒代半胱氨酸的大鼠硫氧还蛋白还原酶的高水平表达。
J Mol Biol. 1999 Oct 8;292(5):1003-16. doi: 10.1006/jmbi.1999.3085.
4
An algorithm for identification of bacterial selenocysteine insertion sequence elements and selenoprotein genes.一种用于鉴定细菌硒代半胱氨酸插入序列元件和硒蛋白基因的算法。
Bioinformatics. 2005 Jun 1;21(11):2580-9. doi: 10.1093/bioinformatics/bti400. Epub 2005 Mar 29.
5
Selenium. Role of the essential metalloid in health.硒。这种必需类金属在健康中的作用。
Met Ions Life Sci. 2013;13:499-534. doi: 10.1007/978-94-007-7500-8_16.
6
Dynamic evolution of selenocysteine utilization in bacteria: a balance between selenoprotein loss and evolution of selenocysteine from redox active cysteine residues.细菌中硒代半胱氨酸利用的动态演变:硒蛋白丧失与氧化还原活性半胱氨酸残基向硒代半胱氨酸演变之间的平衡。
Genome Biol. 2006;7(10):R94. doi: 10.1186/gb-2006-7-10-r94. Epub 2006 Oct 20.
7
Identification of the Selenoprotein S Positive UGA Recoding (SPUR) element and its position-dependent activity.硒蛋白 S 阳性 UGA 重编码(SPUR)元件的鉴定及其位置依赖性活性。
RNA Biol. 2019 Dec;16(12):1682-1696. doi: 10.1080/15476286.2019.1653681. Epub 2019 Aug 21.
8
Factors impacting the aminoglycoside-induced UGA stop codon readthrough in selenoprotein translation.影响氨基糖苷类药物诱导的硒蛋白翻译中UGA终止密码子通读的因素。
J Trace Elem Med Biol. 2016 Sep;37:104-110. doi: 10.1016/j.jtemb.2016.04.010. Epub 2016 Apr 26.
9
SECISearch3 and Seblastian: new tools for prediction of SECIS elements and selenoproteins.SECISearch3 和 Seblastian:预测 SECIS 元件和硒蛋白的新工具。
Nucleic Acids Res. 2013 Aug;41(15):e149. doi: 10.1093/nar/gkt550. Epub 2013 Jun 19.
10
A regulatory role for Sec tRNA[Ser]Sec in selenoprotein synthesis.硒代半胱氨酸转运RNA(tRNA[Ser]Sec)在硒蛋白合成中的调节作用。
RNA. 2004 Jul;10(7):1142-52. doi: 10.1261/rna.7370104.

本文引用的文献

1
Natural language processing with transformers: a review.基于Transformer的自然语言处理综述。
PeerJ Comput Sci. 2024 Aug 7;10:e2222. doi: 10.7717/peerj-cs.2222. eCollection 2024.
2
Application of Transformers in Cheminformatics.Transformer 在化学信息学中的应用。
J Chem Inf Model. 2024 Jun 10;64(11):4392-4409. doi: 10.1021/acs.jcim.3c02070. Epub 2024 May 30.
3
Biological and Catalytic Properties of Selenoproteins.硒蛋白的生物学和催化特性。
Int J Mol Sci. 2023 Jun 14;24(12):10109. doi: 10.3390/ijms241210109.
4
Selenium and Selenoproteins in Health.硒与硒蛋白在健康中的作用。
Biomolecules. 2023 May 8;13(5):799. doi: 10.3390/biom13050799.
5
Metal3D: a general deep learning framework for accurate metal ion location prediction in proteins.Metal3D:一种用于准确预测蛋白质中金属离子位置的通用深度学习框架。
Nat Commun. 2023 May 11;14(1):2713. doi: 10.1038/s41467-023-37870-6.
6
Applications of transformer-based language models in bioinformatics: a survey.基于Transformer的语言模型在生物信息学中的应用:一项综述。
Bioinform Adv. 2023 Jan 11;3(1):vbad001. doi: 10.1093/bioadv/vbad001. eCollection 2023.
7
Identification of metal ion-binding sites in RNA structures using deep learning method.使用深度学习方法鉴定RNA结构中的金属离子结合位点。
Brief Bioinform. 2023 Mar 19;24(2). doi: 10.1093/bib/bbad049.
8
Prediction of RNA-protein interactions using a nucleotide language model.使用核苷酸语言模型预测RNA-蛋白质相互作用。
Bioinform Adv. 2022 Apr 7;2(1):vbac023. doi: 10.1093/bioadv/vbac023. eCollection 2022.
9
Redefining pseudokinases: A look at the untapped enzymatic potential of pseudokinases.重新定义假激酶:探索假激酶未开发的酶学潜力。
IUBMB Life. 2023 Apr;75(4):370-376. doi: 10.1002/iub.2698. Epub 2023 Jan 5.
10
Eight Unexpected Selenoprotein Families in Organometallic Biochemistry in Clostridium difficile, in ABC Transport, and in Methylmercury Biosynthesis.艰难梭菌的金属有机生物化学、ABC 转运蛋白和甲基汞生物合成中 8 个意想不到的硒蛋白家族
J Bacteriol. 2023 Jan 26;205(1):e0025922. doi: 10.1128/jb.00259-22. Epub 2023 Jan 4.