Suppr超能文献

微生物基因组中小开放阅读框的自动预测和注释。

Automated Prediction and Annotation of Small Open Reading Frames in Microbial Genomes.

机构信息

Department of Genetics, Stanford University, Stanford, CA 94305, USA; Department of Medicine (Hematology, Blood and Marrow Transplantation), Stanford University, Stanford, CA 94305, USA.

Department of Genetics, Stanford University, Stanford, CA 94305, USA; Department of Medicine (Hematology, Blood and Marrow Transplantation), Stanford University, Stanford, CA 94305, USA.

出版信息

Cell Host Microbe. 2021 Jan 13;29(1):121-131.e4. doi: 10.1016/j.chom.2020.11.002. Epub 2020 Dec 7.

Abstract

Small open reading frames (smORFs) and their encoded microproteins play central roles in microbes. However, there is a vast unexplored space of smORFs within human-associated microbes. A recent bioinformatic analysis used evolutionary conservation signals to enhance prediction of small protein families. To facilitate the annotation of specific smORFs, we introduce SmORFinder. This tool combines profile hidden Markov models of each smORF family and deep learning models that better generalize to smORF families not seen in the training set, resulting in predictions enriched for Ribo-seq translation signals. Feature importance analysis reveals that the deep learning models learn to identify Shine-Dalgarno sequences, deprioritize the wobble position in each codon, and group codon synonyms found in the codon table. A core-genome analysis of 26 bacterial species identifies several core smORFs of unknown function. We pre-compute smORF annotations for thousands of RefSeq isolate genomes and Human Microbiome Project metagenomes and provide these data through a public web portal.

摘要

小开放阅读框(smORFs)及其编码的微蛋白在微生物中起着核心作用。然而,在与人类相关的微生物中,仍有大量尚未探索的 smORFs。最近的一项生物信息学分析利用进化保守信号来增强对小蛋白家族的预测。为了方便特定 smORF 的注释,我们引入了 SmORFinder。该工具结合了每个 smORF 家族的轮廓隐马尔可夫模型和能够更好地泛化到训练集中未见过的 smORF 家族的深度学习模型,从而使预测结果富含核糖体测序翻译信号。特征重要性分析表明,深度学习模型学会了识别 Shine-Dalgarno 序列,降低每个密码子中摆动位置的优先级,并将密码子表中发现的密码子同义词分组。对 26 种细菌物种的核心基因组分析确定了几个未知功能的核心 smORFs。我们为数千个 RefSeq 分离基因组和人类微生物组计划宏基因组预先计算了 smORF 注释,并通过公共网络门户提供这些数据。

相似文献

2
Accurate annotation of human protein-coding small open reading frames.准确注释人类蛋白质编码的小开放阅读框。
Nat Chem Biol. 2020 Apr;16(4):458-468. doi: 10.1038/s41589-019-0425-0. Epub 2019 Dec 9.

引用本文的文献

3
Cutting-edge deep-learning based tools for metagenomic research.用于宏基因组学研究的前沿深度学习工具。
Natl Sci Rev. 2025 Feb 19;12(6):nwaf056. doi: 10.1093/nsr/nwaf056. eCollection 2025 Jun.
5
Eukaryotic Microproteins.真核生物微小蛋白
Annu Rev Biochem. 2025 Jun;94(1):1-28. doi: 10.1146/annurev-biochem-080124-012840. Epub 2025 Apr 17.
7
The hidden bacterial microproteome.隐藏的细菌微蛋白质组
Mol Cell. 2025 Mar 6;85(5):1024-1041.e6. doi: 10.1016/j.molcel.2025.01.025. Epub 2025 Feb 19.
9
Origins of Life: The Protein Folding Problem all over again?生命起源:蛋白质折叠问题再现?
Proc Natl Acad Sci U S A. 2024 Aug 20;121(34):e2315000121. doi: 10.1073/pnas.2315000121. Epub 2024 Aug 12.

本文引用的文献

3
CDD/SPARCLE: the conserved domain database in 2020.CDD/SPARCLE:2020 年的保守结构域数据库。
Nucleic Acids Res. 2020 Jan 8;48(D1):D265-D268. doi: 10.1093/nar/gkz991.
9
10
A primer on deep learning in genomics.深度学习在基因组学中的应用简介。
Nat Genet. 2019 Jan;51(1):12-18. doi: 10.1038/s41588-018-0295-5. Epub 2018 Nov 26.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验