小开放阅读框：当前的预测技术和未来展望。

Small open reading frames: current prediction techniques and future prospect.

机构信息

Indiana University School of Informatics, Indiana University-Purdue University and Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, IN 46202, USA.

出版信息

Curr Protein Pept Sci. 2011 Sep;12(6):503-7. doi: 10.2174/138920311796957667.

DOI:10.2174/138920311796957667

PMID:21787300

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3203329/

Abstract

Evidence is accumulating that small open reading frames (sORF, <100 codons) play key roles in many important biological processes. Yet, they are generally ignored in gene annotation despite they are far more abundant than the genes with more than 100 codons. Here, we demonstrate that popular homolog search and codon-index techniques perform poorly for small genes relative to that for larger genes, while a method dedicated to sORF discovery has a similar level of accuracy as homology search. The result is largely due to the small dataset of experimentally verified sORF available for homology search and for training ab initio techniques. It highlights the urgent need for both experimental and computational studies in order to further advance the accuracy of sORF prediction.

摘要

越来越多的证据表明，小开放阅读框（sORF，<100 个密码子）在许多重要的生物学过程中发挥着关键作用。然而，尽管它们的数量远远超过 100 个密码子的基因，但在基因注释中通常被忽略。在这里，我们证明相对于较大的基因，流行的同源搜索和密码子索引技术在小基因方面的性能较差，而专门用于发现 sORF 的方法的准确性与同源搜索相当。这一结果主要是由于用于同源搜索和从头预测技术训练的实验验证的 sORF 的小数据集。这突出表明需要进行实验和计算研究，以进一步提高 sORF 预测的准确性。

相似文献

Small open reading frames: current prediction techniques and future prospect.小开放阅读框：当前的预测技术和未来展望。

Curr Protein Pept Sci. 2011 Sep;12(6):503-7. doi: 10.2174/138920311796957667.

Discovery and annotation of small proteins using genomics, proteomics, and computational approaches.利用基因组学、蛋白质组学和计算方法发现和注释小蛋白。

Genome Res. 2011 Apr;21(4):634-41. doi: 10.1101/gr.109280.110. Epub 2011 Mar 2.

Computational discovery and annotation of conserved small open reading frames in fungal genomes.计算发现和注释真菌基因组中的保守小开放阅读框。

BMC Bioinformatics. 2019 Feb 4;19(Suppl 13):551. doi: 10.1186/s12859-018-2550-2.

Re-annotation of protein-coding genes in the genome of saccharomyces cerevisiae based on support vector machines.基于支持向量机的酿酒酵母基因组中蛋白质编码基因的重新注释。

PLoS One. 2013 Jul 10;8(7):e64477. doi: 10.1371/journal.pone.0064477. Print 2013.

Strategies and Challenges in Identifying Function for Thousands of sORF-Encoded Peptides in Meiosis.在减数分裂中鉴定数千个 sORF 编码肽的功能的策略和挑战。

Proteomics. 2018 May;18(10):e1700274. doi: 10.1002/pmic.201700274. Epub 2017 Oct 26.

uORF-seqr: A Machine Learning-Based Approach to the Identification of Upstream Open Reading Frames in Yeast.uORF-seqr：一种基于机器学习的酵母上游开放阅读框识别方法。

Methods Mol Biol. 2021;2252:313-329. doi: 10.1007/978-1-0716-1150-0_15.

Non-AUG start codons: Expanding and regulating the small and alternative ORFeome.非 AUG 起始密码子：扩展和调控小 ORF 和替代 ORFeome。

Exp Cell Res. 2020 Jun 1;391(1):111973. doi: 10.1016/j.yexcr.2020.111973. Epub 2020 Mar 21.

Parallel identification of new genes in Saccharomyces cerevisiae.酿酒酵母中新基因的平行鉴定

Genome Res. 2002 Aug;12(8):1210-20. doi: 10.1101/gr.226802.

Reduce manual curation by combining gene predictions from multiple annotation engines, a case study of start codon prediction.通过结合来自多个注释引擎的基因预测来减少人工注释，以起始密码子预测为例。

PLoS One. 2013 May 10;8(5):e63523. doi: 10.1371/journal.pone.0063523. Print 2013.

Analysis of the Saccharomyces cerevisiae proteome with PeptideAtlas.利用肽图集对酿酒酵母蛋白质组进行分析。

Genome Biol. 2006;7(11):R106. doi: 10.1186/gb-2006-7-11-r106.

引用本文的文献

Mutational constraint analysis workflow for overlapping short open reading frames and genomic neighbors.重叠短开放阅读框和基因组邻域的突变约束分析工作流程。

BMC Genomics. 2025 Mar 14;26(1):254. doi: 10.1186/s12864-025-11444-w.

Discovering the hidden function in fungal genomes.发现真菌基因组中的隐藏功能。

Nat Commun. 2024 Sep 19;15(1):8219. doi: 10.1038/s41467-024-52568-z.

LncRNA-encoded peptides in cancer.lncRNA 编码肽在癌症中的作用。

J Hematol Oncol. 2024 Aug 12;17(1):66. doi: 10.1186/s13045-024-01591-0.

Integrated sequence and -omic features reveal novel small proteome of .整合序列和组学特征揭示了……的新型小蛋白质组。（原文中“of”后面缺少具体内容）

Front Microbiol. 2024 May 15;15:1335310. doi: 10.3389/fmicb.2024.1335310. eCollection 2024.

No country for old methods: New tools for studying microproteins.旧方法的时代不再：研究微蛋白的新工具

iScience. 2024 Jan 20;27(2):108972. doi: 10.1016/j.isci.2024.108972. eCollection 2024 Feb 16.

Re-evaluating the impact of alternative RNA splicing on proteomic diversity.重新评估可变RNA剪接对蛋白质组多样性的影响。

Front Genet. 2023 Feb 9;14:1089053. doi: 10.3389/fgene.2023.1089053. eCollection 2023.

Short open reading frames (sORFs) and microproteins: an update on their identification and validation measures.短开放阅读框 (sORFs) 和微蛋白：它们的鉴定和验证措施的最新进展。

J Biomed Sci. 2022 Mar 17;29(1):19. doi: 10.1186/s12929-022-00802-5.

Small open reading frames in plant research: from prediction to functional characterization.植物研究中的小开放阅读框：从预测到功能表征

3 Biotech. 2022 Mar;12(3):76. doi: 10.1007/s13205-022-03147-w. Epub 2022 Feb 24.

At elevated temperatures, heat shock protein genes show altered ratios of different RNAs and expression of new RNAs, including several novel HSPB1 mRNAs encoding HSP27 protein isoforms.在高温条件下，热休克蛋白基因呈现出不同RNA的比例变化以及新RNA的表达，其中包括几种编码HSP27蛋白异构体的新型HSPB1 mRNA。

Exp Ther Med. 2021 Aug;22(2):900. doi: 10.3892/etm.2021.10332. Epub 2021 Jun 24.

The cardiac translational landscape reveals that micropeptides are new players involved in cardiomyocyte hypertrophy.心脏转化领域表明，微肽是参与心肌细胞肥大的新角色。

Mol Ther. 2021 Jul 7;29(7):2253-2267. doi: 10.1016/j.ymthe.2021.03.004. Epub 2021 Mar 5.

本文引用的文献

Small peptides switch the transcriptional activity of Shavenbaby during Drosophila embryogenesis.小分子肽在果蝇胚胎发生过程中改变 Shavenbaby 的转录活性。

Science. 2010 Jul 16;329(5989):336-9. doi: 10.1126/science.1188158.

Most "dark matter" transcripts are associated with known genes.大多数“暗物质”转录本与已知基因相关。

PLoS Biol. 2010 May 18;8(5):e1000371. doi: 10.1371/journal.pbio.1000371.

An overview of the current status of eukaryote gene prediction strategies.真核生物基因预测策略的现状概述。

Gene. 2010 Aug 1;461(1-2):1-4. doi: 10.1016/j.gene.2010.04.008. Epub 2010 Apr 27.

sORF finder: a program package to identify small open reading frames with high coding potential.sORF finder：一个识别具有高编码潜力的小开放阅读框的程序包。

Bioinformatics. 2010 Feb 1;26(3):399-400. doi: 10.1093/bioinformatics/btp688. Epub 2009 Dec 14.

Saccharomyces Genome Database provides mutant phenotype data.酿酒酵母基因组数据库提供了突变表型数据。

Nucleic Acids Res. 2010 Jan;38(Database issue):D433-6. doi: 10.1093/nar/gkp917. Epub 2009 Nov 11.

The Universal Protein Resource (UniProt) in 2010.2010 年的通用蛋白质资源（UniProt）。

Nucleic Acids Res. 2010 Jan;38(Database issue):D142-8. doi: 10.1093/nar/gkp846. Epub 2009 Oct 20.

Small membrane proteins found by comparative genomics and ribosome binding site models.通过比较基因组学和核糖体结合位点模型发现的小膜蛋白。

Mol Microbiol. 2008 Dec;70(6):1487-501. doi: 10.1111/j.1365-2958.2008.06495.x.

RNA-Seq: a revolutionary tool for transcriptomics.RNA测序：转录组学的革命性工具。

Nat Rev Genet. 2009 Jan;10(1):57-63. doi: 10.1038/nrg2484.

CONTRAST: a discriminative, phylogeny-free approach to multiple informant de novo gene prediction.对比法：一种用于多信息源从头基因预测的无系统发育的判别方法。

Genome Biol. 2007;8(12):R269. doi: 10.1186/gb-2007-8-12-r269.

Small peptide regulators of actin-based cell morphogenesis encoded by a polycistronic mRNA.由多顺反子mRNA编码的基于肌动蛋白的细胞形态发生的小肽调节剂。

Nat Cell Biol. 2007 Jun;9(6):660-5. doi: 10.1038/ncb1595. Epub 2007 May 7.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验