• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

D-sORF:对实验检测到的与翻译机制相关的小开放阅读框(sORF)进行准确的从头分类。

D-sORF: Accurate Ab Initio Classification of Experimentally Detected Small Open Reading Frames (sORFs) Associated with Translational Machinery.

作者信息

Perdikopanis Nikos, Giannakakis Antonis, Kavakiotis Ioannis, Hatzigeorgiou Artemis G

机构信息

Department of Electrical and Computer Engineering, University of Thessaly, 38221 Volos, Greece.

Department of Informatics and Telecommunications, National and Kapodistrian University of Athens, 15784 Athens, Greece.

出版信息

Biology (Basel). 2024 Jul 26;13(8):563. doi: 10.3390/biology13080563.

DOI:10.3390/biology13080563
PMID:39194501
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11351124/
Abstract

Small open reading frames (sORFs; <300 nucleotides or <100 amino acids) are widespread across all genomes, and an increasing variety of them appear to be translating from non-genic regions. Over the past few decades, peptides produced from sORFs have been identified as functional in various organisms, from bacteria to humans. Despite recent advances in next-generation sequencing and proteomics, accurate annotation and classification of sORFs remain a rate-limiting step toward reliable and high-throughput detection of small proteins from non-genic regions. Additionally, the cost of computational methods utilizing machine learning is lower than that of biological experiments, and they can be employed to detect sORFs, laying the groundwork for biological experiments. We present D-sORF, a machine-learning framework that integrates the statistical nucleotide context and motif information around the start codon to predict coding sORFs. D-sORF scores directly for coding identity and requires only the underlying genomic sequence, without incorporating parameters such as the conservation, which, in the case of sORFs, may increase the dispersion of scores within the significantly less conserved non-genic regions. D-sORF achieves 94.74% precision and 92.37% accuracy for small ORFs (using the 99 nt medium length window). When D-sORF is applied to sORFs associated with ribosomes, the identification of transcripts producing peptides (annotated by the Ensembl IDs) is similar to or superior to experimental methodologies based on ribosome-sequencing (Ribo-Seq) profiling. In parallel, the recognition of putative negative data, such as the intron-containing transcripts that associate with ribosomes, remains remarkably low, indicating that D-sORF could be efficiently applied to filter out false-positive sORFs from Ribo-Seq data because of the non-productive ribosomal binding or noise inherent in these protocols.

摘要

小开放阅读框(sORFs;<300个核苷酸或<100个氨基酸)广泛存在于所有基因组中,并且越来越多的小开放阅读框似乎在非基因区域进行翻译。在过去几十年中,已鉴定出由sORFs产生的肽在从细菌到人类的各种生物体中具有功能。尽管在下一代测序和蛋白质组学方面取得了最新进展,但sORFs的准确注释和分类仍然是从非基因区域可靠且高通量检测小蛋白质的限速步骤。此外,利用机器学习的计算方法成本低于生物学实验,并且可以用于检测sORFs,为生物学实验奠定基础。我们提出了D-sORF,这是一个机器学习框架,它整合了起始密码子周围的统计核苷酸上下文和基序信息来预测编码sORFs。D-sORF直接对编码同一性进行评分,并且只需要基础基因组序列,而不纳入诸如保守性等参数,对于sORFs而言,保守性可能会增加在保守性明显较低的非基因区域内评分的离散度。对于小ORFs(使用99 nt中等长度窗口),D-sORF的精确率达到94.74%,准确率达到92.37%。当将D-sORF应用于与核糖体相关的sORFs时,产生肽的转录本(由Ensembl ID注释)的识别与基于核糖体测序(Ribo-Seq)分析的实验方法相似或更优。同时,对假定阴性数据(如与核糖体相关的含内含子转录本)的识别率仍然非常低,这表明由于这些实验方案中固有的非生产性核糖体结合或噪声,D-sORF可以有效地应用于从Ribo-Seq数据中滤除假阳性sORFs。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/52d2/11351124/7ee79b9a43e9/biology-13-00563-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/52d2/11351124/cba81df0ada5/biology-13-00563-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/52d2/11351124/c28ff3fd19c2/biology-13-00563-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/52d2/11351124/0a13f6b9b1e4/biology-13-00563-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/52d2/11351124/10910088f692/biology-13-00563-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/52d2/11351124/2c671a1c85e3/biology-13-00563-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/52d2/11351124/7ee79b9a43e9/biology-13-00563-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/52d2/11351124/cba81df0ada5/biology-13-00563-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/52d2/11351124/c28ff3fd19c2/biology-13-00563-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/52d2/11351124/0a13f6b9b1e4/biology-13-00563-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/52d2/11351124/10910088f692/biology-13-00563-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/52d2/11351124/2c671a1c85e3/biology-13-00563-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/52d2/11351124/7ee79b9a43e9/biology-13-00563-g006.jpg

相似文献

1
D-sORF: Accurate Ab Initio Classification of Experimentally Detected Small Open Reading Frames (sORFs) Associated with Translational Machinery.D-sORF:对实验检测到的与翻译机制相关的小开放阅读框(sORF)进行准确的从头分类。
Biology (Basel). 2024 Jul 26;13(8):563. doi: 10.3390/biology13080563.
2
Short open reading frames (sORFs) and microproteins: an update on their identification and validation measures.短开放阅读框 (sORFs) 和微蛋白:它们的鉴定和验证措施的最新进展。
J Biomed Sci. 2022 Mar 17;29(1):19. doi: 10.1186/s12929-022-00802-5.
3
Discovery and annotation of small proteins using genomics, proteomics, and computational approaches.利用基因组学、蛋白质组学和计算方法发现和注释小蛋白。
Genome Res. 2011 Apr;21(4):634-41. doi: 10.1101/gr.109280.110. Epub 2011 Mar 2.
4
An update on sORFs.org: a repository of small ORFs identified by ribosome profiling.sORFs.org 更新:核糖体图谱鉴定的小开放阅读框数据库。
Nucleic Acids Res. 2018 Jan 4;46(D1):D497-D502. doi: 10.1093/nar/gkx1130.
5
Discovery of Unannotated Small Open Reading Frames in Streptococcus pneumoniae D39 Involved in Quorum Sensing and Virulence Using Ribosome Profiling.利用核糖体图谱技术发现肺炎链球菌 D39 中参与群体感应和毒力的未注释的小开放阅读框。
mBio. 2022 Aug 30;13(4):e0124722. doi: 10.1128/mbio.01247-22. Epub 2022 Jul 19.
6
RiboReport - benchmarking tools for ribosome profiling-based identification of open reading frames in bacteria.RiboReport-基于核糖体图谱的细菌开放阅读框鉴定的基准工具。
Brief Bioinform. 2022 Mar 10;23(2). doi: 10.1093/bib/bbab549.
7
Combining in silico prediction and ribosome profiling in a genome-wide search for novel putatively coding sORFs.在全基因组范围内搜索新的推定编码 sORFs 时,结合计算机预测和核糖体图谱分析。
BMC Genomics. 2013 Sep 23;14:648. doi: 10.1186/1471-2164-14-648.
8
Small Open Reading Frames, How to Find Them and Determine Their Function.小开放阅读框:如何找到它们并确定其功能
Front Genet. 2022 Jan 28;12:796060. doi: 10.3389/fgene.2021.796060. eCollection 2021.
9
Three-nucleotide periodicity of nucleotide diversity in a population enables the identification of open reading frames.群体中核苷酸多样性的三核苷酸周期性可用于识别开放阅读框。
Brief Bioinform. 2022 Jul 18;23(4). doi: 10.1093/bib/bbac210.
10
Role of a short open reading frame in ribosome shunt on the cauliflower mosaic virus RNA leader.一个短开放阅读框在花椰菜花叶病毒RNA前导序列的核糖体跳跃中的作用。
J Biol Chem. 2000 Jun 9;275(23):17288-96. doi: 10.1074/jbc.M001143200.

本文引用的文献

1
Emerging roles and potential clinical applications of translatable circular RNAs in cancer and other human diseases.可转化环状RNA在癌症及其他人类疾病中的新作用及潜在临床应用
Genes Dis. 2022 Oct 29;10(5):1994-2012. doi: 10.1016/j.gendis.2022.10.015. eCollection 2023 Sep.
2
Translational regulation by uORFs and start codon selection stringency.翻译后文本:uORFs 和起始密码子选择严格性的翻译调控。
Genes Dev. 2023 Jun 1;37(11-12):474-489. doi: 10.1101/gad.350752.123. Epub 2023 Jul 11.
3
Standardized annotation of translated open reading frames.
翻译后的开放阅读框的标准化注释。
Nat Biotechnol. 2022 Jul;40(7):994-999. doi: 10.1038/s41587-022-01369-0.
4
Functional Peptides Encoded by Long Non-Coding RNAs in Gastrointestinal Cancer.长链非编码RNA编码的功能性肽在胃肠道癌中的作用
Front Oncol. 2021 Nov 23;11:777374. doi: 10.3389/fonc.2021.777374. eCollection 2021.
5
SmProt: A Reliable Repository with Comprehensive Annotation of Small Proteins Identified from Ribosome Profiling.SmProt:一个从核糖体图谱中鉴定的小蛋白进行全面注释的可靠数据库。
Genomics Proteomics Bioinformatics. 2021 Aug;19(4):602-610. doi: 10.1016/j.gpb.2021.09.002. Epub 2021 Sep 15.
6
OpenProt: a more comprehensive guide to explore eukaryotic coding potential and proteomes.OpenProt:探索真核生物编码潜能和蛋白质组的更全面指南。
Nucleic Acids Res. 2019 Jan 8;47(D1):D403-D410. doi: 10.1093/nar/gky936.
7
The origin and evolution of mycorrhizal symbioses: from palaeomycology to phylogenomics.菌根共生的起源和进化:从古菌学到系统发生基因组学。
New Phytol. 2018 Dec;220(4):1012-1030. doi: 10.1111/nph.15076. Epub 2018 Mar 24.
8
Mining for Small Translated ORFs.挖掘小翻译开放阅读框。
J Proteome Res. 2018 Jan 5;17(1):1-11. doi: 10.1021/acs.jproteome.7b00707. Epub 2017 Dec 11.
9
An update on sORFs.org: a repository of small ORFs identified by ribosome profiling.sORFs.org 更新:核糖体图谱鉴定的小开放阅读框数据库。
Nucleic Acids Res. 2018 Jan 4;46(D1):D497-D502. doi: 10.1093/nar/gkx1130.
10
GWIPS-viz: 2018 update.GWIPS-viz:2018 更新版。
Nucleic Acids Res. 2018 Jan 4;46(D1):D823-D830. doi: 10.1093/nar/gkx790.