Suppr超能文献

sOCP:一种基于 TIS 和框内特征预测 smORF 编码潜能的框架,并有效地应用于人类基因组。

sOCP: a framework predicting smORF coding potential based on TIS and in-frame features and effectively applied in the human genome.

机构信息

School of Life Sciences, and Hubei Key Laboratory of Genetic Regulation and Integrative Biology, Central China Normal University, Wuhan 430079, Hubei, People's Republic of China.

School of Computer Science, and Hubei Provincial Key Laboratory of Artificial Intelligence and Smart Learning, Central China Normal University, Wuhan 430079, Hubei, People's Republic of China.

出版信息

Brief Bioinform. 2024 Mar 27;25(3). doi: 10.1093/bib/bbae147.

Abstract

Small open reading frames (smORFs) have been acknowledged to play various roles on essential biological pathways and affect human beings from diabetes to tumorigenesis. Predicting smORFs in silico is quite a prerequisite for processing the omics data. Here, we proposed the smORF-coding-potential-predicting framework, sOCP, which provides functions to construct a model for predicting novel smORFs in some species. The sOCP model constructed in human was based on in-frame features and the nucleotide bias around the start codon, and the small feature subset was proved to be competent enough and avoid overfitting problems for complicated models. It showed more advanced prediction metrics than previous methods and could correlate closely with experimental evidence in a heterogeneous dataset. The model was applied to Rattus norvegicus and exhibited satisfactory performance. We then scanned smORFs with ATG and non-ATG start codons from the human genome and generated a database containing about a million novel smORFs with coding potential. Around 72 000 smORFs are located on the lncRNA regions of the genome. The smORF-encoded peptides may be involved in biological pathways rare for canonical proteins, including glucocorticoid catabolic process and the prokaryotic defense system. Our work provides a model and database for human smORF investigation and a convenient tool for further smORF prediction in other species.

摘要

小开放阅读框(smORFs)已被确认在重要的生物途径中发挥各种作用,并影响从糖尿病到肿瘤发生的人类。在计算上预测 smORFs 是处理组学数据的一个非常必要的前提。在这里,我们提出了 smORF 编码潜力预测框架 sOCP,它提供了在某些物种中构建预测新 smORFs 的模型的功能。在人类中构建的 sOCP 模型基于框架内特征和起始密码子周围的核苷酸偏倚,并且已经证明小特征子集足以胜任并且避免了复杂模型的过拟合问题。它显示出比以前的方法更先进的预测指标,并且可以在异质数据集与实验证据密切相关。该模型应用于大鼠并表现出令人满意的性能。然后,我们从人类基因组中扫描具有 ATG 和非 ATG 起始密码子的 smORFs,并生成了一个包含约 100 万个具有编码潜力的新型 smORFs 的数据库。大约 72000 个 smORFs 位于基因组的 lncRNA 区域。smORF 编码的肽可能参与了很少涉及经典蛋白的生物途径,包括糖皮质激素代谢过程和原核防御系统。我们的工作为人类 smORF 研究提供了一个模型和数据库,并为其他物种的进一步 smORF 预测提供了一个方便的工具。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/06fc/11006793/31687637ff7d/bbae147f1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验