• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

利用比较基因组学方法鉴定原核生物小蛋白。

Identification of prokaryotic small proteins using a comparative genomic approach.

机构信息

Department of Biomedical Engineering, Institute for Computational Medicine, Johns Hopkins University, Baltimore, MD 21218, USA.

出版信息

Bioinformatics. 2011 Jul 1;27(13):1765-71. doi: 10.1093/bioinformatics/btr275. Epub 2011 May 5.

DOI:10.1093/bioinformatics/btr275
PMID:21551138
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3117347/
Abstract

MOTIVATION

Accurate prediction of genes encoding small proteins (on the order of 50 amino acids or less) remains an elusive open problem in bioinformatics. Some of the best methods for gene prediction use either sequence composition analysis or sequence similarity to a known protein coding sequence. These methods often fail for small proteins, however, either due to a lack of experimentally verified small protein coding genes or due to the limited statistical significance of statistics on small sequences. Our approach is based upon the hypothesis that true small proteins will be under selective pressure for encoding the particular amino acid sequence, for ease of translation by the ribosome and for structural stability. This stability can be achieved either independently or as part of a larger protein complex. Given this assumption, it follows that small proteins should display conserved local protein structure properties much like larger proteins. Our method incorporates neural-net predictions for three local structure alphabets within a comparative genomic approach using a genomic alignment of 22 closely related bacteria genomes to generate predictions for whether or not a given open reading frame (ORF) encodes for a small protein.

RESULTS

We have applied this method to the complete genome for Escherichia coli strain K12 and looked at how well our method performed on a set of 60 experimentally verified small proteins from this organism. Out of a total of 11 407 possible ORFs, we found that 6 of the top 10 and 27 of the top 100 predictions belonged to the set of 60 experimentally verified small proteins. We found 35 of all the true small proteins within the top 200 predictions. We compared our method to Glimmer, using a default Glimmer protocol and a modified small ORF Glimmer protocol with a lower minimum size cutoff. The default Glimmer protocol identified 16 of the true small proteins (all in the top 200 predictions), but failed to predict on 34 due to size cutoffs. The small ORF Glimmer protocol made predictions for all the experimentally verified small proteins but only contained 9 of the 60 true small proteins within the top 200 predictions.

CONTACT

jsamayoa@jhu.edu

摘要

动机

准确预测编码小蛋白(约 50 个氨基酸或更少)的基因仍然是生物信息学中一个难以捉摸的开放性问题。一些最好的基因预测方法要么使用序列组成分析,要么使用与已知蛋白质编码序列的序列相似性。然而,这些方法对于小蛋白往往不适用,要么是因为缺乏经过实验验证的小蛋白编码基因,要么是因为小序列的统计意义有限。我们的方法基于这样的假设,即真正的小蛋白将受到编码特定氨基酸序列的选择压力,以便核糖体易于翻译和结构稳定。这种稳定性可以独立实现,也可以作为更大蛋白质复合物的一部分。根据这一假设,可以得出结论,小蛋白应该显示出与较大蛋白相似的保守局部蛋白质结构特性。我们的方法结合了神经网对三个局部结构字母表的预测,使用 22 个密切相关细菌基因组的基因组比对进行比较基因组分析,以生成给定开放阅读框(ORF)是否编码小蛋白的预测。

结果

我们将这种方法应用于大肠杆菌 K12 菌株的完整基因组,并研究了我们的方法在该生物的 60 个经过实验验证的小蛋白集合上的表现如何。在总共 11407 个可能的 ORF 中,我们发现排名前 10 的 ORF 中有 6 个和排名前 100 的 ORF 中有 27 个属于这 60 个经过实验验证的小蛋白集合。我们在排名前 200 的预测中找到了所有真正的小蛋白中的 35 个。我们将我们的方法与 Glimmer 进行了比较,使用默认的 Glimmer 协议和一个修改后的小 ORF Glimmer 协议,该协议的最小尺寸截止值较低。默认的 Glimmer 协议识别了 16 个真正的小蛋白(全部在排名前 200 的预测中),但由于尺寸截止值,未能预测到 34 个。小 ORF Glimmer 协议对所有经过实验验证的小蛋白都进行了预测,但在排名前 200 的预测中仅包含 60 个真正小蛋白中的 9 个。

联系方式

jsamayoa@jhu.edu

相似文献

1
Identification of prokaryotic small proteins using a comparative genomic approach.利用比较基因组学方法鉴定原核生物小蛋白。
Bioinformatics. 2011 Jul 1;27(13):1765-71. doi: 10.1093/bioinformatics/btr275. Epub 2011 May 5.
2
[Analysis, identification and correction of some errors of model refseqs appeared in NCBI Human Gene Database by in silico cloning and experimental verification of novel human genes].[通过新型人类基因的电子克隆和实验验证对NCBI人类基因数据库中出现的模型参考序列的一些错误进行分析、鉴定和校正]
Yi Chuan Xue Bao. 2004 May;31(5):431-43.
3
4
Molecular cloning and characterization of Escherichia coli K12 ygjG gene.大肠杆菌K12 ygjG基因的分子克隆与特性分析
BMC Microbiol. 2003 Jan 31;3(1):2. doi: 10.1186/1471-2180-3-2.
5
Missing genes in the annotation of prokaryotic genomes.原核生物基因组注释中缺失的基因。
BMC Bioinformatics. 2010 Mar 15;11:131. doi: 10.1186/1471-2105-11-131.
6
Complete set of ORF clones of Escherichia coli ASKA library (a complete set of E. coli K-12 ORF archive): unique resources for biological research.大肠杆菌ASKA文库的全套开放阅读框克隆(大肠杆菌K-12开放阅读框文库全集):生物学研究的独特资源。
DNA Res. 2005;12(5):291-9. doi: 10.1093/dnares/dsi012. Epub 2006 Jan 9.
7
Accuracy improvement for identifying translation initiation sites in microbial genomes.提高微生物基因组中翻译起始位点识别的准确性。
Bioinformatics. 2004 Dec 12;20(18):3308-17. doi: 10.1093/bioinformatics/bth390. Epub 2004 Jul 9.
8
Finding prokaryotic genes by the 'frame-by-frame' algorithm: targeting gene starts and overlapping genes.通过“逐帧”算法寻找原核生物基因:靶向基因起始位点和重叠基因。
Bioinformatics. 1999 Nov;15(11):874-86. doi: 10.1093/bioinformatics/15.11.874.
9
REPARATION: ribosome profiling assisted (re-)annotation of bacterial genomes.REPARATION:核糖体谱分析辅助的细菌基因组(重新)注释
Nucleic Acids Res. 2017 Nov 16;45(20):e168. doi: 10.1093/nar/gkx758.
10
Analysis of five presumptive protein-coding sequences clustered between the primosome genes, 41 and 61, of bacteriophages T4, T2, and T6.对噬菌体T4、T2和T6的引发体基因41和61之间成簇的五个推定蛋白质编码序列的分析。
J Virol. 1993 Apr;67(4):2305-16. doi: 10.1128/JVI.67.4.2305-2316.1993.

引用本文的文献

1
TSS-Captur: a user-friendly pipeline for characterizing unclassified RNA transcripts.TSS-Captur:一个用于表征未分类RNA转录本的用户友好型流程。
NAR Genom Bioinform. 2024 Dec 18;6(4):lqae168. doi: 10.1093/nargab/lqae168. eCollection 2024 Dec.
2
Integrated sequence and -omic features reveal novel small proteome of .整合序列和组学特征揭示了……的新型小蛋白质组。 (原文中“of”后面缺少具体内容)
Front Microbiol. 2024 May 15;15:1335310. doi: 10.3389/fmicb.2024.1335310. eCollection 2024.
3
Hidden in plain sight: challenges in proteomics detection of small ORF-encoded polypeptides.隐匿于众目睽睽之下:小开放阅读框编码多肽的蛋白质组学检测挑战
Microlife. 2022 May 14;3:uqac005. doi: 10.1093/femsml/uqac005. eCollection 2022.
4
Rescue of auxotrophy by de novo small proteins.从头小型蛋白质拯救营养缺陷型。
Elife. 2023 Mar 15;12:e78299. doi: 10.7554/eLife.78299.
5
Exploring the Peptide Potential of Genomes.探索基因组中的肽潜力。
Methods Mol Biol. 2022;2405:63-82. doi: 10.1007/978-1-0716-1855-4_3.
6
Small Protein Enrichment Improves Proteomics Detection of sORF Encoded Polypeptides.小蛋白富集改善了对小开放阅读框编码多肽的蛋白质组学检测。
Front Genet. 2021 Oct 15;12:713400. doi: 10.3389/fgene.2021.713400. eCollection 2021.
7
Small Proteins in Archaea, a Mainly Unexplored World.古菌中的小蛋白:一个主要待探索的世界
J Bacteriol. 2022 Jan 18;204(1):e0031321. doi: 10.1128/JB.00313-21. Epub 2021 Sep 20.
8
Elucidating the Regulatory Elements for Transcription Termination and Posttranscriptional Processing in the Streptomyces clavuligerus Genome.阐明棒状链霉菌基因组中转录终止和转录后加工的调控元件。
mSystems. 2021 May 4;6(3):e01013-20. doi: 10.1128/mSystems.01013-20.
9
Using AnABlast for intergenic sORF prediction in the Caenorhabditis elegans genome.使用 AnABlast 预测秀丽隐杆线虫基因组中的基因间 sORF。
Bioinformatics. 2020 Dec 8;36(19):4827-4832. doi: 10.1093/bioinformatics/btaa608.
10
Small Proteome.小蛋白质组
EcoSal Plus. 2020 May;9(1). doi: 10.1128/ecosalplus.ESP-0031-2019.

本文引用的文献

1
Small membrane proteins found by comparative genomics and ribosome binding site models.通过比较基因组学和核糖体结合位点模型发现的小膜蛋白。
Mol Microbiol. 2008 Dec;70(6):1487-501. doi: 10.1111/j.1365-2958.2008.06495.x.
2
Transcriptome content and dynamics at single-nucleotide resolution.单核苷酸分辨率下的转录组内容与动态变化
Genome Biol. 2008;9(9):234. doi: 10.1186/gb-2008-9-9-234. Epub 2008 Sep 18.
3
PREDICT-2ND: a tool for generalized protein local structure prediction.PREDICT - 2ND:一种用于广义蛋白质局部结构预测的工具。
Bioinformatics. 2008 Nov 1;24(21):2453-9. doi: 10.1093/bioinformatics/btn438. Epub 2008 Aug 30.
4
Ribosomes bind leaderless mRNA in Escherichia coli through recognition of their 5'-terminal AUG.在大肠杆菌中,核糖体通过识别无帽mRNA的5'-末端AUG来结合它。
RNA. 2008 Oct;14(10):2159-69. doi: 10.1261/rna.1089208. Epub 2008 Aug 28.
5
[Hepcidin--the discovery of a small protein with a pivotal role in iron homeostasis].[铁调素——一种在铁稳态中起关键作用的小蛋白的发现]
Harefuah. 2008 Mar;147(3):261-6, 276.
6
The transcriptional landscape of the yeast genome defined by RNA sequencing.通过RNA测序定义的酵母基因组转录图谱。
Science. 2008 Jun 6;320(5881):1344-9. doi: 10.1126/science.1158441. Epub 2008 May 1.
7
MED: a new non-supervised gene prediction algorithm for bacterial and archaeal genomes.MED:一种用于细菌和古细菌基因组的新型无监督基因预测算法。
BMC Bioinformatics. 2007 Mar 16;8:97. doi: 10.1186/1471-2105-8-97.
8
Proteolysis of the replication checkpoint protein Sda is necessary for the efficient initiation of sporulation after transient replication stress in Bacillus subtilis.在枯草芽孢杆菌中,短暂复制应激后,复制检查点蛋白Sda的蛋白水解对于高效启动芽孢形成是必要的。
Mol Microbiol. 2006 Jun;60(6):1490-508. doi: 10.1111/j.1365-2958.2006.05167.x.
9
Analysis of SD sequences in completed microbial genomes: non-SD-led genes are as common as SD-led genes.已完成测序的微生物基因组中SD序列分析:非SD起始基因与SD起始基因一样常见。
Gene. 2006 May 24;373:90-9. doi: 10.1016/j.gene.2006.01.033. Epub 2006 Mar 30.
10
The UCSC Archaeal Genome Browser.加州大学圣克鲁兹分校古菌基因组浏览器。
Nucleic Acids Res. 2006 Jan 1;34(Database issue):D407-10. doi: 10.1093/nar/gkj134.