基因组规模上的翻译起始位点预测：简约之美。

Translation initiation site prediction on a genomic scale: beauty in simplicity.

作者信息

Saeys Yvan, Abeel Thomas, Degroeve Sven, Van de Peer Yves

机构信息

Department of Plant Systems Biology, VIB, Technologiepark 927, B-9052 Ghent, Belgium.

出版信息

Bioinformatics. 2007 Jul 1;23(13):i418-23. doi: 10.1093/bioinformatics/btm177.

DOI:10.1093/bioinformatics/btm177

PMID:17646326

Abstract

MOTIVATION

The correct identification of translation initiation sites (TIS) remains a challenging problem for computational methods that automatically try to solve this problem. Furthermore, the lion's share of these computational techniques focuses on the identification of TIS in transcript data. However, in the gene prediction context the identification of TIS occurs on the genomic level, which makes things even harder because at the genome level many more pseudo-TIS occur, resulting in models that achieve a higher number of false positive predictions.

RESULTS

In this article, we evaluate the performance of several 'simple' TIS recognition methods at the genomic level, and compare them to state-of-the-art models for TIS prediction in transcript data. We conclude that the simple methods largely outperform the complex ones at the genomic scale, and we propose a new model for TIS recognition at the genome level that combines the strengths of these simple models. The new model obtains a false positive rate of 0.125 at a sensitivity of 0.80 on a well annotated human chromosome (chromosome 21). Detailed analyses show that the model is useful, both on its own and in a simple gene prediction setting.

AVAILABILITY

Datafiles and a web interface for the StartScan program are available at http://bioinformatics.psb.ugent.be/supplementary_data/.

摘要

动机

对于试图自动解决该问题的计算方法而言，正确识别翻译起始位点（TIS）仍然是一个具有挑战性的问题。此外，这些计算技术大多集中于在转录本数据中识别TIS。然而，在基因预测背景下，TIS的识别是在基因组水平上进行的，这使得情况变得更加困难，因为在基因组水平上会出现更多的假TIS，导致模型产生更高数量的假阳性预测。

结果

在本文中，我们在基因组水平上评估了几种“简单”的TIS识别方法的性能，并将它们与转录本数据中TIS预测的最先进模型进行比较。我们得出结论，在基因组规模上，简单方法在很大程度上优于复杂方法，并且我们提出了一种在基因组水平上识别TIS的新模型，该模型结合了这些简单模型的优势。在一条注释良好的人类染色体（21号染色体）上，新模型在灵敏度为0.80时的假阳性率为0.125。详细分析表明，该模型本身以及在简单的基因预测设置中都是有用的。

可用性

StartScan程序的数据文件和网络界面可在http://bioinformatics.psb.ugent.be/supplementary_data/获取。

相似文献

Translation initiation site prediction on a genomic scale: beauty in simplicity.

Bioinformatics. 2007 Jul 1;23(13):i418-23. doi: 10.1093/bioinformatics/btm177.

Improved prediction of bacterial transcription start sites.

Bioinformatics. 2006 Jan 15;22(2):142-8. doi: 10.1093/bioinformatics/bti771. Epub 2005 Nov 15.

FunSiP: a modular and extensible classifier for the prediction of functional sites in DNA.

Bioinformatics. 2008 Jul 1;24(13):1532-3. doi: 10.1093/bioinformatics/btn225. Epub 2008 May 12.

Accuracy improvement for identifying translation initiation sites in microbial genomes.

Bioinformatics. 2004 Dec 12;20(18):3308-17. doi: 10.1093/bioinformatics/bth390. Epub 2004 Jul 9.

Effect of the mutation rate and background size on the quality of pathogen identification.

Bioinformatics. 2007 Oct 15;23(20):2665-71. doi: 10.1093/bioinformatics/btm420. Epub 2007 Sep 19.

Predicting disulfide connectivity from protein sequence using multiple sequence feature vectors and secondary structure.

Bioinformatics. 2007 Dec 1;23(23):3147-54. doi: 10.1093/bioinformatics/btm505. Epub 2007 Oct 17.

A segmentation/clustering model for the analysis of array CGH data.

Biometrics. 2007 Sep;63(3):758-66. doi: 10.1111/j.1541-0420.2006.00729.x.

TICO: a tool for postprocessing the predictions of prokaryotic translation initiation sites.

Nucleic Acids Res. 2006 Jul 1;34(Web Server issue):W588-90. doi: 10.1093/nar/gkl313.

Prediction of translation initiation sites in human mRNA sequences with AUG start codon in weak Kozak context: A neural network approach.

Biochem Biophys Res Commun. 2008 May 16;369(4):1166-8. doi: 10.1016/j.bbrc.2008.03.008. Epub 2008 Mar 13.

Identifying translation initiation sites in prokaryotes using support vector machine.

J Theor Biol. 2010 Feb 21;262(4):644-9. doi: 10.1016/j.jtbi.2009.10.023. Epub 2009 Oct 17.

引用本文的文献

Assessing the reliability of point mutation as data augmentation for deep learning with genomic data.

BMC Bioinformatics. 2024 Apr 30;25(1):170. doi: 10.1186/s12859-024-05787-6.

TIS Transformer: remapping the human proteome using deep learning.

NAR Genom Bioinform. 2023 Mar 3;5(1):lqad021. doi: 10.1093/nargab/lqad021. eCollection 2023 Mar.

From shallow to deep: some lessons learned from application of machine learning for recognition of functional genomic elements in human genome.

Hum Genomics. 2022 Feb 18;16(1):7. doi: 10.1186/s40246-022-00376-1.

To New Beginnings: Riboproteogenomics Discovery of N-Terminal Proteoforms in .

Front Plant Sci. 2022 Jan 6;12:778804. doi: 10.3389/fpls.2021.778804. eCollection 2021.

Global sequence features based translation initiation site prediction in human genomic sequences.

Heliyon. 2020 Sep 14;6(9):e04825. doi: 10.1016/j.heliyon.2020.e04825. eCollection 2020 Sep.

Discovery of noncanonical translation initiation sites through mass spectrometric analysis of protein N termini.

Genome Res. 2018 Jan;28(1):25-36. doi: 10.1101/gr.226050.117. Epub 2017 Nov 21.

PreTIS: A Tool to Predict Non-canonical 5' UTR Translational Initiation Sites in Human and Mouse.

PLoS Comput Biol. 2016 Oct 21;12(10):e1005170. doi: 10.1371/journal.pcbi.1005170. eCollection 2016 Oct.

Stepwise approach for combining many sources of evidence for site-recognition in genomic sequences.

BMC Bioinformatics. 2016 Mar 5;17:117. doi: 10.1186/s12859-016-0968-y.

Modeling DNA affinity landscape through two-round support vector regression with weighted degree kernels.

BMC Syst Biol. 2014;8 Suppl 5(Suppl 5):S5. doi: 10.1186/1752-0509-8-S5-S5. Epub 2014 Dec 12.

TISdb: a database for alternative translation initiation in mammalian cells.

Nucleic Acids Res. 2014 Jan;42(Database issue):D845-50. doi: 10.1093/nar/gkt1085. Epub 2013 Nov 6.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

基因组规模上的翻译起始位点预测：简约之美。

Translation initiation site prediction on a genomic scale: beauty in simplicity.

作者信息

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY

动机

结果

可用性

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献