Suppr超能文献

一类用于支持向量机预测真核生物mRNA中翻译起始位点的编辑内核。

A class of edit kernels for SVMs to predict translation initiation sites in eukaryotic mRNAs.

作者信息

Li Haifeng, Jiang Tao

机构信息

Department of Computer Science and Engineering, University of California, Riverside, CA 92521, USA.

出版信息

J Comput Biol. 2005 Jul-Aug;12(6):702-18. doi: 10.1089/cmb.2005.12.702.

Abstract

The prediction of translation initiation sites (TISs) in eukaryotic mRNAs has been a challenging problem in computational molecular biology. In this paper, we present a new algorithm to recognize TISs with a very high accuracy. Our algorithm includes two novel ideas. First, we introduce a class of new sequence-similarity kernels based on string editing, called edit kernels, for use with support vector machines (SVMs) in a discriminative approach to predict TISs. The edit kernels are simple and have significant biological and probabilistic interpretations. Although the edit kernels are not positive definite, it is easy to make the kernel matrix positive definite by adjusting the parameters. Second, we convert the region of an input mRNA sequence downstream to a putative TIS into an amino acid sequence before applying SVMs to avoid the high redundancy in the genetic code. The algorithm has been implemented and tested on previously published data. Our experimental results on real mRNA data show that both ideas improve the prediction accuracy greatly and that our method performs significantly better than those based on neural networks and SVMs with polynomial kernels or Salzberg kernels.

摘要

真核生物mRNA中翻译起始位点(TIS)的预测一直是计算分子生物学中的一个具有挑战性的问题。在本文中,我们提出了一种新算法,能够以非常高的准确率识别TIS。我们的算法包含两个新颖的想法。首先,我们引入了一类基于字符串编辑的新的序列相似性核,称为编辑核,用于支持向量机(SVM),以判别方法预测TIS。编辑核简单且具有重要的生物学和概率解释。虽然编辑核不是正定的,但通过调整参数很容易使核矩阵正定。其次,在应用SVM之前,我们将输入mRNA序列中假定TIS下游的区域转换为氨基酸序列,以避免遗传密码中的高冗余性。该算法已在先前发表的数据上实现并进行了测试。我们在真实mRNA数据上的实验结果表明,这两个想法都极大地提高了预测准确率,并且我们的方法比基于神经网络以及具有多项式核或萨尔茨伯格核的SVM的方法表现得明显更好。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验