Suppr超能文献

一种用于真核生物编码序列预测的自适应窗口长度策略。

An adaptive window length strategy for eukaryotic CDS prediction.

作者信息

Shakya Devendra Kumar, Saxena Rajiv, Sharma Sanjeev Narayan

机构信息

Samrat Ashok Technological Institute, Vidisha.

Jaypee University of Engineering and Technology, Raghogarh, Guna.

出版信息

IEEE/ACM Trans Comput Biol Bioinform. 2013 Sep-Oct;10(5):1241-52. doi: 10.1109/TCBB.2013.76.

Abstract

Signal processing-based algorithms for identification of coding sequences (CDS) in eukaryotes are non-data driven and exploit the presence of three-base periodicity in these regions for their detection. Three-base periodicity is commonly detected using short time Fourier transform (STFT) that uses a window function of fixed length. As the length of the protein coding and noncoding regions varies widely, the identification accuracy of STFT-based algorithms is poor. In this paper, a novel signal processing-based algorithm is developed by enabling the window length adaptation in STFT of DNA sequences for improving the identification of three-base periodicity. The length of the window function has been made adaptive in coding regions to maximize the magnitude of period-3 measure, whereas in the noncoding regions, the window length is tailored to minimize this measure. Simulation results on bench mark data sets demonstrate the advantage of this algorithm when compared with other non-data-driven methods for CDS prediction.

摘要

基于信号处理的真核生物编码序列(CDS)识别算法是非数据驱动的,并且利用这些区域中三联碱基周期性的存在来进行检测。三联碱基周期性通常使用短时傅里叶变换(STFT)来检测,该变换使用固定长度的窗函数。由于蛋白质编码区和非编码区的长度差异很大,基于STFT的算法的识别准确率较低。在本文中,通过在DNA序列的STFT中实现窗长自适应,开发了一种新颖的基于信号处理的算法,以提高三联碱基周期性的识别。窗函数的长度在编码区进行自适应调整,以最大化3周期度量的幅度,而在非编码区,窗长则进行调整以最小化该度量。在基准数据集上的模拟结果表明,与其他用于CDS预测的非数据驱动方法相比,该算法具有优势。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验