Hassani Saadi Hamed, Sameni Reza, Zollanvari Amin
School of Electrical and Computer Engineering, Shiraz University, Shiraz, Iran.
Department of Electrical and Electronic Engineering, Nazarbayev University, Astana, Kazakhstan.
BMC Bioinformatics. 2017 Mar 22;18(Suppl 4):154. doi: 10.1186/s12859-017-1524-0.
Time-Frequency (TF) analysis has been extensively used for the analysis of non-stationary numeric signals in the past decade. At the same time, recent studies have statistically confirmed the non-stationarity of genomic non-numeric sequences and suggested the use of non-stationary analysis for these sequences. The conventional approach to analyze non-numeric genomic sequences using techniques specific to numerical data is to convert non-numerical data into numerical values in some way and then apply time or transform domain signal processing algorithms. Nevertheless, this approach raises questions regarding the relative magnitudes under numeric transforms, which can potentially lead to spurious patterns or misinterpretation of results.
In this paper, using the notion of interpretive signal processing (ISP) and by redefining correlation functions for non-numeric sequences, a general class of TF transforms are extended and applied to non-numerical genomic sequences. The technique has been successfully evaluated on synthetic and real DNA sequences.
The proposed framework is fairly generic and is believed to be useful for extracting quantitative and visual information regarding local and global periodicity, symmetry, (non-) stationarity and spectral color of genomic sequences. The notion of interpretive time-frequency analysis introduced in this work can be considered as the first step towards the development of a rigorous mathematical construct for genomic signal processing.
在过去十年中,时频(TF)分析已被广泛用于分析非平稳数值信号。与此同时,最近的研究已从统计学上证实了基因组非数值序列的非平稳性,并建议对这些序列采用非平稳分析方法。使用数值数据特有的技术来分析非数值基因组序列的传统方法是,以某种方式将非数值数据转换为数值,然后应用时域或变换域信号处理算法。然而,这种方法引发了关于数值变换下相对大小的问题,这可能会导致虚假模式或结果的错误解读。
在本文中,利用解释性信号处理(ISP)的概念并重新定义非数值序列的相关函数,一类通用的TF变换得到扩展并应用于非数值基因组序列。该技术已在合成DNA序列和真实DNA序列上成功进行了评估。
所提出的框架相当通用,并且据信对于提取有关基因组序列的局部和全局周期性、对称性、(非)平稳性和频谱特征的定量和可视化信息很有用。本文中引入的解释性时频分析概念可被视为朝着构建用于基因组信号处理的严格数学结构迈出的第一步。