Department of Mathematics, Statistics, and Computer Science, University of Illinois at Chicago, Chicago, IL 60607, USA.
Department of Mathematical Sciences, Tsinghua University, Beijing 100084, P.R. China.
Genomics. 2020 Mar;112(2):1847-1852. doi: 10.1016/j.ygeno.2019.10.018. Epub 2019 Nov 5.
A novel method is proposed to detect the acceptor and donor splice sites using chaos game representation and artificial neural network. In order to achieve high accuracy, inputs to the neural network, or feature vector, shall reflect the true nature of the DNA segments. Therefore it is important to have one-to-one numerical representation, i.e. a feature vector should be able to represent the original data. Chaos game representation (CGR) is an iterative mapping technique that assigns each nucleotide in a DNA sequence to a respective position on the plane in a one-to-one manner. Using CGR, a DNA sequence can be mapped to a numerical sequence that reflects the true nature of the original sequence. In this research, we propose to use CGR as feature input to a neural network to detect splice sites on the NN269 dataset. Computational experiments indicate that this approach gives good accuracy while being simpler than other methods in the literature, with only one neural network component. The code and data for our method can be accessed from this link: https://github.com/thoang3/portfolio/tree/SpliceSites_ANN_CGR.
提出了一种使用混沌游戏表示和人工神经网络检测受体和供体位点的新方法。为了达到高精度,神经网络的输入,即特征向量,应反映 DNA 片段的真实性质。因此,重要的是要有一一对应的数值表示,即特征向量应该能够代表原始数据。混沌游戏表示(CGR)是一种迭代映射技术,它将 DNA 序列中的每个核苷酸一一对应地分配到平面上的相应位置。使用 CGR,DNA 序列可以映射到一个数值序列,该数值序列反映原始序列的真实性质。在这项研究中,我们建议使用 CGR 作为神经网络的特征输入,以在 NN269 数据集上检测剪接位点。计算实验表明,与文献中的其他方法相比,这种方法更简单,只有一个神经网络组件,具有很好的准确性。我们方法的代码和数据可以从以下链接访问:https://github.com/thoang3/portfolio/tree/SpliceSites_ANN_CGR。