Chowdhury Biswanath, Garai Arnav, Garai Gautam
Department of Biophysics, Molecular Biology and Bioinformatics, University of Calcutta, Kolkata, 700009, WB, India.
Unit of Energy, Utilities, Communications and Services, Infosys Technologies Ltd., Bhubaneswar, 751024, Odisha, India.
BMC Bioinformatics. 2017 Oct 24;18(1):460. doi: 10.1186/s12859-017-1874-7.
Detection of important functional and/or structural elements and identification of their positions in a large eukaryotic genomic sequence are an active research area. Gene is an important functional and structural unit of DNA. The computation of gene prediction is, therefore, very essential for detailed genome annotation.
In this paper, we propose a new gene prediction technique based on Genetic Algorithm (GA) to determine the optimal positions of exons of a gene in a chromosome or genome. The correct identification of the coding and non-coding regions is difficult and computationally demanding. The proposed genetic-based method, named Gene Prediction with Genetic Algorithm (GPGA), reduces this problem by searching only one exon at a time instead of all exons along with its introns. This representation carries a significant advantage in that it breaks the entire gene-finding problem into a number of smaller sub-problems, thereby reducing the computational complexity. We tested the performance of the GPGA with existing benchmark datasets and compared the results with well-known and relevant techniques. The comparison shows the better or comparable performance of the proposed method. We also used GPGA for annotating the human chromosome 21 (HS21) using cross-species comparisons with the mouse orthologs.
It was noted that the GPGA predicted true genes with better accuracy than other well-known approaches.
在大型真核生物基因组序列中检测重要的功能和/或结构元件并确定它们的位置是一个活跃的研究领域。基因是DNA重要的功能和结构单元。因此,基因预测的计算对于详细的基因组注释非常重要。
在本文中,我们提出了一种基于遗传算法(GA)的新基因预测技术,以确定基因外显子在染色体或基因组中的最佳位置。正确识别编码区和非编码区既困难又对计算要求很高。所提出的基于遗传的方法,称为遗传算法基因预测(GPGA),通过一次仅搜索一个外显子而不是连同其内含子一起搜索所有外显子来减少这个问题。这种表示具有显著优势,因为它将整个基因发现问题分解为许多较小的子问题,从而降低了计算复杂度。我们使用现有的基准数据集测试了GPGA的性能,并将结果与知名且相关的技术进行了比较。比较结果表明所提出方法具有更好或相当的性能。我们还使用GPGA通过与小鼠直系同源基因的跨物种比较来注释人类21号染色体(HS21)。
值得注意的是,GPGA预测真实基因的准确性优于其他知名方法。