Goel Neelam, Singh Shailendra, Aseri Trilok Chand
Department of Information Technology, University Institute of Engineering and Technology, Sector-25, Panjab University, Chandigarh 160014, India.
Department of Computer Science and Engineering, Punjab Engineering College (Deemed to be University), Sector-12, Chandigarh 160012, India.
Heliyon. 2020 Sep 14;6(9):e04825. doi: 10.1016/j.heliyon.2020.e04825. eCollection 2020 Sep.
Gene prediction has been increasingly important in genome annotation due to advancements in sequencing technology. Genome annotation further helps in determining the structure and function of these genes. Translation initiation site prediction (TIS) in human genomic sequences is one of the fundamental and essential steps in gene prediction. Thus, accurate prediction of TIS in these sequences is highly desirable. Although many computational methods were developed for this problem, none of them focused on finding these sites in human genomic sequences. In this paper, a new TIS prediction method is proposed by incorporating global sequence based features. Support vector machine is used to assess the prediction power of these features. The proposed method achieved accuracy of above 90% when tested for genomic as well as cDNA sequences. The experimental results indicate that the method works well for both genomic and cDNA sequences. The method can be integrated into gene prediction system in future.
由于测序技术的进步,基因预测在基因组注释中变得越来越重要。基因组注释进一步有助于确定这些基因的结构和功能。人类基因组序列中的翻译起始位点预测(TIS)是基因预测的基本和关键步骤之一。因此,非常需要准确预测这些序列中的TIS。尽管针对这个问题开发了许多计算方法,但没有一种方法专注于在人类基因组序列中找到这些位点。本文通过整合基于全局序列的特征,提出了一种新的TIS预测方法。支持向量机用于评估这些特征的预测能力。当对基因组序列和cDNA序列进行测试时,所提出的方法实现了90%以上的准确率。实验结果表明,该方法对基因组序列和cDNA序列都有效。该方法未来可集成到基因预测系统中。