Department of Computer Science and Engineering, United International University, Madani Aveneue, Satarkul, Badda, Dhaka 1212, Bangladesh.
Department of Computer Science and Engineering, United International University, Madani Aveneue, Satarkul, Badda, Dhaka 1212, Bangladesh.
Genomics. 2019 Jul;111(4):966-972. doi: 10.1016/j.ygeno.2018.06.003. Epub 2018 Jun 20.
Recombination hotspots in a genome are unevenly distributed. Hotspots are regions in a genome that show higher rates of meiotic recombinations. Computational methods for recombination hotspot prediction often use sophisticated features that are derived from physico-chemical or structure based properties of nucleotides. In this paper, we propose iRSpot-SF that uses sequence based features which are computationally cheap to generate. Four feature groups are used in our method: k-mer composition, gapped k-mer composition, TF-IDF of k-mers and reverse complement k-mer composition. We have used recursive feature elimination to select 17 top features for hotspot prediction. Our analysis shows the superiority of gapped k-mer composition and reverse complement k-mer composition features over others. We have used SVM with RBF kernel as a classification algorithm. We have tested our algorithm on standard benchmark datasets. Compared to other methods iRSpot-SF is able to produce significantly better results in terms of accuracy, Mathew's Correlation Coefficient and sensitivity which are 84.58%, 0.6941 and 84.57%. We have made our method readily available to use as a python based tool and made the datasets and source codes available at: https://github.com/abdlmaruf/iRSpot-SF. An web application is developed based on iRSpot-SF and freely available to use at: http://irspot.pythonanywhere.com/server.html.
基因组中的重组热点分布不均匀。热点是基因组中发生减数分裂重组率较高的区域。用于预测重组热点的计算方法通常使用源自核苷酸理化或结构特性的复杂特征。在本文中,我们提出了 iRSpot-SF,它使用基于序列的特征,这些特征的生成计算成本低廉。我们的方法使用了四个特征组:k-mer 组成、带隙 k-mer 组成、k-mer 的 TF-IDF 和反向互补 k-mer 组成。我们使用递归特征消除选择了用于热点预测的 17 个最佳特征。我们的分析表明,带隙 k-mer 组成和反向互补 k-mer 组成特征比其他特征具有优越性。我们使用具有 RBF 核的 SVM 作为分类算法。我们在标准基准数据集上测试了我们的算法。与其他方法相比,iRSpot-SF 在准确性、马修相关系数和灵敏度方面能够产生显著更好的结果,分别为 84.58%、0.6941 和 84.57%。我们已经将我们的方法作为一个基于 Python 的工具,使其易于使用,并在 https://github.com/abdlmaruf/iRSpot-SF 上提供数据集和源代码。还基于 iRSpot-SF 开发了一个 Web 应用程序,并免费供使用:http://irspot.pythonanywhere.com/server.html。