Yang Cheng-Hong, Chiang Yi-Cheng, Chuang Li-Yeh, Lin Yu-Da
1 Department of Electronic Engineering, National Kaohsiung University of Applied Sciences , Kaohsiung, Taiwan .
2 Graduate Institute of Clinical Medicine, Kaohsiung Medical University , Kaohsiung, Taiwan .
J Comput Biol. 2018 Feb;25(2):158-169. doi: 10.1089/cmb.2016.0178. Epub 2017 Oct 19.
Many CpG island detection methods have been proposed based on sliding window and clustering technology, but the accuracy of these methods is proportional to the time required. Therefore, an accurate and rapid method for identifying CpG islands remains an important challenge in the complete human genome. We propose a hybrid method CpGTLBO to detect the CpG islands in the human genome. The method uses the clustering approach and the teaching-learning-based optimization (TLBO) algorithm. The clustering approach is used to detect CpG island candidates, and it can effectively reduce the huge volume of unnecessary DNA fragments. TLBO was used to accurately predict CpG islands among promising CpG island candidates. A comparison based on six contig data sets and a whole human genome analysis showed that the identifying stability of CpGTLBO outperformed eight existing methods in terms of sensitivity (SN), specificity (SP), accuracy (ACC), performance coefficient (PC), and correlation coefficient (CC) and processing time. Results indicated that ClusterTLBO can effectively overcome the drawbacks and maintain the advantages in both the CpGcluster and TLBO.
基于滑动窗口和聚类技术,人们已经提出了许多检测CpG岛的方法,但这些方法的准确性与所需时间成正比。因此,在完整的人类基因组中,一种准确且快速的识别CpG岛的方法仍然是一项重大挑战。我们提出了一种混合方法CpGTLBO来检测人类基因组中的CpG岛。该方法采用聚类方法和基于教学学习的优化(TLBO)算法。聚类方法用于检测CpG岛候选区域,它可以有效减少大量不必要的DNA片段。TLBO用于在有希望的CpG岛候选区域中准确预测CpG岛。基于六个重叠群数据集的比较以及全人类基因组分析表明,CpGTLBO在灵敏度(SN)、特异性(SP)、准确性(ACC)、性能系数(PC)和相关系数(CC)以及处理时间方面的识别稳定性优于现有的八种方法。结果表明,ClusterTLBO可以有效克服缺点,并在CpGcluster和TLBO中保持优势。