Fang Fang, Fan Shicai, Zhang Xuegong, Zhang Michael Q
Bioinformatics Division, TNLIST, Department of Automation, Tsinghua University 100084 China.
Bioinformatics. 2006 Sep 15;22(18):2204-9. doi: 10.1093/bioinformatics/btl377. Epub 2006 Jul 12.
Over 50% of human genes contain CpG islands in their 5'-regions. Methylation patterns of CpG islands are involved in tissue-specific gene expression and regulation. Mis-epigenetic silencing associated with aberrant CpG island methylation is one mechanism leading to the loss of tumor suppressor functions in cancer cells. Large-scale experimental detection of DNA methylation is still both labor-intensive and time-consuming. Therefore, it is necessary to develop in silico approaches for predicting methylation status of CpG islands.
Based on a recent genome-scale dataset of DNA methylation in human brain tissues, we developed a classifier called MethCGI for predicting methylation status of CpG islands using a support vector machine (SVM). Nucleotide sequence contents as well as transcription factor binding sites (TFBSs) are used as features for the classification. The method achieves specificity of 84.65% and sensitivity of 84.32% on the brain data, and can also correctly predict about two-third of the data from other tissues reported in the MethDB database.
An online predictor based on MethCGI is available at http://166.111.201.7/MethCGI.html
Supplementary data available at Bioinformatics online and http://166.111.201.7/help.html.
超过50%的人类基因在其5'区域含有CpG岛。CpG岛的甲基化模式参与组织特异性基因表达和调控。与异常CpG岛甲基化相关的表观遗传沉默是导致癌细胞中肿瘤抑制功能丧失的一种机制。DNA甲基化的大规模实验检测仍然既费力又耗时。因此,有必要开发计算机方法来预测CpG岛的甲基化状态。
基于最近人类脑组织中DNA甲基化的全基因组数据集,我们开发了一种名为MethCGI的分类器,使用支持向量机(SVM)预测CpG岛的甲基化状态。核苷酸序列内容以及转录因子结合位点(TFBSs)被用作分类特征。该方法在脑数据上的特异性为84.65%,敏感性为84.32%,并且还能正确预测MethDB数据库中报告的来自其他组织的数据的约三分之二。
基于MethCGI的在线预测器可在http://166.111.201.7/MethCGI.html获得。
补充数据可在《生物信息学》在线版以及http://166.111.201.7/help.html获得。