Meng Hailong, Murrelle Edward L, Li Guoya
Scientific Division, ClearPoint Resources Inc., Richmond, VA 23219, USA.
BMC Bioinformatics. 2008 Oct 27;9:457. doi: 10.1186/1471-2105-9-457.
DNA methylation patterns have been shown to significantly correlate with different tissue types and disease states. High-throughput methylation arrays enable large-scale DNA methylation analysis to identify informative DNA methylation biomarkers. The identification of disease-specific methylation signatures is of fundamental and practical interest for risk assessment, diagnosis, and prognosis of diseases.
Using published high-throughput DNA methylation data, a two-stage feature selection method was developed to select a small optimal subset of DNA methylation features to precisely classify two sample groups. With this approach, a small number of CpG sites were highly sensitive and specific in distinguishing lung cancer tissue samples from normal lung tissue samples.
This study shows that it is feasible to identify DNA methylation biomarkers from high-throughput DNA methylation profiles and that a small number of signature CpG sites can suffice to classify two groups of samples. The computational method we developed in the study is efficient to identify signature CpG sites from disease samples with complex methylation patterns.
DNA甲基化模式已被证明与不同的组织类型和疾病状态显著相关。高通量甲基化阵列能够进行大规模DNA甲基化分析,以识别有信息价值的DNA甲基化生物标志物。疾病特异性甲基化特征的识别对于疾病的风险评估、诊断和预后具有重要的基础和实际意义。
利用已发表的高通量DNA甲基化数据,开发了一种两阶段特征选择方法,以选择一个小的最佳DNA甲基化特征子集,用于精确分类两个样本组。通过这种方法,少量的CpG位点在区分肺癌组织样本和正常肺组织样本方面具有高度的敏感性和特异性。
本研究表明,从高通量DNA甲基化谱中识别DNA甲基化生物标志物是可行的,并且少量的特征性CpG位点足以对两组样本进行分类。我们在研究中开发的计算方法能够有效地从具有复杂甲基化模式的疾病样本中识别特征性CpG位点。