Suppr超能文献

对高甲基化基因进行基因组扫描。

Genomic sweeping for hypermethylated genes.

作者信息

Goh Liang, Murphy Susan K, Muhkerjee Sayan, Furey Terrence S

机构信息

Institute for Genome Sciences Policy, Duke University, USA.

出版信息

Bioinformatics. 2007 Feb 1;23(3):281-8. doi: 10.1093/bioinformatics/btl620. Epub 2006 Dec 5.

Abstract

MOTIVATION

Genes silenced by the aberrent methylation of nearby CpG islands can contribute to the onset or progression of cancer and represent potential biomarkers for diagnosis and prognosis. Relatively few have thus far been validated as hypermethylated in cancer among over 14,000 candidates with promoter region CpG islands. A descriptive set of genes known to be unmethylated in cancer does not exist. This lack of a negative set and a large number of candidates necessitated the development of a new approach to identify novel genes hypermethylated in cancer.

RESULTS

We developed a general method, cluster_boost, that in an imbalanced data setting predicts new minority class members given limited known samples and a large set of unlabeled samples. Synthetic datasets modeled after the hypermethylated genes data show that cluster_boost can successfully identify minority samples within unlabeled data. Using genome sequence features, cluster_boost predicted candidate hypermethylated genes among 14,000 genes of unknown status. In primary ovarian cancers, we determined the methylation status for 15 genes with different levels of support for being hypermethlyated. Results indicate cluster_boost can accurately identify novel genes hypermethylated in cancer.

AVAILABILITY

Software and datasets are freely available at http://labs.genome.duke.edu/FureyLab/cluster_boost.php.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

因附近CpG岛异常甲基化而沉默的基因可能促成癌症的发生或发展,并代表诊断和预后的潜在生物标志物。在超过14,000个具有启动子区域CpG岛的候选基因中,到目前为止,相对较少的基因已被证实在癌症中发生高甲基化。目前尚不存在一组已知在癌症中未发生甲基化的描述性基因。由于缺乏阴性样本集以及大量的候选基因,因此需要开发一种新方法来识别癌症中发生高甲基化的新基因。

结果

我们开发了一种通用方法cluster_boost,该方法在不平衡数据设置中,在已知样本有限且有大量未标记样本的情况下预测新的少数类成员。以高甲基化基因数据为模型的合成数据集表明,cluster_boost可以成功识别未标记数据中的少数样本。利用基因组序列特征,cluster_boost在14,000个状态未知的基因中预测了候选高甲基化基因。在原发性卵巢癌中,我们确定了15个基因的甲基化状态,这些基因在高甲基化方面有不同程度的支持。结果表明,cluster_boost可以准确识别癌症中发生高甲基化的新基因。

可用性

软件和数据集可在http://labs.genome.duke.edu/FureyLab/cluster_boost.php免费获取。

补充信息

补充数据可在《生物信息学》在线获取。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验