Suppr超能文献

通过基于互信息的具有统计学意义的特征提取实现整合致癌标志物识别:基于关联规则挖掘的癌症表达和甲基化谱研究

Towards integrated oncogenic marker recognition through mutual information-based statistically significant feature extraction: an association rule mining based study on cancer expression and methylation profiles.

作者信息

Mallik Saurav, Zhao Zhongming

机构信息

Computer Science & Engineering, Aliah University, Newtown, Newtown 700156, India.

Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA.

出版信息

Quant Biol. 2017 Dec;5(4):302-327. doi: 10.1007/s40484-017-0119-0. Epub 2017 Nov 23.

Abstract

BACKGROUND

Marker detection is an important task in complex disease studies. Here we provide an association rule mining (ARM) based approach for identifying integrated markers through mutual information (MI) based statistically significant feature extraction, and apply it to acute myeloid leukemia (AML) and prostate carcinoma (PC) gene expression and methylation profiles.

METHODS

We first collect the genes having both expression and methylation values in AML as well as PC. Next, we run Jarque-Bera normality test on the expression/methylation data to divide the whole dataset into two parts: one that ollows normal distribution and the other that does not follow normal distribution. Thus, we have now four parts of the dataset: normally distributed expression data, normally distributed methylation data, non-normally distributed expression data, and non-normally distributed methylated data. A feature-extraction technique, "" is then utilized on each part. This results in a list of top-ranked genes. Next, we apply Welch -test (parametric test) and Shrink -test (non-parametric test) on the expression/methylation data for the top selected normally distributed genes and non-normally distributed genes, respectively. We then use a recent weighted ARM method, "RANWAR" to combine all/specific resultant genes to generate top oncogenic rules along with respective integrated markers. Finally, we perform literature search as well as KEGG pathway and Gene-Ontology (GO) analyses using Enrichr database for validation of the prioritized oncogenes as the markers and labeling the markers as existing or novel.

RESULTS

The novel markers of AML are {ABCB11↑∪KRT17↓} (i.e., ABCB11 as up-regulated, & KRT17 as down-regulated), and {AP1S1-∪KRT17↓∪NEIL2-∪DYDC1↓}) (i.e., AP1S1 and NEIL2 both as hypo-methylated, & KRT17 and DYDC1 both as down-regulated). The novel marker of PC is {UBIAD1¶∪APBA2‡∪C4orf31‡} (i.e., UBIAD1 as up-regulated and hypo-methylated, & APBA2 and C4orf31 both as down-regulated and hyper-methylated).

CONCLUSION

The identified novel markers might have critical roles in AML as well as PC. The approach can be applied to other complex disease.

摘要

背景

标记物检测是复杂疾病研究中的一项重要任务。在此,我们提供一种基于关联规则挖掘(ARM)的方法,通过基于互信息(MI)的具有统计学意义的特征提取来识别综合标记物,并将其应用于急性髓系白血病(AML)和前列腺癌(PC)的基因表达及甲基化谱。

方法

我们首先收集AML以及PC中同时具有表达值和甲基化值的基因。接下来,我们对表达/甲基化数据进行Jarque-Bera正态性检验,将整个数据集分为两部分:一部分服从正态分布,另一部分不服从正态分布。这样,我们现在有数据集的四个部分:正态分布的表达数据、正态分布的甲基化数据、非正态分布的表达数据以及非正态分布的甲基化数据。然后在每个部分上使用一种特征提取技术“”。这会得到一个排名靠前的基因列表。接下来,我们分别对所选排名靠前的正态分布基因和非正态分布基因的表达/甲基化数据应用Welch检验(参数检验)和Shrink检验(非参数检验)。然后,我们使用一种最新的加权ARM方法“RANWAR”来组合所有/特定的结果基因,以生成顶级致癌规则以及各自的综合标记物。最后,我们使用Enrichr数据库进行文献检索以及KEGG通路和基因本体(GO)分析,以验证作为标记物的优先致癌基因,并将这些标记物标记为已知或新发现的。

结果

AML的新标记物是{ABCB11↑∪KRT17↓}(即ABCB11上调,KRT17下调),以及{AP1S1-∪KRT17↓∪NEIL2-∪DYDC1↓}(即AP1S1和NEIL2均为低甲基化,KRT17和DYDC1均为下调)。PC的新标记物是{UBIAD1¶∪APBA2‡∪C4orf31‡}(即UBIAD1上调且低甲基化,APBA2和C4orf31均下调且高甲基化)。

结论

所识别的新标记物可能在AML以及PC中起关键作用。该方法可应用于其他复杂疾病。

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验