Suppr超能文献

通过基于互信息的具有统计学意义的特征提取实现整合致癌标志物识别:基于关联规则挖掘的癌症表达和甲基化谱研究

Towards integrated oncogenic marker recognition through mutual information-based statistically significant feature extraction: an association rule mining based study on cancer expression and methylation profiles.

作者信息

Mallik Saurav, Zhao Zhongming

机构信息

Computer Science & Engineering, Aliah University, Newtown, Newtown 700156, India.

Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA.

出版信息

Quant Biol. 2017 Dec;5(4):302-327. doi: 10.1007/s40484-017-0119-0. Epub 2017 Nov 23.

Abstract

BACKGROUND

Marker detection is an important task in complex disease studies. Here we provide an association rule mining (ARM) based approach for identifying integrated markers through mutual information (MI) based statistically significant feature extraction, and apply it to acute myeloid leukemia (AML) and prostate carcinoma (PC) gene expression and methylation profiles.

METHODS

We first collect the genes having both expression and methylation values in AML as well as PC. Next, we run Jarque-Bera normality test on the expression/methylation data to divide the whole dataset into two parts: one that ollows normal distribution and the other that does not follow normal distribution. Thus, we have now four parts of the dataset: normally distributed expression data, normally distributed methylation data, non-normally distributed expression data, and non-normally distributed methylated data. A feature-extraction technique, "" is then utilized on each part. This results in a list of top-ranked genes. Next, we apply Welch -test (parametric test) and Shrink -test (non-parametric test) on the expression/methylation data for the top selected normally distributed genes and non-normally distributed genes, respectively. We then use a recent weighted ARM method, "RANWAR" to combine all/specific resultant genes to generate top oncogenic rules along with respective integrated markers. Finally, we perform literature search as well as KEGG pathway and Gene-Ontology (GO) analyses using Enrichr database for validation of the prioritized oncogenes as the markers and labeling the markers as existing or novel.

RESULTS

The novel markers of AML are {ABCB11↑∪KRT17↓} (i.e., ABCB11 as up-regulated, & KRT17 as down-regulated), and {AP1S1-∪KRT17↓∪NEIL2-∪DYDC1↓}) (i.e., AP1S1 and NEIL2 both as hypo-methylated, & KRT17 and DYDC1 both as down-regulated). The novel marker of PC is {UBIAD1¶∪APBA2‡∪C4orf31‡} (i.e., UBIAD1 as up-regulated and hypo-methylated, & APBA2 and C4orf31 both as down-regulated and hyper-methylated).

CONCLUSION

The identified novel markers might have critical roles in AML as well as PC. The approach can be applied to other complex disease.

摘要

背景

标记物检测是复杂疾病研究中的一项重要任务。在此,我们提供一种基于关联规则挖掘(ARM)的方法,通过基于互信息(MI)的具有统计学意义的特征提取来识别综合标记物,并将其应用于急性髓系白血病(AML)和前列腺癌(PC)的基因表达及甲基化谱。

方法

我们首先收集AML以及PC中同时具有表达值和甲基化值的基因。接下来,我们对表达/甲基化数据进行Jarque-Bera正态性检验,将整个数据集分为两部分:一部分服从正态分布,另一部分不服从正态分布。这样,我们现在有数据集的四个部分:正态分布的表达数据、正态分布的甲基化数据、非正态分布的表达数据以及非正态分布的甲基化数据。然后在每个部分上使用一种特征提取技术“”。这会得到一个排名靠前的基因列表。接下来,我们分别对所选排名靠前的正态分布基因和非正态分布基因的表达/甲基化数据应用Welch检验(参数检验)和Shrink检验(非参数检验)。然后,我们使用一种最新的加权ARM方法“RANWAR”来组合所有/特定的结果基因,以生成顶级致癌规则以及各自的综合标记物。最后,我们使用Enrichr数据库进行文献检索以及KEGG通路和基因本体(GO)分析,以验证作为标记物的优先致癌基因,并将这些标记物标记为已知或新发现的。

结果

AML的新标记物是{ABCB11↑∪KRT17↓}(即ABCB11上调,KRT17下调),以及{AP1S1-∪KRT17↓∪NEIL2-∪DYDC1↓}(即AP1S1和NEIL2均为低甲基化,KRT17和DYDC1均为下调)。PC的新标记物是{UBIAD1¶∪APBA2‡∪C4orf31‡}(即UBIAD1上调且低甲基化,APBA2和C4orf31均下调且高甲基化)。

结论

所识别的新标记物可能在AML以及PC中起关键作用。该方法可应用于其他复杂疾病。

相似文献

2
RANWAR: rank-based weighted association rule mining from gene expression and methylation data.
IEEE Trans Nanobioscience. 2015 Jan;14(1):59-66. doi: 10.1109/TNB.2014.2359494. Epub 2014 Sep 23.
3
Identifying Epigenetic Biomarkers using Maximal Relevance and Minimal Redundancy Based Feature Selection for Multi-Omics Data.
IEEE Trans Nanobioscience. 2017 Jan;16(1):3-10. doi: 10.1109/TNB.2017.2650217. Epub 2017 Jan 9.
4
Analyzing large gene expression and methylation data profiles using StatBicRM: statistical biclustering-based rule mining.
PLoS One. 2015 Apr 1;10(4):e0119448. doi: 10.1371/journal.pone.0119448. eCollection 2015.
8
Identification of temporal association rules from time-series microarray data sets.
BMC Bioinformatics. 2009 Mar 19;10 Suppl 3(Suppl 3):S6. doi: 10.1186/1471-2105-10-S3-S6.
10
Immunohistochemical Expression of Five Protein Combinations Revealed as Prognostic Markers in Asian Oral Cancer.
Front Genet. 2021 Apr 15;12:643461. doi: 10.3389/fgene.2021.643461. eCollection 2021.

引用本文的文献

2
Whole-transcriptome bioinformatics revealed , , and as novel targets in acute myeloid leukaemia.
J Taibah Univ Med Sci. 2022 Mar 10;17(5):897-903. doi: 10.1016/j.jtumed.2021.12.013. eCollection 2022 Oct.
3
The Single Nucleotide Polymorphisms of AP1S1 are Associated with Risk of Esophageal Squamous Cell Carcinoma in Chinese Population.
Pharmgenomics Pers Med. 2022 Mar 17;15:235-247. doi: 10.2147/PGPM.S342743. eCollection 2022.
4
Role of adaptin protein complexes in intracellular trafficking and their impact on diseases.
Bioengineered. 2021 Dec;12(1):8259-8278. doi: 10.1080/21655979.2021.1982846.
5
Genome-Wide Correlation of DNA Methylation and Gene Expression in Postmortem Brain Tissues of Opioid Use Disorder Patients.
Int J Neuropsychopharmacol. 2021 Nov 12;24(11):879-891. doi: 10.1093/ijnp/pyab043.
7
Single-cell genomic profile-based analysis of tissue differentiation in colorectal cancer.
Sci China Life Sci. 2021 Aug;64(8):1311-1325. doi: 10.1007/s11427-020-1811-5. Epub 2020 Oct 30.
9
Identification of gene signatures from RNA-seq data using Pareto-optimal cluster algorithm.
BMC Syst Biol. 2018 Dec 21;12(Suppl 8):126. doi: 10.1186/s12918-018-0650-2.

本文引用的文献

3
Transcriptomic comparison of primary bovine horn core carcinoma culture and parental tissue at early stage.
Vet World. 2017 Jan;10(1):38-55. doi: 10.14202/vetworld.2017.38-55. Epub 2017 Jan 13.
5
Integrating Multiple Data Sources for Combinatorial Marker Discovery: A Study in Tumorigenesis.
IEEE/ACM Trans Comput Biol Bioinform. 2018 Mar-Apr;15(2):673-687. doi: 10.1109/TCBB.2016.2636207. Epub 2016 Dec 6.
6
Identifying Epigenetic Biomarkers using Maximal Relevance and Minimal Redundancy Based Feature Selection for Multi-Omics Data.
IEEE Trans Nanobioscience. 2017 Jan;16(1):3-10. doi: 10.1109/TNB.2017.2650217. Epub 2017 Jan 9.
7
Transcriptomic and proteomic analysis of mouse radiation-induced acute myeloid leukaemia (AML).
Oncotarget. 2016 Jun 28;7(26):40461-40480. doi: 10.18632/oncotarget.9626.
8
Enrichr: a comprehensive gene set enrichment analysis web server 2016 update.
Nucleic Acids Res. 2016 Jul 8;44(W1):W90-7. doi: 10.1093/nar/gkw377. Epub 2016 May 3.
10
Leukemia-associated activating mutation of Flt3 expands dendritic cells and alters T cell responses.
J Exp Med. 2016 Mar 7;213(3):415-31. doi: 10.1084/jem.20150642. Epub 2016 Feb 22.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验