• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

通过基于互信息的具有统计学意义的特征提取实现整合致癌标志物识别:基于关联规则挖掘的癌症表达和甲基化谱研究

Towards integrated oncogenic marker recognition through mutual information-based statistically significant feature extraction: an association rule mining based study on cancer expression and methylation profiles.

作者信息

Mallik Saurav, Zhao Zhongming

机构信息

Computer Science & Engineering, Aliah University, Newtown, Newtown 700156, India.

Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA.

出版信息

Quant Biol. 2017 Dec;5(4):302-327. doi: 10.1007/s40484-017-0119-0. Epub 2017 Nov 23.

DOI:10.1007/s40484-017-0119-0
PMID:30221015
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6135253/
Abstract

BACKGROUND

Marker detection is an important task in complex disease studies. Here we provide an association rule mining (ARM) based approach for identifying integrated markers through mutual information (MI) based statistically significant feature extraction, and apply it to acute myeloid leukemia (AML) and prostate carcinoma (PC) gene expression and methylation profiles.

METHODS

We first collect the genes having both expression and methylation values in AML as well as PC. Next, we run Jarque-Bera normality test on the expression/methylation data to divide the whole dataset into two parts: one that ollows normal distribution and the other that does not follow normal distribution. Thus, we have now four parts of the dataset: normally distributed expression data, normally distributed methylation data, non-normally distributed expression data, and non-normally distributed methylated data. A feature-extraction technique, "" is then utilized on each part. This results in a list of top-ranked genes. Next, we apply Welch -test (parametric test) and Shrink -test (non-parametric test) on the expression/methylation data for the top selected normally distributed genes and non-normally distributed genes, respectively. We then use a recent weighted ARM method, "RANWAR" to combine all/specific resultant genes to generate top oncogenic rules along with respective integrated markers. Finally, we perform literature search as well as KEGG pathway and Gene-Ontology (GO) analyses using Enrichr database for validation of the prioritized oncogenes as the markers and labeling the markers as existing or novel.

RESULTS

The novel markers of AML are {ABCB11↑∪KRT17↓} (i.e., ABCB11 as up-regulated, & KRT17 as down-regulated), and {AP1S1-∪KRT17↓∪NEIL2-∪DYDC1↓}) (i.e., AP1S1 and NEIL2 both as hypo-methylated, & KRT17 and DYDC1 both as down-regulated). The novel marker of PC is {UBIAD1¶∪APBA2‡∪C4orf31‡} (i.e., UBIAD1 as up-regulated and hypo-methylated, & APBA2 and C4orf31 both as down-regulated and hyper-methylated).

CONCLUSION

The identified novel markers might have critical roles in AML as well as PC. The approach can be applied to other complex disease.

摘要

背景

标记物检测是复杂疾病研究中的一项重要任务。在此,我们提供一种基于关联规则挖掘(ARM)的方法,通过基于互信息(MI)的具有统计学意义的特征提取来识别综合标记物,并将其应用于急性髓系白血病(AML)和前列腺癌(PC)的基因表达及甲基化谱。

方法

我们首先收集AML以及PC中同时具有表达值和甲基化值的基因。接下来,我们对表达/甲基化数据进行Jarque-Bera正态性检验,将整个数据集分为两部分:一部分服从正态分布,另一部分不服从正态分布。这样,我们现在有数据集的四个部分:正态分布的表达数据、正态分布的甲基化数据、非正态分布的表达数据以及非正态分布的甲基化数据。然后在每个部分上使用一种特征提取技术“”。这会得到一个排名靠前的基因列表。接下来,我们分别对所选排名靠前的正态分布基因和非正态分布基因的表达/甲基化数据应用Welch检验(参数检验)和Shrink检验(非参数检验)。然后,我们使用一种最新的加权ARM方法“RANWAR”来组合所有/特定的结果基因,以生成顶级致癌规则以及各自的综合标记物。最后,我们使用Enrichr数据库进行文献检索以及KEGG通路和基因本体(GO)分析,以验证作为标记物的优先致癌基因,并将这些标记物标记为已知或新发现的。

结果

AML的新标记物是{ABCB11↑∪KRT17↓}(即ABCB11上调,KRT17下调),以及{AP1S1-∪KRT17↓∪NEIL2-∪DYDC1↓}(即AP1S1和NEIL2均为低甲基化,KRT17和DYDC1均为下调)。PC的新标记物是{UBIAD1¶∪APBA2‡∪C4orf31‡}(即UBIAD1上调且低甲基化,APBA2和C4orf31均下调且高甲基化)。

结论

所识别的新标记物可能在AML以及PC中起关键作用。该方法可应用于其他复杂疾病。

相似文献

1
Towards integrated oncogenic marker recognition through mutual information-based statistically significant feature extraction: an association rule mining based study on cancer expression and methylation profiles.通过基于互信息的具有统计学意义的特征提取实现整合致癌标志物识别:基于关联规则挖掘的癌症表达和甲基化谱研究
Quant Biol. 2017 Dec;5(4):302-327. doi: 10.1007/s40484-017-0119-0. Epub 2017 Nov 23.
2
RANWAR: rank-based weighted association rule mining from gene expression and methylation data.RANWAR:从基因表达和甲基化数据中进行基于秩的加权关联规则挖掘。
IEEE Trans Nanobioscience. 2015 Jan;14(1):59-66. doi: 10.1109/TNB.2014.2359494. Epub 2014 Sep 23.
3
Identifying Epigenetic Biomarkers using Maximal Relevance and Minimal Redundancy Based Feature Selection for Multi-Omics Data.基于最大相关最小冗余特征选择的多组学数据表观遗传生物标志物识别
IEEE Trans Nanobioscience. 2017 Jan;16(1):3-10. doi: 10.1109/TNB.2017.2650217. Epub 2017 Jan 9.
4
Analyzing large gene expression and methylation data profiles using StatBicRM: statistical biclustering-based rule mining.使用StatBicRM分析大型基因表达和甲基化数据概况:基于统计双聚类的规则挖掘
PLoS One. 2015 Apr 1;10(4):e0119448. doi: 10.1371/journal.pone.0119448. eCollection 2015.
5
ConGEMs: Condensed Gene Co-Expression Module Discovery Through Rule-Based Clustering and Its Application to Carcinogenesis.ConGEMs:通过基于规则的聚类发现浓缩基因共表达模块及其在致癌作用中的应用
Genes (Basel). 2017 Dec 28;9(1):7. doi: 10.3390/genes9010007.
6
Integrated Analysis of Methylomic and Transcriptomic Data to Identify Potential Diagnostic Biomarkers for Major Depressive Disorder.整合甲基化组和转录组数据的分析,以鉴定重度抑郁症的潜在诊断生物标志物。
Genes (Basel). 2021 Jan 27;12(2):178. doi: 10.3390/genes12020178.
7
Integrated bioinformatics analysis of aberrantly-methylated differentially-expressed genes and pathways in age-related macular degeneration.年龄相关性黄斑变性中异常甲基化差异表达基因及通路的综合生物信息学分析。
BMC Ophthalmol. 2020 Mar 24;20(1):119. doi: 10.1186/s12886-020-01392-2.
8
Identification of temporal association rules from time-series microarray data sets.从时间序列微阵列数据集中识别时间关联规则。
BMC Bioinformatics. 2009 Mar 19;10 Suppl 3(Suppl 3):S6. doi: 10.1186/1471-2105-10-S3-S6.
9
Identification of differentially methylated markers among cytogenetic risk groups of acute myeloid leukemia.急性髓系白血病细胞遗传学风险组间差异甲基化标志物的鉴定。
Epigenetics. 2015;10(6):526-35. doi: 10.1080/15592294.2015.1048060.
10
Immunohistochemical Expression of Five Protein Combinations Revealed as Prognostic Markers in Asian Oral Cancer.五种蛋白组合的免疫组化表达在亚洲口腔癌中显示为预后标志物
Front Genet. 2021 Apr 15;12:643461. doi: 10.3389/fgene.2021.643461. eCollection 2021.

引用本文的文献

1
Optimal ranking and directional signature classification using the integral strategy of multi-objective optimization-based association rule mining of multi-omics data.使用基于多组学数据的多目标优化关联规则挖掘的积分策略进行最优排序和方向特征分类。
Front Bioinform. 2023 Jul 27;3:1182176. doi: 10.3389/fbinf.2023.1182176. eCollection 2023.
2
Whole-transcriptome bioinformatics revealed , , and as novel targets in acute myeloid leukaemia.全转录组生物信息学分析揭示了 、 、 和 是急性髓系白血病中的新靶点。 (注:原文中“,,, ”部分内容缺失,翻译时保留原样)
J Taibah Univ Med Sci. 2022 Mar 10;17(5):897-903. doi: 10.1016/j.jtumed.2021.12.013. eCollection 2022 Oct.
3
The Single Nucleotide Polymorphisms of AP1S1 are Associated with Risk of Esophageal Squamous Cell Carcinoma in Chinese Population.AP1S1基因的单核苷酸多态性与中国人群食管鳞状细胞癌风险相关。
Pharmgenomics Pers Med. 2022 Mar 17;15:235-247. doi: 10.2147/PGPM.S342743. eCollection 2022.
4
Role of adaptin protein complexes in intracellular trafficking and their impact on diseases.衔接蛋白复合物在细胞内运输中的作用及其对疾病的影响。
Bioengineered. 2021 Dec;12(1):8259-8278. doi: 10.1080/21655979.2021.1982846.
5
Genome-Wide Correlation of DNA Methylation and Gene Expression in Postmortem Brain Tissues of Opioid Use Disorder Patients.阿片类药物使用障碍患者死后脑组织中 DNA 甲基化与基因表达的全基因组相关性。
Int J Neuropsychopharmacol. 2021 Nov 12;24(11):879-891. doi: 10.1093/ijnp/pyab043.
6
Detecting methylation signatures in neurodegenerative disease by density-based clustering of applications with reducing noise.通过基于密度的应用程序聚类减少噪声来检测神经退行性疾病中的甲基化特征。
Sci Rep. 2020 Dec 17;10(1):22164. doi: 10.1038/s41598-020-78463-3.
7
Single-cell genomic profile-based analysis of tissue differentiation in colorectal cancer.基于单细胞基因组图谱的结直肠癌组织分化分析。
Sci China Life Sci. 2021 Aug;64(8):1311-1325. doi: 10.1007/s11427-020-1811-5. Epub 2020 Oct 30.
8
Multi-Objective Optimized Fuzzy Clustering for Detecting Cell Clusters from Single-Cell Expression Profiles.基于单细胞表达谱的多目标优化模糊聚类检测细胞簇。
Genes (Basel). 2019 Aug 13;10(8):611. doi: 10.3390/genes10080611.
9
Identification of gene signatures from RNA-seq data using Pareto-optimal cluster algorithm.使用帕累托最优聚类算法从RNA测序数据中识别基因特征。
BMC Syst Biol. 2018 Dec 21;12(Suppl 8):126. doi: 10.1186/s12918-018-0650-2.
10
ConGEMs: Condensed Gene Co-Expression Module Discovery Through Rule-Based Clustering and Its Application to Carcinogenesis.ConGEMs:通过基于规则的聚类发现浓缩基因共表达模块及其在致癌作用中的应用
Genes (Basel). 2017 Dec 28;9(1):7. doi: 10.3390/genes9010007.

本文引用的文献

1
Large-scale gene network analysis reveals the significance of extracellular matrix pathway and homeobox genes in acute myeloid leukemia: an introduction to the Pigengene package and its applications.大规模基因网络分析揭示细胞外基质途径和同源盒基因在急性髓系白血病中的意义:Pigengene软件包介绍及其应用
BMC Med Genomics. 2017 Mar 16;10(1):16. doi: 10.1186/s12920-017-0253-6.
2
Proteomic Profiling of Hematopoietic Stem/Progenitor Cells after a Whole Body Exposure of CBA/CaJ Mice to Titanium (Ti) Ions.CBA/CaJ小鼠全身暴露于钛(Ti)离子后造血干/祖细胞的蛋白质组学分析
Proteomes. 2015 Jul 21;3(3):132-159. doi: 10.3390/proteomes3030132.
3
Transcriptomic comparison of primary bovine horn core carcinoma culture and parental tissue at early stage.原发性牛角质芯癌培养物与早期亲本组织的转录组比较。
Vet World. 2017 Jan;10(1):38-55. doi: 10.14202/vetworld.2017.38-55. Epub 2017 Jan 13.
4
Network Biomarkers Constructed from Gene Expression and Protein-Protein Interaction Data for Accurate Prediction of Leukemia.基于基因表达和蛋白质-蛋白质相互作用数据构建的网络生物标志物用于白血病的准确预测。
J Cancer. 2017 Jan 15;8(2):278-286. doi: 10.7150/jca.17302. eCollection 2017.
5
Integrating Multiple Data Sources for Combinatorial Marker Discovery: A Study in Tumorigenesis.整合多种数据源进行组合标记物发现:在肿瘤发生中的研究。
IEEE/ACM Trans Comput Biol Bioinform. 2018 Mar-Apr;15(2):673-687. doi: 10.1109/TCBB.2016.2636207. Epub 2016 Dec 6.
6
Identifying Epigenetic Biomarkers using Maximal Relevance and Minimal Redundancy Based Feature Selection for Multi-Omics Data.基于最大相关最小冗余特征选择的多组学数据表观遗传生物标志物识别
IEEE Trans Nanobioscience. 2017 Jan;16(1):3-10. doi: 10.1109/TNB.2017.2650217. Epub 2017 Jan 9.
7
Transcriptomic and proteomic analysis of mouse radiation-induced acute myeloid leukaemia (AML).小鼠辐射诱导急性髓系白血病(AML)的转录组学和蛋白质组学分析。
Oncotarget. 2016 Jun 28;7(26):40461-40480. doi: 10.18632/oncotarget.9626.
8
Enrichr: a comprehensive gene set enrichment analysis web server 2016 update.Enrichr:一个全面的基因集富集分析网络服务器2016年更新版。
Nucleic Acids Res. 2016 Jul 8;44(W1):W90-7. doi: 10.1093/nar/gkw377. Epub 2016 May 3.
9
IDPT: Insights into potential intrinsically disordered proteins through transcriptomic analysis of genes for prostate carcinoma epigenetic data.IDPT:通过对前列腺癌表观遗传数据基因的转录组分析洞察潜在的内在无序蛋白质
Gene. 2016 Jul 15;586(1):87-96. doi: 10.1016/j.gene.2016.03.056. Epub 2016 Apr 7.
10
Leukemia-associated activating mutation of Flt3 expands dendritic cells and alters T cell responses.Flt3的白血病相关激活突变会使树突状细胞增多并改变T细胞反应。
J Exp Med. 2016 Mar 7;213(3):415-31. doi: 10.1084/jem.20150642. Epub 2016 Feb 22.