ConGEMs：通过基于规则的聚类发现浓缩基因共表达模块及其在致癌作用中的应用

ConGEMs: Condensed Gene Co-Expression Module Discovery Through Rule-Based Clustering and Its Application to Carcinogenesis.

作者信息

Mallik Saurav, Zhao Zhongming

机构信息

Department of Computer Science & Engineering, Aliah University, Newtown, WB-700156, India.

Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA.

出版信息

Genes (Basel). 2017 Dec 28;9(1):7. doi: 10.3390/genes9010007.

DOI:10.3390/genes9010007

PMID:29283433

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5793160/

Abstract

For transcriptomic analysis, there are numerous microarray-based genomic data, especially those generated for cancer research. The typical analysis measures the difference between a cancer sample-group and a matched control group for each transcript or gene. Association rule mining is used to discover interesting item sets through rule-based methodology. Thus, it has advantages to find causal effect relationships between the transcripts. In this work, we introduce two new rule-based similarity measures-weighted rank-based Jaccard and Cosine measures-and then propose a novel computational framework to detect condensed gene co-expression modules ( C o n G E M s) through the association rule-based learning system and the weighted similarity scores. In practice, the list of evolved condensed markers that consists of both singular and complex markers in nature depends on the corresponding condensed gene sets in either antecedent or consequent of the rules of the resultant modules. In our evaluation, these markers could be supported by literature evidence, KEGG (Kyoto Encyclopedia of Genes and Genomes) pathway and Gene Ontology annotations. Specifically, we preliminarily identified differentially expressed genes using an empirical Bayes test. A recently developed algorithm-RANWAR-was then utilized to determine the association rules from these genes. Based on that, we computed the integrated similarity scores of these rule-based similarity measures between each rule-pair, and the resultant scores were used for clustering to identify the co-expressed rule-modules. We applied our method to a gene expression dataset for lung squamous cell carcinoma and a genome methylation dataset for uterine cervical carcinogenesis. Our proposed module discovery method produced better results than the traditional gene-module discovery measures. In summary, our proposed rule-based method is useful for exploring biomarker modules from transcriptomic data.

摘要

对于转录组分析，有大量基于微阵列的基因组数据，尤其是那些为癌症研究生成的数据。典型的分析是测量每个转录本或基因在癌症样本组和匹配的对照组之间的差异。关联规则挖掘用于通过基于规则的方法发现有趣的项目集。因此，它在发现转录本之间的因果关系方面具有优势。在这项工作中，我们引入了两种新的基于规则的相似性度量——加权基于秩的杰卡德度量和余弦度量——然后提出了一种新颖的计算框架，通过基于关联规则的学习系统和加权相似性分数来检测浓缩基因共表达模块（ConGEMs）。在实践中，由自然状态下的单一和复杂标记组成的进化浓缩标记列表取决于所得模块规则的前件或后件中的相应浓缩基因集。在我们的评估中，这些标记可以得到文献证据、KEGG（京都基因与基因组百科全书）通路和基因本体注释的支持。具体来说，我们使用经验贝叶斯检验初步鉴定差异表达基因。然后利用一种最近开发的算法——RANWAR——从这些基因中确定关联规则。在此基础上，我们计算了每个规则对之间这些基于规则的相似性度量的综合相似性分数，并将所得分数用于聚类以识别共表达规则模块。我们将我们的方法应用于肺鳞状细胞癌的基因表达数据集和子宫颈癌发生的基因组甲基化数据集。我们提出的模块发现方法比传统的基因模块发现度量产生了更好的结果。总之，我们提出的基于规则的方法对于从转录组数据中探索生物标志物模块很有用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fa17/5793160/d131e78cadf4/genes-09-00007-g001.jpg

相似文献

ConGEMs: Condensed Gene Co-Expression Module Discovery Through Rule-Based Clustering and Its Application to Carcinogenesis.ConGEMs：通过基于规则的聚类发现浓缩基因共表达模块及其在致癌作用中的应用

Genes (Basel). 2017 Dec 28;9(1):7. doi: 10.3390/genes9010007.

RANWAR: rank-based weighted association rule mining from gene expression and methylation data.RANWAR：从基因表达和甲基化数据中进行基于秩的加权关联规则挖掘。

IEEE Trans Nanobioscience. 2015 Jan;14(1):59-66. doi: 10.1109/TNB.2014.2359494. Epub 2014 Sep 23.

DTFP-Growth: Dynamic Threshold-Based FP-Growth Rule Mining Algorithm Through Integrating Gene Expression, Methylation, and Protein-Protein Interaction Profiles.DTFP-Growth：通过整合基因表达、甲基化和蛋白质-蛋白质相互作用谱的基于动态阈值的 FP 增长规则挖掘算法。

IEEE Trans Nanobioscience. 2018 Apr;17(2):117-125. doi: 10.1109/TNB.2018.2803021.

Towards integrated oncogenic marker recognition through mutual information-based statistically significant feature extraction: an association rule mining based study on cancer expression and methylation profiles.通过基于互信息的具有统计学意义的特征提取实现整合致癌标志物识别：基于关联规则挖掘的癌症表达和甲基化谱研究

Quant Biol. 2017 Dec;5(4):302-327. doi: 10.1007/s40484-017-0119-0. Epub 2017 Nov 23.

Analyzing large gene expression and methylation data profiles using StatBicRM: statistical biclustering-based rule mining.使用StatBicRM分析大型基因表达和甲基化数据概况：基于统计双聚类的规则挖掘

PLoS One. 2015 Apr 1;10(4):e0119448. doi: 10.1371/journal.pone.0119448. eCollection 2015.

Association rule based similarity measures for the clustering of gene expression data.基于关联规则的基因表达数据聚类相似性度量

Open Med Inform J. 2010;4:63-73. doi: 10.2174/1874431101004010063. Epub 2010 May 28.

TPSC: a module detection method based on topology potential and spectral clustering in weighted networks and its application in gene co-expression module discovery.TPSC：一种基于加权网络中拓扑势和谱聚类的模块检测方法及其在基因共表达模块发现中的应用

BMC Bioinformatics. 2021 Oct 25;22(Suppl 4):111. doi: 10.1186/s12859-021-03964-5.

A functional gene module identification algorithm in gene expression data based on genetic algorithm and gene ontology.基于遗传算法和基因本体论的基因表达数据中功能基因模块识别算法。

BMC Genomics. 2023 Feb 17;24(1):76. doi: 10.1186/s12864-023-09157-z.

Analysis of the autophagy gene expression profile of pancreatic cancer based on autophagy-related protein microtubule-associated protein 1A/1B-light chain 3.基于自噬相关蛋白微管相关蛋白 1A/1B-轻链 3 分析胰腺癌的自噬基因表达谱。

World J Gastroenterol. 2019 May 7;25(17):2086-2098. doi: 10.3748/wjg.v25.i17.2086.

Dynamic association rules for gene expression data analysis.用于基因表达数据分析的动态关联规则

BMC Genomics. 2015 Oct 14;16:786. doi: 10.1186/s12864-015-1970-x.

引用本文的文献

Utility of Machine Learning Models to Predict Lymph Node Metastasis of Japanese Localized Prostate Cancer.机器学习模型预测日本局限性前列腺癌淋巴结转移的效用

Cancers (Basel). 2024 Dec 5;16(23):4073. doi: 10.3390/cancers16234073.

PPIGCF: A Protein-Protein Interaction-Based Gene Correlation Filter for Optimal Gene Selection.PPIGCF：一种基于蛋白质相互作用的基因关联滤波器，用于最优基因选择。

Genes (Basel). 2023 May 10;14(5):1063. doi: 10.3390/genes14051063.

Comparison of five supervised feature selection algorithms leading to top features and gene signatures from multi-omics data in cancer.比较五种监督特征选择算法，这些算法可从癌症的多组学数据中得到顶级特征和基因特征。

BMC Bioinformatics. 2022 Apr 28;23(Suppl 3):153. doi: 10.1186/s12859-022-04678-y.

In silico ranking of phenolics for therapeutic effectiveness on cancer stem cells.基于计算机的酚类化合物治疗癌症干细胞疗效的排名。

BMC Bioinformatics. 2020 Dec 28;21(Suppl 21):499. doi: 10.1186/s12859-020-03849-z.

Detecting methylation signatures in neurodegenerative disease by density-based clustering of applications with reducing noise.通过基于密度的应用程序聚类减少噪声来检测神经退行性疾病中的甲基化特征。

Sci Rep. 2020 Dec 17;10(1):22164. doi: 10.1038/s41598-020-78463-3.

Multi-Objective Optimized Fuzzy Clustering for Detecting Cell Clusters from Single-Cell Expression Profiles.基于单细胞表达谱的多目标优化模糊聚类检测细胞簇。

Genes (Basel). 2019 Aug 13;10(8):611. doi: 10.3390/genes10080611.

Graph- and rule-based learning algorithms: a comprehensive review of their applications for cancer type classification and prognosis using genomic data.基于图和规则的学习算法：使用基因组数据对癌症类型分类和预后的应用的全面综述。

Brief Bioinform. 2020 Mar 23;21(2):368-394. doi: 10.1093/bib/bby120.

Identification of gene signatures from RNA-seq data using Pareto-optimal cluster algorithm.使用帕累托最优聚类算法从RNA测序数据中识别基因特征。

BMC Syst Biol. 2018 Dec 21;12(Suppl 8):126. doi: 10.1186/s12918-018-0650-2.

An Introduction to Integrative Genomics and Systems Medicine in Cancer.癌症综合基因组学与系统医学导论

Genes (Basel). 2018 Jan 12;9(1):37. doi: 10.3390/genes9010037.

本文引用的文献

Quant Biol. 2017 Dec;5(4):302-327. doi: 10.1007/s40484-017-0119-0. Epub 2017 Nov 23.

Detecting Disease Specific Pathway Substructures through an Integrated Systems Biology Approach.通过综合系统生物学方法检测疾病特异性通路子结构

Noncoding RNA. 2017 Apr 19;3(2):20. doi: 10.3390/ncrna3020020.

PLoS One. 2017 May 19;12(5):e0178006. doi: 10.1371/journal.pone.0178006. eCollection 2017.

Integrating Multiple Data Sources for Combinatorial Marker Discovery: A Study in Tumorigenesis.整合多种数据源进行组合标记物发现：在肿瘤发生中的研究。

IEEE/ACM Trans Comput Biol Bioinform. 2018 Mar-Apr;15(2):673-687. doi: 10.1109/TCBB.2016.2636207. Epub 2016 Dec 6.

Identifying Epigenetic Biomarkers using Maximal Relevance and Minimal Redundancy Based Feature Selection for Multi-Omics Data.基于最大相关最小冗余特征选择的多组学数据表观遗传生物标志物识别

IEEE Trans Nanobioscience. 2017 Jan;16(1):3-10. doi: 10.1109/TNB.2017.2650217. Epub 2017 Jan 9.

Gene co-expression analysis for functional classification and gene-disease predictions.基因共表达分析用于功能分类和基因疾病预测。

Brief Bioinform. 2018 Jul 20;19(4):575-592. doi: 10.1093/bib/bbw139.

Post-transcriptional knowledge in pathway analysis increases the accuracy of phenotypes classification.通路分析中的转录后知识提高了表型分类的准确性。

Oncotarget. 2016 Aug 23;7(34):54572-54582. doi: 10.18632/oncotarget.9788.

IDPT: Insights into potential intrinsically disordered proteins through transcriptomic analysis of genes for prostate carcinoma epigenetic data.IDPT：通过对前列腺癌表观遗传数据基因的转录组分析洞察潜在的内在无序蛋白质

Gene. 2016 Jul 15;586(1):87-96. doi: 10.1016/j.gene.2016.03.056. Epub 2016 Apr 7.

Dynamic association rules for gene expression data analysis.用于基因表达数据分析的动态关联规则

BMC Genomics. 2015 Oct 14;16:786. doi: 10.1186/s12864-015-1970-x.

Identifying Non-Redundant Gene Markers from Microarray Data: A Multiobjective Variable Length PSO-Based Approach.从微阵列数据中识别非冗余基因标记：一种基于多目标可变长度粒子群优化的方法。

IEEE/ACM Trans Comput Biol Bioinform. 2014 Nov-Dec;11(6):1170-83. doi: 10.1109/TCBB.2014.2323065.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

ConGEMs：通过基于规则的聚类发现浓缩基因共表达模块及其在致癌作用中的应用

ConGEMs: Condensed Gene Co-Expression Module Discovery Through Rule-Based Clustering and Its Application to Carcinogenesis.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献