整合多种数据源进行组合标记物发现：在肿瘤发生中的研究。

Integrating Multiple Data Sources for Combinatorial Marker Discovery: A Study in Tumorigenesis.

出版信息

IEEE/ACM Trans Comput Biol Bioinform. 2018 Mar-Apr;15(2):673-687. doi: 10.1109/TCBB.2016.2636207. Epub 2016 Dec 6.

DOI:10.1109/TCBB.2016.2636207

Abstract

Identification of combinatorial markers from multiple data sources is a challenging task in bioinformatics. Here, we propose a novel computational framework for identifying significant combinatorial markers ( s) using both gene expression and methylation data. The gene expression and methylation data are integrated into a single continuous data as well as a (post-discretized) boolean data based on their intrinsic (i.e., inverse) relationship. A novel combined score of methylation and expression data (viz., ) is introduced which is computed on the integrated continuous data for identifying initial non-redundant set of genes. Thereafter, (maximal) frequent closed homogeneous genesets are identified using a well-known biclustering algorithm applied on the integrated boolean data of the determined non-redundant set of genes. A novel sample-based weighted support ( ) is then proposed that is consecutively calculated on the integrated boolean data of the determined non-redundant set of genes in order to identify the non-redundant significant genesets. The top few resulting genesets are identified as potential s. Since our proposed method generates a smaller number of significant non-redundant genesets than those by other popular methods, the method is much faster than the others. Application of the proposed technique on an expression and a methylation data for Uterine tumor or Prostate Carcinoma produces a set of significant combination of markers. We expect that such a combination of markers will produce lower false positives than individual markers.

摘要

从多个数据源中识别组合标记是生物信息学中的一项具有挑战性的任务。在这里，我们提出了一种新的计算框架，用于使用基因表达和甲基化数据识别有意义的组合标记（s）。将基因表达和甲基化数据集成到单个连续数据以及基于其内在（即反演）关系的（离散后）布尔数据中。引入了一种新的甲基化和表达数据的组合得分（即），该得分是在集成连续数据上计算的，用于识别初始非冗余基因集。此后，使用一种著名的双聚类算法，在确定的非冗余基因集的集成布尔数据上识别（最大）频繁封闭同质基因集。然后提出了一种新的基于样本的加权支持（），该支持在确定的非冗余基因集的集成布尔数据上连续计算，以识别非冗余有意义的基因集。排在前几位的基因集被确定为潜在的 s。由于我们提出的方法生成的有意义的非冗余基因集数量少于其他流行方法，因此该方法比其他方法快得多。将所提出的技术应用于子宫肿瘤或前列腺癌的表达和甲基化数据会产生一组有意义的标记组合。我们期望这样的标记组合比单个标记产生更低的假阳性率。

相似文献

Integrating Multiple Data Sources for Combinatorial Marker Discovery: A Study in Tumorigenesis.整合多种数据源进行组合标记物发现：在肿瘤发生中的研究。

IEEE/ACM Trans Comput Biol Bioinform. 2018 Mar-Apr;15(2):673-687. doi: 10.1109/TCBB.2016.2636207. Epub 2016 Dec 6.

Analyzing large gene expression and methylation data profiles using StatBicRM: statistical biclustering-based rule mining.使用StatBicRM分析大型基因表达和甲基化数据概况：基于统计双聚类的规则挖掘

PLoS One. 2015 Apr 1;10(4):e0119448. doi: 10.1371/journal.pone.0119448. eCollection 2015.

DTFP-Growth: Dynamic Threshold-Based FP-Growth Rule Mining Algorithm Through Integrating Gene Expression, Methylation, and Protein-Protein Interaction Profiles.DTFP-Growth：通过整合基因表达、甲基化和蛋白质-蛋白质相互作用谱的基于动态阈值的 FP 增长规则挖掘算法。

IEEE Trans Nanobioscience. 2018 Apr;17(2):117-125. doi: 10.1109/TNB.2018.2803021.

Identifying Epigenetic Biomarkers using Maximal Relevance and Minimal Redundancy Based Feature Selection for Multi-Omics Data.基于最大相关最小冗余特征选择的多组学数据表观遗传生物标志物识别

IEEE Trans Nanobioscience. 2017 Jan;16(1):3-10. doi: 10.1109/TNB.2017.2650217. Epub 2017 Jan 9.

Identifying Non-Redundant Gene Markers from Microarray Data: A Multiobjective Variable Length PSO-Based Approach.从微阵列数据中识别非冗余基因标记：一种基于多目标可变长度粒子群优化的方法。

IEEE/ACM Trans Comput Biol Bioinform. 2014 Nov-Dec;11(6):1170-83. doi: 10.1109/TCBB.2014.2323065.

Integrative analysis of DNA copy number, DNA methylation and gene expression in multiple myeloma reveals alterations related to relapse.多发性骨髓瘤中DNA拷贝数、DNA甲基化和基因表达的综合分析揭示了与复发相关的改变。

Oncotarget. 2016 Dec 6;7(49):80664-80679. doi: 10.18632/oncotarget.13025.

Integrated analysis of genome-wide DNA methylation and gene expression profiles identifies potential novel biomarkers of rectal cancer.全基因组DNA甲基化和基因表达谱的综合分析鉴定出直肠癌潜在的新型生物标志物。

Oncotarget. 2016 Sep 20;7(38):62547-62558. doi: 10.18632/oncotarget.11534.

Methylomics analysis identifies epigenetically silenced genes and implies an activation of β-catenin signaling in cervical cancer.甲基化组学分析鉴定出了表观遗传沉默的基因，并提示宫颈癌中β-连环蛋白信号的激活。

Int J Cancer. 2014 Jul 1;135(1):117-27. doi: 10.1002/ijc.28658. Epub 2013 Dec 17.

High-specificity bioinformatics framework for epigenomic profiling of discordant twins reveals specific and shared markers for ACPA and ACPA-positive rheumatoid arthritis.用于不一致双胞胎表观基因组分析的高特异性生物信息学框架揭示了抗环瓜氨酸肽抗体（ACPA）及ACPA阳性类风湿性关节炎的特异性和共享标志物。

Genome Med. 2016 Nov 22;8(1):124. doi: 10.1186/s13073-016-0374-0.

Identification of a 5‑microRNA signature and hub miRNA‑mRNA interactions associated with pancreatic cancer.鉴定与胰腺癌相关的 5 个 miRNA 特征和 hub miRNA-mRNA 相互作用。

Oncol Rep. 2019 Jan;41(1):292-300. doi: 10.3892/or.2018.6820. Epub 2018 Oct 24.

引用本文的文献

A novel FCTF evaluation and prediction model for food efficacy based on association rule mining.一种基于关联规则挖掘的新型食品功效FCTF评估与预测模型。

Front Nutr. 2023 Aug 28;10:1170084. doi: 10.3389/fnut.2023.1170084. eCollection 2023.

Optimal ranking and directional signature classification using the integral strategy of multi-objective optimization-based association rule mining of multi-omics data.使用基于多组学数据的多目标优化关联规则挖掘的积分策略进行最优排序和方向特征分类。

Front Bioinform. 2023 Jul 27;3:1182176. doi: 10.3389/fbinf.2023.1182176. eCollection 2023.

3PNMF-MKL: A non-negative matrix factorization-based multiple kernel learning method for multi-modal data integration and its application to gene signature detection.3PNMF-MKL：一种基于非负矩阵分解的多模态数据集成多内核学习方法及其在基因特征检测中的应用。

Front Genet. 2023 Feb 14;14:1095330. doi: 10.3389/fgene.2023.1095330. eCollection 2023.

Comparison of five supervised feature selection algorithms leading to top features and gene signatures from multi-omics data in cancer.比较五种监督特征选择算法，这些算法可从癌症的多组学数据中得到顶级特征和基因特征。

BMC Bioinformatics. 2022 Apr 28;23(Suppl 3):153. doi: 10.1186/s12859-022-04678-y.

In silico ranking of phenolics for therapeutic effectiveness on cancer stem cells.基于计算机的酚类化合物治疗癌症干细胞疗效的排名。

BMC Bioinformatics. 2020 Dec 28;21(Suppl 21):499. doi: 10.1186/s12859-020-03849-z.

Detecting methylation signatures in neurodegenerative disease by density-based clustering of applications with reducing noise.通过基于密度的应用程序聚类减少噪声来检测神经退行性疾病中的甲基化特征。

Sci Rep. 2020 Dec 17;10(1):22164. doi: 10.1038/s41598-020-78463-3.

Molecular signatures identified by integrating gene expression and methylation in non-seminoma and seminoma of testicular germ cell tumours.整合基因表达和甲基化鉴定非精原细胞瘤和精原细胞瘤的分子特征。

Epigenetics. 2021 Jan-Feb;16(2):162-176. doi: 10.1080/15592294.2020.1790108. Epub 2020 Jul 13.

Multi-Objective Optimized Fuzzy Clustering for Detecting Cell Clusters from Single-Cell Expression Profiles.基于单细胞表达谱的多目标优化模糊聚类检测细胞簇。

Genes (Basel). 2019 Aug 13;10(8):611. doi: 10.3390/genes10080611.

Graph- and rule-based learning algorithms: a comprehensive review of their applications for cancer type classification and prognosis using genomic data.基于图和规则的学习算法：使用基因组数据对癌症类型分类和预后的应用的全面综述。

Brief Bioinform. 2020 Mar 23;21(2):368-394. doi: 10.1093/bib/bby120.

Identification of gene signatures from RNA-seq data using Pareto-optimal cluster algorithm.使用帕累托最优聚类算法从RNA测序数据中识别基因特征。

BMC Syst Biol. 2018 Dec 21;12(Suppl 8):126. doi: 10.1186/s12918-018-0650-2.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

整合多种数据源进行组合标记物发现：在肿瘤发生中的研究。

Integrating Multiple Data Sources for Combinatorial Marker Discovery: A Study in Tumorigenesis.

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献