一种用于数据融合的图论方法。

A graph theoretical approach to data fusion.

作者信息

Žurauskienė Justina, Kirk Paul D W, Stumpf Michael P H

出版信息

Stat Appl Genet Mol Biol. 2016 Apr;15(2):107-22. doi: 10.1515/sagmb-2016-0016.

DOI:10.1515/sagmb-2016-0016

PMID:26992203

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5217788/

Abstract

The rapid development of high throughput experimental techniques has resulted in a growing diversity of genomic datasets being produced and requiring analysis. Therefore, it is increasingly being recognized that we can gain deeper understanding about underlying biology by combining the insights obtained from multiple, diverse datasets. Thus we propose a novel scalable computational approach to unsupervised data fusion. Our technique exploits network representations of the data to identify similarities among the datasets. We may work within the Bayesian formalism, using Bayesian nonparametric approaches to model each dataset; or (for fast, approximate, and massive scale data fusion) can naturally switch to more heuristic modeling techniques. An advantage of the proposed approach is that each dataset can initially be modeled independently (in parallel), before applying a fast post-processing step to perform data integration. This allows us to incorporate new experimental data in an online fashion, without having to rerun all of the analysis. We first demonstrate the applicability of our tool on artificial data, and then on examples from the literature, which include yeast cell cycle, breast cancer and sporadic inclusion body myositis datasets.

摘要

高通量实验技术的快速发展导致产生并需要分析的基因组数据集的多样性不断增加。因此，人们越来越认识到，通过结合从多个不同数据集中获得的见解，我们可以更深入地了解潜在生物学。因此，我们提出了一种新颖的可扩展计算方法用于无监督数据融合。我们的技术利用数据的网络表示来识别数据集之间的相似性。我们可以在贝叶斯形式体系内工作，使用贝叶斯非参数方法对每个数据集进行建模；或者（对于快速、近似和大规模数据融合）可以自然地切换到更启发式的建模技术。所提出方法的一个优点是，在应用快速后处理步骤进行数据整合之前，每个数据集最初可以独立（并行）建模。这使我们能够以在线方式纳入新的实验数据，而无需重新运行所有分析。我们首先在人工数据上展示我们工具的适用性，然后在文献中的示例上进行展示，这些示例包括酵母细胞周期、乳腺癌和散发性包涵体肌炎数据集。

相似文献

A graph theoretical approach to data fusion.

Stat Appl Genet Mol Biol. 2016 Apr;15(2):107-22. doi: 10.1515/sagmb-2016-0016.

Global protein function annotation through mining genome-scale data in yeast Saccharomyces cerevisiae.

Nucleic Acids Res. 2004 Dec 7;32(21):6414-24. doi: 10.1093/nar/gkh978. Print 2004.

Bioinformatics. 2012 Dec 15;28(24):3290-7. doi: 10.1093/bioinformatics/bts595. Epub 2012 Oct 9.

Computational analysis of the yeast proteome: understanding and exploiting functional specificity in genomic data.

Methods Mol Biol. 2009;548:273-93. doi: 10.1007/978-1-59745-540-4_15.

A Bayesian data fusion based approach for learning genome-wide transcriptional regulatory networks.

BMC Bioinformatics. 2020 May 29;21(1):219. doi: 10.1186/s12859-020-3510-1.

Dynamic biclustering of microarray data by multi-objective immune optimization.

BMC Genomics. 2011;12 Suppl 2(Suppl 2):S11. doi: 10.1186/1471-2164-12-S2-S11. Epub 2011 Jul 27.

Accounting for network noise in graph-guided Bayesian modeling of structured high-dimensional data.

Biometrics. 2024 Jan 29;80(1). doi: 10.1093/biomtc/ujae012.

A hybrid Bayesian network learning method for constructing gene networks.

Comput Biol Chem. 2007 Oct;31(5-6):361-72. doi: 10.1016/j.compbiolchem.2007.08.005. Epub 2007 Aug 19.

Customizable views on semantically integrated networks for systems biology.

Bioinformatics. 2011 May 1;27(9):1299-306. doi: 10.1093/bioinformatics/btr134. Epub 2011 Mar 16.

Comparative genomics for reliable protein-function prediction from genomic data.

Trends Genet. 2004 Aug;20(8):340-4. doi: 10.1016/j.tig.2004.06.003.

引用本文的文献

Bayesian profile regression for clustering analysis involving a longitudinal response and explanatory variables.

Methodology (Gott). 2024 Mar 11;73(2):314-339. doi: 10.1093/jrsssc/qlad097. Epub 2023 Nov 8.

本文引用的文献

TopKLists: a comprehensive R package for statistical inference, stochastic aggregation, and visualization of multiple omics ranked lists.

Stat Appl Genet Mol Biol. 2015 Jun;14(3):311-6. doi: 10.1515/sagmb-2014-0093.

Nat Methods. 2014 Mar;11(3):333-7. doi: 10.1038/nmeth.2810. Epub 2014 Jan 26.

Bayesian consensus clustering.

Bioinformatics. 2013 Oct 15;29(20):2610-6. doi: 10.1093/bioinformatics/btt425. Epub 2013 Aug 28.

Graphical modelling of molecular networks underlying sporadic inclusion body myositis.

Mol Biosyst. 2013 Jul;9(7):1736-42. doi: 10.1039/c3mb25497f. Epub 2013 Apr 17.

Personal genomic measurements: the opportunity for information integration.

Clin Pharmacol Ther. 2013 Jan;93(1):21-3. doi: 10.1038/clpt.2012.203.

Bioinformatics. 2012 Dec 15;28(24):3290-7. doi: 10.1093/bioinformatics/bts595. Epub 2012 Oct 9.

Comprehensive molecular portraits of human breast tumours.

Nature. 2012 Oct 4;490(7418):61-70. doi: 10.1038/nature11412. Epub 2012 Sep 23.

Detecting overlapping protein complexes in protein-protein interaction networks.

Nat Methods. 2012 Mar 18;9(5):471-2. doi: 10.1038/nmeth.1938.

Patient-specific data fusion defines prognostic cancer subtypes.

PLoS Comput Biol. 2011 Oct;7(10):e1002227. doi: 10.1371/journal.pcbi.1002227. Epub 2011 Oct 20.

Defining breast cancer prognosis based on molecular phenotypes: results from a large cohort study.

Breast Cancer Res Treat. 2011 Feb;126(1):185-92. doi: 10.1007/s10549-010-1113-7. Epub 2010 Aug 14.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

一种用于数据融合的图论方法。

A graph theoretical approach to data fusion.

作者信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献