• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种用于整合多源数据中多个层次聚类或网络的统一框架。

A unified framework for the integration of multiple hierarchical clusterings or networks from multi-source data.

机构信息

Université Paris-Saclay, INRAE, AgroParisTech, GABI , 78350, Jouy-en-Josas, France.

Université Paris-Saclay, AgroParisTech, INRAE, UMR MIA-Paris , 75005, Paris, France.

出版信息

BMC Bioinformatics. 2021 Aug 4;22(1):392. doi: 10.1186/s12859-021-04303-4.

DOI:10.1186/s12859-021-04303-4
PMID:34348641
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8336092/
Abstract

BACKGROUND

Integrating data from different sources is a recurring question in computational biology. Much effort has been devoted to the integration of data sets of the same type, typically multiple numerical data tables. However, data types are generally heterogeneous: it is a common place to gather data in the form of trees, networks or factorial maps, as these representations all have an appealing visual interpretation that helps to study grouping patterns and interactions between entities. The question we aim to answer in this paper is that of the integration of such representations.

RESULTS

To this end, we provide a simple procedure to compare data with various types, in particular trees or networks, that relies essentially on two steps: the first step projects the representations into a common coordinate system; the second step then uses a multi-table integration approach to compare the projected data. We rely on efficient and well-known methodologies for each step: the projection step is achieved by retrieving a distance matrix for each representation form and then applying multidimensional scaling to provide a new set of coordinates from all the pairwise distances. The integration step is then achieved by applying a multiple factor analysis to the multiple tables of the new coordinates. This procedure provides tools to integrate and compare data available, for instance, as tree or network structures. Our approach is complementary to kernel methods, traditionally used to answer the same question.

CONCLUSION

Our approach is evaluated on simulation and used to analyze two real-world data sets: first, we compare several clusterings for different cell-types obtained from a transcriptomics single-cell data set in mouse embryos; second, we use our procedure to aggregate a multi-table data set from the TCGA breast cancer database, in order to compare several protein networks inferred for different breast cancer subtypes.

摘要

背景

整合来自不同来源的数据是计算生物学中反复出现的问题。人们已经投入了大量精力来整合同一类型的数据集,通常是多个数值数据表。然而,数据类型通常是异构的:以树、网络或因子图的形式收集数据是很常见的,因为这些表示形式都具有吸引人的可视化解释,可以帮助研究分组模式和实体之间的相互作用。我们在本文中要回答的问题是这些表示形式的整合。

结果

为此,我们提供了一种简单的程序来比较具有各种类型的数据,特别是树或网络,该程序主要依赖于两个步骤:第一步将表示形式投影到公共坐标系中;第二步然后使用多表集成方法来比较投影数据。我们依赖于每个步骤的有效和知名方法:投影步骤通过为每个表示形式检索距离矩阵,然后应用多维尺度分析从所有成对距离提供新的坐标集来实现。然后通过对新坐标的多个表应用多因素分析来实现集成步骤。该过程提供了集成和比较可用数据的工具,例如树或网络结构。我们的方法是与核方法互补的,传统上用于回答相同的问题。

结论

我们的方法在模拟中进行了评估,并用于分析两个真实世界的数据集:首先,我们比较了从老鼠胚胎转录组学单细胞数据集中获得的不同细胞类型的几种聚类;其次,我们使用我们的程序从 TCGA 乳腺癌数据库中聚合一个多表数据集,以便比较不同乳腺癌亚型推断出的几种蛋白质网络。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b9bc/8336092/f2e03225540f/12859_2021_4303_Fig12_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b9bc/8336092/1117a2c63053/12859_2021_4303_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b9bc/8336092/ff0af889e5b4/12859_2021_4303_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b9bc/8336092/d91d89e1f62a/12859_2021_4303_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b9bc/8336092/e13474f0823a/12859_2021_4303_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b9bc/8336092/bc3893349a55/12859_2021_4303_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b9bc/8336092/3091ae64a602/12859_2021_4303_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b9bc/8336092/e2ee6c66cd08/12859_2021_4303_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b9bc/8336092/a87609d2718e/12859_2021_4303_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b9bc/8336092/e85362762bdb/12859_2021_4303_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b9bc/8336092/ad8dc4fd2273/12859_2021_4303_Fig10_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b9bc/8336092/b5289e250cdd/12859_2021_4303_Fig11_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b9bc/8336092/f2e03225540f/12859_2021_4303_Fig12_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b9bc/8336092/1117a2c63053/12859_2021_4303_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b9bc/8336092/ff0af889e5b4/12859_2021_4303_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b9bc/8336092/d91d89e1f62a/12859_2021_4303_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b9bc/8336092/e13474f0823a/12859_2021_4303_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b9bc/8336092/bc3893349a55/12859_2021_4303_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b9bc/8336092/3091ae64a602/12859_2021_4303_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b9bc/8336092/e2ee6c66cd08/12859_2021_4303_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b9bc/8336092/a87609d2718e/12859_2021_4303_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b9bc/8336092/e85362762bdb/12859_2021_4303_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b9bc/8336092/ad8dc4fd2273/12859_2021_4303_Fig10_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b9bc/8336092/b5289e250cdd/12859_2021_4303_Fig11_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b9bc/8336092/f2e03225540f/12859_2021_4303_Fig12_HTML.jpg

相似文献

1
A unified framework for the integration of multiple hierarchical clusterings or networks from multi-source data.一种用于整合多源数据中多个层次聚类或网络的统一框架。
BMC Bioinformatics. 2021 Aug 4;22(1):392. doi: 10.1186/s12859-021-04303-4.
2
Fast tree aggregation for consensus hierarchical clustering.快速树聚合的共识层次聚类。
BMC Bioinformatics. 2020 Mar 20;21(1):120. doi: 10.1186/s12859-020-3453-6.
3
Multi-dimensional data integration algorithm based on random walk with restart.基于重启动随机游走的多维数据集成算法。
BMC Bioinformatics. 2021 Feb 27;22(1):97. doi: 10.1186/s12859-021-04029-3.
4
scMCs: a framework for single-cell multi-omics data integration and multiple clusterings.scMCs:单细胞多组学数据整合和多种聚类的框架。
Bioinformatics. 2023 Apr 3;39(4). doi: 10.1093/bioinformatics/btad133.
5
Network inference with ensembles of bi-clustering trees.基于二部聚类树集成的网络推断。
BMC Bioinformatics. 2019 Oct 28;20(1):525. doi: 10.1186/s12859-019-3104-y.
6
Combining multiple clusterings using evidence accumulation.使用证据积累合并多个聚类。
IEEE Trans Pattern Anal Mach Intell. 2005 Jun;27(6):835-50. doi: 10.1109/TPAMI.2005.113.
7
Computational Tools for Evaluating Phylogenetic and Hierarchical Clustering Trees.用于评估系统发育树和层次聚类树的计算工具
J Comput Graph Stat. 2012;21(3):581-599. doi: 10.1080/10618600.2012.640901. Epub 2012 Aug 16.
8
Unsupervised multiple kernel learning for heterogeneous data integration.无监督多内核学习在异类数据集成中的应用。
Bioinformatics. 2018 Mar 15;34(6):1009-1015. doi: 10.1093/bioinformatics/btx682.
9
GNE: a deep learning framework for gene network inference by aggregating biological information.GNE:一种通过整合生物信息进行基因网络推断的深度学习框架。
BMC Syst Biol. 2019 Apr 5;13(Suppl 2):38. doi: 10.1186/s12918-019-0694-y.
10
Exploratory analysis of multiple omics datasets using the adjusted RV coefficient.使用调整后的RV系数对多个组学数据集进行探索性分析。
Stat Appl Genet Mol Biol. 2011;10:Article 14. doi: 10.2202/1544-6115.1540.

引用本文的文献

1
MEMMAL: A tool for expanding large-scale mechanistic models with machine learned associations and big datasets.MEMMAL:一种利用机器学习关联和大数据集扩展大规模机制模型的工具。
Front Syst Biol. 2023;3. doi: 10.3389/fsysb.2023.1099413. Epub 2023 Mar 9.
2
MOBILE pipeline enables identification of context-specific networks and regulatory mechanisms.MOBILE 管道能够识别特定上下文的网络和调控机制。
Nat Commun. 2023 Jul 6;14(1):3991. doi: 10.1038/s41467-023-39729-2.
3
Using expression quantitative trait loci data and graph-embedded neural networks to uncover genotype-phenotype interactions.

本文引用的文献

1
Individualized multi-omic pathway deviation scores using multiple factor analysis.使用多因素分析的个体化多组学途径偏差评分。
Biostatistics. 2022 Apr 13;23(2):362-379. doi: 10.1093/biostatistics/kxaa029.
2
Comparing methods for comparing networks.比较网络的方法比较。
Sci Rep. 2019 Nov 26;9(1):17557. doi: 10.1038/s41598-019-53708-y.
3
Multitable Methods for Microbiome Data Integration.微生物组数据整合的多表方法
利用表达数量性状位点数据和图嵌入神经网络揭示基因型-表型相互作用。
Front Genet. 2022 Aug 15;13:921775. doi: 10.3389/fgene.2022.921775. eCollection 2022.
Front Genet. 2019 Aug 28;10:627. doi: 10.3389/fgene.2019.00627. eCollection 2019.
4
A single-cell molecular map of mouse gastrulation and early organogenesis.小鼠原肠胚形成和早期器官发生的单细胞分子图谱
Nature. 2019 Feb;566(7745):490-495. doi: 10.1038/s41586-019-0933-9. Epub 2019 Feb 20.
5
STRING v11: protein-protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets.STRING v11:具有增强覆盖范围的蛋白质-蛋白质相互作用网络,支持在全基因组实验数据集的功能发现。
Nucleic Acids Res. 2019 Jan 8;47(D1):D607-D613. doi: 10.1093/nar/gky1131.
6
Unsupervised multiple kernel learning for heterogeneous data integration.无监督多内核学习在异类数据集成中的应用。
Bioinformatics. 2018 Mar 15;34(6):1009-1015. doi: 10.1093/bioinformatics/btx682.
7
A review on machine learning principles for multi-view biological data integration.机器学习原理在多视图生物数据集成中的研究综述。
Brief Bioinform. 2018 Mar 1;19(2):325-340. doi: 10.1093/bib/bbw113.
8
The huge Package for High-dimensional Undirected Graph Estimation in R.R语言中用于高维无向图估计的庞大软件包。
J Mach Learn Res. 2012 Apr;13:1059-1062.
9
Methods for biological data integration: perspectives and challenges.生物数据整合方法:观点与挑战
J R Soc Interface. 2015 Nov 6;12(112). doi: 10.1098/rsif.2015.0571.
10
Indefinite Proximity Learning: A Review.不确定邻近学习:综述
Neural Comput. 2015 Oct;27(10):2039-96. doi: 10.1162/NECO_a_00770. Epub 2015 Aug 27.