• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

MultiDataSet:一个用于封装多个数据集并应用于组学数据整合的R软件包。

MultiDataSet: an R package for encapsulating multiple data sets with application to omic data integration.

作者信息

Hernandez-Ferrer Carles, Ruiz-Arenas Carlos, Beltran-Gomila Alba, González Juan R

机构信息

Institut de Salut Global de Barcelona (ISGlobal) - Campus Mar, Barcelona Biulding: Biomedical Research Park, c/Dr. Aiguader, 88, 08003, Barcelona, Spain.

Universitat Pompeu Fabra (UPF), Barcelona, Spain.

出版信息

BMC Bioinformatics. 2017 Jan 17;18(1):36. doi: 10.1186/s12859-016-1455-1.

DOI:10.1186/s12859-016-1455-1
PMID:28095799
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5240259/
Abstract

BACKGROUND

Reduction in the cost of genomic assays has generated large amounts of biomedical-related data. As a result, current studies perform multiple experiments in the same subjects. While Bioconductor's methods and classes implemented in different packages manage individual experiments, there is not a standard class to properly manage different omic datasets from the same subjects. In addition, most R/Bioconductor packages that have been designed to integrate and visualize biological data often use basic data structures with no clear general methods, such as subsetting or selecting samples.

RESULTS

To cover this need, we have developed MultiDataSet, a new R class based on Bioconductor standards, designed to encapsulate multiple data sets. MultiDataSet deals with the usual difficulties of managing multiple and non-complete data sets while offering a simple and general way of subsetting features and selecting samples. We illustrate the use of MultiDataSet in three common situations: 1) performing integration analysis with third party packages; 2) creating new methods and functions for omic data integration; 3) encapsulating new unimplemented data from any biological experiment.

CONCLUSIONS

MultiDataSet is a suitable class for data integration under R and Bioconductor framework.

摘要

背景

基因组检测成本的降低产生了大量与生物医学相关的数据。因此,当前的研究在同一受试者身上进行多项实验。虽然Bioconductor在不同包中实现的方法和类可以管理单个实验,但没有一个标准类来妥善管理来自同一受试者的不同组学数据集。此外,大多数旨在整合和可视化生物数据的R/Bioconductor包通常使用基本数据结构,没有明确的通用方法,如子集化或选择样本。

结果

为满足这一需求,我们开发了MultiDataSet,这是一个基于Bioconductor标准的新R类,旨在封装多个数据集。MultiDataSet解决了管理多个不完整数据集时常见的困难,同时提供了一种简单通用的方法来进行特征子集化和样本选择。我们在三种常见情况下说明了MultiDataSet的使用:1)与第三方包进行整合分析;2)为组学数据整合创建新的方法和函数;3)封装来自任何生物实验的新的未实现数据。

结论

MultiDataSet是R和Bioconductor框架下适合数据整合的类。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0a37/5240259/b28dff20ea0c/12859_2016_1455_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0a37/5240259/77979cc585c5/12859_2016_1455_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0a37/5240259/b28dff20ea0c/12859_2016_1455_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0a37/5240259/77979cc585c5/12859_2016_1455_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0a37/5240259/b28dff20ea0c/12859_2016_1455_Fig2_HTML.jpg

相似文献

1
MultiDataSet: an R package for encapsulating multiple data sets with application to omic data integration.MultiDataSet:一个用于封装多个数据集并应用于组学数据整合的R软件包。
BMC Bioinformatics. 2017 Jan 17;18(1):36. doi: 10.1186/s12859-016-1455-1.
2
RGMQL: scalable and interoperable computing of heterogeneous omics big data and metadata in R/Bioconductor.RGMQL:在 R/Bioconductor 中可扩展和互操作的异构组学大数据和元数据的计算。
BMC Bioinformatics. 2022 Apr 7;23(1):123. doi: 10.1186/s12859-022-04648-4.
3
NanoMethViz: An R/Bioconductor package for visualizing long-read methylation data.NanoMethViz:用于可视化长读甲基化数据的 R/Bioconductor 包。
PLoS Comput Biol. 2021 Oct 25;17(10):e1009524. doi: 10.1371/journal.pcbi.1009524. eCollection 2021 Oct.
4
restfulSE: A semantically rich interface for cloud-scale genomics with Bioconductor.restfulSE:一个通过Bioconductor实现的面向云规模基因组学的语义丰富接口。
F1000Res. 2019 Jan 7;8:21. doi: 10.12688/f1000research.17518.1. eCollection 2019.
5
Software for the Integration of Multiomics Experiments in Bioconductor.用于在生物导体中整合多组学实验的软件。
Cancer Res. 2017 Nov 1;77(21):e39-e42. doi: 10.1158/0008-5472.CAN-17-0344.
6
Comprehensive study of the exposome and omic data using rexposome Bioconductor Packages.综合利用 rexposome Bioconductor 包研究外显子组和组学数据。
Bioinformatics. 2019 Dec 15;35(24):5344-5345. doi: 10.1093/bioinformatics/btz526.
7
Integrating omics datasets with the OmicsPLS package.整合组学数据集与 OmicsPLS 包。
BMC Bioinformatics. 2018 Oct 11;19(1):371. doi: 10.1186/s12859-018-2371-3.
8
Epigenomics coverage data extraction and aggregation in R with tidyCoverage.在 R 中使用 tidyCoverage 提取和聚合表观基因组学覆盖数据。
Bioinformatics. 2024 Aug 2;40(8). doi: 10.1093/bioinformatics/btae487.
9
CancerSubtypes: an R/Bioconductor package for molecular cancer subtype identification, validation and visualization.CancerSubtypes:一个用于分子癌症亚型识别、验证和可视化的 R/Bioconductor 包。
Bioinformatics. 2017 Oct 1;33(19):3131-3133. doi: 10.1093/bioinformatics/btx378.
10
The Risa R/Bioconductor package: integrative data analysis from experimental metadata and back again.Risa R/Bioconductor 包:从实验元数据到实验结果的综合数据分析。
BMC Bioinformatics. 2014;15 Suppl 1(Suppl 1):S11. doi: 10.1186/1471-2105-15-S1-S11. Epub 2014 Jan 10.

引用本文的文献

1
Contaminant-Associated Disruption of the Skin Transcriptome in the Endangered St. Lawrence Estuary Beluga.濒危的圣劳伦斯河口白鲸皮肤转录组中与污染物相关的破坏
Environ Sci Technol. 2025 Feb 11;59(5):2389-2399. doi: 10.1021/acs.est.4c08272. Epub 2025 Jan 28.
2
RaggedExperiment: the missing link between genomic ranges and matrices in Bioconductor.RaggedExperiment:Bioconductor 中基因组范围和矩阵之间缺失的环节。
Bioinformatics. 2023 Jun 1;39(6). doi: 10.1093/bioinformatics/btad330.
3
Applications of Omics Technology for Livestock Selection and Improvement.

本文引用的文献

1
Multi-omic data integration and analysis using systems genomics approaches: methods and applications in animal production, health and welfare.使用系统基因组学方法进行多组学数据整合与分析:在动物生产、健康和福利方面的方法与应用
Genet Sel Evol. 2016 Apr 29;48(1):38. doi: 10.1186/s12711-016-0217-x.
2
Public data and open source tools for multi-assay genomic investigation of disease.用于疾病多组学基因组研究的公共数据和开源工具。
Brief Bioinform. 2016 Jul;17(4):603-15. doi: 10.1093/bib/bbv080. Epub 2015 Oct 12.
3
A global reference for human genetic variation.
组学技术在牲畜选育与改良中的应用
Front Genet. 2022 Jun 2;13:774113. doi: 10.3389/fgene.2022.774113. eCollection 2022.
4
ProMetIS, deep phenotyping of mouse models by combined proteomics and metabolomics analysis.ProMetIS,通过联合蛋白质组学和代谢组学分析对小鼠模型进行深入表型分析。
Sci Data. 2021 Dec 3;8(1):311. doi: 10.1038/s41597-021-01095-3.
5
MuSA: a graphical user interface for multi-OMICs data integration in radiogenomic studies.MuSA:用于放射基因组学研究中多组学数据集成的图形用户界面。
Sci Rep. 2021 Jan 15;11(1):1550. doi: 10.1038/s41598-021-81200-z.
6
In utero and childhood exposure to tobacco smoke and multi-layer molecular signatures in children.子宫内和儿童时期暴露于烟草烟雾与儿童的多层分子特征。
BMC Med. 2020 Aug 19;18(1):243. doi: 10.1186/s12916-020-01686-8.
7
Omics Application in Animal Science-A Special Emphasis on Stress Response and Damaging Behaviour in Pigs.组学在动物科学中的应用——特别强调猪的应激反应和破坏性行为。
Genes (Basel). 2020 Aug 11;11(8):920. doi: 10.3390/genes11080920.
8
Holo-Omics: Integrated Host-Microbiota Multi-omics for Basic and Applied Biological Research.全组学:用于基础和应用生物学研究的宿主-微生物群综合多组学
iScience. 2020 Aug 21;23(8):101414. doi: 10.1016/j.isci.2020.101414. Epub 2020 Jul 25.
9
Conserved DNA Methyltransferases: A Window into Fundamental Mechanisms of Epigenetic Regulation in Bacteria.保守型 DNA 甲基转移酶:细菌表观遗传调控基本机制的一扇窗。
Trends Microbiol. 2021 Jan;29(1):28-40. doi: 10.1016/j.tim.2020.04.007. Epub 2020 May 13.
10
Vertical and horizontal integration of multi-omics data with miodin.多维组学数据与 miodin 的垂直和水平整合。
BMC Bioinformatics. 2019 Dec 10;20(1):649. doi: 10.1186/s12859-019-3224-4.
人类遗传变异的全球参考。
Nature. 2015 Oct 1;526(7571):68-74. doi: 10.1038/nature15393.
4
Framework for the Integration of Genomics, Epigenomics and Transcriptomics in Complex Diseases.复杂疾病中基因组学、表观基因组学和转录组学整合框架
Hum Hered. 2015;79(3-4):124-36. doi: 10.1159/000381184. Epub 2015 Jul 28.
5
Orchestrating high-throughput genomic analysis with Bioconductor.使用Bioconductor编排高通量基因组分析。
Nat Methods. 2015 Feb;12(2):115-21. doi: 10.1038/nmeth.3252.
6
A multivariate approach to the integration of multi-omics datasets.一种整合多组学数据集的多变量方法。
BMC Bioinformatics. 2014 May 29;15:162. doi: 10.1186/1471-2105-15-162.
7
SPARSE INTEGRATIVE CLUSTERING OF MULTIPLE OMICS DATA SETS.多组学数据集的稀疏整合聚类
Ann Appl Stat. 2013 Apr 9;7(1):269-294. doi: 10.1214/12-AOAS578.
8
The Database of Genomic Variants: a curated collection of structural variation in the human genome.基因组变异数据库:人类基因组中结构变异的精心整理集合。
Nucleic Acids Res. 2014 Jan;42(Database issue):D986-92. doi: 10.1093/nar/gkt958. Epub 2013 Oct 29.
9
The Cancer Genome Atlas Pan-Cancer analysis project.癌症基因组图谱泛癌分析项目。
Nat Genet. 2013 Oct;45(10):1113-20. doi: 10.1038/ng.2764.
10
NCBI GEO: archive for functional genomics data sets--update.NCBI GEO:功能基因组学数据集存档 - 更新。
Nucleic Acids Res. 2013 Jan;41(Database issue):D991-5. doi: 10.1093/nar/gks1193. Epub 2012 Nov 27.