使用R虚拟实验室（RvLab）分析生态群落数据的优化R函数。

Optimized R functions for analysis of ecological community data using the R virtual laboratory (RvLab).

作者信息

Varsos Constantinos, Patkos Theodore, Oulas Anastasis, Pavloudi Christina, Gougousis Alexandros, Ijaz Umer Zeeshan, Filiopoulou Irene, Pattakos Nikolaos, Vanden Berghe Edward, Fernández-Guerra Antonio, Faulwetter Sarah, Chatzinikolaou Eva, Pafilis Evangelos, Bekiari Chryssoula, Doerr Martin, Arvanitidis Christos

机构信息

Institute of Computer Science, Foundation of Research and Technology Hellas, Heraklion, Greece.

Institute of Marine Biology, Biotechnology and Aquaculture, Hellenic Centre for Marine Research, Heraklion, Crete, Greece.

出版信息

Biodivers Data J. 2016 Nov 1(4):e8357. doi: 10.3897/BDJ.4.e8357. eCollection 2016.

DOI:10.3897/BDJ.4.e8357

PMID:27932907

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5136650/

Abstract

BACKGROUND

Parallel data manipulation using R has previously been addressed by members of the R community, however most of these studies produce solutions that are not readily available to the average R user. Our targeted users, ranging from the expert ecologist/microbiologists to computational biologists, often experience difficulties in finding optimal ways to exploit the full capacity of their computational resources. In addition, improving performance of commonly used R scripts becomes increasingly difficult especially with large datasets. Furthermore, the implementations described here can be of significant interest to expert bioinformaticians or R developers. Therefore, our goals can be summarized as: (i) description of a complete methodology for the analysis of large datasets by combining capabilities of diverse R packages, (ii) presentation of their application through a virtual R laboratory (RvLab) that makes execution of complex functions and visualization of results easy and readily available to the end-user.

NEW INFORMATION

In this paper, the novelty stems from implementations of parallel methodologies which rely on the processing of data on different levels of abstraction and the availability of these processes through an integrated portal. Parallel implementation R packages, such as the (Programming with Big Data - Interface to MPI) package, are used to implement Single Program Multiple Data (SPMD) parallelization on primitive mathematical operations, allowing for interplay with functions of the package. The and R packages are further integrated offering connections to dataframe like objects (databases) as secondary storage solutions whenever memory demands exceed available RAM resources. The RvLab is running on a PC cluster, using version 3.1.2 (2014-10-31) on a x86_64-pc-linux-gnu (64-bit) platform, and offers an intuitive virtual environmet interface enabling users to perform analysis of ecological and microbial communities based on optimized functions. A beta version of the RvLab is available after registration at: https://portal.lifewatchgreece.eu/.

摘要

背景

R社区的成员此前已探讨过使用R进行并行数据处理，然而这些研究中的大多数所产生的解决方案对于普通R用户来说并不容易获取。我们的目标用户，从专家生态学家/微生物学家到计算生物学家，在找到充分利用其计算资源全部能力的最佳方法时常常遇到困难。此外，提高常用R脚本的性能变得越来越困难，尤其是处理大型数据集时。此外，本文所述的实现方式可能会引起专家生物信息学家或R开发者的极大兴趣。因此，我们的目标可概括为：（i）通过结合不同R包的功能来描述一种用于分析大型数据集的完整方法，（ii）通过虚拟R实验室（RvLab）展示其应用，该实验室使复杂函数的执行和结果的可视化变得容易，并且最终用户可以轻松获取。

新信息

在本文中，新颖之处在于并行方法的实现，这些方法依赖于在不同抽象层次上处理数据以及通过集成门户提供这些处理过程。并行实现的R包，如（大数据编程 - MPI接口）包，用于在基本数学运算上实现单程序多数据（SPMD）并行化，从而允许与包的函数进行交互。当内存需求超过可用RAM资源时，和R包进一步集成，提供与类似数据框对象（数据库）的连接作为二级存储解决方案。RvLab在PC集群上运行，在x86_64-pc-linux-gnu（64位）平台上使用版本3.1.2（2014 - 10 - 31），并提供直观的虚拟环境接口，使用户能够基于优化的函数对生态和微生物群落进行分析。在https://portal.lifewatchgreece.eu/注册后可获取RvLab的测试版。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0192/5136650/6f7a23b2ec2a/biodiversity_data_journal-4-e8357-g001.jpg

相似文献

Optimized R functions for analysis of ecological community data using the R virtual laboratory (RvLab).

Biodivers Data J. 2016 Nov 1(4):e8357. doi: 10.3897/BDJ.4.e8357. eCollection 2016.

Towards a HPC-oriented parallel implementation of a learning algorithm for bioinformatics applications.

BMC Bioinformatics. 2014;15 Suppl 5(Suppl 5):S2. doi: 10.1186/1471-2105-15-S5-S2. Epub 2014 May 6.

Parallel-META 3: Comprehensive taxonomical and functional analysis platform for efficient comparison of microbial communities.

Sci Rep. 2017 Jan 12;7:40371. doi: 10.1038/srep40371.

BioVeL: a virtual laboratory for data analysis and modelling in biodiversity science and ecology.

BMC Ecol. 2016 Oct 20;16(1):49. doi: 10.1186/s12898-016-0103-y.

CANEapp: a user-friendly application for automated next generation transcriptomic data analysis.

BMC Genomics. 2016 Jan 13;17:49. doi: 10.1186/s12864-015-2346-y.

Scalable computing for evolutionary genomics.

Methods Mol Biol. 2012;856:529-45. doi: 10.1007/978-1-61779-585-5_22.

NMF-mGPU: non-negative matrix factorization on multi-GPU systems.

BMC Bioinformatics. 2015 Feb 13;16:43. doi: 10.1186/s12859-015-0485-4.

LifeWatchGreece Portal development: architecture, implementation and challenges for a biodiversity research e-infrastructure.

Biodivers Data J. 2016 Nov 1(4):e9434. doi: 10.3897/BDJ.4.e9434. eCollection 2016.

mcaGUI: microbial community analysis R-Graphical User Interface (GUI).

Bioinformatics. 2012 Aug 15;28(16):2198-9. doi: 10.1093/bioinformatics/bts338. Epub 2012 Jun 12.

Proceedings of the Second Workshop on Theory meets Industry (Erwin-Schrödinger-Institute (ESI), Vienna, Austria, 12-14 June 2007).

J Phys Condens Matter. 2008 Feb 13;20(6):060301. doi: 10.1088/0953-8984/20/06/060301. Epub 2008 Jan 24.

引用本文的文献

Single-cell map of innate-like lymphocyte response to infection reveals interleukin-17-dependent protection by MAIT cells.

iScience. 2025 Jan 16;28(3):111810. doi: 10.1016/j.isci.2025.111810. eCollection 2025 Mar 21.

Microbial Communities in Agave Fermentations Vary by Local Biogeographic Regions.

Environ Microbiol Rep. 2025 Feb;17(1):e70057. doi: 10.1111/1758-2229.70057.

Influence of feeding practices in the composition and functionality of infant gut microbiota and its relationship with health.

PLoS One. 2024 Jan 3;19(1):e0294494. doi: 10.1371/journal.pone.0294494. eCollection 2024.

0s and 1s in marine molecular research: a regional HPC perspective.

Gigascience. 2021 Aug 18;10(8). doi: 10.1093/gigascience/giab053.

The Response of Arbuscular Mycorrhizal Fungal Communities to the Soil Environment of Underground Mining Subsidence Area in Northwest China.

Int J Environ Res Public Health. 2020 Dec 8;17(24):9157. doi: 10.3390/ijerph17249157.

Antibiotics and Host-Tailored Probiotics Similarly Modulate Effects on the Developing Avian Microbiome, Mycobiome, and Host Gene Expression.

mBio. 2019 Oct 15;10(5):e02171-19. doi: 10.1128/mBio.02171-19.

Big data in multi-block data analysis: An approach to parallelizing Partial Least Squares Mode B algorithm.

Heliyon. 2019 Apr 29;5(4):e01451. doi: 10.1016/j.heliyon.2019.e01451. eCollection 2019 Apr.

Microbial Network and Soil Properties Are Changed in Bacterial Wilt-Susceptible Soil.

Appl Environ Microbiol. 2019 Jun 17;85(13). doi: 10.1128/AEM.00162-19. Print 2019 Jul 1.

CIGESMED for divers: Establishing a citizen science initiative for the mapping and monitoring of coralligenous assemblages in the Mediterranean Sea.

Biodivers Data J. 2016 Nov 1(4):e8692. doi: 10.3897/BDJ.4.e8692. eCollection 2016.

本文引用的文献

Metagenomics: tools and insights for analyzing next-generation sequencing data derived from biodiversity studies.

Bioinform Biol Insights. 2015 May 5;9:75-88. doi: 10.4137/BBI.S12462. eCollection 2015.

A guide to statistical analysis in microbial ecology: a community-focused, living review of multivariate data analyses.

FEMS Microbiol Ecol. 2014 Dec;90(3):543-50. doi: 10.1111/1574-6941.12437. Epub 2014 Nov 5.

Computational ecology as an emerging science.

Interface Focus. 2012 Apr 6;2(2):241-54. doi: 10.1098/rsfs.2011.0083. Epub 2012 Jan 5.

Biodiversity informatics: managing and applying primary biodiversity data.

Philos Trans R Soc Lond B Biol Sci. 2004 Apr 29;359(1444):689-98. doi: 10.1098/rstb.2003.1439.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

使用R虚拟实验室（RvLab）分析生态群落数据的优化R函数。

Optimized R functions for analysis of ecological community data using the R virtual laboratory (RvLab).

作者信息

机构信息

Institute of Computer Science, Foundation of Research and Technology Hellas, Heraklion, Greece.

Institute of Marine Biology, Biotechnology and Aquaculture, Hellenic Centre for Marine Research, Heraklion, Crete, Greece.

出版信息

Biodivers Data J. 2016 Nov 1(4):e8357. doi: 10.3897/BDJ.4.e8357. eCollection 2016.

DOI:10.3897/BDJ.4.e8357

PMID:27932907

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5136650/

Abstract

BACKGROUND

NEW INFORMATION

摘要

使用R虚拟实验室（RvLab）分析生态群落数据的优化R函数。

Optimized R functions for analysis of ecological community data using the R virtual laboratory (RvLab).

作者信息

机构信息

出版信息

BACKGROUND

NEW INFORMATION

背景

新信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

使用R虚拟实验室（RvLab）分析生态群落数据的优化R函数。

Optimized R functions for analysis of ecological community data using the R virtual laboratory (RvLab).

作者信息

机构信息

出版信息

BACKGROUND

NEW INFORMATION

背景

新信息

相似文献

引用本文的文献

本文引用的文献