Suppr超能文献

植物表型组学数据库:可扩展的模块化植物表型组学数据处理管道。

PhytoOracle: Scalable, modular phenomics data processing pipelines.

作者信息

Gonzalez Emmanuel M, Zarei Ariyan, Hendler Nathanial, Simmons Travis, Zarei Arman, Demieville Jeffrey, Strand Robert, Rozzi Bruno, Calleja Sebastian, Ellingson Holly, Cosi Michele, Davey Sean, Lavelle Dean O, Truco Maria José, Swetnam Tyson L, Merchant Nirav, Michelmore Richard W, Lyons Eric, Pauli Duke

机构信息

School of Plant Sciences, University of Arizona, Tucson, AZ, United States.

Department of Computer Science, University of Arizona, Tucson, AZ, United States.

出版信息

Front Plant Sci. 2023 Mar 6;14:1112973. doi: 10.3389/fpls.2023.1112973. eCollection 2023.

Abstract

As phenomics data volume and dimensionality increase due to advancements in sensor technology, there is an urgent need to develop and implement scalable data processing pipelines. Current phenomics data processing pipelines lack modularity, extensibility, and processing distribution across sensor modalities and phenotyping platforms. To address these challenges, we developed PhytoOracle (PO), a suite of modular, scalable pipelines for processing large volumes of field phenomics RGB, thermal, PSII chlorophyll fluorescence 2D images, and 3D point clouds. PhytoOracle aims to () improve data processing efficiency; () provide an extensible, reproducible computing framework; and () enable data fusion of multi-modal phenomics data. PhytoOracle integrates open-source distributed computing frameworks for parallel processing on high-performance computing, cloud, and local computing environments. Each pipeline component is available as a standalone container, providing transferability, extensibility, and reproducibility. The PO pipeline extracts and associates individual plant traits across sensor modalities and collection time points, representing a unique multi-system approach to addressing the genotype-phenotype gap. To date, PO supports lettuce and sorghum phenotypic trait extraction, with a goal of widening the range of supported species in the future. At the maximum number of cores tested in this study (1,024 cores), PO processing times were: 235 minutes for 9,270 RGB images (140.7 GB), 235 minutes for 9,270 thermal images (5.4 GB), and 13 minutes for 39,678 PSII images (86.2 GB). These processing times represent end-to-end processing, from raw data to fully processed numerical phenotypic trait data. Repeatability values of 0.39-0.95 (bounding area), 0.81-0.95 (axis-aligned bounding volume), 0.79-0.94 (oriented bounding volume), 0.83-0.95 (plant height), and 0.81-0.95 (number of points) were observed in Field Scanalyzer data. We also show the ability of PO to process drone data with a repeatability of 0.55-0.95 (bounding area).

摘要

随着传感器技术的进步,表型组学数据的体积和维度不断增加,迫切需要开发和实施可扩展的数据处理管道。当前的表型组学数据处理管道缺乏模块化、可扩展性,且无法在传感器模态和表型分析平台之间进行处理分布。为应对这些挑战,我们开发了植物表型组学分析平台(PhytoOracle,简称PO),这是一套模块化、可扩展的管道,用于处理大量田间表型组学的RGB图像、热图像、PSII叶绿素荧光2D图像和3D点云。植物表型组学分析平台旨在:(1)提高数据处理效率;(2)提供一个可扩展、可重复的计算框架;(3)实现多模态表型组学数据的融合。植物表型组学分析平台集成了开源分布式计算框架,以便在高性能计算、云和本地计算环境中进行并行处理。每个管道组件都作为一个独立的容器提供,具有可转移性、可扩展性和可重复性。植物表型组学分析平台管道跨传感器模态和采集时间点提取并关联单个植物性状,代表了一种独特的多系统方法来解决基因型-表型差距。迄今为止,植物表型组学分析平台支持生菜和高粱的表型性状提取,目标是在未来扩大支持的物种范围。在本研究测试的最大核心数(1024个核心)下,植物表型组学分析平台的处理时间为:9270张RGB图像(140.7GB)用时235分钟,9270张热图像(5.4GB)用时235分钟,39678张PSII图像(86.2GB)用时13分钟。这些处理时间代表了从原始数据到完全处理的数值表型性状数据的端到端处理。在田间扫描分析仪数据中观察到的重复性值为:边界面积0.39 - 0.95、轴对齐边界体积0.81 - 0.95、定向边界体积0.79 - 0.94、株高0.83 - 0.95、点数0.81 - 0.95。我们还展示了植物表型组学分析平台处理无人机数据的能力,重复性为0.55 - 0.95(边界面积)。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1e33/10025408/7a30889627a4/fpls-14-1112973-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验