Hussain Waseem, Anumalla Mahender, Catolos Margaret, Khanna Apurva, Sta Cruz Ma Teresa, Ramos Joie, Bhosale Sankalp
Rice Breeding Innovation Platform, International Rice Research Institute (IRRI), Los Banos, Laguna, Philippines.
Plant Methods. 2022 Feb 5;18(1):14. doi: 10.1186/s13007-022-00845-7.
Developing a systematic phenotypic data analysis pipeline, creating enhanced visualizations, and interpreting the results is crucial to extract meaningful insights from data in making better breeding decisions. Here, we provide an overview of how the Rainfed Rice Breeding (RRB) program at IRRI has leveraged R computational power with open-source resource tools like R Markdown, plotly, LaTeX, and HTML to develop an open-source and end-to-end data analysis workflow and pipeline, and re-designed it to a reproducible document for better interpretations, visualizations and easy sharing with collaborators.
We reported the state-of-the-art implementation of the phenotypic data analysis pipeline and workflow embedded into a well-descriptive document. The developed analytical pipeline is open-source, demonstrating how to analyze the phenotypic data in crop breeding programs with step-by-step instructions. The analysis pipeline shows how to pre-process and check the quality of phenotypic data, perform robust data analysis using modern statistical tools and approaches, and convert it into a reproducible document. Explanatory text with R codes, outputs either in text, tables, or graphics, and interpretation of results are integrated into the unified document. The analysis is highly reproducible and can be regenerated at any time. The analytical pipeline source codes and demo data are available at https://github.com/whussain2/Analysis-pipeline .
The analysis workflow and document presented are not limited to IRRI's RRB program but are applicable to any organization or institute with full-fledged breeding programs. We believe this is a great initiative to modernize the data analysis of IRRI's RRB program. Further, this pipeline can be easily implemented by plant breeders or researchers, helping and guiding them in analyzing the breeding trials data in the best possible way.
开发一个系统的表型数据分析流程、创建增强的可视化效果并解释结果对于从数据中提取有意义的见解以做出更好的育种决策至关重要。在此,我们概述了国际水稻研究所的雨养水稻育种(RRB)计划如何利用R的计算能力以及诸如R Markdown、plotly、LaTeX和HTML等开源资源工具来开发一个开源的端到端数据分析工作流程和管道,并将其重新设计为一个可重现的文档,以实现更好的解释、可视化以及与合作者轻松共享。
我们报告了嵌入到一份详细描述文档中的表型数据分析管道和工作流程的先进实现。所开发的分析管道是开源的,通过逐步说明展示了如何在作物育种计划中分析表型数据。该分析管道展示了如何预处理和检查表型数据的质量,使用现代统计工具和方法进行稳健的数据分析,并将其转换为一个可重现的文档。带有R代码的解释性文本、文本、表格或图形形式的输出以及结果解释都集成到了统一的文档中。该分析具有高度可重复性,并且可以随时重新生成。分析管道的源代码和演示数据可在https://github.com/whussain2/Analysis - pipeline获取。
所呈现的分析工作流程和文档不仅限于国际水稻研究所的RRB计划,还适用于任何拥有成熟育种计划的组织或机构。我们认为这是使国际水稻研究所RRB计划的数据分析现代化的一项伟大举措。此外,该管道可以很容易地由植物育种者或研究人员实施,帮助并指导他们以最佳方式分析育种试验数据。