一个用于大规模基因型和表型数据的 PostgreSQL Tripal 解决方案。

A PostgreSQL Tripal solution for large-scale genotypic and phenotypic data.

机构信息

Department of Plant Sciences, University of Saskatchewan, 51 Campus Drive, Saskatoon SK S7N 5A8, Canada.

出版信息

Database (Oxford). 2021 Aug 14;2021. doi: 10.1093/database/baab051.

DOI:10.1093/database/baab051

PMID:34389844

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8363843/

Abstract

Researchers are seeking cost-effective solutions for management and analysis of large-scale genotypic and phenotypic data. Open-source software is uniquely positioned to fill this need through user-focused, crowd-sourced development. Tripal, an open-source toolkit for developing biological data web portals, uses the GMOD Chado database schema to achieve flexible, ontology-driven storage in PostgreSQL. Tripal also aids research-focused web portals in providing data according to findable, accessible, interoperable, reusable (FAIR) principles. We describe here a fully relational PostgreSQL solution to handle large-scale genotypic and phenotypic data that is implemented as a collection of freely available, open-source modules. These Tripal extension modules provide a holistic approach for importing, storage, display and analysis within a relational database schema. Furthermore, they embody the Tripal approach to FAIR data by providing multiple search tools and ensuring metadata is fully described and interoperable. Our solution focuses on data integrity, as well as optimizing performance to provide a fully functional system that is currently being used in the production of Tripal portals for crop species. We fully describe the implementation of our solution and discuss why a PostgreSQL-powered web portal provides an efficient environment for researcher-driven genotypic and phenotypic data analysis.

摘要

研究人员正在寻找经济有效的方法来管理和分析大规模的基因型和表型数据。开源软件通过以用户为中心、众包开发的方式，能够很好地满足这一需求。Tripal 是一个用于开发生物数据门户网站的开源工具包，它使用 GMOD Chado 数据库模式在 PostgreSQL 中实现灵活的、基于本体的存储。Tripal 还帮助以研究为重点的门户网站根据可发现性、可访问性、互操作性、可重用性（FAIR）原则提供数据。我们在这里描述了一种完全基于关系型数据库的 PostgreSQL 解决方案，用于处理大规模的基因型和表型数据，该解决方案实现为一组免费提供的开源模块。这些 Tripal 扩展模块提供了一种在关系型数据库模式中导入、存储、显示和分析的整体方法。此外，它们体现了 Tripal 对 FAIR 数据的方法，提供了多种搜索工具，并确保元数据得到充分描述和互操作。我们的解决方案侧重于数据完整性，并优化性能，以提供一个功能齐全的系统，目前正在用于作物物种的 Tripal 门户的生产中。我们全面描述了我们解决方案的实现，并讨论了为什么基于 PostgreSQL 的门户网站为研究人员驱动的基因型和表型数据分析提供了一个高效的环境。