马查多：开源基因组学数据集成框架。

Machado: Open source genomics data integration framework.

机构信息

Embrapa Informática Agropecuária, Campinas, São Paulo, Post Code 13083-886, PO Box 6041, Brazil.

出版信息

Gigascience. 2020 Sep 14;9(9). doi: 10.1093/gigascience/giaa097.

DOI:10.1093/gigascience/giaa097

PMID:32930331

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7490629/

Abstract

BACKGROUND

Genome projects and multiomics experiments generate huge volumes of data that must be stored, mined, and transformed into useful knowledge. All this information is supposed to be accessible and, if possible, browsable afterwards. Computational biologists have been dealing with this scenario for more than a decade and have been implementing software and databases to meet this challenge. The GMOD's (Generic Model Organism Database) biological relational database schema, known as Chado, is one of the few successful open source initiatives; it is widely adopted and many software packages are able to connect to it.

FINDINGS

We have been developing an open source software package named Machado, a genomics data integration framework implemented in Python, to enable research groups to both store and visualize genomics data. The framework relies on the Chado database schema and, therefore, should be very intuitive for current developers to adopt it or have it running on top of already existing databases. It has several data-loading tools for genomics and transcriptomics data and also for annotation results from tools such as BLAST, InterproScan, OrthoMCL, and LSTrAP. There is an API to connect to JBrowse, and a web visualization tool is implemented using Django Views and Templates. The Haystack library integrated with the ElasticSearch engine was used to implement a Google-like search, i.e., single auto-complete search box that provides fast results and filters.

CONCLUSION

Machado aims to be a modern object-relational framework that uses the latest Python libraries to produce an effective open source resource for genomics research.

摘要

背景

基因组计划和多组学实验产生了大量的数据，这些数据必须存储、挖掘，并转化为有用的知识。所有这些信息都应该是可访问的，如果可能的话，以后还可以浏览。计算生物学家已经处理了十多年的这种情况，并一直在实施软件和数据库来应对这一挑战。GMOD（通用模式生物数据库）的生物关系数据库模式，称为 Chado，是为数不多的成功的开源项目之一；它被广泛采用，许多软件包都能够与之连接。

发现

我们一直在开发一个名为 Machado 的开源软件包，这是一个用 Python 实现的基因组学数据集成框架，使研究小组能够存储和可视化基因组学数据。该框架依赖于 Chado 数据库模式，因此，对于当前的开发人员来说，采用它或在现有的数据库之上运行它应该是非常直观的。它有几个用于基因组学和转录组学数据以及 BLAST、InterproScan、OrthoMCL 和 LSTrAP 等工具的注释结果的数据加载工具。它有一个连接到 JBrowse 的 API，并且使用 Django Views 和 Templates 实现了一个 Web 可视化工具。Haystack 库与 ElasticSearch 引擎集成，用于实现类似于 Google 的搜索，即单个自动完成搜索框，提供快速的结果和过滤。

结论

Machado 旨在成为一个现代的对象关系框架，它使用最新的 Python 库为基因组学研究提供一个有效的开源资源。

相似文献

Machado: Open source genomics data integration framework.马查多：开源基因组学数据集成框架。

Gigascience. 2020 Sep 14;9(9). doi: 10.1093/gigascience/giaa097.

Computational framework to support integration of biomolecular and clinical data within a translational approach.用于在转化方法中支持生物分子和临床数据集成的计算框架。

BMC Bioinformatics. 2013 Jun 6;14:180. doi: 10.1186/1471-2105-14-180.

A Chado case study: an ontology-based modular schema for representing genome-associated biological information.一个Chado案例研究：用于表示基因组相关生物信息的基于本体的模块化模式。

Bioinformatics. 2007 Jul 1;23(13):i337-46. doi: 10.1093/bioinformatics/btm189.

Tripal v1.1: a standards-based toolkit for construction of online genetic and genomic databases.Tripal v1.1：一个基于标准的工具包，用于构建在线遗传和基因组数据库。

Database (Oxford). 2013 Oct 25;2013:bat075. doi: 10.1093/database/bat075. Print 2013.

The i5k Workspace@NAL--enabling genomic data access, visualization and curation of arthropod genomes.美国国家农业图书馆的i5k工作区——实现节肢动物基因组数据的访问、可视化和管理。

Nucleic Acids Res. 2015 Jan;43(Database issue):D714-9. doi: 10.1093/nar/gku983. Epub 2014 Oct 20.

JBrowse Connect: A server API to connect JBrowse instances and users.JBrowse Connect：连接 JBrowse 实例和用户的服务器 API。

PLoS Comput Biol. 2020 Aug 18;16(8):e1007261. doi: 10.1371/journal.pcbi.1007261. eCollection 2020 Aug.

Chado use case: storing genomic, genetic and breeding data of Rosaceae and Gossypium crops in Chado.Chado用例：在Chado中存储蔷薇科和棉属作物的基因组、遗传和育种数据。

Database (Oxford). 2016 Mar 17;2016. doi: 10.1093/database/baw010. Print 2016.

Using Chado to store genome annotation data.使用Chado存储基因组注释数据。

Curr Protoc Bioinformatics. 2006 Jan;Chapter 9:Unit 9.6. doi: 10.1002/0471250953.bi0906s12.

Artemis and ACT: viewing, annotating and comparing sequences stored in a relational database.阿耳忒弥斯与ACT：查看、注释和比较存储在关系数据库中的序列。

Bioinformatics. 2008 Dec 1;24(23):2672-6. doi: 10.1093/bioinformatics/btn529. Epub 2008 Oct 9.

Tripal MapViewer: A tool for interactive visualization and comparison of genetic maps.TriPal 图谱查看器：用于遗传图谱交互式可视化和比较的工具。

Database (Oxford). 2019 Jan 1;2019. doi: 10.1093/database/baz100.

引用本文的文献

Transcriptomics, proteomics, and metabolomics interventions prompt crop improvement against metal(loid) toxicity.转录组学、蛋白质组学和代谢组学干预措施促使作物针对金属（类）毒性进行改良。

Plant Cell Rep. 2024 Feb 27;43(3):80. doi: 10.1007/s00299-024-03153-7.

A Secure and Reusable Software Architecture for Supporting Online Data Harmonization.一种支持在线数据协调的安全且可重复使用的软件架构。

Proc IEEE Int Conf Big Data. 2021 Dec;2021:2801-2812. doi: 10.1109/bigdata52589.2021.9671538.

Advances in "Omics" Approaches for Improving Toxic Metals/Metalloids Tolerance in Plants.用于提高植物对有毒金属/类金属耐受性的“组学”方法进展

Front Plant Sci. 2022 Jan 4;12:794373. doi: 10.3389/fpls.2021.794373. eCollection 2021.

Plant Co-expression Annotation Resource: a web server for identifying targets for genetically modified crop breeding pipelines.植物共表达注释资源：一个用于鉴定基因改良作物育种管道目标的网络服务器。

BMC Bioinformatics. 2021 Feb 5;22(1):46. doi: 10.1186/s12859-020-03792-z.

本文引用的文献

Tripal v3: an ontology-based toolkit for construction of FAIR biological community databases.Tripal v3：一个基于本体的工具包，用于构建 FAIR 生物群落数据库。

Database (Oxford). 2019 Jan 1;2019. doi: 10.1093/database/baz077.

More Is Better: Recent Progress in Multi-Omics Data Integration Methods.越多越好：多组学数据整合方法的最新进展

Front Genet. 2017 Jun 16;8:84. doi: 10.3389/fgene.2017.00084. eCollection 2017.

Data management and best practice for plant science.植物科学中的数据管理与最佳实践

Nat Plants. 2017 Jun 6;3:17086. doi: 10.1038/nplants.2017.86.

The Omics Revolution in Agricultural Research.农业研究中的组学革命

J Agric Food Chem. 2016 Jan 13;64(1):36-44. doi: 10.1021/acs.jafc.5b04515. Epub 2015 Nov 17.

InterMine: extensive web services for modern biology.InterMine：现代生物学的广泛网络服务。

Nucleic Acids Res. 2014 Jul;42(Web Server issue):W468-72. doi: 10.1093/nar/gku301. Epub 2014 Apr 21.

Web Apollo: a web-based genomic annotation editing platform.网络阿波罗：一个基于网络的基因组注释编辑平台。

Genome Biol. 2013 Aug 30;14(8):R93. doi: 10.1186/gb-2013-14-8-r93.

JBrowse: a next-generation genome browser.JBrowse：新一代基因组浏览器。

Genome Res. 2009 Sep;19(9):1630-8. doi: 10.1101/gr.094607.109. Epub 2009 Jul 1.

Biopython: freely available Python tools for computational molecular biology and bioinformatics.Biopython：用于计算分子生物学和生物信息学的免费可用Python工具。

Bioinformatics. 2009 Jun 1;25(11):1422-3. doi: 10.1093/bioinformatics/btp163. Epub 2009 Mar 20.

Bioinformatics. 2007 Jul 1;23(13):i337-46. doi: 10.1093/bioinformatics/btm189.

The Sequence Ontology: a tool for the unification of genome annotations.序列本体论：一种统一基因组注释的工具。

Genome Biol. 2005;6(5):R44. doi: 10.1186/gb-2005-6-5-r44. Epub 2005 Apr 29.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

马查多：开源基因组学数据集成框架。

Machado: Open source genomics data integration framework.

机构信息

出版信息

BACKGROUND

FINDINGS

CONCLUSION

背景

发现

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献