HAMAP 作为 SPARQL 规则——一种用于基因组和蛋白质组的可移植注释管道。

HAMAP as SPARQL rules-A portable annotation pipeline for genomes and proteomes.

机构信息

Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, Centre Médical Universitaire, 1 rue Michel-Servet, CH-1211 Geneva 4, Switzerland.

Centre Hospitalier Universitaire Vaudois/Ludwig Institute for Cancer Research, Agora Centre, CH-1005 Lausanne, Switzerland.

出版信息

Gigascience. 2020 Feb 1;9(2). doi: 10.1093/gigascience/giaa003.

DOI:10.1093/gigascience/giaa003

PMID:32034905

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7007698/

Abstract

BACKGROUND

Genome and proteome annotation pipelines are generally custom built and not easily reusable by other groups. This leads to duplication of effort, increased costs, and suboptimal annotation quality. One way to address these issues is to encourage the adoption of annotation standards and technological solutions that enable the sharing of biological knowledge and tools for genome and proteome annotation.

RESULTS

Here we demonstrate one approach to generate portable genome and proteome annotation pipelines that users can run without recourse to custom software. This proof of concept uses our own rule-based annotation pipeline HAMAP, which provides functional annotation for protein sequences to the same depth and quality as UniProtKB/Swiss-Prot, and the World Wide Web Consortium (W3C) standards Resource Description Framework (RDF) and SPARQL (a recursive acronym for the SPARQL Protocol and RDF Query Language). We translate complex HAMAP rules into the W3C standard SPARQL 1.1 syntax, and then apply them to protein sequences in RDF format using freely available SPARQL engines. This approach supports the generation of annotation that is identical to that generated by our own in-house pipeline, using standard, off-the-shelf solutions, and is applicable to any genome or proteome annotation pipeline.

CONCLUSIONS

HAMAP SPARQL rules are freely available for download from the HAMAP FTP site, ftp://ftp.expasy.org/databases/hamap/sparql/, under the CC-BY-ND 4.0 license. The annotations generated by the rules are under the CC-BY 4.0 license. A tutorial and supplementary code to use HAMAP as SPARQL are available on GitHub at https://github.com/sib-swiss/HAMAP-SPARQL, and general documentation about HAMAP can be found on the HAMAP website at https://hamap.expasy.org.

摘要

背景

基因组和蛋白质组注释管道通常是定制的，其他组不容易重复使用。这导致了重复的工作、增加了成本和次优的注释质量。解决这些问题的一种方法是鼓励采用注释标准和技术解决方案，使生物知识和基因组和蛋白质组注释工具能够共享。

结果

这里我们展示了一种生成可移植的基因组和蛋白质组注释管道的方法，用户无需使用定制软件即可运行。这个概念验证使用了我们自己的基于规则的注释管道 HAMAP，它为蛋白质序列提供了与 UniProtKB/Swiss-Prot 相同深度和质量的功能注释，以及万维网联盟 (W3C) 标准资源描述框架 (RDF) 和 SPARQL（SPARQL 协议和 RDF 查询语言的递归缩写）。我们将复杂的 HAMAP 规则转换为 W3C 标准 SPARQL 1.1 语法，然后使用免费提供的 SPARQL 引擎将它们应用于 RDF 格式的蛋白质序列。这种方法支持使用标准的现成解决方案生成与我们自己的内部管道生成的注释相同的注释，并且适用于任何基因组或蛋白质组注释管道。

结论

HAMAP SPARQL 规则可从 HAMAP FTP 站点下载，网址为 ftp://ftp.expasy.org/databases/hamap/sparql/，根据 CC-BY-ND 4.0 许可证获得许可。规则生成的注释根据 CC-BY 4.0 许可证获得许可。有关使用 HAMAP 作为 SPARQL 的教程和补充代码可在 GitHub 上的 https://github.com/sib-swiss/HAMAP-SPARQL 上获得，有关 HAMAP 的一般文档可在 HAMAP 网站上获得，网址为 https://hamap.expasy.org。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a98a/7007698/55e9d1d2fd22/giaa003fig1.jpg

相似文献

HAMAP as SPARQL rules-A portable annotation pipeline for genomes and proteomes.HAMAP 作为 SPARQL 规则——一种用于基因组和蛋白质组的可移植注释管道。

Gigascience. 2020 Feb 1;9(2). doi: 10.1093/gigascience/giaa003.

HAMAP: a database of completely sequenced microbial proteome sets and manually curated microbial protein families in UniProtKB/Swiss-Prot.HAMAP：一个包含完全测序的微生物蛋白质组集以及UniProtKB/Swiss-Prot中经人工整理的微生物蛋白质家族的数据库。

Nucleic Acids Res. 2009 Jan;37(Database issue):D471-8. doi: 10.1093/nar/gkn661. Epub 2008 Oct 11.

HAMAP in 2015: updates to the protein family classification and annotation system.2015年的HAMAP：蛋白质家族分类与注释系统的更新

Nucleic Acids Res. 2015 Jan;43(Database issue):D1064-70. doi: 10.1093/nar/gku1002. Epub 2014 Oct 27.

HAMAP in 2013, new developments in the protein family classification and annotation system.HAMAP 于 2013 年，蛋白质家族分类和注释系统的新发展。

Nucleic Acids Res. 2013 Jan;41(Database issue):D584-9. doi: 10.1093/nar/gks1157. Epub 2012 Nov 27.

DFAST: a flexible prokaryotic genome annotation pipeline for faster genome publication.DFAST：一个灵活的原核生物基因组注释管道，用于更快地发布基因组。

Bioinformatics. 2018 Mar 15;34(6):1037-1039. doi: 10.1093/bioinformatics/btx713.

Annotation of biologically relevant ligands in UniProtKB using ChEBI.使用 ChEBI 对 UniProtKB 中的生物相关配体进行注释。

Bioinformatics. 2023 Jan 1;39(1). doi: 10.1093/bioinformatics/btac793.

MicrobeAnnotator: a user-friendly, comprehensive functional annotation pipeline for microbial genomes.微生物注释器：一个用户友好、全面的微生物基因组功能注释管道。

BMC Bioinformatics. 2021 Jan 6;22(1):11. doi: 10.1186/s12859-020-03940-5.

TogoGenome/TogoStanza: modularized Semantic Web genome database.TogoGenome/TogoStanza：模块化语义网基因组数据库。

Database (Oxford). 2019 Jan 1;2019:bay132. doi: 10.1093/database/bay132.

UniRule: a unified rule resource for automatic annotation in the UniProt Knowledgebase.UniRule：UniProt 知识库中自动注释的统一规则资源。

Bioinformatics. 2020 Nov 1;36(17):4643-4648. doi: 10.1093/bioinformatics/btaa485.

Beav: a bacterial genome and mobile element annotation pipeline.Beav：细菌基因组和移动元件注释流水线。

mSphere. 2024 Aug 28;9(8):e0020924. doi: 10.1128/msphere.00209-24. Epub 2024 Jul 22.

引用本文的文献

A large collection of bioinformatics question-query pairs over federated knowledge graphs: methodology and applications.关于联邦知识图谱的大量生物信息学问题-查询对：方法与应用

Gigascience. 2025 Jan 6;14. doi: 10.1093/gigascience/giaf045.

Bioinformatics analysis of the Microsporidia sp. MB genome: a malaria transmission-blocking symbiont of the Anopheles arabiensis mosquito.微孢子虫 MB 基因组的生物信息学分析：一种阻断疟原虫传播的埃及伊蚊共生体。

BMC Genomics. 2024 Nov 22;25(1):1132. doi: 10.1186/s12864-024-11046-y.

The SIB Swiss Institute of Bioinformatics Semantic Web of data.瑞士生物信息学研究所语义网数据。

Nucleic Acids Res. 2024 Jan 5;52(D1):D44-D51. doi: 10.1093/nar/gkad902.

Using logical constraints to validate statistical information about disease outbreaks in collaborative knowledge graphs: the case of COVID-19 epidemiology in Wikidata.使用逻辑约束来验证协作知识图谱中有关疾病爆发的统计信息：以维基数据中的COVID-19流行病学为例。

PeerJ Comput Sci. 2022 Sep 29;8:e1085. doi: 10.7717/peerj-cs.1085. eCollection 2022.

Diverse Taxonomies for Diverse Chemistries: Enhanced Representation of Natural Product Metabolism in UniProtKB.针对不同化学性质的多样分类法：在UniProtKB中增强天然产物代谢的表征

Metabolites. 2021 Jan 12;11(1):48. doi: 10.3390/metabo11010048.

本文引用的文献

Enzyme annotation in UniProtKB using Rhea.使用 Rhea 在 UniProtKB 中进行酶注释。

Bioinformatics. 2020 Mar 1;36(6):1896-1901. doi: 10.1093/bioinformatics/btz817.

InterPro in 2019: improving coverage, classification and access to protein sequence annotations.InterPro 在 2019 年：提高蛋白质序列注释的覆盖范围、分类和访问。

Nucleic Acids Res. 2019 Jan 8;47(D1):D351-D360. doi: 10.1093/nar/gky1100.

The Gene Ontology Resource: 20 years and still GOing strong.《基因本体论资源：20 年，持续强大》

Nucleic Acids Res. 2019 Jan 8;47(D1):D330-D338. doi: 10.1093/nar/gky1055.

UniProt: a worldwide hub of protein knowledge.UniProt：蛋白质知识的全球枢纽。

Nucleic Acids Res. 2019 Jan 8;47(D1):D506-D515. doi: 10.1093/nar/gky1049.

Updates in Rhea: SPARQLing biochemical reaction data.Rhea 更新：对生物化学反应数据进行 SPARQL 操作。

Nucleic Acids Res. 2019 Jan 8;47(D1):D596-D600. doi: 10.1093/nar/gky876.

Methods for automated genome-scale metabolic model reconstruction.自动化基因组规模代谢模型重建方法。

Biochem Soc Trans. 2018 Aug 20;46(4):931-936. doi: 10.1042/BST20170246. Epub 2018 Jul 31.

Predicting human protein function with multi-task deep neural networks.用多任务深度神经网络预测人类蛋白质功能。

PLoS One. 2018 Jun 11;13(6):e0198216. doi: 10.1371/journal.pone.0198216. eCollection 2018.

Earth BioGenome Project: Sequencing life for the future of life.地球生物基因组计划：为生命的未来测序生命。

Proc Natl Acad Sci U S A. 2018 Apr 24;115(17):4325-4333. doi: 10.1073/pnas.1720115115.

Ensembl 2018.Ensembl 2018.

Nucleic Acids Res. 2018 Jan 4;46(D1):D754-D761. doi: 10.1093/nar/gkx1098.

Rfam 13.0: shifting to a genome-centric resource for non-coding RNA families.RFAM 13.0：转向以基因组为中心的非编码 RNA 家族资源

Nucleic Acids Res. 2018 Jan 4;46(D1):D335-D342. doi: 10.1093/nar/gkx1038.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

HAMAP 作为 SPARQL 规则——一种用于基因组和蛋白质组的可移植注释管道。

HAMAP as SPARQL rules-A portable annotation pipeline for genomes and proteomes.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献