基于氨基酸k聚体的灵活蛋白质数据库。

Flexible protein database based on amino acid k-mers.

作者信息

Déraspe Maxime, Boisvert Sébastien, Laviolette François, Roy Paul H, Corbeil Jacques

机构信息

Department of Molecular Medicine, Université Laval, Quebec, Canada.

Big Data Research Center, Université Laval, Quebec, Canada.

出版信息

Sci Rep. 2022 Jun 1;12(1):9101. doi: 10.1038/s41598-022-12843-9.

DOI:10.1038/s41598-022-12843-9

PMID:35650262

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9160020/

Abstract

Identification of proteins is one of the most computationally intensive steps in genomics studies. It usually relies on aligners that do not accommodate rich information on proteins and require additional pipelining steps for protein identification. We introduce kAAmer, a protein database engine based on amino-acid k-mers that provides efficient identification of proteins while supporting the incorporation of flexible annotations on these proteins. Moreover, the database is built to be used as a microservice, to be hosted and queried remotely.

摘要

蛋白质鉴定是基因组学研究中计算量最大的步骤之一。它通常依赖于比对工具，这些工具无法容纳丰富的蛋白质信息，并且需要额外的流水线步骤来进行蛋白质鉴定。我们引入了kAAmer，这是一种基于氨基酸k聚体的蛋白质数据库引擎，它能在支持对这些蛋白质进行灵活注释的同时，高效地鉴定蛋白质。此外，该数据库被构建为一个微服务，可以远程托管和查询。