硒蛋白谱：一种用于硒蛋白注释的计算流程

Selenoprofiles: A Computational Pipeline for Annotation of Selenoproteins.

作者信息

Santesmasses Didac, Mariotti Marco, Guigó Roderic

机构信息

Centre for Genomic Regulation (CRG), The Barcelona Institute for Science and Technology, Dr. Aiguader 88, Barcelona, 08003, Spain.

Universitat Pompeu Fabra (UPF), Barcelona, Spain.

出版信息

Methods Mol Biol. 2018;1661:17-28. doi: 10.1007/978-1-4939-7258-6_2.

DOI:10.1007/978-1-4939-7258-6_2

PMID:28917034

Abstract

Selenoproteins contain selenocysteine (Sec or U), the 21st amino acid, inserted in response to an in-frame UGA codon. UGA normally terminates translation, but in selenoprotein mRNAs it is recoded to specify Sec insertion. For this reason, standard gene prediction programs fail to predict Sec codons, and selenoproteins are usually misannotated in protein databases and genome projects. Selenoprofiles is a computational pipeline able to correctly annotate selenoprotein genes in genomic sequences. This program uses a SECIS-independent approach, based on homology searches, and employs curated built-in profile alignments for all known selenoprotein families. Selenoprofiles constitutes the most accurate method for predicting selenoprotein genes belonging to known families.

摘要

硒蛋白含有第21种氨基酸硒代半胱氨酸（Sec或U），它是在遇到读码框内的UGA密码子时插入的。UGA通常会终止翻译，但在硒蛋白mRNA中，它会被重新编码以指定插入Sec。因此，标准的基因预测程序无法预测Sec密码子，并且硒蛋白在蛋白质数据库和基因组计划中通常被错误注释。Selenoprofiles是一种能够正确注释基因组序列中硒蛋白基因的计算流程。该程序使用一种基于同源性搜索的不依赖于硒代半胱氨酸插入序列（SECIS）的方法，并对所有已知的硒蛋白家族采用经过整理的内置序列比对。Selenoprofiles是预测属于已知家族的硒蛋白基因的最准确方法。