Bioinformatics and genomics group, Center for Genomic Regulation and Universitat Pompeu Fabra, Barcelona, Catalonia, Spain.
Bioinformatics. 2010 Nov 1;26(21):2656-63. doi: 10.1093/bioinformatics/btq516. Epub 2010 Sep 21.
Selenoproteins are a group of proteins that contain selenocysteine (Sec), a rare amino acid inserted co-translationally into the protein chain. The Sec codon is UGA, which is normally a stop codon. In selenoproteins, UGA is recoded to Sec in presence of specific features on selenoprotein gene transcripts. Due to the dual role of the UGA codon, selenoprotein prediction and annotation are difficult tasks, and even known selenoproteins are often misannotated in genome databases.
We present an homology-based in silico method to scan genomes for members of the known eukaryotic selenoprotein families: selenoprofiles. The core of the method is a set of manually curated highly reliable multiple sequence alignments of selenoprotein families, which are used as queries to scan genomic sequences. Results of the scan are processed through a number of steps, to produce highly accurate predictions of selenoprotein genes with little or no human intervention. Selenoprofiles is a valuable tool for bioinformatic characterization of eukaryotic selenoproteomes, and can complement genome annotation pipelines.
Selenoprofiles is a python-built pipeline that internally runs psitblastn, exonerate, genewise, SECISearch and a number of custom-made scripts and programs. The program is available at http://big.crg.cat/services/selenoprofiles. The predictions presented in this article are available through DAS at http://genome.crg.cat:9000/das/Selenoprofiles_ensembl.
硒蛋白是一组含有硒代半胱氨酸(Sec)的蛋白质,硒代半胱氨酸是一种在翻译过程中插入蛋白质链的稀有氨基酸。Sec 密码子是 UGA,通常是终止密码子。在硒蛋白中,UGA 在硒蛋白基因转录本存在特定特征的情况下被重新编码为 Sec。由于 UGA 密码子的双重作用,硒蛋白的预测和注释是困难的任务,甚至已知的硒蛋白在基因组数据库中经常被错误注释。
我们提出了一种基于同源性的计算方法,用于在基因组中搜索已知真核硒蛋白家族的成员:硒蛋白谱。该方法的核心是一组经过人工精心整理的高度可靠的硒蛋白家族多重序列比对,用作查询来扫描基因组序列。扫描的结果通过多个步骤进行处理,以产生高度准确的硒蛋白基因预测,几乎不需要或不需要人为干预。硒蛋白谱是对真核硒蛋白组进行生物信息学特征描述的有用工具,并且可以补充基因组注释管道。
硒蛋白谱是一个用 python 构建的管道,内部运行 psitblastn、exonerate、genewise、SECISearch 和许多自定义脚本和程序。该程序可在 http://big.crg.cat/services/selenoprofiles 获得。本文介绍的预测结果可通过 DAS 在 http://genome.crg.cat:9000/das/Selenoprofiles_ensembl 获得。