Research group PLIS: Programming, Logic and Intelligent Systems, Department of Communication, Business and Information Technologies, Roskilde University, P.O. Box 260, Roskilde, DK-4000, Denmark.
BMC Bioinformatics. 2013 Apr 4;14:118. doi: 10.1186/1471-2105-14-118.
Pyrrolysine (the 22nd amino acid) is in certain organisms and under certain circumstances encoded by the amber stop codon, UAG. The circumstances driving pyrrolysine translation are not well understood. The involvement of a predicted mRNA structure in the region downstream UAG has been suggested, but the structure does not seem to be present in all pyrrolysine incorporating genes.
We propose a strategy to predict pyrrolysine encoding genes in genomes of archaea and bacteria. We cluster open reading frames interrupted by the amber codon based on sequence similarity. We rank these clusters according to several features that may influence pyrrolysine translation. The ranking effects of different features are assessed and we propose a weighted combination of these features which best explains the currently known pyrrolysine incorporating genes. We devote special attention to the effect of structural conservation and provide further substantiation to support that structural conservation may be influential - but is not a necessary factor. Finally, from the weighted ranking, we identify a number of potentially pyrrolysine incorporating genes.
We propose a method for prediction of pyrrolysine incorporating genes in genomes of bacteria and archaea leading to insights about the factors driving pyrrolysine translation and identification of new gene candidates. The method predicts known conserved genes with high recall and predicts several other promising candidates for experimental verification. The method is implemented as a computational pipeline which is available on request.
吡咯赖氨酸(第 22 种氨基酸)在某些生物体内和特定条件下,由琥珀终止密码子 UAG 编码。吡咯赖氨酸翻译的条件还没有被很好地理解。有人提出,在 UAG 下游的预测 mRNA 结构可能参与其中,但并非所有包含吡咯赖氨酸的基因都存在这种结构。
我们提出了一种在古菌和细菌基因组中预测编码吡咯赖氨酸的基因的策略。我们根据序列相似性,对被琥珀密码子打断的开放阅读框进行聚类。我们根据可能影响吡咯赖氨酸翻译的几个特征对这些聚类进行排序。评估了不同特征的排序效果,并提出了这些特征的加权组合,该组合能最好地解释目前已知的包含吡咯赖氨酸的基因。我们特别关注结构保守性的影响,并提供进一步的证据支持结构保守性可能有影响,但不是必要因素。最后,根据加权排序,我们确定了一些潜在的包含吡咯赖氨酸的基因。
我们提出了一种在细菌和古菌基因组中预测包含吡咯赖氨酸的基因的方法,从而深入了解了驱动吡咯赖氨酸翻译的因素,并鉴定了新的候选基因。该方法可以准确地预测已知的保守基因,并预测了其他几个有希望的候选基因,以供实验验证。该方法已实现为一个计算流程,可应要求提供。