Suppr超能文献

生物因素在重叠基因的合成构建中。

Biological factors in the synthetic construction of overlapping genes.

机构信息

Chair of Microbial Ecology, Department of Molecular Life Sciences, Technical University of Munich, Freising, Germany.

Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK.

出版信息

BMC Genomics. 2021 Dec 11;22(1):888. doi: 10.1186/s12864-021-08181-1.

Abstract

BACKGROUND

Overlapping genes (OLGs) with long protein-coding overlapping sequences are disallowed by standard genome annotation programs, outside of viruses. Recently however they have been discovered in Archaea, diverse Bacteria, and Mammals. The biological factors underlying life's ability to create overlapping genes require more study, and may have important applications in understanding evolution and in biotechnology. A previous study claimed that protein domains from viruses were much better suited to forming overlaps than those from other cellular organisms - in this study we assessed this claim, in order to discover what might underlie taxonomic differences in the creation of gene overlaps.

RESULTS

After overlapping arbitrary Pfam domain pairs and evaluating them with Hidden Markov Models we find OLG construction to be much less constrained than expected. For instance, close to 10% of the constructed sequences cannot be distinguished from typical sequences in their protein family. Most are also indistinguishable from natural protein sequences regarding identity and secondary structure. Surprisingly, contrary to a previous study, virus domains were much less suitable for designing OLGs than bacterial or eukaryotic domains were. In general, the amount of amino acid change required to force a domain to overlap is approximately equal to the variation observed within a typical domain family. The resulting high similarity between natural sequences and those altered so as to overlap is mostly due to the combination of high redundancy in the genetic code and the evolutionary exchangeability of many amino acids.

CONCLUSIONS

Synthetic overlapping genes which closely resemble natural gene sequences, as measured by HMM profiles, are remarkably easy to construct, and most arbitrary domain pairs can be altered so as to overlap while retaining high similarity to the original sequences. Future work however will need to assess important factors not considered such as intragenic interactions which affect protein folding. While the analysis here is not sufficient to guarantee functional folding proteins, further analysis of constructed OLGs will improve our understanding of the origin of these remarkable genetic elements across life and opens up exciting possibilities for synthetic biology.

摘要

背景

标准基因组注释程序不允许重叠基因(OLGs)具有长的蛋白质编码重叠序列,除了病毒之外。然而,最近在古菌、多样的细菌和哺乳动物中发现了它们。生命创造重叠基因的生物学因素需要更多的研究,并且可能在理解进化和生物技术方面有重要的应用。先前的一项研究声称,病毒的蛋白质结构域比其他细胞生物的结构域更适合形成重叠——在这项研究中,我们评估了这一说法,以发现导致基因重叠产生分类学差异的潜在因素。

结果

在任意 Pfam 结构域对之间进行重叠,并使用隐马尔可夫模型对其进行评估后,我们发现 OLG 的构建受到的限制比预期的要小得多。例如,构建序列中接近 10%的序列与它们蛋白质家族中的典型序列无法区分。关于身份和二级结构,它们与天然蛋白质序列也无法区分。令人惊讶的是,与先前的研究相反,病毒结构域比细菌或真核生物结构域更不适合设计 OLG。一般来说,为了迫使一个结构域重叠而需要的氨基酸变化量大约等于典型结构域家族内观察到的变化量。在自然序列和为了重叠而改变的序列之间产生的高度相似性主要是由于遗传密码的高度冗余性和许多氨基酸的进化可互换性。

结论

根据 HMM 谱进行衡量,与天然基因序列非常相似的合成重叠基因非常容易构建,并且大多数任意结构域对可以被改变以重叠,同时保持与原始序列的高度相似性。然而,未来的工作将需要评估诸如影响蛋白质折叠的基因内相互作用等未考虑的重要因素。虽然这里的分析不足以保证折叠蛋白的功能,但对构建的 OLG 进行进一步分析将提高我们对这些在生命中出现的非凡遗传元件的起源的理解,并为合成生物学开辟令人兴奋的可能性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9168/8665593/23fb9533906c/12864_2021_8181_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验