Lu Tzu-Chiao, Leu Jun-Yi, Lin Wen-Chang
Graduate Institute of Life Sciences, National Defense Medical Center, Taipei, Taiwan.
Institute of Molecular Biology, Academia Sinica, Taipei, Taiwan.
Mol Biol Evol. 2017 Nov 1;34(11):2823-2838. doi: 10.1093/molbev/msx210.
Novel genes arising from random DNA sequences (de novo genes) have been suggested to be widespread in the genomes of different organisms. However, our knowledge about the origin and evolution of de novo genes is still limited. To systematically understand the general features of de novo genes, we established a robust pipeline to analyze >20,000 transcript-supported coding sequences (CDSs) from the budding yeast Saccharomyces cerevisiae. Our analysis pipeline combined phylogeny, synteny, and sequence alignment information to identify possible orthologs across 20 Saccharomycetaceae yeasts and discovered 4,340 S. cerevisiae-specific de novo genes and 8,871 S. sensu stricto-specific de novo genes. We further combine information on CDS positions and transcript structures to show that >65% of de novo genes arose from transcript isoforms of ancient genes, especially in the upstream and internal regions of ancient genes. Fourteen identified de novo genes with high transcript levels were chosen to verify their protein expressions. Ten of them, including eight transcript isoform-associated CDSs, showed translation signals and five proteins exhibited specific cytosolic localizations. Our results suggest that de novo genes frequently arise in the S. sensu stricto complex and have the potential to be quickly integrated into ancient cellular network.
源于随机DNA序列的新基因(从头起源基因)被认为在不同生物体的基因组中广泛存在。然而,我们对从头起源基因的起源和进化的了解仍然有限。为了系统地了解从头起源基因的一般特征,我们建立了一个强大的流程来分析来自芽殖酵母酿酒酵母的20000多个转录本支持的编码序列(CDS)。我们的分析流程结合了系统发育、共线性和序列比对信息,以识别20种酵母科酵母中的可能直系同源物,并发现了4340个酿酒酵母特异性的从头起源基因和8871个狭义酵母属特异性的从头起源基因。我们进一步结合CDS位置和转录本结构的信息,表明超过65%的从头起源基因来自古老基因的转录本异构体,特别是在古老基因的上游和内部区域。选择14个转录水平高的已鉴定从头起源基因来验证它们的蛋白质表达。其中10个,包括8个与转录本异构体相关的CDS,显示出翻译信号,5种蛋白质表现出特定的胞质定位。我们的结果表明,从头起源基因经常出现在狭义酵母属复合体中,并且有可能迅速整合到古老的细胞网络中。