Department of Biochemistry, University College Cork, Cork, Ireland.
BMC Evol Biol. 2011 Feb 18;11:47. doi: 10.1186/1471-2148-11-47.
All sequenced genomes contain a proportion of lineage-specific genes, which exhibit no sequence similarity to any genes outside the lineage. Despite their prevalence, the origins and functions of most lineage-specific genes remain largely unknown. As more genomes are sequenced opportunities for understanding evolutionary origins and functions of lineage-specific genes are increasing.
This study provides a comprehensive analysis of the origins of lineage-specific genes (LSGs) in Arabidopsis thaliana that are restricted to the Brassicaceae family. In this study, lineage-specific genes within the nuclear (1761 genes) and mitochondrial (28 genes) genomes are identified. The evolutionary origins of two thirds of the lineage-specific genes within the Arabidopsis thaliana genome are also identified. Almost a quarter of lineage-specific genes originate from non-lineage-specific paralogs, while the origins of ~10% of lineage-specific genes are partly derived from DNA exapted from transposable elements (twice the proportion observed for non-lineage-specific genes). Lineage-specific genes are also enriched in genes that have overlapping CDS, which is consistent with such novel genes arising from overprinting. Over half of the subset of the 958 lineage-specific genes found only in Arabidopsis thaliana have alignments to intergenic regions in Arabidopsis lyrata, consistent with either de novo origination or differential gene loss and retention, with both evolutionary scenarios explaining the lineage-specific status of these genes. A smaller number of lineage-specific genes with an incomplete open reading frame across different Arabidopsis thaliana accessions are further identified as accession-specific genes, most likely of recent origin in Arabidopsis thaliana. Putative de novo origination for two of the Arabidopsis thaliana-only genes is identified via additional sequencing across accessions of Arabidopsis thaliana and closely related sister species lineages. We demonstrate that lineage-specific genes have high tissue specificity and low expression levels across multiple tissues and developmental stages. Finally, stress responsiveness is identified as a distinct feature of Brassicaceae-specific genes; where these LSGs are enriched for genes responsive to a wide range of abiotic stresses.
Improving our understanding of the origins of lineage-specific genes is key to gaining insights regarding how novel genes can arise and acquire functionality in different lineages. This study comprehensively identifies all of the Brassicaceae-specific genes in Arabidopsis thaliana and identifies how the majority of such lineage-specific genes have arisen. The analysis allows the relative importance (and prevalence) of different evolutionary routes to the genesis of novel ORFs within lineages to be assessed. Insights regarding the functional roles of lineage-specific genes are further advanced through identification of enrichment for stress responsiveness in lineage-specific genes, highlighting their likely importance for environmental adaptation strategies.
所有已测序的基因组都包含一部分谱系特异性基因,这些基因与谱系外的任何基因都没有序列相似性。尽管它们很普遍,但大多数谱系特异性基因的起源和功能仍在很大程度上未知。随着越来越多的基因组被测序,人们对谱系特异性基因的进化起源和功能的理解机会也在增加。
本研究对拟南芥中仅限于十字花科家族的核基因组(1761 个基因)和线粒体基因组(28 个基因)中的谱系特异性基因进行了全面分析。本研究还确定了拟南芥基因组中三分之二的谱系特异性基因的进化起源。近四分之一的谱系特异性基因起源于非谱系特异性的旁系同源基因,而大约 10%的谱系特异性基因的起源部分来自转座元件衍生的 DNA(是观察到的非谱系特异性基因的两倍)。谱系特异性基因也在具有重叠 CDS 的基因中富集,这与这种新型基因由重叠产生的情况一致。在仅在拟南芥中发现的 958 个谱系特异性基因的亚集中,有一半以上与拟南芥 lyrata 的基因间区域有比对,这与从头起源或差异基因的丢失和保留一致,这两种进化情景都解释了这些基因的谱系特异性状态。在不同的拟南芥品系中,发现一小部分具有不完整开放阅读框的谱系特异性基因是品系特异性基因,很可能是在拟南芥中最近起源的。通过对拟南芥和密切相关的姐妹种系的不同品系进行额外的测序,确定了两个仅在拟南芥中存在的谱系特异性基因的假定从头起源。我们证明,谱系特异性基因在多个组织和发育阶段具有高度的组织特异性和低表达水平。最后,确定了对十字花科特异性基因的应激反应是一个显著特征;这些 LSG 富集了对多种非生物胁迫有反应的基因。
提高我们对谱系特异性基因起源的理解是深入了解新基因如何在不同谱系中产生并获得功能的关键。本研究全面鉴定了拟南芥中的所有十字花科特异性基因,并确定了此类谱系特异性基因的大部分起源。该分析可以评估不同进化途径在谱系中新 ORF 产生中的相对重要性(和普遍性)。通过鉴定谱系特异性基因对胁迫反应的富集,进一步推进了对谱系特异性基因功能作用的认识,突出了它们在环境适应策略中的重要性。