Fonseca Luiz H M, Lohmann Lúcia G
Laboratorio de Sistemática Vegetal, Departamento de Botânica, Instituto de Biocências, Universidade de São Paulo, São Paulo, Brazil.
Front Plant Sci. 2017 Nov 1;8:1875. doi: 10.3389/fpls.2017.01875. eCollection 2017.
The chloroplast is one of the most important organelles of plants. This organelle has a circular DNA with approximately 130 genes. The use of plastid genomic data in phylogenetic and evolutionary studies became possible with high-throughput sequencing methods, which allowed us to rapidly obtain complete genomes at a reasonable cost. Here, we use high-throughput sequencing to study the "-" clade (Bignonieae, Bignoniaceae). More specifically, we use Hi-Seq Illumina technology to sequence 10 complete plastid genomes. Plastomes were assembled using selected plastid reads and approach with SPAdes. The 10 assembled genomes were analyzed in a phylogenetic context using five different partition schemes: (1) 91 protein-coding genes ("coding"); (2) 76 introns and spacers with alignment manually edited ("non-coding edited"); (3) 76 non-coding regions with poorly aligned regions removed using T-Coffee ("non-coding filtered"); (4) 91 coding regions plus 76 non-coding regions edited ("coding + non-coding edited"); and, (5) 91 protein-coding regions plus the 76 filtered non-coding regions ("coding + non-coding filtered"). Fragmented regions were aligned using Mafft. Phylogenetic analyses were conducted using Maximum Likelihood (ML) and Bayesian Criteria (BC). The analyses of the individual plastomes consistently recovered an expansion of the Inverted Repeated (IRs) regions and a compression of the Small Single Copy (SSC) region. Major genomic translocations were observed at the Large Single Copy (LSC) and IRs. ML phylogenetic analyses of the individual datasets led to the same topology, with the exception of the analysis of the "non-coding filtered" dataset. Overall, relationships were strongly supported, with the highest support values obtained through the analysis of the "coding + non-coding edited" dataset. Four regions at the LSC, SSC, and IR were selected for primer development. The "-" clade shows an unusual pattern of plastid structure variation, including four major genomic translocations. These rearrangements challenge the current view of conserved plastid genome architecture in terms of gene order. It also complicates both genomic assemblies using reference genomes and sequence alignments using whole plastomes. Therefore, strategies that employ assemblies and manual evaluation of sequence alignments are required to prevent assembly and alignment errors.
叶绿体是植物最重要的细胞器之一。该细胞器含有一个约有130个基因的环状DNA。随着高通量测序方法的出现,利用质体基因组数据进行系统发育和进化研究成为可能,这使我们能够以合理的成本快速获得完整的基因组。在此,我们利用高通量测序研究“-”分支(紫葳科,紫葳族)。更具体地说,我们使用Hi-Seq Illumina技术对10个完整的质体基因组进行测序。质体基因组使用选定的质体读数和SPAdes方法进行组装。使用五种不同的分区方案在系统发育背景下分析这10个组装好的基因组:(1)91个蛋白质编码基因(“编码区”);(2)76个内含子和间隔区,其比对经过人工编辑(“非编码区编辑”);(3)76个非编码区,使用T-Coffee去除比对不佳的区域(“非编码区过滤”);(4)91个编码区加上76个编辑后的非编码区(“编码区+非编码区编辑”);以及,(5)91个蛋白质编码区加上76个过滤后的非编码区(“编码区+非编码区过滤”)。使用Mafft对片段化区域进行比对。使用最大似然法(ML)和贝叶斯准则(BC)进行系统发育分析。各个质体基因组的分析一致发现反向重复(IRs)区域的扩张和小单拷贝(SSC)区域的压缩。在大单拷贝(LSC)和IRs处观察到主要的基因组易位。各个数据集的ML系统发育分析导致相同的拓扑结构,但“非编码区过滤”数据集的分析除外。总体而言,各分支关系得到了有力支持,通过“编码区+非编码区编辑”数据集的分析获得了最高的支持值。在LSC、SSC和IR处选择了四个区域进行引物开发。“-”分支显示出一种不寻常的质体结构变异模式,包括四个主要的基因组易位。这些重排挑战了目前关于质体基因组结构在基因顺序方面保守性的观点。这也使使用参考基因组进行基因组组装以及使用整个质体基因组进行序列比对变得复杂。因此,需要采用从头组装和人工评估序列比对的策略来防止组装和比对错误。