Kinjo Yukihiro, Saitoh Seikoh, Tokuda Gaku
Tropical Biosphere Research Center, University of the Ryukyus.
Microbes Environ. 2015;30(3):208-20. doi: 10.1264/jsme2.ME14153. Epub 2015 Jul 4.
Whole-genome sequencing has emerged as one of the most effective means to elucidate the biological roles and molecular features of obligate intracellular symbionts (endosymbionts). However, the de novo assembly of an endosymbiont genome remains a challenge when host and/or mitochondrial DNA sequences are present in a dataset and hinder the assembly of the genome. By focusing on the traits of genome evolution in endosymbionts, we herein developed and investigated a genome-assembly strategy that consisted of two consecutive procedures: the selection of endosymbiont contigs from an output obtained from a de novo assembly performed using a TBLASTX search against a reference genome, named TBLASTX Contig Selection and Filtering (TCSF), and the iterative reassembling of the genome from reads mapped on the selected contigs, named Iterative Mapping and ReAssembling (IMRA), to merge the contigs. In order to validate this approach, we sequenced two strains of the cockroach endosymbiont Blattabacterium cuenoti and applied this strategy to the datasets. TCSF was determined to be highly accurate and sensitive in contig selection even when the genome of a distantly related free-living bacterium was used as a reference genome. Furthermore, the use of IMRA markedly improved sequence assemblies: the genomic sequence of an endosymbiont was almost completed from a dataset containing only 3% of the sequences of the endosymbiont's genome. The efficiency of our strategy may facilitate further studies on endosymbionts.
全基因组测序已成为阐明专性细胞内共生体(内共生体)生物学作用和分子特征的最有效手段之一。然而,当数据集中存在宿主和/或线粒体DNA序列并阻碍基因组组装时,内共生体基因组的从头组装仍然是一个挑战。通过关注内共生体基因组进化的特征,我们在此开发并研究了一种基因组组装策略,该策略包括两个连续步骤:从使用TBLASTX搜索参考基因组进行从头组装得到的输出中选择内共生体重叠群,称为TBLASTX重叠群选择与过滤(TCSF),以及从映射到所选重叠群的 reads 中对基因组进行迭代重新组装,称为迭代映射与重新组装(IMRA),以合并重叠群。为了验证这种方法,我们对蟑螂内共生体克氏拟杆菌的两个菌株进行了测序,并将此策略应用于数据集。即使使用远缘自由生活细菌的基因组作为参考基因组,TCSF在重叠群选择中也被确定为高度准确和敏感。此外,IMRA的使用显著改善了序列组装:从仅包含内共生体基因组3%序列的数据集中几乎完成了内共生体的基因组序列。我们策略的效率可能有助于对内共生体进行进一步研究。