Humphreys Christopher M, McLean Samantha, Schatschneider Sarah, Millat Thomas, Henstra Anne M, Annan Florence J, Breitkopf Ronja, Pander Bart, Piatek Pawel, Rowe Peter, Wichlacz Alexander T, Woods Craig, Norman Rupert, Blom Jochen, Goesman Alexander, Hodgman Charlie, Barrett David, Thomas Neil R, Winzer Klaus, Minton Nigel P
BBSRC/EPSRC Synthetic Biology Research Centre, School of Life Sciences, University of Nottingham, Nottingham, NG7 2RD, UK.
School of Pharmacy, University of Nottingham, Nottingham, NG7 2RD, UK.
BMC Genomics. 2015 Dec 21;16:1085. doi: 10.1186/s12864-015-2287-5.
Clostridium autoethanogenum is an acetogenic bacterium capable of producing high value commodity chemicals and biofuels from the C1 gases present in synthesis gas. This common industrial waste gas can act as the sole energy and carbon source for the bacterium that converts the low value gaseous components into cellular building blocks and industrially relevant products via the action of the reductive acetyl-CoA (Wood-Ljungdahl) pathway. Current research efforts are focused on the enhancement and extension of product formation in this organism via synthetic biology approaches. However, crucial to metabolic modelling and directed pathway engineering is a reliable and comprehensively annotated genome sequence.
We performed next generation sequencing using Illumina MiSeq technology on the DSM10061 strain of Clostridium autoethanogenum and observed 243 single nucleotide discrepancies when compared to the published finished sequence (NCBI: GCA_000484505.1), with 59.1 % present in coding regions. These variations were confirmed by Sanger sequencing and subsequent analysis suggested that the discrepancies were sequencing errors in the published genome not true single nucleotide polymorphisms. This was corroborated by the observation that over 90 % occurred within homopolymer regions of greater than 4 nucleotides in length. It was also observed that many genes containing these sequencing errors were annotated in the published closed genome as encoding proteins containing frameshift mutations (18 instances) or were annotated despite the coding frame containing stop codons, which if genuine, would severely hinder the organism's ability to survive. Furthermore, we have completed a comprehensive manual curation to reduce errors in the annotation that occur through serial use of automated annotation pipelines in related species. As a result, different functions were assigned to gene products or previous functional annotations rejected because of missing evidence in various occasions.
We present a revised manually curated full genome sequence for Clostridium autoethanogenum DSM10061, which provides reliable information for genome-scale models that rely heavily on the accuracy of annotation, and represents an important step towards the manipulation and metabolic modelling of this industrially relevant acetogen.
自养乙醇梭菌是一种产乙酸细菌,能够利用合成气中的C1气体生产高价值的商品化学品和生物燃料。这种常见的工业废气可作为该细菌的唯一能量和碳源,该细菌通过还原性乙酰辅酶A(伍德-Ljungdahl)途径的作用,将低价值的气态成分转化为细胞组成成分和具有工业相关性的产品。目前的研究工作集中在通过合成生物学方法增强和扩展该生物体中的产物形成。然而,对于代谢建模和定向途径工程而言,可靠且注释全面的基因组序列至关重要。
我们使用Illumina MiSeq技术对自养乙醇梭菌DSM10061菌株进行了下一代测序,与已发表的完整序列(NCBI:GCA_000484505.1)相比,观察到243个单核苷酸差异,其中59.1%存在于编码区域。这些变异通过桑格测序得到证实,随后的分析表明这些差异是已发表基因组中的测序错误,而非真正的单核苷酸多态性。超过90%的差异出现在长度大于4个核苷酸的同聚物区域这一观察结果证实了这一点。还观察到许多包含这些测序错误的基因在已发表的封闭基因组中被注释为编码含有移码突变的蛋白质(18例),或者尽管编码框中含有终止密码子仍被注释,如果这些是真实的,将严重阻碍该生物体的生存能力。此外,我们完成了全面的人工校正,以减少在相关物种中连续使用自动注释管道时出现的注释错误。结果,不同的功能被赋予基因产物,或者由于各种情况下缺乏证据而拒绝了先前的功能注释。
我们提供了一份经过人工校正的自养乙醇梭菌DSM10061修订全基因组序列,该序列为严重依赖注释准确性的基因组规模模型提供了可靠信息,并且代表了朝着对这种具有工业相关性的产乙酸菌进行操作和代谢建模迈出的重要一步。