Hernandez Sarah I, Berezin Casey-Tyler, Miller Katie M, Peccoud Samuel J, Peccoud Jean
Department of Chemical and Biological Engineering, Colorado State University, Fort Collins, Colorado 80523, United States of America.
ACS Synth Biol. 2024 Dec 20;13(12):4099-4109. doi: 10.1021/acssynbio.4c00539. Epub 2024 Nov 7.
Despite the wide use of plasmids in research and clinical production, the need to verify plasmid sequences is a bottleneck that is too often underestimated in the manufacturing process. Although sequencing platforms continue to improve, the method and assembly pipeline chosen still influence the final plasmid assembly sequence. Furthermore, few dedicated tools exist for plasmid assembly, especially for assembly. Here, we evaluated short-read, long-read, and hybrid (both short and long reads) assembly pipelines across three replicates of a 24-plasmid library. Consistent with previous characterizations of each sequencing technology, short-read assemblies had issues resolving GC-rich regions, and long-read assemblies commonly had small insertions and deletions, especially in repetitive regions. The hybrid approach facilitated the most accurate, consistent assembly generation and identified mutations relative to the reference sequence. Although Sanger sequencing can be used to verify specific regions, some GC-rich and repetitive regions were difficult to resolve using any method, suggesting that easily sequenced genetic parts should be prioritized in the design of new genetic constructs.
尽管质粒在研究和临床生产中得到广泛应用,但在制造过程中,验证质粒序列的必要性往往被低估,这是一个瓶颈。尽管测序平台不断改进,但所选的方法和组装流程仍会影响最终的质粒组装序列。此外,专门用于质粒组装的工具很少,尤其是对于组装。在这里,我们在一个包含24个质粒的文库的三个重复样本中评估了短读长、长读长和混合(短读长和长读长)组装流程。与之前对每种测序技术的特征描述一致,短读长组装在解决富含GC的区域时存在问题,而长读长组装通常存在小的插入和缺失,特别是在重复区域。混合方法有助于生成最准确、一致的组装结果,并识别相对于参考序列的突变。虽然桑格测序可用于验证特定区域,但使用任何方法都难以解析一些富含GC和重复的区域,这表明在设计新的基因构建体时应优先考虑易于测序的遗传元件。