Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge, UK.
Nat Protoc. 2012 Jun 7;7(7):1260-84. doi: 10.1038/nprot.2012.068.
Genome projects now produce draft assemblies within weeks owing to advanced high-throughput sequencing technologies. For milestone projects such as Escherichia coli or Homo sapiens, teams of scientists were employed to manually curate and finish these genomes to a high standard. Nowadays, this is not feasible for most projects, and the quality of genomes is generally of a much lower standard. This protocol describes software (PAGIT) that is used to improve the quality of draft genomes. It offers flexible functionality to close gaps in scaffolds, correct base errors in the consensus sequence and exploit reference genomes (if available) in order to improve scaffolding and generating annotations. The protocol is most accessible for bacterial and small eukaryotic genomes (up to 300 Mb), such as pathogenic bacteria, malaria and parasitic worms. Applying PAGIT to an E. coli assembly takes ∼24 h: it doubles the average contig size and annotates over 4,300 gene models.
由于先进的高通量测序技术,基因组项目现在可以在数周内生成草稿组装。对于里程碑项目,如大肠杆菌或人类,科学家团队被雇用来手动编辑和完成这些基因组,以达到高标准。如今,对于大多数项目来说,这是不可行的,而且基因组的质量通常要低得多。本协议描述了用于提高草稿基因组质量的软件(PAGIT)。它提供了灵活的功能,可用于闭合支架中的缺口、纠正共识序列中的碱基错误,并利用参考基因组(如果有)来改进支架和生成注释。该协议最适用于细菌和小型真核生物基因组(高达 300 Mb),如病原菌、疟疾和寄生虫。将 PAGIT 应用于大肠杆菌组装需要约 24 小时:它将平均 contig 大小增加一倍,并注释了超过 4300 个基因模型。