Smith Miriam, Campino Susana, Gu Yong, Clark Taane G, Otto Thomas D, Maslen Gareth, Manske Magnus, Imwong Mallika, Dondorp Arjen M, Kwiatkowski Dominic P, Quail Michael A, Swerdlow Harold
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, UK.
Open Genomics J. 2012;5. doi: 10.2174/1875693X01205010018.
Studies on DNA from pathogenic organisms, within clinical samples, are often complicated by the presence of large amounts of host, e.g., human DNA. Isolation of pathogen DNA from these samples would improve the efficiency of next-generation sequencing (NGS) and pathogen identification. Here we describe a solution-based hybridisation method for isolation of pathogen DNA from a mixed population. This straightforward and inexpensive technique uses probes made from whole-genome DNA and off-the-shelf reagents. In this study, DNA was successfully enriched from a mixture of and human DNA. After enrichment, genome coverage following NGS was significantly higher and the evenness of coverage and GC content were unaffected. This technique was also applied to samples containing a mixture of human and DNA. The genome is particularly difficult to sequence due to its high AT content (80.6%) and repetitive nature. Post enrichment, a bias in the recovered DNA was observed, with a poorer representation of the AT-rich non-coding regions. This uneven coverage was also observed in pre-enrichment samples, but to a lesser degree. Despite the coverage bias in enriched samples, SNP (single-nucleotide polymorphism) calling in coding regions was unaffected and the majority of samples had over 90% of their coding region covered at 5× depth. This technique shows significant promise as an effective method to enrich pathogen DNA from samples with heavy human contamination, particularly when applied to GC-neutral genomes.
对临床样本中致病生物的DNA进行研究,常常因存在大量宿主(如人类DNA)而变得复杂。从这些样本中分离病原体DNA将提高下一代测序(NGS)的效率和病原体鉴定能力。在此,我们描述了一种基于溶液的杂交方法,用于从混合群体中分离病原体DNA。这种直接且廉价的技术使用由全基因组DNA制成的探针和现成的试剂。在本研究中,成功地从[未提及的病原体]和人类DNA的混合物中富集了DNA。富集后,NGS后的基因组覆盖率显著提高,覆盖率的均匀性和GC含量不受影响。该技术还应用于含有人类和[未提及的病原体]DNA混合物的样本。[未提及的病原体]基因组由于其高AT含量(80.6%)和重复性而特别难以测序。富集后,观察到回收DNA存在偏差,富含AT的非编码区代表性较差。在富集前的样本中也观察到这种不均匀的覆盖率,但程度较轻。尽管富集样本中存在覆盖率偏差,但编码区的单核苷酸多态性(SNP)检测不受影响,大多数样本的编码区在5倍深度下有超过90%被覆盖。该技术作为一种从严重人类污染的样本中富集病原体DNA的有效方法显示出巨大潜力,特别是当应用于GC中性基因组时。