Feng Yan-Jie, Liu Qing-Feng, Chen Meng-Yun, Liang Dan, Zhang Peng
State Key Laboratory of Biocontrol, College of Ecology and Evolution, School of Life Sciences, Sun Yat-Sen University, Guangzhou, 510275, China.
Mol Ecol Resour. 2016 Jan;16(1):91-102. doi: 10.1111/1755-0998.12429. Epub 2015 May 21.
In phylogenetics and population genetics, a large number of loci are often needed to accurately resolve species relationships. Normally, loci are enriched by PCR and sequenced by Sanger sequencing, which is expensive when the number of amplicons is large. Next-generation sequencing (NGS) techniques are increasingly used for parallel amplicon sequencing, which reduces sequencing costs tremendously, but has not reduced preparation costs very much. Moreover, for most current NGS methods, amplicons need to be purified and quantified before sequencing and their lengths are also restricted (normally <700 bp). Here, we describe an approach to sequence pooled amplicons of any length using the Illumina platform. Using this method, amplicons are pooled at equal volume rather than at equal concentration, thus eliminating the laborious purification and quantification steps. We then shear the pooled amplicons, repair the ends, add sample identifying linkers and pool multiple samples prior to Illumina library preparation. Data are then assembled using the transcriptome assembly program trinity, which is optimized to deal with templates of highly varying quantities. We demonstrated the utility of our approach by recovering 93.5% of the target amplicons (size up to 1650 bp) in full length for a 16 taxa × 101 loci project, using ~2.0 GB of Illumina HiSeq paired-end 90-bp data. Overall, we validate a rapid, cost-effective and scalable approach to sequence a large number of targeted loci from a large number of samples that is particularly suitable for both phylogenetics and population genetics studies that require a modest scale of data.
在系统发育学和群体遗传学中,通常需要大量基因座来准确解析物种关系。通常,基因座通过聚合酶链式反应(PCR)富集,并通过桑格测序法进行测序,当扩增子数量很大时,这一方法成本高昂。新一代测序(NGS)技术越来越多地用于并行扩增子测序,这极大地降低了测序成本,但制备成本降低不多。此外,对于当前大多数NGS方法,扩增子在测序前需要进行纯化和定量,并且其长度也受到限制(通常<700碱基对)。在这里,我们描述了一种使用Illumina平台对任意长度的混合扩增子进行测序的方法。使用这种方法,扩增子以等体积混合,而不是等浓度混合,从而省去了繁琐的纯化和定量步骤。然后,我们将混合的扩增子进行片段化处理,修复末端,添加样本识别接头,并在进行Illumina文库制备之前将多个样本混合。然后使用转录组组装程序Trinity对数据进行组装,该程序经过优化可处理数量差异很大的模板。对于一个包含16个分类单元×101个基因座的项目,我们使用约2.0GB的Illumina HiSeq双端90碱基对数据,全长回收了93.5%的目标扩增子(大小达1650碱基对),从而证明了我们方法的实用性。总体而言,我们验证了一种快速、经济高效且可扩展的方法,可对大量样本中的大量靶向基因座进行测序,该方法特别适用于需要适度规模数据的系统发育学和群体遗传学研究。