Elbrecht Vasco, Vamos Ecaterina Edith, Steinke Dirk, Leese Florian
Aquatic Ecosystem Research, University of Duisburg-Essen, Essen, North Rhine-Westphalia, Germany.
Centre for Biodiversity Genomics, University of Guelph, Guelph, ON, Canada.
PeerJ. 2018 Apr 9;6:e4644. doi: 10.7717/peerj.4644. eCollection 2018.
DNA metabarcoding is used to generate species composition data for entire communities. However, sequencing errors in high-throughput sequencing instruments are fairly common, usually requiring reads to be clustered into operational taxonomic units (OTUs), losing information on intraspecific diversity in the process. While Cytochrome c oxidase subunit I (COI) haplotype information is limited in resolving intraspecific diversity it is nevertheless often useful e.g. in a phylogeographic context, helping to formulate hypotheses on taxon distribution and dispersal.
This study combines sequence denoising strategies, normally applied in microbial research, with additional abundance-based filtering to extract haplotype information from freshwater macroinvertebrate metabarcoding datasets. This novel approach was added to the R package "JAMP" and can be applied to COI amplicon datasets. We tested our haplotyping method by sequencing (i) a single-species mock community composed of 31 individuals with 15 different haplotypes spanning three orders of magnitude in biomass and (ii) 18 monitoring samples each amplified with four different primer sets and two PCR replicates.
We detected all 15 haplotypes of the single specimens in the mock community with relaxed filtering and denoising settings. However, up to 480 additional unexpected haplotypes remained in both replicates. Rigorous filtering removes most unexpected haplotypes, but also can discard expected haplotypes mainly from the small specimens. In the monitoring samples, the different primer sets detected 177-200 OTUs, each containing an average of 2.40-3.30 haplotypes per OTU. The derived intraspecific diversity data showed population structures that were consistent between replicates and similar between primer pairs but resolution depended on the primer length. A closer look at abundant taxa in the dataset revealed various population genetic patterns, e.g. the stonefly and the caddisfly showed a distinct north-south cline with respect to haplotype distribution, while the beetle and the isopod displayed no clear population pattern but differed in genetic diversity.
We developed a strategy to infer intraspecific genetic diversity from bulk invertebrate metabarcoding data. It needs to be stressed that at this point this metabarcoding-informed haplotyping is not capable of capturing the full diversity present in such samples, due to variation in specimen size, primer bias and loss of sequence variants with low abundance. Nevertheless, for a high number of species intraspecific diversity was recovered, identifying potentially isolated populations and taxa for further more detailed phylogeographic investigation. While we are currently lacking large-scale metabarcoding datasets to fully take advantage of our new approach, metabarcoding-informed haplotyping holds great promise for biomonitoring efforts that not only seek information about species diversity but also underlying genetic diversity.
DNA 宏条形码技术用于生成整个群落的物种组成数据。然而,高通量测序仪器中的测序错误相当常见,通常需要将 reads 聚类为操作分类单元(OTU),在此过程中会丢失种内多样性信息。虽然细胞色素 c 氧化酶亚基 I(COI)单倍型信息在解析种内多样性方面有限,但在例如系统地理学背景下仍然常常有用,有助于形成关于分类单元分布和扩散的假设。
本研究将通常应用于微生物研究的序列去噪策略与基于丰度的额外过滤相结合,以从淡水大型无脊椎动物宏条形码数据集中提取单倍型信息。这种新方法已添加到 R 包“JAMP”中,可应用于 COI 扩增子数据集。我们通过对(i)一个由 31 个个体组成的单物种模拟群落进行测序来测试我们的单倍型分型方法,这些个体具有 15 种不同的单倍型,生物量跨越三个数量级,以及(ii)18 个监测样本进行测序,每个样本用四种不同的引物组扩增并进行两次 PCR 重复。
在宽松的过滤和去噪设置下,我们在模拟群落中检测到了单个标本的所有 15 种单倍型。然而,在两个重复样本中仍有多达 480 种额外的意外单倍型。严格的过滤去除了大多数意外单倍型,但也可能会丢弃主要来自小标本的预期单倍型。在监测样本中,不同的引物组检测到 177 - 200 个 OTU,每个 OTU 平均包含 2.40 - 3.30 种单倍型。推导的种内多样性数据显示,重复样本之间的种群结构一致,引物对之间相似,但分辨率取决于引物长度。仔细观察数据集中丰富的分类单元发现了各种种群遗传模式,例如,石蝇和毛翅目昆虫在单倍型分布上呈现出明显的南北渐变群,而甲虫和等足类动物没有明显的种群模式,但在遗传多样性上有所不同。
我们开发了一种从大量无脊椎动物宏条形码数据中推断种内遗传多样性的策略。需要强调的是,在这一点上,由于标本大小的差异、引物偏差以及低丰度序列变体的丢失,这种基于宏条形码的单倍型分型无法捕获此类样本中存在的全部多样性。然而,对于大量物种,种内多样性得以恢复,识别出潜在隔离的种群和分类单元,以便进行更详细的系统地理学研究。虽然我们目前缺乏大规模的宏条形码数据集来充分利用我们的新方法,但基于宏条形码的单倍型分型对于不仅寻求物种多样性信息而且寻求潜在遗传多样性信息的生物监测工作具有巨大潜力。