Animal Ecology, Department of Ecology and Genetics, Evolutionary Biology Centre, Uppsala University, Uppsala, Sweden.
Science for Life Laboratory, Solna, Sweden.
Mol Ecol Resour. 2020 Sep;20(5):1171-1181. doi: 10.1111/1755-0998.13009. Epub 2019 May 5.
The high-throughput capacities of the Illumina sequencing platforms and the possibility to label samples individually have encouraged wide use of sample multiplexing. However, this practice results in read misassignment (usually <1%) across samples sequenced on the same lane. Alarmingly high rates of read misassignment of up to 10% were reported for lllumina sequencing machines with exclusion amplification chemistry. This may make use of these platforms prohibitive, particularly in studies that rely on low-quantity and low-quality samples, such as historical and archaeological specimens. Here, we use barcodes, short sequences that are ligated to both ends of the DNA insert, to directly quantify the rate of index hopping in 100-year old museum-preserved gorilla (Gorilla beringei) samples. Correcting for multiple sources of noise, we identify on average 0.470% of reads containing a hopped index. We show that sample-specific quantity of misassigned reads depends on the number of reads that any given sample contributes to the total sequencing pool, so that samples with few sequenced reads receive the greatest proportion of misassigned reads. This particularly affects ancient DNA samples, as these frequently differ in their DNA quantity and endogenous content. Through simulations we show that even low rates of index hopping, as reported here, can lead to biases in ancient DNA studies when multiplexing samples with vastly different quantities of endogenous material.
Illumina 测序平台的高通量能力以及对样本进行单独标记的可能性,鼓励了广泛使用样本多重化。然而,这种做法会导致在同一条泳道上测序的样本之间出现读取错误分配(通常<1%)。令人震惊的是,高达 10%的读取错误分配率被报道用于排除扩增化学的 Illumina 测序仪。这可能会使这些平台的使用变得不可行,特别是在依赖低数量和低质量样本的研究中,如历史和考古标本。在这里,我们使用条形码,即连接在 DNA 插入物两端的短序列,直接定量 100 年前博物馆保存的大猩猩(Gorilla beringei)样本中的索引跳跃率。在纠正多种噪声源后,我们平均确定 0.470%的包含跳跃索引的读取。我们表明,样本特异性的错误分配读取数量取决于任何给定样本对总测序池的贡献的读取数量,因此具有少数测序读取的样本接收最大比例的错误分配读取。这特别影响古代 DNA 样本,因为这些样本的 DNA 数量和内源性含量经常不同。通过模拟,我们表明,即使是这里报道的低索引跳跃率,也会导致在具有极大不同内源性物质数量的样本进行多重化时,在古代 DNA 研究中产生偏差。