Margraf Rebecca L, Durtschi Jacob D, Dames Shale, Pattison David C, Stephens Jack E, Voelkerding Karl V
ARUP Institute for Clinical & Experimental Pathology®, Salt Lake City, Utah, USA.
J Biomol Tech. 2011 Jul;22(2):74-84.
Multi-sample pooling and Illumina Genome Analyzer (GA) sequencing allows high throughput sequencing of multiple samples to determine population sequence variation. A preliminary experiment, using the RET proto-oncogene as a model, predicted ≤ 30 samples could be pooled to reliably detect singleton variants without requiring additional confirmation testing. This report used 30 and 50 sample pools to test the hypothesized pooling limit and also to test recent protocol improvements, Illumina GAIIx upgrades, and longer read chemistry. The SequalPrep(TM) method was used to normalize amplicons before pooling. For comparison, a single 'control' sample was run in a different flow cell lane. Data was evaluated by variant read percentages and the subtractive correction method which utilizes the control sample. In total, 59 variants were detected within the pooled samples, which included all 47 known true variants. The 15 known singleton variants due to Sanger sequencing had an average of 1.62 ± 0.26% variant reads for the 30 pool (expected 1.67% for a singleton variant [unique variant within the pool]) and 1.01 ± 0.19% for the 50 pool (expected 1%). The 76 base read lengths had higher error rates than shorter read lengths (33 and 50 base reads), which eliminated the distinction of true singleton variants from background error. This report demonstrated pooling limits from 30 up to 50 samples (depending on error rates and coverage), for reliable singleton variant detection. The presented pooling protocols and analysis methods can be used for variant discovery in other genes, facilitating molecular diagnostic test design and interpretation.
多样本混合及Illumina基因组分析仪(GA)测序可对多个样本进行高通量测序,以确定群体序列变异。一项以RET原癌基因为模型的初步实验预测,可将≤30个样本混合,以可靠地检测单例变异,而无需额外的确认测试。本报告使用30样本池和50样本池来测试假设的混合极限,并测试近期的方案改进、Illumina GAIIx升级以及更长读长的化学方法。在混合之前,使用SequalPrep™方法对扩增子进行标准化。为作比较,在不同的流动池泳道中运行单个“对照”样本。通过变异读数百分比和利用对照样本的减法校正方法对数据进行评估。在混合样本中共检测到59个变异,其中包括所有47个已知的真实变异。对于30样本池,因桑格测序产生的15个已知单例变异的变异读数平均为1.62±0.26%(单例变异[样本池内的独特变异]预期为1.67%),对于50样本池则为1.01±0.19%(预期为1%)。76碱基的读长比更短的读长(33和50碱基读长)具有更高的错误率,这消除了真实单例变异与背景错误之间的区别。本报告证明了30至50个样本的混合极限(取决于错误率和覆盖率),用于可靠的单例变异检测。所提出的混合方案和分析方法可用于其他基因的变异发现,有助于分子诊断测试的设计和解读。