Erlich Yaniv, Chang Kenneth, Gordon Assaf, Ronen Roy, Navon Oron, Rooks Michelle, Hannon Gregory J
Watson School of Biological Sciences, Howard Hughes Medical Institute, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11724, USA.
Genome Res. 2009 Jul;19(7):1243-53. doi: 10.1101/gr.092957.109. Epub 2009 May 15.
Next-generation sequencers have sufficient power to analyze simultaneously DNAs from many different specimens, a practice known as multiplexing. Such schemes rely on the ability to associate each sequence read with the specimen from which it was derived. The current practice of appending molecular barcodes prior to pooling is practical for parallel analysis of up to many dozen samples. Here, we report a strategy that permits simultaneous analysis of tens of thousands of specimens. Our approach relies on the use of combinatorial pooling strategies in which pools rather than individual specimens are assigned barcodes. Thus, the identity of each specimen is encoded within the pooling pattern rather than by its association with a particular sequence tag. Decoding the pattern allows the sequence of an original specimen to be inferred with high confidence. We verified the ability of our encoding and decoding strategies to accurately report the sequence of individual samples within a large number of mixed specimens in two ways. First, we simulated data both from a clone library and from a human population in which a sequence variant associated with cystic fibrosis was present. Second, we actually pooled, sequenced, and decoded identities within two sets of 40,000 bacterial clones comprising approximately 20,000 different artificial microRNAs targeting Arabidopsis or human genes. We achieved greater than 97% accuracy in these trials. The strategies reported here can be applied to a wide variety of biological problems, including the determination of genotypic variation within large populations of individuals.
新一代测序仪有足够的能力同时分析来自许多不同样本的DNA,这种做法称为多重分析。此类方案依赖于将每个测序读数与其来源样本相关联的能力。目前在混合样本之前附加分子条形码的做法对于多达几十份样本的平行分析是可行的。在此,我们报告一种允许同时分析数以万计样本的策略。我们的方法依赖于使用组合混合策略,其中是对混合样本而非单个样本进行条形码标记。因此,每个样本的身份是通过混合模式进行编码,而不是通过与特定序列标签的关联来编码。对该模式进行解码可使原始样本的序列得以高度准确地推断。我们通过两种方式验证了我们的编码和解码策略在大量混合样本中准确报告单个样本序列的能力。首先,我们模拟了来自克隆文库以及存在与囊性纤维化相关序列变异的人类群体的数据。其次,我们对两组各40000个细菌克隆进行了实际混合、测序并解码其身份,这些克隆包含大约20000种靶向拟南芥或人类基因的不同人工微小RNA。在这些试验中,我们实现了超过97%的准确率。本文报道的策略可应用于各种各样的生物学问题,包括确定大量个体群体中的基因型变异。