INRA EFPA, UMR CBGP (INRA/IRD/Cirad/Montpellier SupAgro), Campus international de Baillarguet, CS 30016, F-34988 Montferrier-sur-Lez cedex, France.
BMC Genomics. 2010 May 11;11:296. doi: 10.1186/1471-2164-11-296.
High-throughput sequencing technologies offer new perspectives for biomedical, agronomical and evolutionary research. Promising progresses now concern the application of these technologies to large-scale studies of genetic variation. Such studies require the genotyping of high numbers of samples. This is theoretically possible using 454 pyrosequencing, which generates billions of base pairs of sequence data. However several challenges arise: first in the attribution of each read produced to its original sample, and second, in bioinformatic analyses to distinguish true from artifactual sequence variation. This pilot study proposes a new application for the 454 GS FLX platform, allowing the individual genotyping of thousands of samples in one run. A probabilistic model has been developed to demonstrate the reliability of this method.
DNA amplicons from 1,710 rodent samples were individually barcoded using a combination of tags located in forward and reverse primers. Amplicons consisted in 222 bp fragments corresponding to DRB exon 2, a highly polymorphic gene in mammals. A total of 221,789 reads were obtained, of which 153,349 were finally assigned to original samples. Rules based on a probabilistic model and a four-step procedure, were developed to validate sequences and provide a confidence level for each genotype. The method gave promising results, with the genotyping of DRB exon 2 sequences for 1,407 samples from 24 different rodent species and the sequencing of 392 variants in one half of a 454 run. Using replicates, we estimated that the reproducibility of genotyping reached 95%.
This new approach is a promising alternative to classical methods involving electrophoresis-based techniques for variant separation and cloning-sequencing for sequence determination. The 454 system is less costly and time consuming and may enhance the reliability of genotypes obtained when high numbers of samples are studied. It opens up new perspectives for the study of evolutionary and functional genetics of highly polymorphic genes like major histocompatibility complex genes in vertebrates or loci regulating self-compatibility in plants. Important applications in biomedical research will include the detection of individual variation in disease susceptibility. Similarly, agronomy will benefit from this approach, through the study of genes implicated in productivity or disease susceptibility traits.
高通量测序技术为生物医学、农艺学和进化研究提供了新的视角。目前令人振奋的进展涉及将这些技术应用于大规模的遗传变异研究。此类研究需要对大量样本进行基因分型。使用产生数十亿个碱基对序列数据的 454 焦磷酸测序在理论上是可行的。然而,出现了一些挑战:首先是将生成的每个读取与原始样本相关联,其次是在生物信息学分析中区分真实和人为的序列变异。本研究提出了 454 GS FLX 平台的新应用,可在一次运行中对数千个样本进行个体基因分型。已经开发了一种概率模型来证明该方法的可靠性。
使用位于正向和反向引物中的标签组合,对来自 1710 个啮齿动物样本的 DNA 扩增子进行了个体条码标记。扩增子由 222bp 片段组成,对应于哺乳动物高度多态性的 DRB 外显子 2 基因。共获得 221789 个读取,其中 153349 个最终被分配到原始样本。基于概率模型和四步程序的规则被开发出来,以验证序列并为每个基因型提供置信水平。该方法取得了有希望的结果,对来自 24 个不同啮齿动物物种的 1407 个样本的 DRB 外显子 2 序列进行了基因分型,并在 454 运行的一半中对 392 个变体进行了测序。使用重复样本,我们估计基因分型的重现性达到 95%。
与涉及电泳技术分离变体和克隆测序确定序列的经典方法相比,这种新方法是一种很有前途的替代方法。454 系统成本更低、耗时更少,并且在研究大量样本时可以提高获得的基因型的可靠性。它为高度多态性基因(如脊椎动物的主要组织相容性复合体基因或调节自交亲和性的植物基因)的进化和功能遗传学研究开辟了新的前景。在生物医学研究中的重要应用包括检测疾病易感性的个体变异。同样,农学也将受益于这种方法,通过研究与生产力或疾病易感性性状相关的基因。