INRA, UMR 1313 Génétique Animale et Biologie Intégrative, 78350, Jouy-en-Josas, France.
BMC Genomics. 2012 Jun 13;13:238. doi: 10.1186/1471-2164-13-238.
There is considerable interest in developing high-throughput genotyping with single nucleotide polymorphisms (SNPs) for the identification of genes affecting important ecological or economical traits. SNPs are evenly distributed throughout the genome and are likely to be functionally relevant. In rainbow trout, in silico screening of EST databases represents an attractive approach for de novo SNP identification. Nevertheless, EST sequencing errors and assembly of EST paralogous sequences can lead to the identification of false positive SNPs which renders the reliability of EST-derived SNPs relatively low. Further validation of EST-derived SNPs is therefore required. The objective of this work was to assess the quality of and to validate a large number of rainbow trout EST-derived SNPs.
A panel of 1,152 EST-derived SNPs was selected from the INRA Sigenae SNP database and was genotyped in standard and double haploid individuals from several populations using the Illumina GoldenGate BeadXpress assay. High-quality genotyping data were obtained for 958 SNPs representing a genotyping success rate of 83.2 %, out of which, 350 SNPs (36.5 %) were polymorphic in at least one population and were designated as true SNPs. They also proved to be a potential tool to investigate genetic diversity of the species, as the set of SNP successfully sorted individuals into three main groups using STRUCTURE software. Functional annotations revealed 28 non-synonymous SNPs, out of which four substitutions were predicted to affect protein functions. A subset of 223 true SNPs were polymorphic in the two INRA mapping reference families and were integrated into the INRA microsatellite-based linkage map.
Our results represent the first study of EST-derived SNPs validation in rainbow trout, a species whose genome sequences is not yet available. We designed several specific filters in order to improve the genotyping yield. Nevertheless, our selection criteria should be further improved in order to reduce the observed high rate of false positive SNPs which results from the occurrence of whole genome duplications.
人们对于开发高通量单核苷酸多态性(SNP)基因分型技术以鉴定影响重要生态或经济性状的基因有着浓厚的兴趣。SNP 在基因组中均匀分布,并且可能具有功能相关性。在虹鳟鱼中,EST 数据库的计算机筛选代表了从头鉴定 SNP 的一种有吸引力的方法。然而,EST 测序错误和 EST 旁系同源序列的组装可能导致假阳性 SNP 的鉴定,从而降低 EST 衍生 SNP 的可靠性。因此,需要进一步验证 EST 衍生 SNP。本工作的目的是评估大量虹鳟鱼 EST 衍生 SNP 的质量并对其进行验证。
从 INRA Sigenae SNP 数据库中选择了一组 1152 个 EST 衍生 SNP,并使用 Illumina GoldenGate BeadXpress 分析在来自多个种群的标准和双单倍体个体中进行基因分型。高质量的基因分型数据可用于 958 个 SNP,基因分型成功率为 83.2%,其中至少有一个群体中 350 个 SNP(36.5%)多态性,被指定为真正的 SNP。它们还被证明是一种潜在的工具,可以用来研究物种的遗传多样性,因为使用 STRUCTURE 软件,这组 SNP 成功地将个体分为三个主要群体。功能注释显示了 28 个非同义 SNP,其中 4 个取代被预测会影响蛋白质功能。在两个 INRA 作图参考家系中,223 个真正 SNP 的多态性,并且被整合到 INRA 基于微卫星的连锁图谱中。
我们的结果代表了在虹鳟鱼中首次进行 EST 衍生 SNP 验证的研究,该物种的基因组序列尚未公布。我们设计了几个特定的筛选标准以提高基因分型的产量。然而,我们的选择标准需要进一步改进,以减少由于全基因组重复而导致的假阳性 SNP 的高发生率。