Fondazione Parco Tecnologico Padano, Via Einstein, Loc, Cascina Codazza, Lodi 26900, Italy.
BMC Genomics. 2014 Feb 11;15:123. doi: 10.1186/1471-2164-15-123.
Currently, six commercial whole-genome SNP chips are available for cattle genotyping, produced by two different genotyping platforms. Technical issues need to be addressed to combine data that originates from the different platforms, or different versions of the same array generated by the manufacturer. For example: i) genome coordinates for SNPs may refer to different genome assemblies; ii) reference genome sequences are updated over time changing the positions, or even removing sequences which contain SNPs; iii) not all commercial SNP ID's are searchable within public databases; iv) SNPs can be coded using different formats and referencing different strands (e.g. A/B or A/C/T/G alleles, referencing forward/reverse, top/bottom or plus/minus strand); v) Due to new information being discovered, higher density chips do not necessarily include all the SNPs present in the lower density chips; and, vi) SNP IDs may not be consistent across chips and platforms. Most researchers and breed associations manage SNP data in real-time and thus require tools to standardise data in a user-friendly manner.
Here we present SNPchiMp, a MySQL database linked to an open access web-based interface. Features of this interface include, but are not limited to, the following functions: 1) referencing the SNP mapping information to the latest genome assembly, 2) extraction of information contained in dbSNP for SNPs present in all commercially available bovine chips, and 3) identification of SNPs in common between two or more bovine chips (e.g. for SNP imputation from lower to higher density). In addition, SNPchiMp can retrieve this information on subsets of SNPs, accessing such data either via physical position on a supported assembly, or by a list of SNP IDs, rs or ss identifiers.
This tool combines many different sources of information, that otherwise are time consuming to obtain and difficult to integrate. The SNPchiMp not only provides the information in a user-friendly format, but also enables researchers to perform a large number of operations with a few clicks of the mouse. This significantly reduces the time needed to execute the large number of operations required to manage SNP data.
目前,有六种商业全基因组 SNP 芯片可用于牛基因分型,它们由两个不同的基因分型平台生产。需要解决技术问题,以便组合来自不同平台或同一制造商生成的不同版本的阵列的数据。例如:i)SNP 的基因组坐标可能指的是不同的基因组组装;ii)参考基因组序列随时间更新,改变位置,甚至删除包含 SNP 的序列;iii)并非所有商业 SNP ID 都可在公共数据库中搜索;iv)SNP 可以使用不同的格式进行编码,并引用不同的链(例如 A/B 或 A/C/T/G 等位基因,引用正向/反向、顶部/底部或正负链);v)由于新信息的发现,更高密度的芯片不一定包含较低密度芯片中存在的所有 SNP;vi)SNP ID 可能在不同的芯片和平台之间不一致。大多数研究人员和育种群协会实时管理 SNP 数据,因此需要以用户友好的方式标准化数据的工具。
在这里,我们展示了 SNPchiMp,这是一个链接到开放访问基于网络的接口的 MySQL 数据库。该接口的功能包括但不限于以下功能:1)将 SNP 映射信息引用到最新的基因组组装,2)提取所有商业上可用的牛芯片中存在的 SNP 中 dbSNP 包含的信息,3)识别两个或更多牛芯片之间共同的 SNP(例如,用于从较低密度到较高密度的 SNP 推断)。此外,SNPchiMp 可以检索这些信息的 SNP 子集,通过支持组装上的物理位置或 SNP ID、rs 或 ss 标识符列表访问此类数据。
该工具结合了许多不同的信息来源,否则这些信息来源获取起来很耗时,并且难以整合。SNPchiMp 不仅以用户友好的格式提供信息,还使研究人员能够通过鼠标点击几下执行大量操作。这大大减少了执行管理 SNP 数据所需的大量操作所需的时间。