Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville TN, USA 37232.
Department of Molecular Physiology and Biophysics, Vanderbilt University, Nashville TN, USA, 37232.
Bioinformatics. 2017 Aug 1;33(15):2399-2401. doi: 10.1093/bioinformatics/btx186.
After the introduction of high-throughput sequencing, genotyping arrays continue to be a viable source for conducting large-scale genetic studies. Currently, Illumina is one of the largest genotyping array manufacturers. One technical issue that has always plagued the post-processing of Illumina genotyping array data is the strand definition. Against convention, Illumina uses their own definition of strand, which is inconsistent with the standard reference forward and reverse definition. This issue has been a major obstacle in the consistency of reporting, meta-analysis and correct interpretation of phenotype association results. To date, the strand issue has not been adequately addressed, prompting us to develop StrandScript, a tool that can convert all genotyping data generated from Illumina genotyping arrays to the reference forward strand. StrandScript works independently of the Illumina array version and is future proof for newer Illumina array designs. Furthermore, StrandScript can examine an Illumina genotyping array manifest file and can detect all problematic SNPs, including SNPs with wrong RS ID and SNPs with mismatched probe sequences. Here, we introduce StrandScript's design and development, and demonstrate its effectiveness using real genotyping data.
https://github.com/seasky002002/Strandscript.
Supplementary data are available at Bioinformatics online.
高通量测序技术问世后,基因分型芯片仍然是进行大规模遗传研究的一种可行资源。目前,Illumina 是最大的基因分型芯片制造商之一。Illumina 基因分型芯片数据后处理一直存在一个技术问题,即链定义。与常规做法相反,Illumina 使用自己定义的链,与标准参考正向和反向定义不一致。这个问题一直是报告一致性、荟萃分析和正确解释表型关联结果的主要障碍。迄今为止,这个链的问题尚未得到充分解决,促使我们开发了 StrandScript,这是一种可以将所有来自 Illumina 基因分型芯片的基因分型数据转换为参考正向链的工具。StrandScript 独立于 Illumina 芯片版本工作,并且为较新的 Illumina 芯片设计提供了未来保障。此外,StrandScript 可以检查 Illumina 基因分型芯片清单文件,并可以检测所有有问题的 SNP,包括 RS ID 错误的 SNP 和探针序列不匹配的 SNP。在这里,我们介绍了 StrandScript 的设计和开发,并使用真实的基因分型数据展示了它的有效性。
https://github.com/seasky002002/Strandscript。
补充数据可在 Bioinformatics 在线获取。