Pettersson Fredrik, Morris Andrew P, Barnes Michael R, Cardon Lon R
Dept Bioinformatics, Wellcome Trust Centre, Oxford, UK.
BMC Bioinformatics. 2008 Mar 4;9:138. doi: 10.1186/1471-2105-9-138.
Genome wide association (GWA) studies are now being widely undertaken aiming to find the link between genetic variations and common diseases. Ideally, a well-powered GWA study will involve the measurement of hundreds of thousands of single nucleotide polymorphisms (SNPs) in thousands of individuals. The sheer volume of data generated by these experiments creates very high analytical demands. There are a number of important steps during the analysis of such data, many of which may present severe bottlenecks. The data need to be imported and reviewed to perform initial quality control (QC) before proceeding to association testing. Evaluation of results may involve further statistical analysis, such as permutation testing, or further QC of associated markers, for example, reviewing raw genotyping intensities. Finally significant associations need to be prioritised using functional and biological interpretation methods, browsing available biological annotation, pathway information and patterns of linkage disequilibrium (LD).
We have developed an interactive and user-friendly graphical application to be used in all steps in GWA projects from initial data QC and analysis to biological evaluation and validation of results. The program is implemented in Java and can be used on all platforms.
Very large data sets (e.g. 500 k markers and 5000 samples) can be quality assessed, rapidly analysed and integrated with genomic sequence information. Candidate SNPs can be selected and functionally evaluated.
全基因组关联(GWA)研究目前正在广泛开展,旨在寻找基因变异与常见疾病之间的联系。理想情况下,一项强大的GWA研究将涉及对数千个体中的数十万个单核苷酸多态性(SNP)进行测量。这些实验产生的数据量极大,对分析提出了很高的要求。在此类数据分析过程中有许多重要步骤,其中许多可能成为严重的瓶颈。在进行关联测试之前,需要导入和审查数据以进行初始质量控制(QC)。结果评估可能涉及进一步的统计分析,如置换检验,或对相关标记进行进一步的QC,例如审查原始基因分型强度。最后,需要使用功能和生物学解释方法、浏览可用的生物学注释、通路信息和连锁不平衡(LD)模式,对显著关联进行优先级排序。
我们开发了一个交互式且用户友好的图形应用程序,可用于GWA项目的所有步骤,从初始数据QC和分析到结果的生物学评估与验证。该程序用Java实现,可在所有平台上使用。
非常大的数据集(例如50万个标记和5000个样本)可以进行质量评估、快速分析并与基因组序列信息整合。可以选择候选SNP并进行功能评估。