Lancaster Alex, Nelson Mark P, Meyer Diogo, Single Richard M, Thomson Glenys
Department of Integrative Biology, University of California, Berkeley, 3060 Valley Life Sciences, Berkeley, CA 94720, USA.
Pac Symp Biocomput. 2003:514-25.
Software to analyze multi-locus genotype data for entire populations is useful for estimating haplotype frequencies, deviation from Hardy-Weinberg equilibrium and patterns of linkage disequilibrium. These statistical results are important to both those interested in human genome variation and disease predisposition as well as evolutionary genetics. As part of the 13th International Histocompatibility and Immunogenetics Working Group (IHWG), we have developed a software framework (PyPop). The primary novelty of this package is that it allows integration of statistics across large numbers of data-sets by heavily utilizing the XML file format and the R statistical package to view graphical output, while retaining the ability to inter-operate with existing software. Largely developed to address human population data, it can, however, be used for population based data for any organism. We tested our software on the data from the 13th IHWG which involved data sets from at least 50 laboratories each of up to 1000 individuals with 9 MHC loci (both class I and class II) and found that it scales to large numbers of data sets well.
用于分析整个人群多位点基因型数据的软件,对于估计单倍型频率、偏离哈迪-温伯格平衡的程度以及连锁不平衡模式很有用。这些统计结果对于关注人类基因组变异和疾病易感性的人以及进化遗传学领域的人都很重要。作为第13届国际组织相容性和免疫遗传学工作组(IHWG)的一部分,我们开发了一个软件框架(PyPop)。该软件包的主要新颖之处在于,它通过大量使用XML文件格式和R统计软件包来查看图形输出,从而允许整合大量数据集的统计信息,并保留了与现有软件进行互操作的能力。虽然它主要是为处理人类群体数据而开发的,但也可用于任何生物体的群体数据。我们使用第13届IHWG的数据对我们的软件进行了测试,这些数据涉及至少50个实验室的数据集,每个实验室有多达1000个个体,包含9个MHC基因座(I类和II类),结果发现它能很好地扩展到大量数据集。