Lancaster A K, Single R M, Solberg O D, Nelson M P, Thomson G
Department of Integrative Biology, University of California, Berkeley, Berkeley, CA, USA.
Tissue Antigens. 2007 Apr;69 Suppl 1(0 1):192-7. doi: 10.1111/j.1399-0039.2006.00769.x.
Population genetic statistics from multilocus genotype data inform our understanding of the patterns of genetic variation and their implications for evolutionary studies, generally, and human disease studies in particular. In any given population one can estimate haplotype frequencies, identify deviation from Hardy-Weinberg equilibrium, test for balancing or directional selection, and investigate patterns of linkage disequilibrium. Existing software packages are oriented primarily toward the computation of such statistics on a population-by-population basis, not on comparisons among populations and across different statistics. We developed PyPop (Python for Population Genomics) to facilitate the analyses of population genetic statistics across populations and the relationships among different statistics within and across populations. PyPop is an open-source framework for performing large-scale population genetic analyses on multilocus genotype data. It computes the statistics described above, among others. PyPop deploys a standard Extensible Markup Language (XML) output format and can integrate the results of multiple analyses on various populations that were performed at different times into a common output format that can be read into a spreadsheet. The XML output format allows PyPop to be embedded as part of a larger analysis pipeline. Originally developed to analyze the highly polymorphic genetic data of the human leukocyte antigen region of the human genome, PyPop has applicability to any kind of multilocus genetic data. It is the primary analysis platform for analyzing data collected for the Anthropological component of the 13th and 14th International Histocompatibility Workshops. PyPop has also been successfully used in studies by our group, with collaborators, and in publications by several independent research teams.
来自多位点基因型数据的群体遗传统计信息有助于我们理解遗传变异模式及其对进化研究(一般而言),特别是对人类疾病研究的意义。在任何给定群体中,人们可以估计单倍型频率,识别偏离哈迪 - 温伯格平衡的情况,测试平衡或定向选择,并研究连锁不平衡模式。现有的软件包主要侧重于逐个群体计算此类统计信息,而非群体间比较以及不同统计量之间的比较。我们开发了PyPop(用于群体基因组学的Python),以促进跨群体的群体遗传统计分析以及群体内部和群体之间不同统计量之间的关系分析。PyPop是一个用于对多位点基因型数据进行大规模群体遗传分析的开源框架。它可以计算上述统计信息以及其他信息。PyPop采用标准的可扩展标记语言(XML)输出格式,并能将在不同时间对各个群体进行的多次分析结果整合为一种可读取到电子表格中的通用输出格式。XML输出格式使PyPop能够作为更大分析流程的一部分被嵌入。PyPop最初是为分析人类基因组人类白细胞抗原区域的高度多态性遗传数据而开发的,适用于任何类型的多位点遗传数据。它是分析为第13届和第14届国际组织相容性研讨会的人类学部分收集的数据的主要分析平台。PyPop还在我们团队与合作者的研究以及几个独立研究团队的出版物中得到了成功应用。