Pappas Derek James, Tomich Alannah, Garnier Federico, Marry Evelyne, Gourraud Pierre-Antoine
Children's Hospital Research Institute, Oakland, CA 94609, USA.
University of California, Berkeley, Berkeley, CA, USA.
Hum Immunol. 2015 May;76(5):374-80. doi: 10.1016/j.humimm.2015.01.029. Epub 2015 Jan 28.
High-resolution haplotype frequency estimations and descriptive metrics are becoming increasingly popular for accurately describing human leukocyte antigen diversity. In this study, we compared sample sets of publically available haplotype frequencies from different populations to characterize the consequences of unequal sample size on haplotype frequency estimation. We found that for low samples sizes (a few thousand), haplotype frequencies were overestimated, affecting all descriptive metrics of the underlying distribution, such as most frequent haplotype, the number of haplotypes, and the mean/median frequency. This overestimation was a result of random sample fluctuation and truncation of the tail end of the frequency distribution that comprises the least frequent haplotypes. Finally, we simulated balanced datasets through resampling and contrasted the disparities of descriptive metrics among equal and unequal datasets. This simulation resulted in the global description of the most frequent human leukocyte antigen haplotypes worldwide.
高分辨率单倍型频率估计和描述性指标在准确描述人类白细胞抗原多样性方面越来越受欢迎。在本研究中,我们比较了来自不同人群的公开可用单倍型频率样本集,以表征样本量不等对单倍型频率估计的影响。我们发现,对于小样本量(几千个),单倍型频率被高估,影响了基础分布的所有描述性指标,如最常见单倍型、单倍型数量以及平均/中位数频率。这种高估是随机样本波动以及包含最罕见单倍型的频率分布尾端被截断的结果。最后,我们通过重采样模拟了平衡数据集,并对比了相等和不相等数据集之间描述性指标的差异。该模拟得出了全球最常见人类白细胞抗原单倍型的总体描述。