重尾分布的幂律：为国家骨髓捐献者计划建立等位基因和单倍型多样性模型。

Power laws for heavy-tailed distributions: modeling allele and haplotype diversity for the national marrow donor program.

作者信息

Slater Noa, Louzoun Yoram, Gragert Loren, Maiers Martin, Chatterjee Ansu, Albrecht Mark

机构信息

Gonda Brain Research Center, Bar-Ilan University, Ramat Gan, Israel.

Gonda Brain Research Center, Bar-Ilan University, Ramat Gan, Israel; Department of Mathematics, Bar-Ilan University, Ramat Gan, Israel.

出版信息

PLoS Comput Biol. 2015 Apr 22;11(4):e1004204. doi: 10.1371/journal.pcbi.1004204. eCollection 2015 Apr.

DOI:10.1371/journal.pcbi.1004204

PMID:25901749

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4406525/

Abstract

Measures of allele and haplotype diversity, which are fundamental properties in population genetics, often follow heavy tailed distributions. These measures are of particular interest in the field of hematopoietic stem cell transplant (HSCT). Donor/Recipient suitability for HSCT is determined by Human Leukocyte Antigen (HLA) similarity. Match predictions rely upon a precise description of HLA diversity, yet classical estimates are inaccurate given the heavy-tailed nature of the distribution. This directly affects HSCT matching and diversity measures in broader fields such as species richness. We, therefore, have developed a power-law based estimator to measure allele and haplotype diversity that accommodates heavy tails using the concepts of regular variation and occupancy distributions. Application of our estimator to 6.59 million donors in the Be The Match Registry revealed that haplotypes follow a heavy tail distribution across all ethnicities: for example, 44.65% of the European American haplotypes are represented by only 1 individual. Indeed, our discovery rate of all U.S. European American haplotypes is estimated at 23.45% based upon sampling 3.97% of the population, leaving a large number of unobserved haplotypes. Population coverage, however, is much higher at 99.4% given that 90% of European Americans carry one of the 4.5% most frequent haplotypes. Alleles were found to be less diverse suggesting the current registry represents most alleles in the population. Thus, for HSCT registries, haplotype discovery will remain high with continued recruitment to a very deep level of sampling, but population coverage will not. Finally, we compared the convergence of our power-law versus classical diversity estimators such as Capture recapture, Chao, ACE and Jackknife methods. When fit to the haplotype data, our estimator displayed favorable properties in terms of convergence (with respect to sampling depth) and accuracy (with respect to diversity estimates). This suggests that power-law based estimators offer a valid alternative to classical diversity estimators and may have broad applicability in the field of population genetics.

摘要

等位基因和单倍型多样性的测量是群体遗传学的基本属性，通常遵循重尾分布。这些测量在造血干细胞移植（HSCT）领域尤为重要。HSCT的供体/受体适配性由人类白细胞抗原（HLA）相似度决定。匹配预测依赖于HLA多样性的精确描述，但鉴于分布的重尾性质，传统估计并不准确。这直接影响了HSCT匹配以及更广泛领域（如物种丰富度）中的多样性测量。因此，我们开发了一种基于幂律的估计器，利用正则变化和占用分布的概念来测量适应重尾的等位基因和单倍型多样性。将我们的估计器应用于“成为配型登记处”的659万捐赠者，结果显示单倍型在所有种族中都遵循重尾分布：例如，仅1个人就代表了44.65%的欧裔美国人单倍型。事实上，基于对3.97%的人口进行抽样，我们对所有美国欧裔美国人单倍型的发现率估计为23.45%，这意味着有大量未观察到的单倍型。然而，由于90%的欧裔美国人携带4.5%最常见单倍型中的一种，群体覆盖率要高得多，为99.4%。发现等位基因的多样性较低，这表明当前登记处代表了群体中的大多数等位基因。因此，对于HSCT登记处而言，随着持续招募到非常深入的抽样水平，单倍型的发现率仍将很高，但群体覆盖率不会。最后，我们比较了我们的幂律估计器与传统多样性估计器（如捕获再捕获、Chao、ACE和刀切法）的收敛情况。当拟合单倍型数据时，我们的估计器在收敛性（相对于抽样深度）和准确性（相对于多样性估计）方面表现出良好的特性。这表明基于幂律的估计器为传统多样性估计器提供了一种有效的替代方法，并且可能在群体遗传学领域具有广泛的适用性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1671/4406525/661f4f9c3a24/pcbi.1004204.g001.jpg

相似文献

Power laws for heavy-tailed distributions: modeling allele and haplotype diversity for the national marrow donor program.重尾分布的幂律：为国家骨髓捐献者计划建立等位基因和单倍型多样性模型。

PLoS Comput Biol. 2015 Apr 22;11(4):e1004204. doi: 10.1371/journal.pcbi.1004204. eCollection 2015 Apr.

Modeling coverage gaps in haplotype frequencies via Bayesian inference to improve stem cell donor selection.通过贝叶斯推断建模单倍型频率的覆盖缺口，以改进干细胞供体选择。

Immunogenetics. 2018 May;70(5):279-292. doi: 10.1007/s00251-017-1040-4. Epub 2017 Nov 9.

Availability of HLA-allele-matched unrelated donors and registry size: Estimation from haplotype frequency in the Italian population.HLA 等位基因匹配的无关供体的可用性和登记规模：从意大利人群的单倍型频率估计。

Hum Immunol. 2021 Oct;82(10):758-766. doi: 10.1016/j.humimm.2021.07.012. Epub 2021 Aug 2.

A European HLA Isolate and Its Implications for Hematopoietic Stem Cell Transplant Donor Procurement.一个欧洲 HLA 分离株及其对造血干细胞移植供体采集的影响。

Biol Blood Marrow Transplant. 2018 Mar;24(3):587-593. doi: 10.1016/j.bbmt.2017.10.010. Epub 2017 Oct 13.

Next-generation sequencing reveals new information about HLA allele and haplotype diversity in a large European American population.下一代测序揭示了大型欧洲裔人群中 HLA 等位基因和单倍型多样性的新信息。

Hum Immunol. 2019 Oct;80(10):807-822. doi: 10.1016/j.humimm.2019.07.275. Epub 2019 Jul 22.

Significance of regional population HLA immunogenetic datasets in the efficacy of umbilical cord blood banks and marrow donor registries: a study of Cretan HLA genetic diversity.区域人群 HLA 免疫遗传学数据集对脐血库和骨髓供者登记处功效的意义：克里特岛 HLA 遗传多样性研究。

Cytotherapy. 2022 Feb;24(2):183-192. doi: 10.1016/j.jcyt.2021.07.010. Epub 2021 Aug 28.

The distribution of HLA haplotypes in the ethnic groups that make up the Brazilian Bone Marrow Volunteer Donor Registry (REDOME).巴西骨髓自愿捐献者登记处（REDOME）所包含的民族群体中的 HLA 单倍型分布。

Immunogenetics. 2018 Aug;70(8):511-522. doi: 10.1007/s00251-018-1059-1. Epub 2018 Apr 26.

HLA-A, -B, -C, and -DRB1 allele and haplotype frequencies distinguish Eastern European Americans from the general European American population.HLA - A、- B、- C和 - DRB1等位基因及单倍型频率可将东欧裔美国人与一般欧美人群区分开来。

Tissue Antigens. 2009 Jan;73(1):17-32. doi: 10.1111/j.1399-0039.2008.01151.x. Epub 2008 Oct 24.

Validation of statistical imputation of allele-level multilocus phased genotypes from ambiguous HLA assignments.基于模糊的人类白细胞抗原（HLA）分型对等位基因水平多位点分型基因型进行统计推断的验证

Tissue Antigens. 2014 Sep;84(3):285-92. doi: 10.1111/tan.12390. Epub 2014 Jul 11.

The heterogeneous HLA genetic composition of the Brazilian population and its relevance to the optimization of hematopoietic stem cell donor recruitment.巴西人群异质性的人类白细胞抗原（HLA）基因组成及其与优化造血干细胞供者招募的相关性。

Tissue Antigens. 2014 Aug;84(2):187-97. doi: 10.1111/tan.12352. Epub 2014 Apr 12.

引用本文的文献

HLA EPLET Frequencies Are Similar in Six Population Groups and Are Expressed by the Most Common HLA Alleles.HLA表位频率在六个群体中相似，且由最常见的HLA等位基因所表达。

HLA. 2024 Dec;104(6):e70000. doi: 10.1111/tan.70000.

Bw4 ligand and direct T-cell receptor binding induced selection on HLA A and B alleles.Bw4 配体和直接 T 细胞受体结合诱导 HLA A 和 B 等位基因的选择。

Front Immunol. 2023 Nov 21;14:1236080. doi: 10.3389/fimmu.2023.1236080. eCollection 2023.

Molecular HLA mismatching for prediction of primary humoral alloimmunity and graft function deterioration in paediatric kidney transplantation.分子 HLA 错配预测儿科肾移植中原发性体液性同种异体免疫和移植物功能恶化。

Front Immunol. 2023 Mar 15;14:1092335. doi: 10.3389/fimmu.2023.1092335. eCollection 2023.

Cataloguing experimentally confirmed 80.7 kb-long ACKR1 haplotypes from the 1000 Genomes Project database.从 1000 基因组计划数据库中编目经实验证实的 80.7 kb 长的 ACKR1 单倍型。

BMC Bioinformatics. 2021 May 26;22(1):273. doi: 10.1186/s12859-021-04169-6.

Single haplotype admixture models using large scale HLA genotype frequencies to reproduce human admixture.利用大规模 HLA 基因型频率进行单倍型混合模型，以重现人类混合。

Immunogenetics. 2019 Nov;71(10):589-604. doi: 10.1007/s00251-019-01144-7. Epub 2019 Nov 18.

HLA alleles and haplotypes observed in 263 US families.263 个美国家庭中观察到的 HLA 等位基因和单倍型。

Hum Immunol. 2019 Sep;80(9):644-660. doi: 10.1016/j.humimm.2019.05.018. Epub 2019 Jun 27.

Multiplicative fitness, rapid haplotype discovery, and fitness decay explain evolution of human MHC.多效适应度、快速单倍型发现和适应度衰减解释了人类 MHC 的进化。

Proc Natl Acad Sci U S A. 2019 Jul 9;116(28):14098-14104. doi: 10.1073/pnas.1714436116. Epub 2019 Jun 21.

Meta-populational demes constitute a reservoir for large MHC allele diversity in wild house mice ().元种群的同类群构成了野生家鼠中主要组织相容性复合体（MHC）等位基因多样性的储存库。

Front Zool. 2018 Apr 20;15:15. doi: 10.1186/s12983-018-0266-9. eCollection 2018.

Immunogenetics. 2018 May;70(5):279-292. doi: 10.1007/s00251-017-1040-4. Epub 2017 Nov 9.

HLA class I haplotype diversity is consistent with selection for frequent existing haplotypes.人类白细胞抗原I类单倍型多样性与对常见现有单倍型的选择相一致。

PLoS Comput Biol. 2017 Aug 28;13(8):e1005693. doi: 10.1371/journal.pcbi.1005693. eCollection 2017 Aug.

本文引用的文献

Identification of 2127 new HLA class I alleles in potential stem cell donors from Germany, the United States and Poland.在来自德国、美国和波兰的潜在干细胞供体中鉴定出2127个新的HLA I类等位基因。

Tissue Antigens. 2014 Mar;83(3):184-9. doi: 10.1111/tan.12304.

Next-generation sequencing can reveal in vitro-generated PCR crossover products: some artifactual sequences correspond to HLA alleles in the IMGT/HLA database.下一代测序能够揭示体外产生的PCR交叉产物：一些人为序列与IMGT/HLA数据库中的HLA等位基因相对应。

Tissue Antigens. 2014 Jan;83(1):32-40. doi: 10.1111/tan.12269.

Hyperdominance in the Amazonian tree flora.亚马孙树区系的优势现象。

Science. 2013 Oct 18;342(6156):1243092. doi: 10.1126/science.1243092.

Six-locus high resolution HLA haplotype frequencies derived from mixed-resolution DNA typing for the entire US donor registry.六位点高分辨 HLA 单倍型频率来自于整个美国供者库的混合分辨率 DNA 分型。

Hum Immunol. 2013 Oct;74(10):1313-20. doi: 10.1016/j.humimm.2013.06.025. Epub 2013 Jun 24.

Common and well-documented HLA alleles: 2012 update to the CWD catalogue.常见且有充分文献记载的HLA等位基因：2012年CWD目录更新版

Tissue Antigens. 2013 Apr;81(4):194-203. doi: 10.1111/tan.12093.

Filling the gaps - the generation of full genomic sequences for 15 common and well-documented HLA class I alleles using next-generation sequencing technology.填补空白 - 使用下一代测序技术为 15 个常见且有充分文献记录的 HLA Ⅰ类等位基因生成完整的基因组序列。

Hum Immunol. 2013 Mar;74(3):325-9. doi: 10.1016/j.humimm.2012.12.007. Epub 2012 Dec 13.

New reservoirs of HLA alleles: pools of rare variants enhance immune defense.新的 HLA 等位基因库：稀有变异体池增强免疫防御。

Trends Genet. 2012 Oct;28(10):480-6. doi: 10.1016/j.tig.2012.06.007. Epub 2012 Aug 3.

Estimating population diversity with CatchAll.使用 CatchAll 估计种群多样性。

Bioinformatics. 2012 Apr 1;28(7):1045-7. doi: 10.1093/bioinformatics/bts075. Epub 2012 Feb 13.

Are there laws of genome evolution?基因组进化有规律可循吗？

PLoS Comput Biol. 2011 Aug;7(8):e1002173. doi: 10.1371/journal.pcbi.1002173. Epub 2011 Aug 25.

Precise determination of the diversity of a combinatorial antibody library gives insight into the human immunoglobulin repertoire.精确测定组合抗体文库的多样性可深入了解人类免疫球蛋白库。

Proc Natl Acad Sci U S A. 2009 Dec 1;106(48):20216-21. doi: 10.1073/pnas.0909775106. Epub 2009 Oct 29.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

重尾分布的幂律：为国家骨髓捐献者计划建立等位基因和单倍型多样性模型。

Power laws for heavy-tailed distributions: modeling allele and haplotype diversity for the national marrow donor program.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献