Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA; Program in Medical and Population Genetics, Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, MA 02142, USA; Division of Medical Sciences and Department of Medicine, Harvard Medical School, Boston, MA 02115, USA.
Department of Pediatrics, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA; Department of Pediatrics, Division of Human Genetics, Perelman School of Medicine, Philadelphia, PA 19104, USA.
Cell. 2022 Aug 4;185(16):3041-3055.e25. doi: 10.1016/j.cell.2022.06.036. Epub 2022 Aug 1.
Rare copy-number variants (rCNVs) include deletions and duplications that occur infrequently in the global human population and can confer substantial risk for disease. In this study, we aimed to quantify the properties of haploinsufficiency (i.e., deletion intolerance) and triplosensitivity (i.e., duplication intolerance) throughout the human genome. We harmonized and meta-analyzed rCNVs from nearly one million individuals to construct a genome-wide catalog of dosage sensitivity across 54 disorders, which defined 163 dosage sensitive segments associated with at least one disorder. These segments were typically gene dense and often harbored dominant dosage sensitive driver genes, which we were able to prioritize using statistical fine-mapping. Finally, we designed an ensemble machine-learning model to predict probabilities of dosage sensitivity (pHaplo & pTriplo) for all autosomal genes, which identified 2,987 haploinsufficient and 1,559 triplosensitive genes, including 648 that were uniquely triplosensitive. This dosage sensitivity resource will provide broad utility for human disease research and clinical genetics.
罕见的拷贝数变异(rCNVs)包括在全球人类群体中罕见发生的缺失和重复,这些变异可能会导致严重的疾病风险。在这项研究中,我们旨在量化整个人类基因组中杂合不足(即缺失不耐受)和三体敏感性(即重复不耐受)的特性。我们对近一百万人的 rCNVs 进行了协调和荟萃分析,构建了一个涵盖 54 种疾病的全基因组剂量敏感性目录,其中定义了 163 个与至少一种疾病相关的剂量敏感片段。这些片段通常基因密度较高,并且经常含有显性剂量敏感驱动基因,我们可以使用统计精细映射来优先考虑这些基因。最后,我们设计了一个集成机器学习模型来预测所有常染色体基因的剂量敏感性概率(pHaplo 和 pTriplo),该模型确定了 2987 个杂合不足和 1559 个三体敏感基因,其中包括 648 个独特的三体敏感基因。这个剂量敏感性资源将为人类疾病研究和临床遗传学提供广泛的应用。