Center for Applied Bioinformatics, St. Jude Children's Research Hospital, Memphis, TN, USA.
Department of Computational Biology, St. Jude Children's Research Hospital, Memphis, TN, USA.
Nat Commun. 2022 May 11;13(1):2592. doi: 10.1038/s41467-022-30248-0.
Sequencing cases without matched healthy controls hinders prioritization of germline disease-predisposition genes. To circumvent this problem, genotype summary counts from public data sets can serve as controls. However, systematic inflation and false positives can arise if confounding factors are not controlled. We propose a framework, consistent summary counts based rare variant burden test (CoCoRV), to address these challenges. CoCoRV implements consistent variant quality control and filtering, ethnicity-stratified rare variant association test, accurate estimation of inflation factors, powerful FDR control, and detection of rare variant pairs in high linkage disequilibrium. When we applied CoCoRV to pediatric cancer cohorts, the top genes identified were cancer-predisposition genes. We also applied CoCoRV to identify disease-predisposition genes in adult brain tumors and amyotrophic lateral sclerosis. Given that potential confounding factors were well controlled after applying the framework, CoCoRV provides a cost-effective solution to prioritizing disease-risk genes enriched with rare pathogenic variants.
对无匹配健康对照的测序病例进行排序会阻碍对种系疾病易感性基因的优先级排序。为了规避这个问题,可以使用公共数据集的基因型汇总计数作为对照。然而,如果不控制混杂因素,可能会出现系统性膨胀和假阳性。我们提出了一个框架,即基于一致汇总计数的罕见变异负担测试(CoCoRV),以解决这些挑战。CoCoRV 实现了一致的变异质量控制和过滤、按种族分层的罕见变异关联测试、膨胀因素的精确估计、强大的 FDR 控制以及高连锁不平衡中罕见变异对的检测。当我们将 CoCoRV 应用于儿科癌症队列时,确定的首要基因是癌症易感性基因。我们还将 CoCoRV 应用于识别成人脑肿瘤和肌萎缩侧索硬化症的疾病易感性基因。鉴于在应用该框架后很好地控制了潜在的混杂因素,CoCoRV 为优先考虑富含罕见致病性变异的疾病风险基因提供了一种具有成本效益的解决方案。