Center for Precision Genetics and Genomics, Department of Medicine, Columbia University, New York, NY, USA.
Division of Nephrology, Department of Medicine, Columbia University, New York, NY, USA.
Funct Integr Genomics. 2024 May 20;24(3):104. doi: 10.1007/s10142-024-01358-3.
Accurate estimation of population allele frequency (AF) is crucial for gene discovery and genetic diagnostics. However, determining AF for frameshift-inducing small insertions and deletions (indels) faces challenges due to discrepancies in mapping and variant calling methods. Here, we propose an innovative approach to assess indel AF. We developed CRAFTS-indels (Calculating Regional Allele Frequency Targeting Small indels), an algorithm that combines AF of distinct indels within a given region and provides "regional AF" (rAF). We tested and validated CRAFTS-indels using three independent datasets: gnomAD v2 (n=125,748 samples), an internal dataset (IGM; n=39,367), and the UK BioBank (UKBB; n=469,835). By comparing rAF against standard AF, we identified rare indels with rAF exceeding standard AF (sAF≤10 and rAF>10) as "rAF-hi" indels. Notably, a high percentage of rare indels were "rAF-hi", with a higher proportion in gnomAD v2 (11-20%) and IGM (11-22%) compared to the UKBB (5-9% depending on the CRAFTS-indels' parameters). Analysis of the overlap of regions based on their rAF with low complexity regions and with ClinVar classification supported the pertinence of rAF. Using the internal dataset, we illustrated the utility of CRAFTS-indel in the analysis of de novo variants and the potential negative impact of rAF-hi indels in gene discovery. In summary, annotation of indels with cohort specific rAF can be used to handle some of the limitations of current annotation pipelines and facilitate detection of novel gene disease associations. CRAFTS-indels offers a user-friendly approach to providing rAF annotation. It can be integrated into public databases such as gnomAD, UKBB and used by ClinVar to revise indel classifications.
准确估计人群等位基因频率(AF)对于基因发现和遗传诊断至关重要。然而,由于在映射和变异调用方法上的差异,对于导致移码的小插入和缺失(indels)的 AF 确定存在挑战。在这里,我们提出了一种评估 indel AF 的创新方法。我们开发了 CRAFTS-indels(计算靶向小 indels 的区域等位基因频率),这是一种算法,它结合了给定区域内不同 indels 的 AF,并提供了“区域 AF”(rAF)。我们使用三个独立的数据集:gnomAD v2(n=125748 个样本)、内部数据集(IGM;n=39367)和英国生物银行(UKBB;n=469835)对 CRAFTS-indels 进行了测试和验证。通过比较 rAF 与标准 AF,我们确定了 rAF 超过标准 AF(sAF≤10 和 rAF>10)的罕见 indels 为“rAF-hi”indels。值得注意的是,高比例的罕见 indels 是“rAF-hi”,在 gnomAD v2(11-20%)和 IGM(11-22%)中比 UKBB(5-9%,具体取决于 CRAFTS-indels 的参数)更高。基于其 rAF 与低复杂度区域和 ClinVar 分类的重叠分析支持了 rAF 的相关性。使用内部数据集,我们说明了 CRAFTS-indel 在分析新生变体和 rAF-hi indels 在基因发现中的潜在负面影响方面的效用。总之,使用特定于队列的 rAF 注释 indels 可以用于处理当前注释管道的一些限制,并促进新的基因疾病关联的检测。CRAFTS-indels 提供了一种用户友好的方法来提供 rAF 注释。它可以集成到 gnomAD、UKBB 等公共数据库中,并由 ClinVar 用于修改 indel 分类。