Biagini Simone Andrea, Becelaere Sara, Aerden Mio, Jatsenko Tatjana, Hannes Laurens, Van Damme Philip, Breckpot Jeroen, Devriendt Koenraad, Thienpont Bernard, Vermeesch Joris Robert, Cleynen Isabelle, Kivisild Toomas
Department of Human Genetics, KU Leuven, Leuven 3000, Belgium.
Department of Archaeology and Museology, Masaryk University, 662 43 Brno, Czech Republic.
Genome Res. 2025 Sep 2;35(9):1929-1941. doi: 10.1101/gr.280175.124.
Genotype imputation from low-pass sequencing data presents unique opportunities for genomic analyses but comes with specific challenges. In this study, we explore the impact of quality filters on genetic ancestry and Polygenic Score (PGS) estimation after imputing 32,769 low-pass genome-wide sequences (LPS) from noninvasive prenatal screening (NIPS) with an average autosomal sequence depth of ∼0.15×. In studies involving ultra-low coverage sequences, conventional approaches to secure genotype accuracy may fail, especially when multiple samples are pooled. To enhance the proportion of high-quality genotypes in large data sets, we introduce a filtering approach called GDI that combines genotype probability (GP), alternate allele dosage (DS), and INFO score filters. We demonstrate that the imputation tools QUILT and GLIMPSE2 achieve similar accuracy, which is high enough for broad-scale ancestry mapping but insufficient for high resolution principal component analysis (PCA), when applied without filters. With the GDI approach, we can achieve quality that is adequate for such purposes. Furthermore, we explored the impact of imputation errors, choice of variants, and filtering methods on PGS prediction for height in 1911 subjects with height data. We show that polygenic scores predict 23.7% of variance in height in our imputed data and that, contrary to the effect on PCA, the GDI filter does not improve the performance of PGS in height prediction. These results highlight that imputed LPS data can be leveraged for further biomedical and population genetic use, but there is a need to consider each downstream analysis tool individually for its imputation quality thresholds and filtering requirements.
从低深度测序数据中进行基因型填充为基因组分析带来了独特机遇,但也伴随着特定挑战。在本研究中,我们对32769条来自无创产前筛查(NIPS)的低深度全基因组序列(LPS)进行填充,平均常染色体序列深度约为0.15×,之后探讨了质量过滤对遗传血统和多基因分数(PGS)估计的影响。在涉及超低覆盖度序列的研究中,确保基因型准确性的传统方法可能会失效,尤其是当多个样本合并时。为提高大数据集中高质量基因型的比例,我们引入了一种名为GDI的过滤方法,该方法结合了基因型概率(GP)、替代等位基因剂量(DS)和INFO分数过滤器。我们证明,在不应用过滤器的情况下,填充工具QUILT和GLIMPSE2实现了相似的准确性,这对于大规模血统图谱绘制来说足够高,但对于高分辨率主成分分析(PCA)来说则不够。使用GDI方法,我们可以实现足以用于此类目的的质量。此外,我们还探讨了填充错误、变异选择和过滤方法对1911名有身高数据的受试者身高PGS预测的影响。我们表明,多基因分数在我们的填充数据中预测了23.7%的身高方差,并且与对PCA的影响相反,GDI过滤器在身高预测中并未提高PGS的性能。这些结果突出表明,填充后的LPS数据可用于进一步的生物医学和群体遗传学用途,但需要针对每个下游分析工具单独考虑其填充质量阈值和过滤要求。