Xu Jiayi, Liu Dongjing, Hassan Arsalan, Genovese Giulio, Cote Alanna C, Fennessy Brian, Cheng Esther, Charney Alexander W, Knowles James A, Ayub Muhammad, Peterson Roseann E, Bigdeli Tim B, Huckins Laura M
Department of Psychiatry, Yale School of Medicine, New Haven, CT, USA.
Icahn School of Medicine at Mount Sinai, New York, NY, USA.
medRxiv. 2023 Dec 26:2023.12.22.23300448. doi: 10.1101/2023.12.22.23300448.
Genotype imputation is crucial for GWAS, but reference panels and existing benchmarking studies prioritize European individuals. Consequently, it is unclear which publicly available reference panel should be used for Pakistani individuals, and whether ancestry composition or sample size of the panel matters more for imputation accuracy. Our study compared different reference panels to impute genotype data in 1814 Pakistani individuals, finding the best performance balancing accuracy and coverage with meta-imputation with TOPMed and the expanded 1000 Genomes (ex1KG) reference. Imputation accuracy of ex1KG outperformed TOPMed despite its 30-fold smaller sample size, supporting efforts to create future panels with diverse populations.
基因型填充对于全基因组关联研究(GWAS)至关重要,但参考面板和现有的基准研究都将欧洲个体作为优先考虑对象。因此,目前尚不清楚对于巴基斯坦个体应该使用哪个公开可用的参考面板,以及该面板的祖先构成或样本量对填充准确性的影响哪个更大。我们的研究比较了不同的参考面板,以对1814名巴基斯坦个体的基因型数据进行填充,发现使用TOPMed和扩展的千人基因组计划(ex1KG)参考进行元填充在平衡准确性和覆盖范围方面表现最佳。尽管ex1KG的样本量比TOPMed小30倍,但其填充准确性仍优于TOPMed,这为创建包含不同人群的未来面板提供了支持。