Ardiansyah Edwin, Riza Anca-Lelia, Dian Sofiati, Ganiem Ahmad Rizal, Alisjahbana Bachti, Setiabudiawan Todia P, van Laarhoven Arjan, van Crevel Reinout, Kumar Vinod
Research Center for Care and Control of Infectious Diseases, Universitas Padjadjaran, Bandung, Indonesia.
Laboratory of Human Genomics, University of Medicine and Pharmacy of Craiova, 200638 Craiova, Romania.
bioRxiv. 2024 Jun 14:2024.06.14.598981. doi: 10.1101/2024.06.14.598981.
Existing genotype imputation reference panels are mainly derived from European populations, limiting their accuracy in non-European populations. To improve imputation accuracy for Indonesians, the world's fourth most populous country, we combined Whole Genome Sequencing (WGS) data from 227 West Javanese individuals with East Asian data from the 1000 Genomes Project. This created three reference panels: EAS 1KGP3 (EASp), Indonesian (INDp), and a combined panel (EASp+INDp). We also used ten West-Javanese samples with WGS and SNP-typing data for benchmarking. We identified 1.8 million novel single nucleotide variants (SNVs) in the West Javanese population, which, while similar to the East Asians, are distinct from the Central Indonesian Flores population. Adding INDp to the EASp reference panel improved imputation accuracy (R2) from 0.85 to 0.90, and concordance from 87.88% to 91.13%. These findings underscore the importance of including Indonesian genetic data in reference panels, advocating for broader WGS of diverse Indonesian populations to enhance genomic studies.
现有的基因型插补参考面板主要源自欧洲人群,这限制了它们在非欧洲人群中的准确性。为提高世界第四人口大国印度尼西亚人群的插补准确性,我们将来自227名西爪哇人的全基因组测序(WGS)数据与千人基因组计划中的东亚数据相结合。这创建了三个参考面板:东亚1KGP3(EASp)、印度尼西亚(INDp)和一个组合面板(EASp+INDp)。我们还使用了十个具有WGS和SNP分型数据的西爪哇样本进行基准测试。我们在西爪哇人群中鉴定出180万个新的单核苷酸变异(SNV),这些变异虽然与东亚人相似,但与印度尼西亚中部弗洛雷斯人群不同。将INDp添加到EASp参考面板中可将插补准确性(R2)从0.85提高到0.90,一致性从87.88%提高到91.13%。这些发现强调了在参考面板中纳入印度尼西亚遗传数据的重要性,倡导对不同的印度尼西亚人群进行更广泛的WGS以加强基因组研究。