Liu Ting-Yuan, Lin Chih-Fan, Wu Hsing-Tsung, Wu Ya-Lun, Chen Yu-Chia, Liao Chi-Chou, Chou Yu-Pao, Chao Dysan, Chang Ya-Sian, Lu Hsing-Fang, Chang Jan-Gowth, Hsu Kai-Cheng, Tsai Fuu-Jen
Center for Precision Medicine, China Medical University Hospital, Taichung, 40447, Taiwan.
Artificial Intelligence Center for Medical Diagnosis, China Medical University Hospital, Taichung, 40447, Taiwan.
Biomedicine (Taipei). 2021 Dec 1;11(4):57-65. doi: 10.37796/2211-8039.1302. eCollection 2021.
A genome-wide association study (GWAS) can be conducted to systematically analyze the contributions of genetic factors to a wide variety of complex diseases. Nevertheless, existing GWASs have provided highly ethnic specific data. Accordingly, to provide data specific to Taiwan, we established a large-scale genetic database in a single medical institution at the China Medical University Hospital. With current technological limitations, microarray analysis can detect only a limited number of single-nucleotide polymorphisms (SNPs) with a minor allele frequency of >1%. Nevertheless, imputation represents a useful alternative means of expanding data. In this study, we compared four imputation algorithms in terms of various metrics. We observed that among the compared algorithms, Beagle5.2 achieved the fastest calculation speed, smallest storage space, highest specificity, and highest number of high-quality variants. We obtained 15,277,414 high-quality variants in 175,871 people by using Beagle5.2. In our internal verification process, Beagle5.2 exhibited an accuracy rate of up to 98.75%. We also conducted external verification. Our imputed variants had a 79.91% mapping rate and 90.41% accuracy. These results will be combined with clinical data in future research. We have made the results available for researchers to use in formulating imputation algorithms, in addition to establishing a complete SNP database for GWAS and PRS researchers. We believe that these data can help improve overall medical capabilities, particularly precision medicine, in Taiwan.
全基因组关联研究(GWAS)可用于系统分析遗传因素对多种复杂疾病的影响。然而,现有的GWAS提供的数据具有高度的种族特异性。因此,为了提供台湾地区特有的数据,我们在中国医科大学附设医院的单一医疗机构中建立了一个大规模遗传数据库。由于目前的技术限制,微阵列分析只能检测少数次要等位基因频率大于1%的单核苷酸多态性(SNP)。然而,插补是一种扩展数据的有用替代方法。在本研究中,我们根据各种指标比较了四种插补算法。我们观察到,在比较的算法中,Beagle5.2的计算速度最快、存储空间最小、特异性最高且高质量变异数量最多。通过使用Beagle5.2,我们在175,871人中获得了15,277,414个高质量变异。在我们的内部验证过程中,Beagle5.2的准确率高达98.75%。我们还进行了外部验证。我们插补的变异映射率为79.91%,准确率为90.41%。这些结果将在未来的研究中与临床数据相结合。除了为GWAS和PRS研究人员建立一个完整的SNP数据库外,我们还将结果提供给研究人员用于制定插补算法。我们相信这些数据有助于提高台湾地区的整体医疗能力,特别是精准医学能力。