Khor Seik-Soon, Hirayasu Kouyuki, Kawai Yosuke, Kim Hie Lim, Nagasaki Masao, Tokunaga Katsushi
Genome Medical Science Project, National Center for Global Health and Medicine, Tokyo, Japan.
Singapore Centre for Environmental Life Sciences Engineering, Nanyang Technological University, Singapore, Singapore.
Front Immunol. 2025 Apr 7;16:1559301. doi: 10.3389/fimmu.2025.1559301. eCollection 2025.
There are ten leukocyte immunoglobulin (Ig)-like receptor () genes, i.e., five genes encoding activating receptors (, and ) characterized by their truncated cytoplasmic tails, and five genes encoding inhibitory receptors (, and ) characterized by their extended cytoplasmic tails containing immunoreceptor tyrosine-based inhibitory motifs (ITIMs). Among these, , , and are known for harboring high frequencies of copy number variations (CNVs). However, the presence of CNVs in the leukocyte receptor complex (LRC) region complicates single nucleotide polymorphism (SNP) association analysis within commercially available SNP microarray datasets. This study introduces LILR Genotype Imputation with Attribute Bagging (LIBAG), a novel method for determining CNVs in , and from commercially available SNP genotyping array datasets. CNV imputation accuracy peaked at 98.0% for the Infinium Japanese Screening Array, followed by 97.4% for Axiom Japonica V2, 97.3% for Axiom Japonica Array NEO, and 94.3% for Axiom Japonica V1, with the lowest recorded accuracy of 93.6% for the Axiom Genome-wide ASI1 array. For the 1000 Genomes Project (1kGP) dataset, CNV imputation achieved peak accuracies of 94.5% for 1kGP-EAS (East Asian), 86.6% for 1kGP-AMR (Admixed American), 83.8% for 1kGP-EUR European), and 75.0% for 1kGP-AFR (African), particularly after the 20 kb flanking region. Similarly, imputation accuracy for CNV progressively increased, peaking at the 80 kb flanking region. Accuracy reached 1kGP-AMR, reaching 99.2% and 98.9% for 1kGP-AFR, 98.7% for 1kGP-EUR, and 97.5% for 1kGP-EAS. Investigating the copy number (CN) in diseases associated with HLA class I molecules will provide further insights into disease pathogenesis.
有10个白细胞免疫球蛋白(Ig)样受体(LILR)基因,即5个编码激活受体(LILRA1、LILRA2和LILRA3)的基因,其特征是细胞质尾截短,还有5个编码抑制受体(LILRB1、LILRB2和LILRB3)的基因,其特征是细胞质尾延长,含有基于免疫受体酪氨酸的抑制基序(ITIM)。其中,LILRA1、LILRA2和LILRA3因拷贝数变异(CNV)频率高而为人所知。然而,白细胞受体复合物(LRC)区域中CNV的存在使商业可用SNP微阵列数据集中的单核苷酸多态性(SNP)关联分析变得复杂。本研究介绍了基于属性袋的LILR基因型插补(LIBAG),这是一种从商业可用SNP基因分型阵列数据集中确定LILRA1、LILRA2和LILRA3中CNV的新方法。对于Infinium日本筛查阵列,CNV插补准确率最高达到98.0%,其次是Axiom Japonica V2的97.4%、Axiom Japonica Array NEO的97.3%和Axiom Japonica V1的94.3%,Axiom全基因组ASI1阵列记录的最低准确率为93.6%。对于1000基因组计划(1kGP)数据集,CNV插补在1kGP-EAS(东亚)中达到94.5%的峰值准确率,在1kGP-AMR(混合美洲人)中为86.6%,在1kGP-EUR(欧洲人)中为83.8%。