National Research Centre "Kurchatov Institute", Kurchatov Sq. 2, Moscow 123182, Russia.
Int J Mol Sci. 2023 Oct 19;24(20):15355. doi: 10.3390/ijms242015355.
The genetic architecture of ischemic stroke (IS), which is one of the leading causes of death worldwide, is complex and underexplored. The traditional approach for associative gene mapping is genome-wide association studies (GWASs), testing individual single-nucleotide polymorphisms (SNPs) across the genomes of case and control groups. The purpose of this research is to develop an alternative approach in which groups of SNPs are examined rather than individual ones. We proposed, validated and applied to real data a new workflow consisting of three key stages: grouping SNPs in clusters, inferring the haplotypes in the clusters and testing haplotypes for the association with phenotype. To group SNPs, we applied the clustering algorithms DBSCAN and HDBSCAN to linkage disequilibrium (LD) matrices, representing pairwise r values between all genotyped SNPs. These clustering algorithms have never before been applied to genotype data as part of the workflow of associative studies. In total, 883,908 SNPs and insertion/deletion polymorphisms from people of European ancestry (4929 cases and 652 controls) were processed. The subsequent testing for frequencies of haplotypes restored in the clusters of SNPs revealed dozens of genes associated with IS and suggested the complex role that protocadherin molecules play in IS. The developed workflow was validated with the use of a simulated dataset of similar ancestry and the same sample sizes. The results of classic GWASs are also provided and discussed. The considered clustering algorithms can be applied to genotypic data to identify the genomic loci associated with different qualitative traits, using the workflow presented in this research.
缺血性中风(IS)的遗传结构复杂且尚未得到充分探索,IS 是全球主要死亡原因之一。关联基因映射的传统方法是全基因组关联研究(GWAS),测试病例和对照组基因组中个体单核苷酸多态性(SNP)。本研究旨在开发一种替代方法,即检查 SNP 组而不是单个 SNP。我们提出、验证并将一个新的工作流程应用于真实数据,该流程由三个关键阶段组成:SNP 聚类、聚类中单体型推断以及单体型与表型关联的检验。为了对 SNP 进行聚类,我们应用了 DBSCAN 和 HDBSCAN 聚类算法到连锁不平衡(LD)矩阵,代表所有基因分型 SNP 之间的成对 r 值。这些聚类算法以前从未作为关联研究工作流程的一部分应用于基因型数据。总共处理了来自欧洲血统人群的 883908 个 SNP 和插入/缺失多态性(4929 例病例和 652 例对照)。随后对 SNP 聚类中恢复的单体型频率进行检验,发现了数十个与 IS 相关的基因,并提示原钙黏蛋白分子在 IS 中发挥复杂作用。使用具有相似祖先和相同样本大小的模拟数据集验证了所开发的工作流程。还提供和讨论了经典 GWAS 的结果。所考虑的聚类算法可以应用于基因型数据,以使用本研究中提出的工作流程识别与不同定性特征相关的基因组位点。