Nguyen Peter T, Coetzee Simon G, Silacheva Irina, Hazelett Dennis J
Cedars-Sinai Medical Center.
Res Sq. 2024 Oct 22:rs.3.rs-5189487. doi: 10.21203/rs.3.rs-5189487/v2.
With recent advances in single cell technology, high-throughput methods provide unique insight into disease mechanisms and more importantly, cell type origin. Here, we used multi-omics data to understand how genetic variants from genome-wide association studies influence development of disease. We show in principle how to use genetic algorithms with normal, matching pairs of single-nucleus RNA- and ATAC-seq, genome annotations, and protein-protein interaction data to describe the genes and cell types collectively and their contribution to increased risk.
We used genetic algorithms to measure fitness of gene-cell set proposals against a series of objective functions that capture data and annotations. The highest information objective function captured protein-protein interactions. We observed significantly greater fitness scores and subgraph sizes in foreground matching sets of control variants. Furthermore, our model reliably identified known targets and ligand-receptor pairs, consistent with prior studies.
Our findings suggested that application of genetic algorithms to association studies can generate a coherent cellular model of risk from a set of susceptibility variants. Further, we showed, using breast cancer as an example, that such variants have a greater number of physical interactions than expected due to chance.
随着单细胞技术的最新进展,高通量方法为疾病机制,更重要的是细胞类型起源提供了独特的见解。在此,我们使用多组学数据来了解全基因组关联研究中的遗传变异如何影响疾病的发展。我们原则上展示了如何使用遗传算法结合正常、匹配的单核RNA和ATAC序列对、基因组注释以及蛋白质-蛋白质相互作用数据,来共同描述基因和细胞类型及其对风险增加的贡献。
我们使用遗传算法根据一系列捕获数据和注释的目标函数来衡量基因-细胞集提议的适应性。最高信息目标函数捕获了蛋白质-蛋白质相互作用。我们在对照变异的前景匹配集中观察到显著更高的适应性分数和子图大小。此外,我们的模型可靠地识别了已知的靶点和配体-受体对,与先前的研究一致。
我们的研究结果表明,将遗传算法应用于关联研究可以从一组易感变异中生成一个连贯的风险细胞模型。此外,我们以乳腺癌为例表明,这些变异具有比随机预期更多的物理相互作用。