Zhang Qingrun, Bhatia Muskan, Park Taesung, Ott Jurg
Department of Mathematics and Statistics, University of Calgary, Calgary, AB, Canada.
Department of Biochemistry and Molecular Biology, University of Calgary, Calgary, AB, Canada.
Front Genet. 2023 Aug 24;14:1222517. doi: 10.3389/fgene.2023.1222517. eCollection 2023.
To locate disease-causing DNA variants on the human gene map, the customary approach has been to carry out a genome-wide association study for one variant after another by testing for genotype frequency differences between individuals affected and unaffected with disease. So-called digenic traits are due to the combined effects of two variants, often on different chromosomes, while individual variants may have little or no effect on disease. Machine learning approaches have been developed to find variant pairs underlying digenic traits. However, many of these methods have large memory requirements so that only small datasets can be analyzed. The increasing availability of desktop computers with large numbers of processors and suitable programming to distribute the workload evenly over all processors in a machine make a new and relatively straightforward approach possible, that is, to evaluate all existing variant and genotype pairs for disease association. We present a prototype of such a method with two components, and , and demonstrate its advantages over existing implementations of such well-known algorithms as and . We apply these methods to published case-control datasets on age-related macular degeneration and Parkinson disease and construct an ROC curve for a large set of genotype patterns.
为了在人类基因图谱上定位致病DNA变异,传统方法是通过检测患病个体和未患病个体之间的基因型频率差异,对一个又一个变异进行全基因组关联研究。所谓的双基因性状是由两个变异的联合效应导致的,这两个变异通常位于不同染色体上,而单个变异可能对疾病影响很小或没有影响。已经开发了机器学习方法来寻找双基因性状背后的变异对。然而,这些方法中的许多都有很大的内存需求,因此只能分析小数据集。随着具有大量处理器的台式计算机的可用性不断提高,以及合适的编程能够将工作负载均匀地分布在机器中的所有处理器上,一种新的、相对简单的方法成为可能,即评估所有现有的变异和基因型对与疾病的关联性。我们提出了一种具有两个组件的此类方法的原型,并展示了它相对于 和 等著名算法的现有实现的优势。我们将这些方法应用于已发表的年龄相关性黄斑变性和帕金森病的病例对照数据集,并为一大组基因型模式构建ROC曲线。