IEEE/ACM Trans Comput Biol Bioinform. 2020 Sep-Oct;17(5):1535-1545. doi: 10.1109/TCBB.2019.2948330. Epub 2019 Oct 21.
Epistasis is a progressive approach that complements the 'common disease, common variant' hypothesis that highlights the potential for connected networks of genetic variants collaborating to produce a phenotypic expression. Epistasis is commonly performed as a pairwise or limitless-arity capacity that considers variant networks as either variant vs variant or as high order interactions. This type of analysis extends the number of tests that were previously performed in a standard approach such as Genome-Wide Association Study (GWAS), in which False Discovery Rate (FDR) is already an issue, therefore by multiplying the number of tests up to a factorial rate also increases the issue of FDR. Further to this, epistasis introduces its own limitations of computational complexity and intensity that are generated based on the analysis performed; to consider the most intense approach, a multivariate analysis introduces a time complexity of O(n!). Proposed in this paper is a novel methodology for the detection of epistasis using interpretable methods and best practice to outline interactions through filtering processes. Using a process of Random Sampling Regularisation which randomly splits and produces sample sets to conduct a voting system to regularise the significance and reliability of biological markers, SNPs. Preliminary results are promising, outlining a concise detection of interactions. Results for the detection of epistasis, in the classification of breast cancer patients, indicated eight outlined risk candidate interactions from five variants and a singular candidate variant with high protective association.
上位性是一种渐进的方法,补充了“常见疾病,常见变体”假说,该假说强调了遗传变体的连接网络协作产生表型表达的潜力。上位性通常作为一种成对或无限制变元的能力来执行,将变体网络视为变体与变体或高阶相互作用。这种类型的分析扩展了以前在标准方法(例如全基因组关联研究(GWAS))中进行的测试数量,在该方法中,假发现率(FDR)已经是一个问题,因此通过将测试数量乘以阶乘率也会增加 FDR 的问题。除此之外,上位性还引入了基于执行的分析产生的计算复杂性和强度的自身限制;为了考虑最强烈的方法,多元分析引入了 O(n!)的时间复杂度。本文提出了一种使用可解释方法和最佳实践来通过过滤过程概述相互作用的检测上位性的新方法。使用随机抽样正则化过程,该过程随机拆分并生成样本集,以进行投票系统来正则化生物标志物 SNPs 的显著性和可靠性。初步结果很有希望,概述了相互作用的简洁检测。在乳腺癌患者分类中检测上位性的结果表明,从五个变体中提取了八个突出的风险候选相互作用,以及一个具有高保护关联的单一候选变体。