Aristodimou Aristos, Antoniades Athos, Dardiotis Efthimios, Loizidou Eleni, Spyrou George, Votsi Christina, Kyproula Christodoulou, Pantzaris Marios, Grigoriadis Nikolaos, Hadjigeorgiou Georgios, Kyriakides Theodoros, Pattichi Constantinos
Department of Computer ScienceUniversity of Cyprus Nicosia 1678 Cyprus.
Stremble Ventures Ltd. Limassol 59 4042 Cyprus.
IEEE Open J Eng Med Biol. 2021 Jul 27;2:256-262. doi: 10.1109/OJEMB.2021.3100416. eCollection 2021.
Most common diseases are influenced by multiple gene interactions and interactions with the environment. Performing an exhaustive search to identify such interactions is computationally expensive and needs to address the multiple testing problem. A four-step framework is proposed for the efficient identification of n-Way interactions. The framework was applied on a Multiple Sclerosis dataset with 725 subjects and 147 tagging SNPs. The first two steps of the framework are quality control and feature selection. The next step uses clustering and binary encodes the features. The final step performs the n-Way interaction testing. The feature space was reduced to 7 SNPs and using the proposed binary encoding, more 2-SNP and 3-SNP interactions were identified compared to using the initial encoding. The framework selects informative features and with the proposed binary encoding it is able to identify more n-way interactions by increasing the power of the statistical analysis.
大多数常见疾病受多种基因相互作用以及与环境相互作用的影响。进行详尽搜索以识别此类相互作用在计算上成本高昂,且需要解决多重检验问题。本文提出了一个四步框架,用于高效识别n维相互作用。该框架应用于一个包含725个受试者和147个标签单核苷酸多态性(SNP)的多发性硬化症数据集。该框架的前两步是质量控制和特征选择。下一步使用聚类并对特征进行二进制编码。最后一步进行n维相互作用测试。特征空间减少到了7个SNP,并且与使用初始编码相比,使用所提出的二进制编码识别出了更多的双SNP和三SNP相互作用。该框架选择了信息丰富的特征,并且通过提高统计分析的功效,使用所提出的二进制编码能够识别更多的n维相互作用。