Department of Computer Science, SK S4S 0A2, Canada.
Department of Biochemistry, University of Regina, Regina, SK S4S 0A2, Canada.
Bioinformatics. 2020 Feb 1;36(3):880-889. doi: 10.1093/bioinformatics/btz673.
A digenic genetic interaction (GI) is observed when mutations in two genes within the same organism yield a phenotype that is different from the expected, given each mutation's individual effects. While multiplicative scoring is widely applied to define GIs, revealing underlying gene functions, it remains unclear if it is the most suitable choice for scoring GIs in Escherichia coli. Here, we assess many different definitions, including the multiplicative model, for mapping functional links between genes and pathways in E.coli.
Using our published E.coli GI datasets, we show computationally that a machine learning Gaussian process (GP)-based definition better identifies functional associations among genes than a multiplicative model, which we have experimentally confirmed on a set of gene pairs. Overall, the GP definition improves the detection of GIs, biological reasoning of epistatic connectivity, as well as the quality of GI maps in E.coli, and, potentially, other microbes.
The source code and parameters used to generate the machine learning models in WEKA software were provided in the Supplementary information.
Supplementary data are available at Bioinformatics online.
当同一生物体中的两个基因的突变产生的表型与每个突变的个体效应所预期的不同时,就会观察到双基因遗传相互作用 (GI)。虽然乘法评分被广泛应用于定义 GI 以揭示潜在的基因功能,但对于在大肠杆菌中评分 GI 来说,它是否是最合适的选择仍不清楚。在这里,我们评估了许多不同的定义,包括乘法模型,用于在大肠杆菌中映射基因和途径之间的功能联系。
使用我们已发表的大肠杆菌 GI 数据集,我们通过计算表明,基于机器学习高斯过程 (GP) 的定义比乘法模型更能识别基因之间的功能关联,我们已经在一组基因对上通过实验证实了这一点。总的来说,GP 定义提高了 GI 的检测、上位性连接的生物学推理以及大肠杆菌中 GI 图谱的质量,并且可能在其他微生物中也是如此。
在 WEKA 软件中生成机器学习模型所使用的源代码和参数在补充信息中提供。
补充数据可在生物信息学在线获得。