Wang Yishu, Wang Lin, Yang Dejie, Deng Minghua
Center for Quantitative Biology, Peking University, Beijing 100871, China.
Institute of Computing Technology, Chinese Academy of Science, Beijing 100190, China.
Methods. 2014 Jun 1;67(3):269-77. doi: 10.1016/j.ymeth.2014.03.032. Epub 2014 Apr 6.
Epistatic Miniarray Profiles (EMAP) enable the research of genetic interaction as an important method to construct large-scale genetic interaction networks. However, a high proportion of missing values frequently poses problems in EMAP data analysis since such missing values hinder downstream analysis. While some imputation approaches have been available to EMAP data, we adopted an improved SVD modeling procedure to impute the missing values in EMAP data which has resulted in a higher accuracy rate compared with existing methods.
The improved SVD imputation method adopts an effective soft-threshold to the SVD approach which has been shown to be the best model to impute genetic interaction data when compared with a number of advanced imputation methods. Imputation methods also improve the clustering results of EMAP datasets. Thus, after applying our imputation method on the EMAP dataset, more meaningful modules, known pathways and protein complexes could be detected.
While the phenomenon of missing data unavoidably complicates EMAP data, our results showed that we could complete the original dataset by the Soft-SVD approach to accurately recover genetic interactions.
上位性微阵列图谱(EMAP)作为构建大规模遗传相互作用网络的重要方法,能够用于遗传相互作用的研究。然而,在EMAP数据分析中,高比例的缺失值经常带来问题,因为这些缺失值会阻碍下游分析。虽然已有一些插补方法可用于EMAP数据,但我们采用了一种改进的奇异值分解(SVD)建模程序来插补EMAP数据中的缺失值,与现有方法相比,该方法具有更高的准确率。
改进的SVD插补方法对SVD方法采用了有效的软阈值,与许多先进的插补方法相比,该方法已被证明是插补遗传相互作用数据的最佳模型。插补方法还改善了EMAP数据集的聚类结果。因此,在将我们的插补方法应用于EMAP数据集后,可以检测到更有意义的模块、已知途径和蛋白质复合物。
虽然数据缺失现象不可避免地使EMAP数据复杂化,但我们的结果表明,我们可以通过软SVD方法完成原始数据集,以准确恢复遗传相互作用。