Lee Kyu Min, Lee Minhyeok, Seok Junhee, Han Sung Won
1 School of Industrial Management Engineering, Korea University, Seoul, South Korea.
2 School of Electrical Engineering, Korea University, Seoul, South Korea.
J Comput Biol. 2019 Apr;26(4):336-349. doi: 10.1089/cmb.2018.0225. Epub 2019 Jan 17.
Given the continuous advancement in genome sequencing technology, large volumes of gene expression data can be easily obtained. However, the corresponding increase in genetic information necessitates adoption of a new approach for network estimation. Data dimensions increase with the progress in genome sequencing technology, thereby making it difficult to estimate gene networks by causing multicollinearity. Furthermore, such a problem also occurs when hub nodes exist, where gene networks are known to have regulator genes that can be interpreted as hub nodes. This study aims at developing methods that demonstrate good performance when handling high-dimensional data with hub nodes. We propose regression-based approaches as feasible solutions in this article. Elastic-net and adaptive elastic-net penalty regressions were applied to compensate for the disadvantages of existing regression-based approaches employing LASSO or adaptive LASSO. Experiments were performed to compare the proposed regression-based approaches with other conventional methods. We confirmed the superior performance of the regression-based approaches and applied it to actual genetic data to verify the suitability to estimate gene networks. As results, robustness of the proposed methods was demonstrated with respect to high-dimensional gene expression data.
鉴于基因组测序技术的不断进步,大量的基因表达数据能够轻易获取。然而,随之增加的遗传信息需要采用一种新的网络估计方法。随着基因组测序技术的发展,数据维度增加,从而因多重共线性导致难以估计基因网络。此外,当存在枢纽节点时也会出现这样的问题,已知基因网络具有可被解释为枢纽节点的调控基因。本研究旨在开发在处理带有枢纽节点的高维数据时表现良好的方法。在本文中,我们提出基于回归的方法作为可行的解决方案。应用弹性网络和自适应弹性网络惩罚回归来弥补现有采用套索(LASSO)或自适应套索回归的基于回归方法的缺点。进行实验以将所提出的基于回归的方法与其他传统方法进行比较。我们证实了基于回归的方法的优越性能,并将其应用于实际遗传数据以验证其对估计基因网络的适用性。结果表明,所提出的方法对于高维基因表达数据具有稳健性。