Yuan Lin, Guo Le-Hang, Yuan Chang-An, Zhang You-Hua, Han Kyungsook, Nandi Asoke, Honig Barry, Huang De-Shuang
IEEE/ACM Trans Comput Biol Bioinform. 2018 Aug 23. doi: 10.1109/TCBB.2018.2866836.
Underlying a cancer phenotype is a specific gene regulatory network that represents the complex regulatory relationships between genes. However, it remains a challenge to find cancer-related gene regulatory network because of insufficient sample sizes and complex regulatory mechanisms in which gene is influenced by not only other genes but also other biological factors. With the development of high-throughput technologies and the unprecedented wealth of multi-omics data give us a new opportunity to design machine learning method to investigate underlying gene regulatory network. In this paper, we propose an approach, which use biweight midcorrelation to measure the correlation between factors and make use of nonconvex penalty based sparse regression for gene regulatory network inference (BMNPGRN). BMNCGRN incorporates multi-omics data (including DNA methylation and copy number variation) and their interactions in gene regulatory network model. The experimental results on synthetic datasets show that BMNPGRN outperforms popular and state-of-the-art methods (including DCGRN, ARACNE and CLR) under false positive control. Furthermore, we applied BMNPGRN on breast cancer (BRCA) data from The Cancer Genome Atlas database and provided gene regulatory network.
癌症表型的基础是一个特定的基因调控网络,它代表了基因之间复杂的调控关系。然而,由于样本量不足以及复杂的调控机制(基因不仅受其他基因影响,还受其他生物因素影响),找到与癌症相关的基因调控网络仍然是一项挑战。随着高通量技术的发展以及前所未有的多组学数据财富,为我们设计机器学习方法来研究潜在的基因调控网络提供了新机会。在本文中,我们提出了一种方法,该方法使用双权中相关来衡量因素之间的相关性,并利用基于非凸惩罚的稀疏回归进行基因调控网络推断(BMNPGRN)。BMNCGRN在基因调控网络模型中纳入了多组学数据(包括DNA甲基化和拷贝数变异)及其相互作用。在合成数据集上的实验结果表明,在误报控制下,BMNPGRN优于流行的和最新的方法(包括DCGRN、ARACNE和CLR)。此外,我们将BMNPGRN应用于来自癌症基因组图谱数据库的乳腺癌(BRCA)数据,并提供了基因调控网络。