Dutta Pratik, Mishra Piyush, Saha Sriparna
Department of Computer Science and Engineering, Indian Institute of Technology, Patna, India.
Department of Computer Science and Engineering, IIIT, Bhubaneswar, India.
Comput Biol Med. 2020 Oct;125:103965. doi: 10.1016/j.compbiomed.2020.103965. Epub 2020 Sep 8.
Deciphering patterns in the structural and functional anatomy of genes can prove to be very helpful in understanding genetic biology and genomics. Also, the availability of the multiple omics data, along with the advent of machine learning techniques, aids medical professionals in gaining insights about various biological regulations. Gene clustering is one of the many such computation techniques that can help in understanding gene behavior. However, more comprehensive and reliable insights can be gained if different modalities/views of biomedical data are considered. However, in most multi-view cases, each view contains some missing data, leading to incomplete multi-view clustering. In this study, we have presented a deep Boltzmann machine-based incomplete multi-view clustering framework for gene clustering. Here, we seek to regenerate the data of the three NCBI datasets in the incomplete modalities using Shape Boltzmann Machines. The overall performance of the proposed multi-view clustering technique has been evaluated using the Silhouette index and Davies-Bouldin index, and the comparative analysis shows an improvement over state-of-the-art methods. Finally, to prove that the improvement attained by the proposed incomplete multi-view clustering is statistically significant, we perform Welch's t-test. AVAILABILITY OF DATA AND MATERIALS: https://github.com/piyushmishra12/IMC.
破译基因的结构和功能解剖模式有助于深入理解遗传生物学和基因组学。此外,多组学数据的可用性以及机器学习技术的出现,有助于医学专业人员深入了解各种生物调控机制。基因聚类是众多有助于理解基因行为的计算技术之一。然而,如果考虑生物医学数据的不同模态/视图,就能获得更全面、可靠的见解。然而,在大多数多视图情况下,每个视图都包含一些缺失数据,导致多视图聚类不完整。在本研究中,我们提出了一种基于深度玻尔兹曼机的不完整多视图聚类框架用于基因聚类。在此,我们试图使用形状玻尔兹曼机在不完整模态下重建三个NCBI数据集的数据。所提出的多视图聚类技术的整体性能已使用轮廓系数和戴维斯-布尔丁指数进行评估,对比分析表明该技术优于现有方法。最后,为证明所提出的不完整多视图聚类所实现的改进具有统计学意义,我们进行了韦尔奇t检验。数据和材料的可用性:https://github.com/piyushmishra12/IMC 。