Valle Filippo, Osella Matteo, Caselle Michele
Physics Department, University of Turin and INFN, via P. Giuria 1, 10125 Turin, Italy.
Cancers (Basel). 2022 Feb 23;14(5):1150. doi: 10.3390/cancers14051150.
The integration of transcriptional data with other layers of information, such as the post-transcriptional regulation mediated by microRNAs, can be crucial to identify the driver genes and the subtypes of complex and heterogeneous diseases such as cancer. This paper presents an approach based on topic modeling to accomplish this integration task. More specifically, we show how an algorithm based on a hierarchical version of stochastic block modeling can be naturally extended to integrate any combination of 'omics data. We test this approach on breast cancer samples from the TCGA database, integrating data on messenger RNA, microRNAs, and copy number variations. We show that the inclusion of the microRNA layer significantly improves the accuracy of subtype classification. Moreover, some of the hidden structures or "topics" that the algorithm extracts actually correspond to genes and microRNAs involved in breast cancer development and are associated to the survival probability.
将转录数据与其他信息层(如由微小RNA介导的转录后调控)整合,对于识别驱动基因以及复杂异质性疾病(如癌症)的亚型可能至关重要。本文提出了一种基于主题建模的方法来完成此整合任务。更具体地说,我们展示了如何自然地扩展基于随机块建模分层版本的算法,以整合任何“组学”数据组合。我们在来自TCGA数据库的乳腺癌样本上测试了这种方法,整合了信使RNA、微小RNA和拷贝数变异的数据。我们表明,纳入微小RNA层显著提高了亚型分类的准确性。此外,该算法提取的一些隐藏结构或“主题”实际上对应于参与乳腺癌发展的基因和微小RNA,并与生存概率相关。