IEEE/ACM Trans Comput Biol Bioinform. 2022 Jul-Aug;19(4):2197-2207. doi: 10.1109/TCBB.2021.3065535. Epub 2022 Aug 8.
Detecting predictive biomarkers from multi-omics data is important for precision medicine, to improve diagnostics of complex diseases and for better treatments. This needs substantial experimental efforts that are made difficult by the heterogeneity of cell lines and huge cost. An effective solution is to build a computational model over the diverse omics data, including genomic, molecular, and environmental information. However, choosing informative and reliable data sources from among the different types of data is a challenging problem. We propose DIVERSE, a framework of Bayesian importance-weighted tri- and bi-matrix factorization(DIVERSE3 or DIVERSE2) to predict drug responses from data of cell lines, drugs, and gene interactions. DIVERSE integrates the data sources systematically, in a step-wise manner, examining the importance of each added data set in turn. More specifically, we sequentially integrate five different data sets, which have not all been combined in earlier bioinformatic methods for predicting drug responses. Empirical experiments show that DIVERSE clearly outperformed five other methods including three state-of-the-art approaches, under cross-validation, particularly in out-of-matrix prediction, which is closer to the setting of real use cases and more challenging than simpler in-matrix prediction. Additionally, case studies for discovering new drugs further confirmed the performance advantage of DIVERSE.
从多组学数据中检测预测性生物标志物对于精准医学非常重要,可以改善复杂疾病的诊断,并提供更好的治疗方法。这需要大量的实验工作,但细胞系的异质性和巨大的成本使得这些工作变得困难。一个有效的解决方案是在包括基因组、分子和环境信息在内的各种组学数据上构建计算模型。然而,从不同类型的数据中选择有信息和可靠的数据来源是一个具有挑战性的问题。我们提出了 DIVERSE,这是一个基于贝叶斯重要性加权三矩阵和双矩阵分解(DIVERSE3 或 DIVERSE2)的框架,用于从细胞系、药物和基因相互作用的数据中预测药物反应。DIVERSE 系统地整合了数据源,逐步检查每个添加数据集的重要性。更具体地说,我们依次整合了五个不同的数据集,这些数据集在以前用于预测药物反应的生物信息学方法中没有全部组合在一起。实证实验表明,在交叉验证中,DIVERSE 明显优于其他五种方法,包括三种最先进的方法,特别是在矩阵外预测方面,它更接近实际用例的设置,比简单的矩阵内预测更具挑战性。此外,用于发现新药的案例研究进一步证实了 DIVERSE 的性能优势。