Shakyawar Sushil K, Sajja Balasrinivasa R, Patel Jai Chand, Guda Chittibabu
Department of Genetics, Cell Biology and Anatomy, University of Nebraska Medical Center, Omaha, NE 68198, United States.
Department of Radiology, University of Nebraska Medical Center, Omaha, NE 68198, United States.
Bioinform Adv. 2024 Jan 30;4(1):vbae015. doi: 10.1093/bioadv/vbae015. eCollection 2024.
MOTIVATION: Patient stratification is crucial for the effective treatment or management of heterogeneous diseases, including cancers. Multiomic technologies facilitate molecular characterization of human diseases; however, the complexity of data warrants the need for the development of robust data integration tools for patient stratification using machine-learning approaches. RESULTS: CluF iteratively integrates three types of multiomic data (mRNA, miRNA, and DNA methylation) using pairwise patient similarity matrices built from each omic data. The intermediate omic-specific neighborhood matrices implement iterative matrix fusion and message passing among the similarity matrices to derive a final integrated matrix representing all the omics profiles of a patient, which is used to further cluster patients into subtypes. CluF outperforms other methods with significant differences in the survival profiles of 8581 patients belonging to 30 different cancers in TCGA. CluF also predicted the four intrinsic subtypes of Breast Invasive Carcinomas with adjusted rand index and Fowlkes-Mallows scores of 0.72 and 0.83, respectively. The Gini importance score showed that methylation features were the primary decisive players, followed by mRNA and miRNA to identify disease subtypes. CluF can be applied to stratify patients with any disease containing multiomic datasets. AVAILABILITY AND IMPLEMENTATION: Source code and datasets are available at https://github.com/GudaLab/iCluF_core.
动机:患者分层对于包括癌症在内的异质性疾病的有效治疗或管理至关重要。多组学技术有助于对人类疾病进行分子特征分析;然而,数据的复杂性使得有必要开发强大的数据集成工具,以便使用机器学习方法进行患者分层。 结果:CluF使用从每个组学数据构建的成对患者相似性矩阵,迭代地整合三种类型的多组学数据(mRNA、miRNA和DNA甲基化)。中间的组学特异性邻域矩阵在相似性矩阵之间实现迭代矩阵融合和消息传递,以得出代表患者所有组学概况的最终整合矩阵,该矩阵用于进一步将患者聚类为不同亚型。在TCGA中,CluF在属于30种不同癌症的8581名患者的生存概况方面显著优于其他方法。CluF还预测了乳腺浸润性癌的四种内在亚型,调整后的兰德指数和福克尔斯 - 马洛斯分数分别为0.72和0.83。基尼重要性分数表明,甲基化特征是主要的决定性因素,其次是mRNA和miRNA,用于识别疾病亚型。CluF可应用于对任何包含多组学数据集的疾病患者进行分层。 可用性和实现方式:源代码和数据集可在https://github.com/GudaLab/iCluF_core获取。
Bioinformatics. 2021-12-11
Bioinformatics. 2019-9-15
Bioinformatics. 2019-5-15
IEEE/ACM Trans Comput Biol Bioinform. 2022
Comput Struct Biotechnol J. 2022-7-2
Nucleic Acids Res. 2023-8-25
Stat Methods Med Res. 2020-10
Int J Mol Sci. 2025-7-23
Int J Mol Sci. 2025-6-21
Front Pharmacol. 2025-4-30
Int J Mol Sci. 2025-3-30
NPJ Breast Cancer. 2023-3-22
Breast Cancer Res Treat. 2023-5
Front Bioinform. 2022-6-27
Wiley Interdiscip Rev Comput Stat. 2022
Front Genet. 2022-3-22