IEEE Trans Biomed Eng. 2020 Jul;67(7):2110-2118. doi: 10.1109/TBME.2019.2954989. Epub 2019 Nov 21.
The study of pathogenic mechanism at the genetic level by imaging genetics methods enables to effectively reveal the association of histopathology and genetics. However, there is a lack of effective and accurate tools to establish association models from macroscopic to microscopic.
The multi-constrained joint non-negative matrix factorization (MCJNMF) was developed for simultaneous integration of genomic data and image data to identify common modules related to disease. Two types of data matrices were projected onto a common feature space, in which heterogeneous variables with large coefficients in the same projected direction form a common module. Meanwhile, the correlation between original data features was integrated by using regularization constraints to improve the biological relevance. Sparsity constraints and orthogonal constraints were performed on decomposition factors to minimize the redundancy between different bases and to reduce algorithm complexity.
This algorithm was successfully performed on the module identification of lung metastasis in soft tissue sarcomas (STSs) by integrating FDG-PET image and DNA methylation data features. Multilevel analysis on the top extracted modules revealed that these modules were closely related to the lung metastasis. Particularly, several genes with diagnostic potential for lung metastasis can be discovered from high score modules.
This method not only can be applied for the accurate identification of patterns related to pathogenic mechanism of diseases, but also has a significant implication for discovering protein biomarkers.
This method provides avenues for further studies of identifying complex association patterns of diseases according to different types of biological data.
通过影像遗传学方法从遗传水平研究致病机制,能够有效地揭示组织病理学与遗传学的关联。然而,缺乏从宏观到微观建立关联模型的有效且准确的工具。
提出了多约束联合非负矩阵分解(MCJNMF)方法,用于整合基因组数据和图像数据以识别与疾病相关的常见模块。将两种类型的数据矩阵投影到一个公共特征空间中,其中在同一投影方向上具有较大系数的异构变量形成一个共同模块。同时,通过使用正则化约束来整合原始数据特征之间的相关性,以提高生物学相关性。对分解因子进行稀疏约束和正交约束,以最小化不同基之间的冗余并降低算法复杂度。
该算法成功地应用于整合 FDG-PET 图像和 DNA 甲基化数据特征来识别软组织肉瘤(STSs)肺转移的模块。对提取的顶级模块进行多层次分析表明,这些模块与肺转移密切相关。特别是,从高评分模块中可以发现一些具有肺转移诊断潜力的基因。
该方法不仅可以用于准确识别与疾病致病机制相关的模式,还可以用于发现蛋白质生物标志物。
该方法为根据不同类型的生物数据进一步研究识别疾病的复杂关联模式提供了途径。