BIO3 - Systems Genetics, GIGA-R Medical Genomics, University of Liège, 4000, Liège, Belgium.
Post-Doctoral Fellow, ETH AI center, Zürich, Switzerland.
Sci Rep. 2023 Nov 10;13(1):19653. doi: 10.1038/s41598-023-46392-6.
Personalised cancer screening before therapy paves the way toward improving diagnostic accuracy and treatment outcomes. Most approaches are limited to a single data type and do not consider interactions between features, leaving aside the complementary insights that multimodality and systems biology can provide. In this project, we demonstrate the use of graph theory for data integration via individual networks where nodes and edges are individual-specific. We showcase the consequences of early, intermediate, and late graph-based fusion of RNA-Seq data and histopathology whole-slide images for predicting cancer subtypes and severity. The methodology developed is as follows: (1) we create individual networks; (2) we compute the similarity between individuals from these graphs; (3) we train our model on the similarity matrices; (4) we evaluate the performance using the macro F1 score. Pros and cons of elements of the pipeline are evaluated on publicly available real-life datasets. We find that graph-based methods can increase performance over methods that do not study interactions. Additionally, merging multiple data sources often improves classification compared to models based on single data, especially through intermediate fusion. The proposed workflow can easily be adapted to other disease contexts to accelerate and enhance personalized healthcare.
个性化癌症筛查在治疗前为提高诊断准确性和治疗效果铺平了道路。大多数方法仅限于单一数据类型,不考虑特征之间的相互作用,忽略了多模态和系统生物学可以提供的互补见解。在这个项目中,我们展示了如何通过个体网络(节点和边都是个体特有的)使用图论进行数据集成。我们展示了基于 RNA-Seq 数据和组织病理学全切片图像的早期、中期和晚期基于图的融合在预测癌症亚型和严重程度方面的结果。所开发的方法如下:(1)我们创建个体网络;(2)我们计算这些图中个体之间的相似性;(3)我们在相似性矩阵上训练我们的模型;(4)我们使用宏 F1 分数评估性能。在公开的真实数据集上评估了该管道的优缺点。我们发现基于图的方法可以提高性能,超过不研究相互作用的方法。此外,与基于单一数据源的模型相比,融合多个数据源通常可以提高分类性能,特别是通过中间融合。所提出的工作流程可以很容易地适应其他疾病环境,以加速和增强个性化医疗。