Medical Informatics Laboratory, School of Computing, Queen's University, Kingston, K7L 3N6, Canada,
Pac Symp Biocomput. 2022;27:373-384.
Next-generation sequencing has provided rapid collection and quantification of 'big' biological data. In particular, multi-omics and integration of different molecular data such as miRNA and mRNA can provide important insights to disease classification and processes. There is a need for computational methods that can correctly model and interpret these relationships, and handle the difficulties of large-scale data. In this study, we develop a novel method of representing miRNA-mRNA interactions to classify cancer. Specifically, graphs are designed to account for the interactions and biological communication between miRNAs and mRNAs, using message-passing and attention mechanisms. Patient-matched miRNA and mRNA expression data is obtained from The Cancer Genome Atlas for 12 cancers, and targeting information is incorporated from TargetScan. A Graph Transformer Network (GTN) is selected to provide high interpretability of classification through self-attention mechanisms. The GTN is able to classify the 12 different cancers with an accuracy of 93.56% and is compared to a Graph Convolutional Network, Random Forest, Support Vector Machine, and Multilayer Perceptron. While the GTN does not outperform all of the other classifiers in terms of accuracy, it allows high interpretation of results. Multi-omics models are compared and generally outperform their respective single-omics performance. Extensive analysis of attention identifies important targeting pathways and molecular biomarkers based on integrated miRNA and mRNA expression.
下一代测序技术提供了快速收集和量化“大”生物数据的方法。特别是,多组学和整合不同的分子数据,如 miRNA 和 mRNA,可以为疾病分类和过程提供重要的见解。需要开发能够正确建模和解释这些关系并处理大规模数据困难的计算方法。在这项研究中,我们开发了一种新的 miRNA-mRNA 相互作用表示方法来对癌症进行分类。具体来说,使用消息传递和注意力机制设计图来考虑 miRNA 和 mRNA 之间的相互作用和生物通讯。从癌症基因组图谱中获得了 12 种癌症的患者匹配的 miRNA 和 mRNA 表达数据,并整合了来自 TargetScan 的靶向信息。选择图转换器网络(GTN)通过自注意力机制提供分类的高可解释性。GTN 能够以 93.56%的准确率对 12 种不同的癌症进行分类,并与图卷积网络、随机森林、支持向量机和多层感知机进行比较。虽然 GTN 在准确性方面并不优于所有其他分类器,但它允许对结果进行高度解释。多组学模型进行了比较,总体上优于各自的单组学性能。注意力的广泛分析根据整合的 miRNA 和 mRNA 表达确定了重要的靶向途径和分子生物标志物。