The State Key Laboratory of Chemical Oncogenomics, Key Laboratory of Chemical Biology, Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen 518055, P.R. China.
Bioinformatics and Drug Design Group, Department of Pharmacy, and Center for Computational Science and Engineering, National University of Singapore 117543, Singapore.
Nucleic Acids Res. 2022 May 6;50(8):e45. doi: 10.1093/nar/gkac010.
Omics-based biomedical learning frequently relies on data of high-dimensions (up to thousands) and low-sample sizes (dozens to hundreds), which challenges efficient deep learning (DL) algorithms, particularly for low-sample omics investigations. Here, an unsupervised novel feature aggregation tool AggMap was developed to Aggregate and Map omics features into multi-channel 2D spatial-correlated image-like feature maps (Fmaps) based on their intrinsic correlations. AggMap exhibits strong feature reconstruction capabilities on a randomized benchmark dataset, outperforming existing methods. With AggMap multi-channel Fmaps as inputs, newly-developed multi-channel DL AggMapNet models outperformed the state-of-the-art machine learning models on 18 low-sample omics benchmark tasks. AggMapNet exhibited better robustness in learning noisy data and disease classification. The AggMapNet explainable module Simply-explainer identified key metabolites and proteins for COVID-19 detections and severity predictions. The unsupervised AggMap algorithm of good feature restructuring abilities combined with supervised explainable AggMapNet architecture establish a pipeline for enhanced learning and interpretability of low-sample omics data.
基于组学的生物医学学习通常依赖于高维(高达数千维)和低样本量(几十到几百)的数据,这对有效的深度学习(DL)算法提出了挑战,特别是对于低样本组学研究。在这里,开发了一种无监督的新型特征聚合工具 AggMap,用于根据内在相关性将组学特征聚合并映射到多通道 2D 空间相关的图像状特征图(Fmap)中。AggMap 在随机基准数据集上表现出强大的特征重建能力,优于现有方法。使用 AggMap 多通道 Fmap 作为输入,新开发的多通道 DL AggMapNet 模型在 18 个低样本组学基准任务上优于最先进的机器学习模型。AggMapNet 在学习噪声数据和疾病分类方面表现出更好的鲁棒性。可解释模块 Simply-explainer 识别了用于 COVID-19 检测和严重程度预测的关键代谢物和蛋白质。具有良好特征重构能力的无监督 AggMap 算法与有监督可解释 AggMapNet 架构相结合,为低样本组学数据的增强学习和可解释性建立了一个管道。