Krembil Research Institute, University Health Network, 399 Bathurst Street, Suite 4W-449, Toronto, ON, M5T 2S8, Canada.
Division of Neurosurgery, Department of Surgery, University of Toronto, 149 College Street, 5th Floor, Toronto, ON, M5T 1P5, Canada.
Sci Rep. 2022 Jun 11;12(1):9669. doi: 10.1038/s41598-022-13665-5.
Application of deep learning methods to transcriptomic data has the potential to enhance the accuracy and efficiency of tissue classification and cell state identification. Herein, we developed a multitask deep learning model for tissue classification combining publicly available whole transcriptomic (RNA-seq) datasets of non-neoplastic, neoplastic and peri-neoplastic tissue to classify disease state, tissue origin and neoplastic subclass. RNA-seq data from a total of 10,116 patient samples processed through a common pipeline were used for model training and validation. The model achieved 99% accuracy for disease state classification (ROC-AUC of 0.98) and 97% accuracy for tissue origin (ROC-AUC of 0.99). Moreover, the model achieved an accuracy of 92% (ROC-AUC 0.95) for neoplastic subclassification. This is the first multitask deep learning algorithm developed for tissue classification employing a uniform pipeline analysis of transcriptomic data with multiple tissue classifiers. This model serves as a framework for incorporating large transcriptomic datasets across conditions to facilitate clinical diagnosis and cell-based treatment strategies.
深度学习方法在转录组数据中的应用有可能提高组织分类和细胞状态识别的准确性和效率。在此,我们开发了一种多任务深度学习模型,用于结合公开的非肿瘤、肿瘤和肿瘤周围组织的全转录组(RNA-seq)数据集进行组织分类,以分类疾病状态、组织来源和肿瘤亚类。通过通用管道处理的总计 10116 个患者样本的 RNA-seq 数据用于模型训练和验证。该模型对疾病状态分类的准确率达到 99%(ROC-AUC 为 0.98),对组织来源的准确率达到 97%(ROC-AUC 为 0.99)。此外,该模型对肿瘤亚类的准确率达到 92%(ROC-AUC 为 0.95)。这是第一个使用多组织分类器对转录组数据进行统一管道分析的多任务深度学习算法。该模型为整合跨条件的大型转录组数据集以促进临床诊断和基于细胞的治疗策略提供了一个框架。