Zhejiang Provincial Laboratory of Life Sciences and Biomedicine, Key Laboratory of Structural Biology of Zhejiang Province, School of Life Sciences, Westlake University, 18 Shilongshan Road, Hangzhou 310024, Zhejiang Province, China.
Institute of Basic Medical Sciences, Westlake Institute for Advanced Study, Hangzhou 310024, Zhejiang Province, China.
J Am Soc Mass Spectrom. 2020 Nov 4;31(11):2296-2304. doi: 10.1021/jasms.0c00254. Epub 2020 Oct 26.
A novel approach for phenotype prediction is developed for data-independent acquisition (DIA) mass spectrometric (MS) data without the need for peptide precursor identification using existing DIA software tools. The first step converts the DIA-MS data file into a new file format called DIA tensor (DIAT), which can be used for the convenient visualization of all the ions from peptide precursors and fragments. DIAT files can be fed directly into a deep neural network to predict phenotypes such as appearances of cats, dogs, and microscopic images. As a proof of principle, we applied this approach to 102 hepatocellular carcinoma samples and achieved an accuracy of 96.8% in distinguishing malignant from benign samples. We further applied a refined model to classify thyroid nodules. Deep learning based on 492 training samples achieved an accuracy of 91.7% in an independent cohort of 216 test samples. This approach surpassed the deep-learning model based on peptide and protein matrices generated by OpenSWATH. In summary, we present a new strategy for DIA data analysis based on a novel data format called DIAT, which enables facile two-dimensional visualization of DIA proteomics data. DIAT files can be directly used for deep learning for biological and clinical phenotype classification. Future research will interpret the deep-learning models emerged from DIAT analysis.
开发了一种新的方法,用于在无需使用现有 DIA 软件工具识别肽前体的情况下,对数据独立采集 (DIA) 质谱 (MS) 数据进行表型预测。该方法的第一步是将 DIA-MS 数据文件转换为一种新的文件格式,称为 DIA 张量 (DIAT),可用于方便地可视化肽前体和片段的所有离子。可以将 DIAT 文件直接输入到深度神经网络中,以预测表型,如猫、狗的外观和微观图像。作为原理验证,我们将该方法应用于 102 个肝细胞癌样本,在区分良恶性样本方面的准确率达到 96.8%。我们进一步应用了一个改进的模型来对甲状腺结节进行分类。基于 492 个训练样本的深度学习在一个独立的 216 个测试样本队列中达到了 91.7%的准确率。该方法优于基于 OpenSWATH 生成的肽和蛋白质矩阵的深度学习模型。总之,我们提出了一种新的基于 DIAT 的 DIA 数据分析策略,该策略能够方便地对 DIA 蛋白质组学数据进行二维可视化。可以直接使用 DIAT 文件进行生物和临床表型分类的深度学习。未来的研究将解释 DIAT 分析中出现的深度学习模型。