探索用于反相保留时间预测的浅层机器学习模型的变量空间。

Exploring the variable space of shallow machine learning models for reversed-phase retention time prediction.

作者信息

Yeung Darien, Spicer Victor, Zahedi René P, Krokhin Oleg

机构信息

Department of Biochemistry and Medical Genetics, University of Manitoba, 336 BMSB, 745 Bannatyne Avenue, Winnipeg R3E 0J9, Canada.

Manitoba Centre for Proteomics and Systems Biology, University of Manitoba, 799 JBRC, 715 McDermot Avenue, Winnipeg R3E 3P4, Canada.

出版信息

Comput Struct Biotechnol J. 2023 Feb 27;21:2446-2453. doi: 10.1016/j.csbj.2023.02.047. eCollection 2023.

DOI:10.1016/j.csbj.2023.02.047

PMID:37090433

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10113922/

Abstract

Peptide retention time (RT) prediction algorithms are tools to study and identify the physicochemical properties that drive the peptide-sorbent interaction. Traditional RT algorithms use multiple linear regression with manually curated parameters to determine the degree of direct contribution for each parameter and improvements to RT prediction accuracies relied on superior feature engineering. Deep learning led to a significant increase in RT prediction accuracy and automated feature engineering via chaining multiple learning modules. However, the significance and the identity of these extracted variables are not well understood due to the inherent complexity when interpreting "relationships-of-relationships" found in deep learning variables. To achieve both accuracy and interpretability simultaneously, we isolated individual modules used in deep learning and the isolated modules are the shallow learners employed for RT prediction in this work. Using a shallow convolutional neural network (CNN) and gated recurrent unit (GRU), we find that the spatial features obtained via the CNN correlate with real-world physicochemical properties namely cross-collisional sections (CCS) and variations of assessable surface area (ASA). Furthermore, we determined that the discovered parameters are "micro-coefficients" that contribute to the "macro-coefficient" - hydrophobicity. Manually embedding CCS and the variations of ASA to the GRU model yielded an R2 = 0.981 using only 525 variables and can represent 88% of the ∼110,000 tryptic peptides used in our dataset. This work highlights the feature discovery process of our shallow learners can achieve beyond traditional RT models in performance and have better interpretability when compared with the deep learning RT algorithms found in the literature.

摘要

肽保留时间（RT）预测算法是用于研究和识别驱动肽与吸附剂相互作用的物理化学性质的工具。传统的RT算法使用具有手动策划参数的多元线性回归来确定每个参数的直接贡献程度，并且RT预测准确性的提高依赖于卓越的特征工程。深度学习通过链接多个学习模块，显著提高了RT预测准确性并实现了自动特征工程。然而，由于在解释深度学习变量中发现的“关系的关系”时存在固有的复杂性，这些提取变量的重要性和身份尚未得到很好的理解。为了同时实现准确性和可解释性，我们分离了深度学习中使用的各个模块，并且在本工作中，这些分离的模块是用于RT预测的浅层学习器。使用浅层卷积神经网络（CNN）和门控循环单元（GRU），我们发现通过CNN获得的空间特征与实际的物理化学性质相关，即交叉碰撞截面（CCS）和可评估表面积（ASA）的变化。此外，我们确定所发现的参数是有助于“宏观系数”——疏水性的“微观系数”。仅使用525个变量将CCS和ASA的变化手动嵌入到GRU模型中，得到的R2 = 0.981，并且可以代表我们数据集中约110,000个胰蛋白酶肽中的88%。这项工作突出了我们的浅层学习器的特征发现过程在性能上可以超越传统RT模型，并且与文献中发现的深度学习RT算法相比具有更好的可解释性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1586/10113922/661a43f3252e/ga1.jpg

相似文献

Exploring the variable space of shallow machine learning models for reversed-phase retention time prediction.

Comput Struct Biotechnol J. 2023 Feb 27;21:2446-2453. doi: 10.1016/j.csbj.2023.02.047. eCollection 2023.

Spatial modelling of soil salinity: deep or shallow learning models?

Environ Sci Pollut Res Int. 2021 Aug;28(29):39432-39450. doi: 10.1007/s11356-021-13503-7. Epub 2021 Mar 23.

Efficient mapping of crash risk at intersections with connected vehicle data and deep learning models.

Accid Anal Prev. 2020 Sep;144:105665. doi: 10.1016/j.aap.2020.105665. Epub 2020 Jul 16.

A Novel Groundwater Burial Depth Prediction Model Based on Two-Stage Modal Decomposition and Deep Learning.

Int J Environ Res Public Health. 2022 Dec 26;20(1):345. doi: 10.3390/ijerph20010345.

Evaluation of Machine Learning Models for Proteoform Retention and Migration Time Prediction in Top-Down Mass Spectrometry.

J Proteome Res. 2022 Jul 1;21(7):1736-1747. doi: 10.1021/acs.jproteome.2c00124. Epub 2022 May 26.

An integrated 3D CNN-GRU deep learning method for short-term prediction of PM2.5 concentration in urban environment.

Sci Total Environ. 2022 Aug 15;834:155324. doi: 10.1016/j.scitotenv.2022.155324. Epub 2022 Apr 19.

Accurate Prediction of Anti-hypertensive Peptides Based on Convolutional Neural Network and Gated Recurrent unit.

Interdiscip Sci. 2022 Dec;14(4):879-894. doi: 10.1007/s12539-022-00521-3. Epub 2022 Apr 27.

Prediction of anticancer peptides based on an ensemble model of deep learning and machine learning using ordinal positional encoding.

Brief Bioinform. 2023 Jan 19;24(1). doi: 10.1093/bib/bbac630.

Isolated Convolutional-Neural-Network-Based Deep-Feature Extraction for Brain Tumor Classification Using Shallow Classifier.

Diagnostics (Basel). 2022 Jul 24;12(8):1793. doi: 10.3390/diagnostics12081793.

Deep learning for retention time prediction in reversed-phase liquid chromatography.

J Chromatogr A. 2022 Feb 8;1664:462792. doi: 10.1016/j.chroma.2021.462792. Epub 2021 Dec 30.

引用本文的文献

Recent Advances in Mass Spectrometry-Based Bottom-Up Proteomics.

Anal Chem. 2025 Mar 11;97(9):4728-4749. doi: 10.1021/acs.analchem.4c06750. Epub 2025 Feb 25.

本文引用的文献

DeepLC can predict retention times for peptides that carry as-yet unseen modifications.

Nat Methods. 2021 Nov;18(11):1363-1369. doi: 10.1038/s41592-021-01301-5. Epub 2021 Oct 28.

Zika Infection Disrupts Proteins Involved in the Neurosensory System.

Front Cell Dev Biol. 2020 Jul 29;8:571. doi: 10.3389/fcell.2020.00571. eCollection 2020.

The proteome landscape of the kingdoms of life.

Nature. 2020 Jun;582(7813):592-596. doi: 10.1038/s41586-020-2402-x. Epub 2020 Jun 17.

Separation Orthogonality in Liquid Chromatography-Mass Spectrometry for Proteomic Applications: Comparison of 16 Different Two-Dimensional Combinations.

Anal Chem. 2020 Mar 3;92(5):3904-3912. doi: 10.1021/acs.analchem.9b05407. Epub 2020 Feb 14.

Prosit: proteome-wide prediction of peptide tandem mass spectra by deep learning.

Nat Methods. 2019 Jun;16(6):509-518. doi: 10.1038/s41592-019-0426-7. Epub 2019 May 27.

Improved Peptide Retention Time Prediction in Liquid Chromatography through Deep Learning.

Anal Chem. 2018 Sep 18;90(18):10881-10888. doi: 10.1021/acs.analchem.8b02386. Epub 2018 Aug 29.

3D HPLC-MS with Reversed-Phase Separation Functionality in All Three Dimensions for Large-Scale Bottom-Up Proteomics and Peptide Retention Data Collection.

Anal Chem. 2016 Mar 1;88(5):2847-55. doi: 10.1021/acs.analchem.5b04567. Epub 2016 Feb 18.

N-capping motifs promote interaction of amphipathic helical peptides with hydrophobic surfaces and drastically alter hydrophobicity values of individual amino acids.

Anal Chem. 2014 Dec 2;86(23):11498-502. doi: 10.1021/ac503352h. Epub 2014 Nov 10.

Utility of retention prediction model for investigation of peptide separation selectivity in reversed-phase liquid chromatography: impact of concentration of trifluoroacetic acid, column temperature, gradient slope and type of stationary phase.

Anal Chem. 2010 Jan 1;82(1):265-75. doi: 10.1021/ac901931c.

Peptide retention standards and hydrophobicity indexes in reversed-phase high-performance liquid chromatography of peptides.

Anal Chem. 2009 Nov 15;81(22):9522-30. doi: 10.1021/ac9016693.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

探索用于反相保留时间预测的浅层机器学习模型的变量空间。

Exploring the variable space of shallow machine learning models for reversed-phase retention time prediction.

作者信息

Yeung Darien, Spicer Victor, Zahedi René P, Krokhin Oleg

机构信息

Department of Biochemistry and Medical Genetics, University of Manitoba, 336 BMSB, 745 Bannatyne Avenue, Winnipeg R3E 0J9, Canada.

Manitoba Centre for Proteomics and Systems Biology, University of Manitoba, 799 JBRC, 715 McDermot Avenue, Winnipeg R3E 3P4, Canada.

出版信息

Comput Struct Biotechnol J. 2023 Feb 27;21:2446-2453. doi: 10.1016/j.csbj.2023.02.047. eCollection 2023.

DOI:10.1016/j.csbj.2023.02.047

PMID:37090433

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10113922/

Abstract

摘要

探索用于反相保留时间预测的浅层机器学习模型的变量空间。

Exploring the variable space of shallow machine learning models for reversed-phase retention time prediction.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

探索用于反相保留时间预测的浅层机器学习模型的变量空间。

Exploring the variable space of shallow machine learning models for reversed-phase retention time prediction.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献