Suppr超能文献

利用机器学习预测缺失的蛋白质组学值:利用转录组学和其他生物学特征填补空白

Predicting missing proteomics values using machine learning: Filling the gap using transcriptomics and other biological features.

作者信息

Ochoteco Asensio Juan, Verheijen Marcha, Caiment Florian

机构信息

Department of Toxicogenomics, School of Oncology and Developmental Biology (GROW), Maastricht University, Maastricht, The Netherlands.

出版信息

Comput Struct Biotechnol J. 2022 Apr 22;20:2057-2069. doi: 10.1016/j.csbj.2022.04.017. eCollection 2022.

Abstract

Proteins are often considered the main biological element in charge of the different functions and structures of a cell. However, proteomics, the global study of all expressed proteins, often performed by mass spectrometry, is limited by its stochastic sampling and can only quantify a limited amount of protein per sample. Transcriptomics, which allows an exhaustive analysis of all expressed transcripts, is often used as a surrogate. However, the transcript level does not present a high level of correlation with the corresponding protein level, notably due to the existence of several post-transcriptional regulatory mechanisms. In this publication, we hypothesize that the missing protein values in proteomics could be predicted using machine learning regression methods, trained with many features extracted from transcriptomics, including known translational regulatory elements such as microRNAs and circular RNAs. After considering different machine learning algorithms applied on two different splitting strategies, we report that random forest can predict proteins in new samples out of transcriptomics data with good accuracy. The proposed pre-processing and model building scripts can be accessed on GitHub: https://github.com/jochotecoa/ml_proteomics.

摘要

蛋白质通常被认为是负责细胞不同功能和结构的主要生物元素。然而,蛋白质组学,即对所有表达蛋白质的全面研究,通常通过质谱法进行,受到其随机采样的限制,每个样本只能定量有限数量的蛋白质。转录组学允许对所有表达的转录本进行详尽分析,常被用作替代方法。然而,转录水平与相应蛋白质水平的相关性并不高,特别是由于存在多种转录后调控机制。在本出版物中,我们假设蛋白质组学中缺失的蛋白质值可以使用机器学习回归方法进行预测,这些方法通过从转录组学中提取的许多特征进行训练,包括已知的翻译调控元件,如微小RNA和环状RNA。在考虑了应用于两种不同拆分策略的不同机器学习算法后,我们报告随机森林可以从转录组学数据中准确预测新样本中的蛋白质。所提出的预处理和模型构建脚本可在GitHub上获取:https://github.com/jochotecoa/ml_proteomics。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验