Suppr超能文献

利用机器学习预测缺失的蛋白质组学值:利用转录组学和其他生物学特征填补空白

Predicting missing proteomics values using machine learning: Filling the gap using transcriptomics and other biological features.

作者信息

Ochoteco Asensio Juan, Verheijen Marcha, Caiment Florian

机构信息

Department of Toxicogenomics, School of Oncology and Developmental Biology (GROW), Maastricht University, Maastricht, The Netherlands.

出版信息

Comput Struct Biotechnol J. 2022 Apr 22;20:2057-2069. doi: 10.1016/j.csbj.2022.04.017. eCollection 2022.

Abstract

Proteins are often considered the main biological element in charge of the different functions and structures of a cell. However, proteomics, the global study of all expressed proteins, often performed by mass spectrometry, is limited by its stochastic sampling and can only quantify a limited amount of protein per sample. Transcriptomics, which allows an exhaustive analysis of all expressed transcripts, is often used as a surrogate. However, the transcript level does not present a high level of correlation with the corresponding protein level, notably due to the existence of several post-transcriptional regulatory mechanisms. In this publication, we hypothesize that the missing protein values in proteomics could be predicted using machine learning regression methods, trained with many features extracted from transcriptomics, including known translational regulatory elements such as microRNAs and circular RNAs. After considering different machine learning algorithms applied on two different splitting strategies, we report that random forest can predict proteins in new samples out of transcriptomics data with good accuracy. The proposed pre-processing and model building scripts can be accessed on GitHub: https://github.com/jochotecoa/ml_proteomics.

摘要

蛋白质通常被认为是负责细胞不同功能和结构的主要生物元素。然而,蛋白质组学,即对所有表达蛋白质的全面研究,通常通过质谱法进行,受到其随机采样的限制,每个样本只能定量有限数量的蛋白质。转录组学允许对所有表达的转录本进行详尽分析,常被用作替代方法。然而,转录水平与相应蛋白质水平的相关性并不高,特别是由于存在多种转录后调控机制。在本出版物中,我们假设蛋白质组学中缺失的蛋白质值可以使用机器学习回归方法进行预测,这些方法通过从转录组学中提取的许多特征进行训练,包括已知的翻译调控元件,如微小RNA和环状RNA。在考虑了应用于两种不同拆分策略的不同机器学习算法后,我们报告随机森林可以从转录组学数据中准确预测新样本中的蛋白质。所提出的预处理和模型构建脚本可在GitHub上获取:https://github.com/jochotecoa/ml_proteomics。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验