• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

蛋白质基因组学面临的挑战:以 DREAM 蛋白质基因组学子挑战为例的分析方法比较。

Challenges in proteogenomics: a comparison of analysis methods with the case study of the DREAM proteogenomics sub-challenge.

机构信息

Department of Computer Science and Engineering, The Ohio State University, Columbus, OH, 43210, USA.

Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH, 43210, USA.

出版信息

BMC Bioinformatics. 2019 Dec 20;20(Suppl 24):669. doi: 10.1186/s12859-019-3253-z.

DOI:10.1186/s12859-019-3253-z
PMID:31861998
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6923881/
Abstract

BACKGROUND

Proteomic measurements, which closely reflect phenotypes, provide insights into gene expression regulations and mechanisms underlying altered phenotypes. Further, integration of data on proteome and transcriptome levels can validate gene signatures associated with a phenotype. However, proteomic data is not as abundant as genomic data, and it is thus beneficial to use genomic features to predict protein abundances when matching proteomic samples or measurements within samples are lacking.

RESULTS

We evaluate and compare four data-driven models for prediction of proteomic data from mRNA measured in breast and ovarian cancers using the 2017 DREAM Proteogenomics Challenge data. Our results show that Bayesian network, random forests, LASSO, and fuzzy logic approaches can predict protein abundance levels with median ground truth-predicted correlation values between 0.2 and 0.5. However, the most accurately predicted proteins differ considerably between approaches.

CONCLUSIONS

In addition to benchmarking aforementioned machine learning approaches for predicting protein levels from transcript levels, we discuss challenges and potential solutions in state-of-the-art proteogenomic analyses.

摘要

背景

蛋白质组学测量结果与表型密切相关,能够深入了解基因表达调控和表型改变的机制。此外,整合蛋白质组和转录组水平的数据可以验证与表型相关的基因特征。然而,蛋白质组学数据不像基因组学数据那样丰富,因此在缺乏匹配的蛋白质组学样本或样本内测量值时,使用基因组特征来预测蛋白质丰度是有益的。

结果

我们使用 2017 年 DREAM 蛋白质组学挑战赛的数据,评估和比较了四种基于数据驱动的模型,用于预测乳腺癌和卵巢癌中测量的 mRNA 的蛋白质组学数据。我们的结果表明,贝叶斯网络、随机森林、LASSO 和模糊逻辑方法可以预测蛋白质丰度水平,其中位数真实值-预测值相关性值在 0.2 到 0.5 之间。然而,最准确预测的蛋白质在不同方法之间存在显著差异。

结论

除了对预测转录水平的蛋白质水平的上述机器学习方法进行基准测试外,我们还讨论了当前蛋白质组学分析中的挑战和潜在解决方案。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4c1f/6923881/6f408e203501/12859_2019_3253_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4c1f/6923881/0baaa6851685/12859_2019_3253_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4c1f/6923881/5c9feb82101e/12859_2019_3253_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4c1f/6923881/b54a80ab8c8b/12859_2019_3253_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4c1f/6923881/7e2bac00ad0e/12859_2019_3253_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4c1f/6923881/b163459e778a/12859_2019_3253_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4c1f/6923881/05c2e292d066/12859_2019_3253_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4c1f/6923881/a79221bfcf21/12859_2019_3253_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4c1f/6923881/9691d441dbcb/12859_2019_3253_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4c1f/6923881/6f408e203501/12859_2019_3253_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4c1f/6923881/0baaa6851685/12859_2019_3253_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4c1f/6923881/5c9feb82101e/12859_2019_3253_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4c1f/6923881/b54a80ab8c8b/12859_2019_3253_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4c1f/6923881/7e2bac00ad0e/12859_2019_3253_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4c1f/6923881/b163459e778a/12859_2019_3253_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4c1f/6923881/05c2e292d066/12859_2019_3253_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4c1f/6923881/a79221bfcf21/12859_2019_3253_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4c1f/6923881/9691d441dbcb/12859_2019_3253_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4c1f/6923881/6f408e203501/12859_2019_3253_Fig9_HTML.jpg

相似文献

1
Challenges in proteogenomics: a comparison of analysis methods with the case study of the DREAM proteogenomics sub-challenge.蛋白质基因组学面临的挑战:以 DREAM 蛋白质基因组学子挑战为例的分析方法比较。
BMC Bioinformatics. 2019 Dec 20;20(Suppl 24):669. doi: 10.1186/s12859-019-3253-z.
2
Joint learning improves protein abundance prediction in cancers.联合学习提高癌症中蛋白质丰度预测。
BMC Biol. 2019 Dec 23;17(1):107. doi: 10.1186/s12915-019-0730-9.
3
Proteome-wide onco-proteogenomic somatic variant identification in ER-positive breast cancer.雌激素受体阳性乳腺癌中全蛋白质组肿瘤蛋白质基因组体细胞变异鉴定
Clin Biochem. 2019 Apr;66:63-75. doi: 10.1016/j.clinbiochem.2019.01.005. Epub 2019 Jan 23.
4
Protein prediction models support widespread post-transcriptional regulation of protein abundance by interacting partners.蛋白质预测模型通过与相互作用伙伴的相互作用,支持蛋白质丰度的广泛转录后调控。
PLoS Comput Biol. 2022 Nov 10;18(11):e1010702. doi: 10.1371/journal.pcbi.1010702. eCollection 2022 Nov.
5
Proteogenomics Gets onto the Regulation of mRNA Decoding and Translation into Protein.蛋白质基因组学涉足 mRNA 解码和翻译为蛋白质的调控。
Proteomics. 2017 Nov;17(21). doi: 10.1002/pmic.201700315.
6
Extracting Pathway-level Signatures from Proteogenomic Data in Breast Cancer Using Independent Component Analysis.基于独立成分分析从乳腺癌的蛋白质基因组数据中提取通路水平特征。
Mol Cell Proteomics. 2019 Aug 9;18(8 suppl 1):S169-S182. doi: 10.1074/mcp.TIR119.001442. Epub 2019 Jun 18.
7
Evaluation of machine learning models on protein level inference from prioritized RNA features.基于优先级 RNA 特征的蛋白质水平推断的机器学习模型评估。
Brief Bioinform. 2022 May 13;23(3). doi: 10.1093/bib/bbac091.
8
Mass Spectrometry-Based Proteogenomics: New Therapeutic Opportunities for Precision Medicine.基于质谱的蛋白质基因组学:精准医学的新治疗机会。
Annu Rev Pharmacol Toxicol. 2024 Jan 23;64:455-479. doi: 10.1146/annurev-pharmtox-022723-113921. Epub 2023 Sep 22.
9
Current Challenges and Implications of Proteogenomic Approaches in Prostate Cancer.蛋白质基因组学方法在前列腺癌中的当前挑战与意义
Curr Top Med Chem. 2020;20(22):1968-1980. doi: 10.2174/1568026620666200722112450.
10
Methods, Tools and Current Perspectives in Proteogenomics.蛋白质基因组学中的方法、工具及当前观点
Mol Cell Proteomics. 2017 Jun;16(6):959-981. doi: 10.1074/mcp.MR117.000024. Epub 2017 Apr 29.

引用本文的文献

1
Multi-dataset Integration and Residual Connections Improve Proteome Prediction from Transcriptomes using Deep Learning.多数据集整合与残差连接通过深度学习改进从转录组预测蛋白质组
bioRxiv. 2024 Jul 11:2024.07.08.602560. doi: 10.1101/2024.07.08.602560.
2
Proteogenomics in Nephrology: A New Frontier in Nephrological Research.肾脏病学中的蛋白质基因组学:肾脏病学研究的新前沿。
Curr Issues Mol Biol. 2024 May 11;46(5):4595-4608. doi: 10.3390/cimb46050279.
3
Protein prediction models support widespread post-transcriptional regulation of protein abundance by interacting partners.

本文引用的文献

1
Quantitative proteomics: challenges and opportunities in basic and applied research.定量蛋白质组学:基础和应用研究中的挑战和机遇。
Nat Protoc. 2017 Jul;12(7):1289-1294. doi: 10.1038/nprot.2017.040. Epub 2017 Jun 1.
2
Reverse Phase Protein Arrays-Quantitative Assessment of Multiple Biomarkers in Biopsies for Clinical Use.反相蛋白质阵列——用于临床的活检中多种生物标志物的定量评估
Microarrays (Basel). 2015 Mar 24;4(2):98-114. doi: 10.3390/microarrays4020098.
3
Integrated Proteogenomic Characterization of Human High-Grade Serous Ovarian Cancer.
蛋白质预测模型通过与相互作用伙伴的相互作用,支持蛋白质丰度的广泛转录后调控。
PLoS Comput Biol. 2022 Nov 10;18(11):e1010702. doi: 10.1371/journal.pcbi.1010702. eCollection 2022 Nov.
4
Deepening into Intracellular Signaling Landscape through Integrative Spatial Proteomics and Transcriptomics in a Lymphoma Model.通过淋巴瘤模型中的整合空间蛋白质组学和转录组学深入了解细胞内信号景观。
Biomolecules. 2021 Nov 26;11(12):1776. doi: 10.3390/biom11121776.
5
Synthetic data generation with probabilistic Bayesian Networks.基于概率贝叶斯网络的合成数据生成。
Math Biosci Eng. 2021 Oct 9;18(6):8603-8621. doi: 10.3934/mbe.2021426.
6
Transcriptome features of striated muscle aging and predictability of protein level changes.横纹肌衰老的转录组特征及蛋白质水平变化的可预测性。
Mol Omics. 2021 Oct 11;17(5):796-808. doi: 10.1039/d1mo00178g.
7
The International Conference on Intelligent Biology and Medicine (ICIBM) 2019: bioinformatics methods and applications for human diseases.2019 年智能生物学与医学国际会议(ICIBM):用于人类疾病的生物信息学方法和应用。
BMC Bioinformatics. 2019 Dec 20;20(Suppl 24):676. doi: 10.1186/s12859-019-3240-4.
人类高级别浆液性卵巢癌的综合蛋白质基因组特征分析
Cell. 2016 Jul 28;166(3):755-765. doi: 10.1016/j.cell.2016.05.069. Epub 2016 Jun 29.
4
On the Dependency of Cellular Protein Levels on mRNA Abundance.细胞蛋白质水平对mRNA丰度的依赖性
Cell. 2016 Apr 21;165(3):535-50. doi: 10.1016/j.cell.2016.03.014.
5
Bayesian learning theory applied to human cognition.贝叶斯学习理论在人类认知中的应用。
Wiley Interdiscip Rev Cogn Sci. 2011 Jan;2(1):8-21. doi: 10.1002/wcs.80. Epub 2010 May 17.
6
Analyzing small data sets using Bayesian estimation: the case of posttraumatic stress symptoms following mechanical ventilation in burn survivors.使用贝叶斯估计分析小数据集:烧伤幸存者机械通气后创伤后应激症状的案例。
Eur J Psychotraumatol. 2015 Mar 11;6:25216. doi: 10.3402/ejpt.v6.25216. eCollection 2015.
7
Proteogenomics: concepts, applications and computational strategies.蛋白质基因组学:概念、应用及计算策略
Nat Methods. 2014 Nov;11(11):1114-25. doi: 10.1038/nmeth.3144.
8
Predicting the dynamics of protein abundance.预测蛋白质丰度的动态变化。
Mol Cell Proteomics. 2014 May;13(5):1330-40. doi: 10.1074/mcp.M113.033076. Epub 2014 Feb 16.
9
Bayesian network prior: network analysis of biological data using external knowledge.贝叶斯网络先验:使用外部知识进行生物数据的网络分析。
Bioinformatics. 2014 Mar 15;30(6):860-7. doi: 10.1093/bioinformatics/btt643. Epub 2013 Nov 9.
10
Influence of DNA copy number and mRNA levels on the expression of breast cancer related proteins.DNA 拷贝数和 mRNA 水平对乳腺癌相关蛋白表达的影响。
Mol Oncol. 2013 Jun;7(3):704-18. doi: 10.1016/j.molonc.2013.02.018. Epub 2013 Mar 19.