• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

蛋白质组学与机器学习:在骨骼肌组织荟萃分析中利用领域知识进行特征选择

Proteomics and machine learning: Leveraging domain knowledge for feature selection in a skeletal muscle tissue meta-analysis.

作者信息

Shahin-Shamsabadi Alireza, Cappuccitti John

机构信息

Evolved.Bio, 280 Joseph Street, Kitchener, Ontario, Canada.

出版信息

Heliyon. 2024 Nov 29;10(24):e40772. doi: 10.1016/j.heliyon.2024.e40772. eCollection 2024 Dec 30.

DOI:10.1016/j.heliyon.2024.e40772
PMID:39720035
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11667615/
Abstract

Omics techniques, such as proteomics, contain crucial data for understanding biological processes, but they remain underutilized due to their high dimensionality. Typically, proteomics research focuses narrowly on using a limited number of datasets, hindering cross-study comparisons, a problem that can potentially be addressed by machine learning. Despite this potential, machine learning has seen limited adoption in the field of proteomics. Here, skeletal muscle proteomics datasets from five separate studies were combined. These studies included conditions such as models (both 2D and 3D), skeletal muscle tissue, and adjacent tissues such as tendons. The collected data was preprocessed using MaxQuant, and then enriched using a Python script fetching structural and compositional details from UniProt and Ensembl databases. This was used to handle high-dimensional and sparsely labeled dataset by breaking it down into five smaller categories using cellular composition information and then training a Random Forest model for each category separately. Using biological context for interpreting the data resulted in improved model performance and made tailored analysis possible by reducing the dimensionality and increasing signal-to-noise ratio as well as only preserving biologically relevant features in each category. This integration of domain knowledge into data analysis and model training facilitated the discovery of new patterns while ensuring the retention of critical details, often overlooked when blind feature selection methods are used to exclude proteins with minimal expressions or variances. This approach was shown to be suitable for performing diverse analyses on individual as well as combined datasets within a broader biological context, ultimately leading to the identification of biologically relevant patterns. Besides from generating new biological insights, this approach can be used to perform tasks such as biomarker discovery, cluster analysis, classification, and anomaly detection more accurately, but incorporation of more datasets is needed to further expand the computational capabilities of such models in clinical settings.

摘要

组学技术,如蛋白质组学,包含理解生物过程的关键数据,但由于其高维度性,这些数据仍未得到充分利用。通常,蛋白质组学研究狭隘地集中于使用有限数量的数据集,这阻碍了跨研究比较,而机器学习可能可以解决这个问题。尽管有这种潜力,但机器学习在蛋白质组学领域的应用仍然有限。在这里,来自五项独立研究的骨骼肌蛋白质组学数据集被合并。这些研究包括诸如模型(二维和三维)、骨骼肌组织以及诸如肌腱等相邻组织等条件。收集到的数据使用MaxQuant进行预处理,然后使用一个Python脚本从UniProt和Ensembl数据库获取结构和组成细节进行富集。这被用于通过利用细胞组成信息将高维度且标记稀疏的数据集分解为五个较小的类别,然后分别为每个类别训练一个随机森林模型来处理该数据集。利用生物学背景来解释数据提高了模型性能,并通过降低维度、增加信噪比以及仅保留每个类别中生物学相关特征,使得定制分析成为可能。将领域知识整合到数据分析和模型训练中有助于发现新的模式,同时确保保留关键细节,而当使用盲目特征选择方法排除低表达或低方差蛋白质时,这些细节常常被忽视。这种方法被证明适用于在更广泛的生物学背景下对单个以及组合数据集进行各种分析,最终导致识别出生物学相关模式。除了产生新的生物学见解外,这种方法还可用于更准确地执行诸如生物标志物发现、聚类分析、分类和异常检测等任务,但需要纳入更多数据集以进一步扩展此类模型在临床环境中的计算能力。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f2d/11667615/d247b5d61a6e/gr8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f2d/11667615/f1ea22831430/ga1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f2d/11667615/f80ef51f918f/sc1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f2d/11667615/3a8bbe6010f8/gr1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f2d/11667615/a0f2f4c7abb9/gr2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f2d/11667615/9d1fb3b95f9b/gr3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f2d/11667615/6300fe28adf0/gr4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f2d/11667615/58e0704cbcb0/gr5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f2d/11667615/75399147e753/gr6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f2d/11667615/b2abae083bb0/gr7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f2d/11667615/d247b5d61a6e/gr8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f2d/11667615/f1ea22831430/ga1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f2d/11667615/f80ef51f918f/sc1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f2d/11667615/3a8bbe6010f8/gr1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f2d/11667615/a0f2f4c7abb9/gr2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f2d/11667615/9d1fb3b95f9b/gr3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f2d/11667615/6300fe28adf0/gr4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f2d/11667615/58e0704cbcb0/gr5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f2d/11667615/75399147e753/gr6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f2d/11667615/b2abae083bb0/gr7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f2d/11667615/d247b5d61a6e/gr8.jpg

相似文献

1
Proteomics and machine learning: Leveraging domain knowledge for feature selection in a skeletal muscle tissue meta-analysis.蛋白质组学与机器学习:在骨骼肌组织荟萃分析中利用领域知识进行特征选择
Heliyon. 2024 Nov 29;10(24):e40772. doi: 10.1016/j.heliyon.2024.e40772. eCollection 2024 Dec 30.
2
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
3
A machine learning heuristic to identify biologically relevant and minimal biomarker panels from omics data.一种从组学数据中识别具有生物学相关性且最小化的生物标志物组合的机器学习启发式方法。
BMC Genomics. 2015;16 Suppl 1(Suppl 1):S2. doi: 10.1186/1471-2164-16-S1-S2. Epub 2015 Jan 15.
4
Ensemble machine learning model trained on a new synthesized dataset generalizes well for stress prediction using wearable devices.在新合成数据集上训练的集成机器学习模型,对于使用可穿戴设备进行压力预测具有良好的泛化能力。
J Biomed Inform. 2023 Dec;148:104556. doi: 10.1016/j.jbi.2023.104556. Epub 2023 Dec 2.
5
Applying machine learning to high-dimensional proteomics datasets for the identification of Alzheimer's disease biomarkers.将机器学习应用于高维蛋白质组学数据集以鉴定阿尔茨海默病生物标志物。
Fluids Barriers CNS. 2025 Mar 3;22(1):23. doi: 10.1186/s12987-025-00634-z.
6
A review on advancements in feature selection and feature extraction for high-dimensional NGS data analysis.一篇关于高通量测序数据分析中特征选择和特征提取进展的综述。
Funct Integr Genomics. 2024 Aug 19;24(5):139. doi: 10.1007/s10142-024-01415-x.
7
The future of Cochrane Neonatal.考克兰新生儿协作网的未来。
Early Hum Dev. 2020 Nov;150:105191. doi: 10.1016/j.earlhumdev.2020.105191. Epub 2020 Sep 12.
8
Benchmarking feature selection and feature extraction methods to improve the performances of machine-learning algorithms for patient classification using metabolomics biomedical data.对特征选择和特征提取方法进行基准测试,以提高使用代谢组学生物医学数据的机器学习算法在患者分类中的性能。
Comput Struct Biotechnol J. 2024 Mar 19;23:1274-1287. doi: 10.1016/j.csbj.2024.03.016. eCollection 2024 Dec.
9
Drug-Protein Interactions Prediction Models Using Feature Selection and Classification Techniques.基于特征选择和分类技术的药物-蛋白相互作用预测模型。
Curr Drug Metab. 2023;24(12):817-834. doi: 10.2174/0113892002268739231211063718.
10
Gene targeting in amyotrophic lateral sclerosis using causality-based feature selection and machine learning.使用基于因果关系的特征选择和机器学习进行肌萎缩侧索硬化症的基因靶向治疗。
Mol Med. 2023 Jan 24;29(1):12. doi: 10.1186/s10020-023-00603-y.

本文引用的文献

1
Highly contractile 3D tissue engineered skeletal muscles from human iPSCs reveal similarities with primary myoblast-derived tissues.高度收缩的 3D 组织工程化骨骼肌源自人诱导多能干细胞,与原代成肌细胞衍生组织具有相似性。
Stem Cell Reports. 2023 Oct 10;18(10):1954-1971. doi: 10.1016/j.stemcr.2023.08.014. Epub 2023 Sep 28.
2
Data quantity governance for machine learning in materials science.材料科学中机器学习的数据量治理
Natl Sci Rev. 2023 May 1;10(7):nwad125. doi: 10.1093/nsr/nwad125. eCollection 2023 Jul.
3
Machine Learning on Large-Scale Proteomics Data Identifies Tissue and Cell-Type Specific Proteins.
基于大规模蛋白质组学数据的机器学习可鉴定组织和细胞类型特异性蛋白。
J Proteome Res. 2023 Apr 7;22(4):1181-1192. doi: 10.1021/acs.jproteome.2c00644. Epub 2023 Mar 24.
4
Domain Adaptation Principal Component Analysis: Base Linear Method for Learning with Out-of-Distribution Data.域适应主成分分析:用于处理分布外数据学习的基础线性方法
Entropy (Basel). 2022 Dec 24;25(1):33. doi: 10.3390/e25010033.
5
Advances, obstacles, and opportunities for machine learning in proteomics.蛋白质组学中机器学习的进展、障碍与机遇
Cell Rep Phys Sci. 2022 Oct 19;3(10). doi: 10.1016/j.xcrp.2022.101069. Epub 2022 Sep 22.
6
A Review of Feature Selection Methods for Machine Learning-Based Disease Risk Prediction.基于机器学习的疾病风险预测的特征选择方法综述
Front Bioinform. 2022 Jun 27;2:927312. doi: 10.3389/fbinf.2022.927312. eCollection 2022.
7
Insight Into the Metabolic Adaptations of Electrically Pulse-Stimulated Human Myotubes Using Global Analysis of the Transcriptome and Proteome.利用转录组和蛋白质组全局分析深入了解电脉冲刺激的人肌管的代谢适应性
Front Physiol. 2022 Jul 6;13:928195. doi: 10.3389/fphys.2022.928195. eCollection 2022.
8
Skeletal muscle differentiation of human iPSCs meets bioengineering strategies: perspectives and challenges.人诱导多能干细胞的骨骼肌分化与生物工程策略:前景与挑战
NPJ Regen Med. 2022 Apr 7;7(1):23. doi: 10.1038/s41536-022-00216-9.
9
The proteomic profile of the human myotendinous junction.人类肌腱-肌连接点的蛋白质组学特征
iScience. 2022 Jan 29;25(2):103836. doi: 10.1016/j.isci.2022.103836. eCollection 2022 Feb 18.
10
Protein profile of fiber types in human skeletal muscle: a single-fiber proteomics study.人类骨骼肌纤维类型的蛋白质图谱:一项单纤维蛋白质组学研究。
Skelet Muscle. 2021 Nov 2;11(1):24. doi: 10.1186/s13395-021-00279-0.