Suppr超能文献

基于机器学习的人血清 M 蛋白鉴定

Machine learning evaluation for identification of M-proteins in human serum.

机构信息

Mathematics, Faculty of Engineering (LTH), Lund University, Lund, Sweden.

Department of Clinical Chemistry, Region Västra Götaland, Sahlgrenska University Hospital, Gothenburg, Sweden.

出版信息

PLoS One. 2024 Apr 2;19(4):e0299600. doi: 10.1371/journal.pone.0299600. eCollection 2024.

Abstract

Serum electrophoresis (SPEP) is a method used to analyze the distribution of the most important proteins in the blood. The major clinical question is the presence of monoclonal fraction(s) of antibodies (M-protein/paraprotein), which is essential for the diagnosis and follow-up of hematological diseases, such as multiple myeloma. Recent studies have shown that machine learning can be used to assess protein electrophoresis by, for example, examining protein glycan patterns to follow up tumor surgery. In this study we compared 26 different decision tree algorithms to identify the presence of M-proteins in human serum by using numerical data from serum protein capillary electrophoresis. For the automated detection and clustering of data, we used an anonymized data set consisting of 67,073 samples. We found five methods with superior ability to detect M-proteins: Extra Trees (ET), Random Forest (RF), Histogram Grading Boosting Regressor (HGBR), Light Gradient Boosting Method (LGBM), and Extreme Gradient Boosting (XGB). Additionally, we implemented a game theoretic approach to disclose which features in the data set that were indicative of the resulting M-protein diagnosis. The results verified the gamma globulin fraction and part of the beta globulin fraction as the most important features of the electrophoresis analysis, thereby further strengthening the reliability of our approach. Finally, we tested the algorithms for classifying the M-protein isotypes, where ET and XGB showed the best performance out of the five algorithms tested. Our results show that serum capillary electrophoresis combined with decision tree algorithms have great potential in the application of rapid and accurate identification of M-proteins. Moreover, these methods would be applicable for a variety of blood analyses, such as hemoglobinopathies, indicating a wide-range diagnostic use. However, for M-protein isotype classification, combining machine learning solutions for numerical data from capillary electrophoresis with gel electrophoresis image data would be most advantageous.

摘要

血清电泳(SPEP)是一种用于分析血液中最重要蛋白质分布的方法。主要的临床问题是是否存在单克隆抗体片段(M 蛋白/副蛋白),这对于血液病如多发性骨髓瘤的诊断和随访至关重要。最近的研究表明,机器学习可用于评估蛋白质电泳,例如通过检查蛋白质糖型模式来跟踪肿瘤手术。在这项研究中,我们比较了 26 种不同的决策树算法,通过使用血清蛋白毛细管电泳的数值数据来确定人血清中 M 蛋白的存在。为了自动检测和聚类数据,我们使用了一个由 67073 个样本组成的匿名数据集。我们发现了五种具有卓越能力的方法来检测 M 蛋白:Extra Trees(ET)、Random Forest(RF)、Histogram Grading Boosting Regressor(HGBR)、Light Gradient Boosting Method(LGBM)和 Extreme Gradient Boosting(XGB)。此外,我们还实施了一种博弈论方法来揭示数据集中哪些特征表明了最终的 M 蛋白诊断。结果验证了球蛋白部分和部分β球蛋白部分是电泳分析的最重要特征,从而进一步增强了我们方法的可靠性。最后,我们测试了这些算法对 M 蛋白同工型的分类,其中 ET 和 XGB 在测试的五种算法中表现出最好的性能。我们的研究结果表明,血清毛细管电泳结合决策树算法在快速准确识别 M 蛋白方面具有巨大潜力。此外,这些方法将适用于多种血液分析,如血红蛋白病,表明其具有广泛的诊断用途。然而,对于 M 蛋白同工型分类,将毛细管电泳数值数据的机器学习解决方案与凝胶电泳图像数据相结合将是最有利的。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/92b2/10986985/e39645520865/pone.0299600.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验