非监督机器学习与慢性淋巴细胞白血病生存的预后因素。

Unsupervised machine learning and prognostic factors of survival in chronic lymphocytic leukemia.

机构信息

The Ohio State University College of Medicine, Columbus, Ohio, USA.

Department of Biomedical Informatics, The Ohio State University, Columbus, Ohio, USA.

出版信息

J Am Med Inform Assoc. 2020 Jul 1;27(7):1019-1027. doi: 10.1093/jamia/ocaa060.

DOI:10.1093/jamia/ocaa060

PMID:32483590

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7647286/

Abstract

OBJECTIVE

Unsupervised machine learning approaches hold promise for large-scale clinical data. However, the heterogeneity of clinical data raises new methodological challenges in feature selection, choosing a distance metric that captures biological meaning, and visualization. We hypothesized that clustering could discover prognostic groups from patients with chronic lymphocytic leukemia, a disease that provides biological validation through well-understood outcomes.

METHODS

To address this challenge, we applied k-medoids clustering with 10 distance metrics to 2 experiments ("A" and "B") with mixed clinical features collapsed to binary vectors and visualized with both multidimensional scaling and t-stochastic neighbor embedding. To assess prognostic utility, we performed survival analysis using a Cox proportional hazard model, log-rank test, and Kaplan-Meier curves.

RESULTS

In both experiments, survival analysis revealed a statistically significant association between clusters and survival outcomes (A: overall survival, P = .0164; B: time from diagnosis to treatment, P = .0039). Multidimensional scaling separated clusters along a gradient mirroring the order of overall survival. Longer survival was associated with mutated immunoglobulin heavy-chain variable region gene (IGHV) status, absent Zap 70 expression, female sex, and younger age.

CONCLUSIONS

This approach to mixed-type data handling and selection of distance metric captured well-understood, binary, prognostic markers in chronic lymphocytic leukemia (sex, IGHV mutation status, ZAP70 expression status) with high fidelity.

摘要

目的

无监督机器学习方法在大规模临床数据中具有广阔的应用前景。然而，临床数据的异质性给特征选择、选择具有生物学意义的距离度量以及可视化等方面带来了新的方法学挑战。我们假设聚类可以从慢性淋巴细胞白血病患者中发现预后组，这种疾病通过明确的结局提供生物学验证。

方法

为了应对这一挑战，我们应用了 k-medoids 聚类和 10 种距离度量方法，对混合临床特征折叠为二进制向量的两个实验（“A”和“B”）进行分析，并使用多维缩放和 t-随机邻居嵌入进行可视化。为了评估预后的实用性，我们使用 Cox 比例风险模型、对数秩检验和 Kaplan-Meier 曲线进行生存分析。

结果

在两个实验中，生存分析均显示聚类与生存结局之间存在统计学显著关联（A：总生存，P=0.0164；B：从诊断到治疗的时间，P=0.0039）。多维缩放根据与总生存顺序相对应的梯度将聚类分开。较长的生存时间与突变的免疫球蛋白重链可变区基因（IGHV）状态、不存在 Zap 70 表达、女性和年轻有关。

结论

这种处理混合类型数据和选择距离度量的方法以高精度捕捉到慢性淋巴细胞白血病中具有明确生物学意义的、二进制的预后标志物（性别、IGHV 突变状态、ZAP70 表达状态）。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b674/7647286/856ea57a78d7/ocaa060f1.jpg

相似文献

Unsupervised machine learning and prognostic factors of survival in chronic lymphocytic leukemia.非监督机器学习与慢性淋巴细胞白血病生存的预后因素。

J Am Med Inform Assoc. 2020 Jul 1;27(7):1019-1027. doi: 10.1093/jamia/ocaa060.

COBLL1, LPL and ZAP70 expression defines prognostic subgroups of chronic lymphocytic leukemia patients with high accuracy and correlates with IGHV mutational status.COBLL1、LPL和ZAP70的表达能够高度准确地定义慢性淋巴细胞白血病患者的预后亚组，并且与IGHV突变状态相关。

Leuk Lymphoma. 2017 Jan;58(1):70-79. doi: 10.1080/10428194.2016.1180690. Epub 2016 May 17.

98% IGHV gene identity is the optimal cutoff to dichotomize the prognosis of Chinese patients with chronic lymphocytic leukemia.IGHV 基因同源性 98%是将中国慢性淋巴细胞白血病患者预后进行二分的最佳截断值。

Cancer Med. 2020 Feb;9(3):999-1007. doi: 10.1002/cam4.2788. Epub 2019 Dec 17.

Quantification of ZAP70 mRNA in B cells by real-time PCR is a powerful prognostic factor in chronic lymphocytic leukemia.通过实时聚合酶链反应对B细胞中ZAP70信使核糖核酸进行定量分析，是慢性淋巴细胞白血病中一项强有力的预后因素。

Clin Chem. 2007 Oct;53(10):1757-66. doi: 10.1373/clinchem.2007.089326. Epub 2007 Aug 16.

Quantification of ZAP-70 mRNA by real-time PCR is a prognostic factor in chronic lymphocytic leukemia.实时 PCR 检测 ZAP-70 mRNA 的表达水平是慢性淋巴细胞白血病的一个预后因素。

J Cancer Res Clin Oncol. 2012 Jun;138(6):1011-7. doi: 10.1007/s00432-012-1177-3. Epub 2012 Feb 24.

Telomere length and telomerase levels delineate subgroups of B-cell chronic lymphocytic leukemia with different biological characteristics and clinical outcomes.端粒长度和端粒酶水平可区分具有不同生物学特征和临床结局的 B 细胞慢性淋巴细胞白血病亚群。

Haematologica. 2012 Jan;97(1):56-63. doi: 10.3324/haematol.2011.049874. Epub 2011 Sep 20.

Integrated CLL Scoring System, a New and Simple Index to Predict Time to Treatment and Overall Survival in Patients With Chronic Lymphocytic Leukemia.综合慢性淋巴细胞白血病评分系统，一种预测慢性淋巴细胞白血病患者治疗时间和总生存期的新的简易指标。

Clin Lymphoma Myeloma Leuk. 2015 Oct;15(10):612-20.e1-5. doi: 10.1016/j.clml.2015.06.001. Epub 2015 Jun 20.

Bone marrow infiltration pattern in B-cell chronic lymphocytic leukemia is related to immunoglobulin heavy-chain variable region mutation status and expression of 70-kd zeta-associated protein (ZAP-70).B细胞慢性淋巴细胞白血病的骨髓浸润模式与免疫球蛋白重链可变区突变状态及70-kd ζ相关蛋白（ZAP-70）的表达有关。

Hum Pathol. 2006 Sep;37(9):1153-61. doi: 10.1016/j.humpath.2006.04.016. Epub 2006 Jul 7.

Multivariable model for time to first treatment in patients with chronic lymphocytic leukemia.慢性淋巴细胞白血病患者首次治疗时间的多变量模型。

J Clin Oncol. 2011 Nov 1;29(31):4088-95. doi: 10.1200/JCO.2010.33.9002. Epub 2011 Oct 3.

Validation of ZAP-70 methylation and its relative significance in predicting outcome in chronic lymphocytic leukemia.ZAP-70 甲基化的验证及其在慢性淋巴细胞白血病预后预测中的相对意义。

Blood. 2014 Jul 3;124(1):42-8. doi: 10.1182/blood-2014-02-555722. Epub 2014 May 27.

引用本文的文献

The relationship between mixed exposure to blood metal and serum neurofilament light chain levels in the general U.S. population: an unsupervised clustering approach.美国普通人群中血液金属混合暴露与血清神经丝轻链水平之间的关系：一种无监督聚类方法。

Front Public Health. 2025 Jul 30;13:1516879. doi: 10.3389/fpubh.2025.1516879. eCollection 2025.

A comprehensive machine learning for high throughput Tuberculosis sequence analysis, functional annotation, and visualization.一种用于高通量结核病序列分析、功能注释和可视化的综合机器学习方法。

Sci Rep. 2025 Jul 16;15(1):25866. doi: 10.1038/s41598-025-98654-0.

Association Between Comorbidity Clusters and Mortality in Patients With Cancer: Predictive Modeling Using Machine Learning Approaches of Data From the United States and Hong Kong.癌症患者共病集群与死亡率之间的关联：使用来自美国和香港的数据的机器学习方法进行预测建模

JMIR Cancer. 2025 Jul 16;11:e71937. doi: 10.2196/71937.

Prognostic determinants in cancer survival: a multidimensional evaluation of clinical and genetic factors across 10 cancer types in the participants of Genomics England's 100,000 Genomes Project.癌症生存的预后决定因素：对英国基因组学10万基因组计划参与者中10种癌症类型的临床和遗传因素进行多维度评估

Discov Oncol. 2024 Sep 15;15(1):448. doi: 10.1007/s12672-024-01310-8.

Topological Structures in the Space of Treatment-Naïve Patients with Chronic Lymphocytic Leukemia.初治慢性淋巴细胞白血病患者空间中的拓扑结构

Cancers (Basel). 2024 Jul 26;16(15):2662. doi: 10.3390/cancers16152662.

Prediction of leukemia peptides using convolutional neural network and protein compositions.基于卷积神经网络和蛋白质组成预测白血病肽。

BMC Cancer. 2024 Jul 26;24(1):900. doi: 10.1186/s12885-024-12609-8.

SillyPutty: Improved clustering by optimizing the silhouette width.SillyPutty：通过优化轮廓宽度实现聚类改进。

PLoS One. 2024 Jun 7;19(6):e0300358. doi: 10.1371/journal.pone.0300358. eCollection 2024.

SillyPutty: Improved clustering by optimizing the silhouette width.橡皮泥：通过优化轮廓宽度改进聚类

bioRxiv. 2023 Nov 11:2023.11.07.566055. doi: 10.1101/2023.11.07.566055.

Clinical Informatics Approaches to Understand and Address Cancer Disparities.临床信息学方法在理解和解决癌症差异中的应用。

Yearb Med Inform. 2022 Aug;31(1):121-130. doi: 10.1055/s-0042-1742511. Epub 2022 Dec 4.

Intersubject Variability in Cerebrovascular Hemodynamics and Systemic Physiology during a Verbal Fluency Task under Colored Light Exposure: Clustering of Subjects by Unsupervised Machine Learning.在彩色光照射下进行语言流畅性任务时脑血管血流动力学和全身生理学的个体间变异性：通过无监督机器学习对受试者进行聚类

Brain Sci. 2022 Oct 27;12(11):1449. doi: 10.3390/brainsci12111449.

本文引用的文献

Time-to-progression after front-line fludarabine, cyclophosphamide, and rituximab chemoimmunotherapy for chronic lymphocytic leukaemia: a retrospective, multicohort study.一线氟达拉滨、环磷酰胺和利妥昔单抗化疗免疫治疗慢性淋巴细胞白血病后的无进展时间：一项回顾性、多队列研究。

Lancet Oncol. 2019 Nov;20(11):1576-1586. doi: 10.1016/S1470-2045(19)30503-0. Epub 2019 Sep 30.

Detecting Systemic Data Quality Issues in Electronic Health Records.检测电子健康记录中的系统性数据质量问题。

Stud Health Technol Inform. 2019 Aug 21;264:383-387. doi: 10.3233/SHTI190248.

A Cluster Analysis of the Japanese Multicenter Outpatient Registry of Patients With Atrial Fibrillation.一项日本心房颤动患者多中心门诊注册研究的聚类分析。

Am J Cardiol. 2019 Sep 15;124(6):871-878. doi: 10.1016/j.amjcard.2019.05.071. Epub 2019 Jun 25.

Assessing clinical heterogeneity in sepsis through treatment patterns and machine learning.通过治疗模式和机器学习评估脓毒症的临床异质性。

J Am Med Inform Assoc. 2019 Dec 1;26(12):1466-1477. doi: 10.1093/jamia/ocz106.

A cluster-based approach for integrating clinical management of Medicare beneficiaries with multiple chronic conditions.一种基于聚类的方法，用于整合 Medicare 多重慢性病受益人的临床管理。

PLoS One. 2019 Jun 19;14(6):e0217696. doi: 10.1371/journal.pone.0217696. eCollection 2019.

Identifying clinically important COPD sub-types using data-driven approaches in primary care population based electronic health records.利用初级保健人群基于电子健康记录的数据分析方法识别有临床意义的 COPD 亚型。

BMC Med Inform Decis Mak. 2019 Apr 18;19(1):86. doi: 10.1186/s12911-019-0805-0.

Applying Machine Learning Algorithms to Segment High-Cost Patient Populations.应用机器学习算法对高费用患者人群进行细分。

J Gen Intern Med. 2019 Feb;34(2):211-217. doi: 10.1007/s11606-018-4760-8. Epub 2018 Dec 12.

Inpatient portal clusters: identifying user groups based on portal features.住院患者门户群集：基于门户功能识别用户群体。

J Am Med Inform Assoc. 2019 Jan 1;26(1):28-36. doi: 10.1093/jamia/ocy147.

Applying Machine Learning to Pediatric Critical Care Data.应用机器学习于儿科重症监护数据。

Pediatr Crit Care Med. 2018 Jul;19(7):599-608. doi: 10.1097/PCC.0000000000001567.

Informatics and machine learning to define the phenotype.信息学和机器学习定义表型。

Expert Rev Mol Diagn. 2018 Mar;18(3):219-226. doi: 10.1080/14737159.2018.1439380. Epub 2018 Feb 16.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

非监督机器学习与慢性淋巴细胞白血病生存的预后因素。

Unsupervised machine learning and prognostic factors of survival in chronic lymphocytic leukemia.

机构信息

出版信息

OBJECTIVE

METHODS

RESULTS

CONCLUSIONS

目的

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献