Loftus Tyler J, Shickel Benjamin, Balch Jeremy A, Tighe Patrick J, Abbott Kenneth L, Fazzone Brian, Anderson Erik M, Rozowsky Jared, Ozrazgat-Baslanti Tezcan, Ren Yuanfang, Berceli Scott A, Hogan William R, Efron Philip A, Moorman J Randall, Rashidi Parisa, Upchurch Gilbert R, Bihorac Azra
Department of Surgery, University of Florida Health, Gainesville, FL, United States.
Precision and Intelligent Systems in Medicine (PrismaP), University of Florida, Gainesville, FL, United States.
Front Artif Intell. 2022 Aug 12;5:842306. doi: 10.3389/frai.2022.842306. eCollection 2022.
Human pathophysiology is occasionally too complex for unaided hypothetical-deductive reasoning and the isolated application of additive or linear statistical methods. Clustering algorithms use input data patterns and distributions to form groups of similar patients or diseases that share distinct properties. Although clinicians frequently perform tasks that may be enhanced by clustering, few receive formal training and clinician-centered literature in clustering is sparse. To add value to clinical care and research, optimal clustering practices require a thorough understanding of how to process and optimize data, select features, weigh strengths and weaknesses of different clustering methods, select the optimal clustering method, and apply clustering methods to solve problems. These concepts and our suggestions for implementing them are described in this narrative review of published literature. All clustering methods share the weakness of finding potential clusters even when natural clusters do not exist, underscoring the importance of applying data-driven techniques as well as clinical and statistical expertise to clustering analyses. When applied properly, patient and disease phenotype clustering can reveal obscured associations that can help clinicians understand disease pathophysiology, predict treatment response, and identify patients for clinical trial enrollment.
人类病理生理学有时过于复杂,仅凭假设-演绎推理以及孤立地应用加法或线性统计方法是不够的。聚类算法利用输入数据的模式和分布来形成具有不同特性的相似患者或疾病组。尽管临床医生经常执行一些可以通过聚类得到增强的任务,但很少有人接受过正式培训,而且以临床医生为中心的聚类文献也很少。为了给临床护理和研究增加价值,最佳的聚类实践需要全面了解如何处理和优化数据、选择特征、权衡不同聚类方法的优缺点、选择最佳聚类方法以及应用聚类方法来解决问题。这些概念以及我们对实施这些概念的建议将在这篇已发表文献的叙述性综述中进行描述。所有聚类方法都存在一个弱点,即即使不存在自然聚类时也会找到潜在聚类,这凸显了在聚类分析中应用数据驱动技术以及临床和统计专业知识的重要性。如果应用得当,患者和疾病表型聚类可以揭示隐藏的关联,这有助于临床医生理解疾病病理生理学、预测治疗反应并确定纳入临床试验的患者。