Suppr超能文献

利用就诊记录对慢性肾脏病进行分层:通过层次元分类处理数据不平衡。

Chronic Kidney Disease stratification using office visit records: Handling data imbalance via hierarchical meta-classification.

机构信息

Computational Biomedicine Lab, Computer and Information Sciences, University of Delaware, Newark, DE, USA.

Value Institute, Christiana Care Health System, Newark, DE, USA.

出版信息

BMC Med Inform Decis Mak. 2018 Dec 12;18(Suppl 4):125. doi: 10.1186/s12911-018-0675-x.

Abstract

BACKGROUND

Chronic Kidney Disease (CKD) is one of several conditions that affect a growing percentage of the US population; the disease is accompanied by multiple co-morbidities, and is hard to diagnose in-and-of itself. In its advanced forms it carries severe outcomes and can lead to death. It is thus important to detect the disease as early as possible, which can help devise effective intervention and treatment plan. Here we investigate ways to utilize information available in electronic health records (EHRs) from regular office visits of more than 13,000 patients, in order to distinguish among several stages of the disease. While clinical data stored in EHRs provide valuable information for risk-stratification, one of the major challenges in using them arises from data imbalance. That is, records associated with a more severe condition are typically under-represented compared to those associated with a milder manifestation of the disease. To address imbalance, we propose and develop a sampling-based ensemble approach, hierarchical meta-classification, aiming to stratify CKD patients into severity stages, using simple quantitative non-text features gathered from standard office visit records.

METHODS

The proposed hierarchical meta-classification method frames the multiclass classification task as a hierarchy of two subtasks. The first is binary classification, separating records associated with the majority class from those associated with all minority classes combined, using meta-classification. The second subtask separates the records assigned to the combined minority classes into the individual constituent classes.

RESULTS

The proposed method identifies a significant proportion of patients suffering from the more advanced stages of the condition, while also correctly identifying most of the less severe cases, maintaining high sensitivity, specificity and F-measure (≥ 93%). Our results show that the high level of performance attained by our method is preserved even when the size of the training set is significantly reduced, demonstrating the stability and generalizability of our approach.

CONCLUSION

We present a new approach to perform classification while addressing data imbalance, which is inherent in the biomedical domain. Our model effectively identifies severity stages of CKD patients, using information readily available in office visit records within the realistic context of high data imbalance.

摘要

背景

慢性肾脏病(CKD)是影响美国人口增长比例的几种疾病之一;该疾病伴有多种合并症,且本身难以诊断。在其晚期,会导致严重的后果,并可能导致死亡。因此,尽早发现该疾病非常重要,这有助于制定有效的干预和治疗计划。在这里,我们研究了利用超过 13000 名患者的常规门诊电子健康记录(EHR)中可用信息的方法,以便区分疾病的几个阶段。虽然存储在 EHR 中的临床数据为风险分层提供了有价值的信息,但在使用它们时面临的一个主要挑战来自数据不平衡。也就是说,与疾病较轻表现相关的记录通常代表性不足,而与更严重病情相关的记录则代表性过高。为了解决不平衡问题,我们提出并开发了一种基于抽样的集成方法,即层次元分类,旨在使用从标准门诊记录中收集的简单定量非文本特征,将 CKD 患者分层为严重程度阶段。

方法

所提出的层次元分类方法将多类分类任务构造成两个子任务的层次结构。第一个是二进制分类,使用元分类将与多数类相关的记录与与所有少数类相关的记录分开,第二个子任务将分配给合并的少数类的记录分成各个组成类。

结果

所提出的方法识别出了相当一部分患有更严重疾病阶段的患者,同时还正确地识别出了大部分病情较轻的患者,保持了较高的敏感性、特异性和 F 值(≥93%)。我们的结果表明,即使训练集的规模大大减少,我们方法的高性能水平仍然得以保留,证明了我们方法的稳定性和通用性。

结论

我们提出了一种新的方法来进行分类,同时解决了生物医学领域中固有的数据不平衡问题。我们的模型使用在现实的高数据不平衡背景下,门诊记录中易于获得的信息,有效地识别出 CKD 患者的严重程度阶段。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dd7e/6290512/c01e277587c8/12911_2018_675_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验