Medical Science, Kawasaki Medical School, Kurashiki, Okayama, Japan.
College of Engineering, University of Michigan, Ann Arbor, MI, USA.
Sci Rep. 2024 Jan 18;14(1):1661. doi: 10.1038/s41598-024-52235-9.
A new marker reflecting the pathophysiology of chronic kidney disease (CKD) has been desired for its therapy. In this study, we developed a virtual space where data in medical words and those of actual CKD patients were unified by natural language processing and category theory. A virtual space of medical words was constructed from the CKD-related literature (n = 165,271) using Word2Vec, in which 106,612 words composed a network. The network satisfied vector calculations, and retained the meanings of medical words. The data of CKD patients of a cohort study for 3 years (n = 26,433) were transformed into the network as medical-word vectors. We let the relationship between vectors of patient data and the outcome (dialysis or death) be a marker (inner product). Then, the inner product accurately predicted the outcomes: C-statistics of 0.911 (95% CI 0.897, 0.924). Cox proportional hazards models showed that the risk of the outcomes in the high-inner-product group was 21.92 (95% CI 14.77, 32.51) times higher than that in the low-inner-product group. This study showed that CKD patients can be treated as a network of medical words that reflect the pathophysiological condition of CKD and the risks of CKD progression and mortality.
一直以来,人们都希望有一种新的标志物能够反映慢性肾脏病(CKD)的病理生理学,以便用于其治疗。在这项研究中,我们通过自然语言处理和范畴论开发了一个虚拟空间,将医学词汇中的数据与实际 CKD 患者的数据统一起来。利用 Word2Vec 从与 CKD 相关的文献(n = 165271)中构建了一个医学词汇的虚拟空间,其中包含 106612 个单词的网络。该网络满足向量计算,并且保留了医学词汇的含义。将一项为期 3 年的队列研究中 CKD 患者的数据(n = 26433)转化为网络中的医学词汇向量。我们让患者数据向量与结局(透析或死亡)之间的关系成为一个标志物(内积)。然后,内积可以准确地预测结局:C 统计量为 0.911(95%CI 0.897,0.924)。Cox 比例风险模型显示,高内积组的结局风险是低内积组的 21.92 倍(95%CI 14.77,32.51)。这项研究表明,CKD 患者可以被视为一个医学词汇网络,反映 CKD 的病理生理状况以及 CKD 进展和死亡的风险。