School of Computing, University of Eastern Finland.
Int J Med Inform. 2023 Jul;175:105068. doi: 10.1016/j.ijmedinf.2023.105068. Epub 2023 Apr 11.
Early recognition and prevention are crucial for reducing the risk of disease progression. This study aimed to develop a novel technique based on a temporal disease occurrence network to analyze and predict disease progression.
This study used a total of 3.9 million patient records. Patient health records were transformed into temporal disease occurrence networks, and a supervised depth first search was used to find frequent disease sequences to predict the onset of disease progression. The diseases represented nodes in the network and paths between nodes represented edges that co-occurred in a patient cohort with temporal order. The node and edge level attributes contained meta-information about patients' gender, age group, and identity as labels where the disease occurred. The node and edge level attributes guided the depth first search to identify frequent disease occurrences in specific genders and age groups. The patient history was used to match the most frequent disease occurrences and then the obtained sequences were merged together to generate a ranked list of diseases with their conditional probability and relative risk.
The study found that the proposed method had improved performance compared to other methods. Specifically, when predicting a single disease, the method achieved an area under the receiver operating characteristic curve (AUC) of 0.65 and an F1-score of 0.11. When predicting a set of diseases relative to ground truth, the method achieved an AUC of 0.68 and an F1-score of 0.13.
The ranked list generated by the proposed method, which includes the probability of occurrence and relative risk score, can provide physicians with valuable information about the sequential development of diseases in patients. This information can help physicians to take preventive measures in a timely manner, based on the best available information.
早期识别和预防对于降低疾病进展的风险至关重要。本研究旨在开发一种基于时间疾病发生网络的新技术,以分析和预测疾病进展。
本研究共使用了 390 万份患者记录。患者健康记录被转化为时间疾病发生网络,使用有监督的深度优先搜索来找到频繁的疾病序列,以预测疾病进展的发生。网络中的节点代表疾病,节点之间的路径代表患者队列中同时发生的具有时间顺序的边缘。节点和边缘级别的属性包含有关患者性别、年龄组和身份的元信息,这些信息作为疾病发生的标签。节点和边缘级别的属性指导深度优先搜索,以识别特定性别和年龄组中频繁发生的疾病。使用患者病史来匹配最常见的疾病发生情况,然后将获得的序列合并在一起,生成疾病的排名列表及其条件概率和相对风险。
研究发现,与其他方法相比,所提出的方法具有更好的性能。具体来说,在预测单个疾病时,该方法的接收者操作特征曲线下面积(AUC)为 0.65,F1 得分为 0.11。在预测一组疾病相对于真实情况时,该方法的 AUC 为 0.68,F1 得分为 0.13。
所提出的方法生成的排名列表,包括发生概率和相对风险评分,可以为医生提供有关患者疾病顺序发展的有价值信息。这些信息可以帮助医生根据最佳可用信息及时采取预防措施。