• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

评估癌症数据中可疑变量的稳健标准化可解释异常检测。

Evaluating robustly standardized explainable anomaly detection of implausible variables in cancer data.

作者信息

Röchner Philipp, Rothlauf Franz

机构信息

Information Systems and Business Administration, Johannes Gutenberg University, Mainz 55128, Germany.

Cancer Registry Rhineland-Palatinate, Institute for Digital Health Data, Mainz 55116, Germany.

出版信息

J Am Med Inform Assoc. 2025 Apr 1;32(4):724-735. doi: 10.1093/jamia/ocaf011.

DOI:10.1093/jamia/ocaf011
PMID:39873664
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12005620/
Abstract

OBJECTIVES

Explanations help to understand why anomaly detection algorithms identify data as anomalous. This study evaluates whether robustly standardized explanation scores correctly identify the implausible variables that make cancer data anomalous.

MATERIALS AND METHODS

The dataset analyzed consists of 18 587 truncated real-world cancer registry records containing 8 categorical variables describing patients diagnosed with bladder and lung tumors. We identified 800 anomalous records using an autoencoder's per-record reconstruction error, which is a common neural network-based anomaly detection approach. For each variable of a record, we determined a robust explanation score, which indicates how anomalous the variable is. A variable's robust explanation score is the autoencoder's per-variable reconstruction error measured by cross-entropy and robustly standardized across records; that is, large reconstruction errors have a small effect on standardization. To evaluate the explanation scores, medical coders identified the implausible variables of the anomalous records. We then compare the explanation scores to the medical coders' validation in a classification and ranking setting. As baselines, we identified anomalous variables using the raw autoencoder's per-variable reconstruction error, the non-robustly standardized per-variable reconstruction error, the empirical frequency of implausible variables according to the medical coders' validation, and random selection or ranking of variables.

RESULTS

When we sort the variables by their robust explanation scores, on average, the 2.37 highest-ranked variables contain all implausible variables. For the baselines, on average, the 2.84, 2.98, 3.27, and 4.91 highest-ranked variables contain all the variables that made a record implausible.

DISCUSSION

We found that explanations based on robust explanation scores were better than or as good as the baseline explanations examined in the classification and ranking settings. Due to the international standardization of cancer data coding, we expect our results to generalize to other cancer types and registries. As we anticipate different magnitudes of per-variable autoencoder reconstruction errors in data from other medical registries and domains, these may also benefit from robustly standardizing the reconstruction errors per variable. Future work could explore methods to identify subsets of anomalous variables, addressing whether individual variables or their combinations contribute to anomalies. This direction aims to improve the interpretability and utility of anomaly detection systems.

CONCLUSIONS

Robust explanation scores can improve explanations for identifying implausible variables in cancer data.

摘要

目的

解释有助于理解异常检测算法为何将数据识别为异常。本研究评估稳健标准化的解释分数是否能正确识别使癌症数据异常的不合理变量。

材料与方法

所分析的数据集包含18587条截断的真实世界癌症登记记录,其中包含8个分类变量,描述了被诊断患有膀胱癌和肺癌的患者。我们使用自动编码器的每条记录重建误差识别出800条异常记录,这是一种常见的基于神经网络的异常检测方法。对于记录中的每个变量,我们确定一个稳健解释分数,该分数表明变量的异常程度。变量的稳健解释分数是通过交叉熵测量并在各记录间进行稳健标准化的自动编码器的每个变量重建误差;也就是说,大的重建误差对标准化的影响较小。为了评估解释分数,医学编码人员识别出异常记录中的不合理变量。然后,我们在分类和排序设置中将解释分数与医学编码人员的验证结果进行比较。作为基线,我们使用原始自动编码器的每个变量重建误差、未稳健标准化的每个变量重建误差、根据医学编码人员验证得出的不合理变量的经验频率以及变量的随机选择或排序来识别异常变量。

结果

当我们按稳健解释分数对变量进行排序时,平均而言,排名最高的2.37个变量包含了所有不合理变量。对于基线方法,平均而言,排名最高的2.84、2.98、3.27和4.91个变量包含了所有使记录变得不合理的变量。

讨论

我们发现,基于稳健解释分数的解释在分类和排序设置中优于或等同于所检验的基线解释。由于癌症数据编码的国际标准化,我们期望我们的结果能推广到其他癌症类型和登记处。鉴于我们预计来自其他医学登记处和领域的数据中每个变量的自动编码器重建误差幅度不同,这些数据也可能受益于对每个变量的重建误差进行稳健标准化。未来的工作可以探索识别异常变量子集的方法,探讨是单个变量还是它们的组合导致了异常。这个方向旨在提高异常检测系统的可解释性和实用性。

结论

稳健解释分数可以改进对癌症数据中不合理变量的识别解释。

相似文献

1
Evaluating robustly standardized explainable anomaly detection of implausible variables in cancer data.评估癌症数据中可疑变量的稳健标准化可解释异常检测。
J Am Med Inform Assoc. 2025 Apr 1;32(4):724-735. doi: 10.1093/jamia/ocaf011.
2
Comparison of Two Modern Survival Prediction Tools, SORG-MLA and METSSS, in Patients With Symptomatic Long-bone Metastases Who Underwent Local Treatment With Surgery Followed by Radiotherapy and With Radiotherapy Alone.两种现代生存预测工具 SORG-MLA 和 METSSS 在接受手术联合放疗和单纯放疗治疗有症状长骨转移患者中的比较。
Clin Orthop Relat Res. 2024 Dec 1;482(12):2193-2208. doi: 10.1097/CORR.0000000000003185. Epub 2024 Jul 23.
3
Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.在基层医疗机构或医院门诊环境中,如果患者出现以下症状和体征,可判断其是否患有 COVID-19。
Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3.
4
Systemic pharmacological treatments for chronic plaque psoriasis: a network meta-analysis.系统性药理学治疗慢性斑块状银屑病:网络荟萃分析。
Cochrane Database Syst Rev. 2021 Apr 19;4(4):CD011535. doi: 10.1002/14651858.CD011535.pub4.
5
Pharmacological treatments in panic disorder in adults: a network meta-analysis.成人惊恐障碍的药物治疗:网络荟萃分析。
Cochrane Database Syst Rev. 2023 Nov 28;11(11):CD012729. doi: 10.1002/14651858.CD012729.pub3.
6
Systemic pharmacological treatments for chronic plaque psoriasis: a network meta-analysis.慢性斑块状银屑病的全身药理学治疗:一项网状荟萃分析。
Cochrane Database Syst Rev. 2017 Dec 22;12(12):CD011535. doi: 10.1002/14651858.CD011535.pub2.
7
Systemic pharmacological treatments for chronic plaque psoriasis: a network meta-analysis.慢性斑块状银屑病的全身药理学治疗:一项网状Meta分析。
Cochrane Database Syst Rev. 2020 Jan 9;1(1):CD011535. doi: 10.1002/14651858.CD011535.pub3.
8
Antidepressants for pain management in adults with chronic pain: a network meta-analysis.抗抑郁药治疗成人慢性疼痛的疼痛管理:一项网络荟萃分析。
Health Technol Assess. 2024 Oct;28(62):1-155. doi: 10.3310/MKRT2948.
9
Survivor, family and professional experiences of psychosocial interventions for sexual abuse and violence: a qualitative evidence synthesis.性虐待和暴力的心理社会干预的幸存者、家庭和专业人员的经验:定性证据综合。
Cochrane Database Syst Rev. 2022 Oct 4;10(10):CD013648. doi: 10.1002/14651858.CD013648.pub2.
10
Falls prevention interventions for community-dwelling older adults: systematic review and meta-analysis of benefits, harms, and patient values and preferences.社区居住的老年人跌倒预防干预措施:系统评价和荟萃分析的益处、危害以及患者的价值观和偏好。
Syst Rev. 2024 Nov 26;13(1):289. doi: 10.1186/s13643-024-02681-3.

本文引用的文献

1
The Joint Research Centre-European Network of Cancer Registries Quality Check Software (JRC-ENCR QCS).联合研究中心-欧洲癌症登记处网络质量检查软件(JRC-ENCR QCS)
Front Oncol. 2023 Oct 26;13:1250195. doi: 10.3389/fonc.2023.1250195. eCollection 2023.
2
Unsupervised anomaly detection of implausible electronic health records: a real-world evaluation in cancer registries.无监督的电子健康记录异常检测:癌症登记处的真实世界评估。
BMC Med Res Methodol. 2023 May 24;23(1):125. doi: 10.1186/s12874-023-01946-0.
3
A survey on outlier explanations.关于异常值解释的一项调查。
VLDB J. 2022;31(5):977-1008. doi: 10.1007/s00778-021-00721-1. Epub 2022 Jan 26.
4
The eighth edition TNM stage classification for lung cancer: What does it mean on main street?肺癌的第八版TNM分期分类:在实际应用中意味着什么?
J Thorac Cardiovasc Surg. 2018 Jan;155(1):356-359. doi: 10.1016/j.jtcvs.2017.08.138. Epub 2017 Sep 28.
5
Methods and dimensions of electronic health record data quality assessment: enabling reuse for clinical research.电子健康记录数据质量评估的方法和维度:为临床研究提供可重用性。
J Am Med Inform Assoc. 2013 Jan 1;20(1):144-51. doi: 10.1136/amiajnl-2011-000681. Epub 2012 Jun 25.