Keikes Lotte, Medlock Stephanie, van de Berg Daniel J, Zhang Shuxin, Guicherit Onno R, Punt Cornelis J A, van Oijen Martijn G H
Department of medical oncology, Cancer Center Amsterdam, Academic Medical Center, University of Amsterdam, the Netherlands.
J Clin Transl Res. 2018 Jul 27;3(Suppl 3):411-423. eCollection 2018 Dec 17.
Medical specialists aim to provide evidence-based care based on the most recent scientific insights, but with the ongoing expansion of medical literature it seems unfeasible to remain updated. "Black-box" decision support tools such as Watson for Oncology (Watson) are gaining attention as they offer a promising opportunity to conquer this challenging issue, but it is not known if the advice given is congruent with guidelines or clinically valid in other settings. We present a protocol for the content evaluation of black-box decision support tools and a feasibility study to test the content and usability of Watson using this protocol.
The protocol consists of developing synthetic patient cases based on Dutch guidelines and expert opinion, entering the synthetic cases into Watson and Oncoguide, noting the response of each system and evaluating the result using a cross-tabulation scoring system resulting in a score range of -12 to +12. Treatment options that were not recommended according to the Dutch guideline were labeled with a "red flag" if Watson recommended it, and an "orange flag" if Watson suggested it for consideration. To test the feasibility of applying the protocol, we developed synthetic patient cases for the adjuvant treatment of stage I to stage III colon cancer based on relevant patient, clinical and tumor characteristics and followed our protocol. Additionally, for the feasibility study we also compared the recommendations from the NCCN guideline with Watson's advice, and evaluated usability by a cognitive walkthrough method.
In total, we developed 190 synthetic patient cases (stage I: n=8; stage II: n=110; and stage III: n=72). Overall concordance scores per case for Watson versus Oncoguide ranged from a minimum score of -4 (n=6) to a maximum score of+12 (n=17) and from -4 (n=9) to +12 (n=24) for Watson versus the NCCN guidelines). In total, 69 cases (36%) were labeled with red flags, 96 cases (51%) with orange flags and 25 cases (13%) without flags. For the comparison of Watson with the NCCN guidelines, no red or orange flags were identified.
We developed a research protocol for the evaluation of a black-box decision support tool, which proved useful and usable in testing the content and usability of Watson. Overall concordance scores ranged considerably between synthetic cases for both comparisons between Watson versus Oncoguide and Watson versus NCCN. Non-concordance is partially attributable to guideline differences between the United States and The Netherlands. This implies that further adjustments and localization are required before implementation of Watson outside the United States.
This study describes the first steps of content evaluation of a decision support tool before implementation in daily oncological patient care. The ultimate goal of the incorporation of decision support tools in daily practice is to improve personalized medicine and quality of care.
医学专家旨在依据最新科学见解提供循证医疗,但随着医学文献不断扩充,要持续跟进似乎并不可行。诸如肿瘤学沃森(Watson)等“黑箱”决策支持工具正受到关注,因为它们为攻克这一难题提供了契机,但尚不清楚其给出的建议是否与指南一致,或在其他环境中是否具有临床有效性。我们提出了一项针对黑箱决策支持工具内容评估的方案以及一项可行性研究,以使用该方案测试沃森的内容及可用性。
该方案包括基于荷兰指南和专家意见制定合成患者病例,将合成病例输入沃森和肿瘤指南(Oncoguide),记录每个系统的响应,并使用交叉列表评分系统评估结果,得分范围为 -12 至 +12。根据荷兰指南不推荐的治疗方案,若沃森推荐则标记为“红旗”,若沃森建议考虑则标记为“橙旗”。为测试应用该方案的可行性,我们根据相关患者、临床和肿瘤特征制定了用于 I 期至 III 期结肠癌辅助治疗的合成患者病例,并遵循我们的方案。此外,在可行性研究中,我们还将美国国立综合癌症网络(NCCN)指南的建议与沃森的建议进行了比较,并通过认知走查法评估了可用性。
我们总共制定了 190 个合成患者病例(I 期:n = 8;II 期:n = 110;III 期:n = 72)。沃森与肿瘤指南相比,每个病例的总体一致性得分范围为最低 -4 分(n = 6)至最高 +12 分(n = 17);沃森与 NCCN 指南相比,得分范围为 -4 分(n = 9)至 +12 分(n = 24)。总共 69 个病例(36%)被标记为红旗,96 个病例(51%)被标记为橙旗,25 个病例(13%)未标记。在将沃森与 NCCN 指南进行比较时,未发现红旗或橙旗。
我们制定了一项用于评估黑箱决策支持工具的研究方案,该方案在测试沃森的内容和可用性方面被证明是有用且可行的。在沃森与肿瘤指南以及沃森与 NCCN 指南的两项比较中,合成病例之间的总体一致性得分差异很大。不一致部分归因于美国和荷兰之间的指南差异。这意味着在沃森在美国境外实施之前,需要进一步调整和本地化。
本研究描述了在日常肿瘤患者护理中实施决策支持工具之前进行内容评估的第一步。将决策支持工具纳入日常实践的最终目标是改善个性化医疗和护理质量。