Lensing Julia C, Choe John Y, Johnson Branden B, Wang Jingwen
Department of Industrial & Systems Engineering, University of Washington, Seattle, WA, United States of America.
Decision Science Research Institute, Inc., Springfield, OR, United States of America.
PLoS One. 2025 Jan 7;20(1):e0313259. doi: 10.1371/journal.pone.0313259. eCollection 2025.
Many practical disaster reports are published daily worldwide in various forms, including after-action reports, response plans, impact assessments, and resiliency plans. These reports serve as vital resources, allowing future generations to learn from past events and better mitigate and prepare for future disasters. However, this extensive practical literature often has limited impact on research and practice due to challenges in synthesizing and analyzing the reports. In this study, we 1) present a corpus of practical reports for text mining and 2) introduce an approach to extract insights from the corpus using select text mining tools. We validate the approach through a case study examining practical reports on the preparedness of the U.S. Pacific Northwest for a magnitude 9 Cascadia Subduction Zone earthquake, which has the potential to disrupt lifeline infrastructures for months. To explore opportunities and challenges associated with text mining of practical disaster reports, we conducted a brief survey of potential user groups. The case study illustrates the types of insights that our approach can extract from a corpus. Notably, it reveals potential differences in priorities between Washington and Oregon state-level emergency management, uncovers latent sentiments expressed within the reports, and identifies inconsistent vocabulary across the field. Survey results highlight that while simple tools may yield insights that are primarily interpretable by experienced professionals, more advanced tools utilizing large language models, such as Generative Pre-trained Transformer (GPT), offer more accessible insights, albeit with known risk associated with current artificial intelligence technologies. To ensure reproducibility, all supporting data and code are made publicly available (DOI: 10.17603/ds2-9s7w-9694).
全球每天都会以各种形式发布许多实际的灾害报告,包括事后分析报告、应对计划、影响评估和恢复力计划。这些报告是至关重要的资源,使后代能够从过去的事件中吸取教训,更好地减轻未来灾害的影响并做好应对准备。然而,由于在综合和分析这些报告方面存在挑战,这类丰富的实践文献对研究和实践的影响往往有限。在本研究中,我们:1)提供一个用于文本挖掘的实际报告语料库;2)介绍一种使用选定文本挖掘工具从该语料库中提取见解的方法。我们通过一个案例研究来验证该方法,该案例研究考察了关于美国太平洋西北地区为9级卡斯卡迪亚俯冲带地震做准备的实际报告,这场地震有可能使生命线基础设施中断数月。为了探索与实际灾害报告文本挖掘相关的机遇和挑战,我们对潜在用户群体进行了一项简短的调查。该案例研究展示了我们的方法可以从语料库中提取的见解类型。值得注意的是,它揭示了华盛顿州和俄勒冈州在州级应急管理优先事项上的潜在差异,发现了报告中表达的潜在情绪,并识别了该领域中不一致的词汇。调查结果表明,虽然简单工具可能产生主要由经验丰富的专业人员解读的见解,但利用生成式预训练变换器(GPT)等大语言模型的更先进工具提供了更易获取的见解,尽管存在与当前人工智能技术相关的已知风险。为确保可重复性,所有支持数据和代码均已公开提供(数字对象标识符:10.17603/ds2 - 9s7w - 9694)。