Suppr超能文献

ARTCDP:一个用于监测中国道路交通事故新形态的自动化数据平台。

ARTCDP: An automated data platform for monitoring emerging patterns concerning road traffic crashes in China.

机构信息

Department of Epidemiology and Health Statistics, Hunan Provincial Key Laboratory of Clinical Epidemiology, Xiangya School of Public Health, Central South University, Changsha, China.

Department of Psychology, University of Alabama at Birmingham, Birmingham, AL, United States.

出版信息

Accid Anal Prev. 2022 Sep;174:106727. doi: 10.1016/j.aap.2022.106727. Epub 2022 Jun 3.

Abstract

Online media reports provide valuable information for road traffic injury prevention, but technical challenges concerning data acquisition and processing limit analysis and interpretation of such data. Integrating injury epidemiology theory and big data technology, we developed a data platform consisting of four layers (data acquisition, data processing, application and data storage) to automatically collect reports from online Chinese media concerning road traffic crashes every 24 h. We built a text classification model using 20,000 manually annotated news stories based on the Bidirectional Encoder Representations from Transformers (BERT) and then used natural language processing algorithms to extract data concerning 27 structured variables from the news sources. The accuracy of the BERT-based text classification model was 0.9271, with information extraction accuracy exceeding 80% for 22 variables. As of November 30, 2021, the data platform collected 244,650 eligible media reports covering all 333 prefecture-level divisions in China. These reports were from 37,073 websites or social media accounts, which were geographically located in all 31 provinces and over 98% of prefecture-level divisions. Data availability varied greatly from 0.9% to 100% across the 27 structured variables. Additionally, the platform identified 645,787 potentially relevant keywords when applying natural language processing techniques to the textual media reports. Platform data were highly correlated with road police data in province-based road traffic crash statistics (crashes, r = 0.799; non-fatal injuries, r = 0.802; deaths, r = 0.775). In particular, the platform offers valuable data (like crashes involving electric vehicles) that are not included in official road traffic crash statistics. The new automated data platform shows great potential for timely detection of emerging characteristics of road traffic crashes. Further research is needed to improve the platform and apply it to real-time monitoring and analysis of road traffic injuries.

摘要

在线媒体报道为道路交通事故预防提供了有价值的信息,但数据采集和处理方面的技术挑战限制了对这些数据的分析和解释。我们将伤害流行病学理论和大数据技术相结合,开发了一个由四层组成的数据平台(数据采集、数据处理、应用和数据存储),以每 24 小时自动从中文在线媒体上收集道路交通事故报告。我们使用基于变压器的双向编码器表示(BERT)的 20000 个手动标记新闻故事构建了一个文本分类模型,然后使用自然语言处理算法从新闻源中提取 27 个结构化变量的数据。基于 BERT 的文本分类模型的准确率为 0.9271,对于 22 个变量,信息提取准确率超过 80%。截至 2021 年 11 月 30 日,该数据平台共收集了 244650 份符合条件的媒体报道,涵盖了中国所有 333 个地级市。这些报告来自 37073 个网站或社交媒体账户,分布在中国 31 个省和 98%以上的地级市。27 个结构化变量的数据可用性差异很大,从 0.9%到 100%不等。此外,当应用自然语言处理技术处理文本媒体报告时,该平台还识别出了 645787 个潜在相关关键词。平台数据与省级道路交通事故统计中的道路警察数据高度相关(事故,r=0.799;非致命伤害,r=0.802;死亡,r=0.775)。特别是,该平台提供了官方道路交通事故统计数据中未包含的有价值的数据(如涉及电动汽车的事故)。新的自动化数据平台在及时发现道路交通事故的新特征方面显示出巨大潜力。需要进一步研究以改进该平台并将其应用于道路交通事故伤害的实时监测和分析。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验