Department of Epidemiology and Health Statistics; Hunan Provincial Key Laboratory of Clinical Epidemiology, Xiangya School of Public Health, Central South University, Changsha, China.
Department of Child, Adolescent and Women's Health, School of Public Health, Capital Medical University, Beijing, China.
J Glob Health. 2024 Aug 23;14:04174. doi: 10.7189/jogh.14.04174.
Internet-based media stories provide valuable information for emerging risks of product-related child injury prevention and control, but critical methodological challenges and high costs of data acquisition and processing restrict practical use by stakeholders.
We constructed a data platform through literature reviews and multi-round research group discussions. Developed components included standard search strategies, filtering criteria, textual document classification, information extraction standards and a keyword dictionary. We used ten thousand manually labelled media stories to validate the textual document classification model, which was established using the Bidirectional Encoder Representation from Transformers (BERT). Multiple information extraction methods based on natural language processing algorithms were adopted to extract data for 29 structured variables from media stories. They were evaluated through manual validation of 1000 media stories about product-related child injury. We mapped the geographic distribution of media sources and media-reported product-related child injury events.
We developed an internet-based product-related child injury textual data platform, IPCITDP, consisting of four layers - automatic data search and acquisition, data processing, data storage, and data application - concerning product-related child injury online media stories in China. Each layer occurred daily. External validation demonstrated high performance for the BERT classification model we established (accuracy = 0.9703) and the combined information extraction strategies (accuracy >0.70 for 25 variables). As of 31 December 2023, IPCITDP collected 35 275 eligible product-related child injury reports from 13 261 news media websites or social media platform accounts which were geographically located across all 31 mainland Chinese provinces and covered over 97% of prefecture-level cities. The injury cases in IPCITDP were typically reported several months or years earlier than official announcements about the product-related child injury risks. Our data platform added data concerning 15 supplementary variables that the national product-related injury surveillance system lacks. Two examples demonstrate the value of IPCITDP in supplementing existing data and providing early epidemiological detection of emerging signals concerning product-related child injury: magnetic beads and electric self-balancing scooters.
Our data platform provides injury data that can support early detection of new product-related child injury characteristics and supplement existing data sources to reduce the burden of product-related injury among Chinese children.
基于互联网的媒体报道为产品相关儿童伤害预防和控制的新兴风险提供了有价值的信息,但关键的方法学挑战和数据获取与处理的高昂成本限制了利益相关者的实际应用。
我们通过文献回顾和多轮研究小组讨论构建了一个数据平台。开发的组件包括标准搜索策略、过滤标准、文本文件分类、信息提取标准和关键词词典。我们使用一万个手动标记的媒体报道来验证使用来自 Transformer 的双向编码器表示(BERT)建立的文本文件分类模型。采用了多种基于自然语言处理算法的信息提取方法,从媒体报道中提取 29 个结构化变量的数据。通过对 1000 篇关于产品相关儿童伤害的媒体报道进行手动验证来评估这些方法。我们绘制了媒体来源和媒体报道的产品相关儿童伤害事件的地理分布。
我们开发了一个基于互联网的产品相关儿童伤害文本数据平台,即 IPCITDP,它由四层组成——自动数据搜索和获取、数据处理、数据存储和数据应用,涉及中国的产品相关儿童伤害在线媒体报道。每一层每天都在运行。外部验证表明,我们建立的 BERT 分类模型(准确率为 0.9703)和联合信息提取策略表现出很高的性能(25 个变量中有 25 个变量的准确率大于 0.70)。截至 2023 年 12 月 31 日,IPCITDP 从 13261 个新闻媒体网站或社交媒体平台账户中收集了 35275 份符合条件的产品相关儿童伤害报告,这些账户的地理位置分布在中国 31 个省级行政区的所有地区,覆盖了 97%以上的地级市。IPCITDP 中的伤害案例通常比官方宣布产品相关儿童伤害风险早几个月或几年报告。我们的数据平台增加了国家产品相关伤害监测系统所缺乏的 15 个补充变量的数据。两个例子说明了 IPCITDP 在补充现有数据和提供对产品相关儿童伤害新兴信号的早期流行病学检测方面的价值:磁性珠子和电动自平衡滑板车。
我们的数据平台提供了伤害数据,可支持对新产品相关儿童伤害特征的早期检测,并补充现有数据源,以减少中国儿童的产品相关伤害负担。