Chen Penglai, Chai Jing, Zhang Lu, Wang Debin
School of Health Service Management, Anhui Medical University, 81, Meishan Road, Hefei, Anhui, China.
J Med Syst. 2014 Nov;38(11):88. doi: 10.1007/s10916-014-0088-z. Epub 2014 Sep 30.
This study aims at designing and piloting a convenient Chinese webpage suicide information mining system (SIMS) to help search and filter required data from the internet and discover potential features and trends of suicide.
SIMS utilizes Microsoft Visual Studio2008, SQL2008 and C# as development tools. It collects webpage data via popular search engines; cleans the data using trained models plus minimum manual help; translates the cleaned texts into quantitative data through models and supervised fuzzy recognition; analyzes and visualizes related variables by self-programmed algorithms.
The SIMS developed comprises such functions as suicide news and blogs collection, data filtering, cleaning, extraction and translation, data analysis and presentation. SIMS-mediated mining of one-year webpage revealed that: peak months and hours of web-reported suicide events were June-July and 10-11 am respectively, and the lowest months and hours, September-October and 1-7 am; suicide reports came mostly from Soho, Tecent, Sina etc.; male suicide victims over counted female victims in most sub-regions but southwest China; homes, public places and rented houses were the top three places to commit suicide; poisoning, cutting vein and jumping from building were the most commonly used methods to commit suicide; love disputes, family disputes and mental diseases were the leading causes.
SIMS provides a preliminary and supplementary means for monitoring and understanding suicide. It proposes useful aspects as well as tools for analyzing the features and trends of suicide using data derived from Chinese webpages. Yet given the intrinsic "dual nature" of internet-based suicide information and the tremendous difficulties experienced by ourselves and other researchers, there is still a long way to go for us to expand, refine and evaluate the system.
本研究旨在设计并试运行一个便捷的中文网页自杀信息挖掘系统(SIMS),以帮助从互联网上搜索和筛选所需数据,并发现自杀的潜在特征和趋势。
SIMS利用Microsoft Visual Studio2008、SQL2008和C#作为开发工具。它通过流行的搜索引擎收集网页数据;使用经过训练的模型并辅以最少的人工帮助来清理数据;通过模型和监督模糊识别将清理后的文本转化为定量数据;通过自行编写的算法对相关变量进行分析和可视化处理。
所开发的SIMS具备自杀新闻和博客收集、数据过滤、清理、提取和翻译、数据分析及呈现等功能。通过SIMS对一年网页的挖掘发现:网络报道自杀事件的高峰月份和时间分别为6月至7月以及上午10点至11点,最低月份和时间为9月至10月以及凌晨1点至7点;自杀报道大多来自搜狐、腾讯、新浪等;在大多数地区男性自杀受害者多于女性,但中国西南部除外;家庭、公共场所和出租屋是自杀的前三大场所;中毒、割腕和跳楼是最常用的自杀方式;情感纠纷、家庭纠纷和精神疾病是主要原因。
SIMS为监测和了解自杀提供了一种初步且辅助性的手段。它提出了利用源自中文网页的数据来分析自杀特征和趋势的有用方面及工具。然而,鉴于基于互联网的自杀信息固有的“双重性质”以及我们自己和其他研究人员所面临的巨大困难,要扩展、完善和评估该系统我们仍有很长的路要走。