Torii Manabu, Tilak Sameer S, Doan Son, Zisook Daniel S, Fan Jung-Wei
Medical Informatics, Kaiser Permanente Southern California, San Diego, CA, USA.
Biomed Inform Insights. 2016 Jun 20;8(Suppl 1):1-11. doi: 10.4137/BII.S37791. eCollection 2016.
In an era when most of our life activities are digitized and recorded, opportunities abound to gain insights about population health. Online product reviews present a unique data source that is currently underexplored. Health-related information, although scarce, can be systematically mined in online product reviews. Leveraging natural language processing and machine learning tools, we were able to mine 1.3 million grocery product reviews for health-related information. The objectives of the study were as follows: (1) conduct quantitative and qualitative analysis on the types of health issues found in consumer product reviews; (2) develop a machine learning classifier to detect reviews that contain health-related issues; and (3) gain insights about the task characteristics and challenges for text analytics to guide future research.
在我们大部分生活活动都被数字化和记录的时代,获取有关人群健康的见解机会众多。在线产品评论是一个目前尚未得到充分探索的独特数据源。与健康相关的信息虽然稀少,但可以在在线产品评论中进行系统挖掘。利用自然语言处理和机器学习工具,我们能够从130万条食品杂货产品评论中挖掘与健康相关的信息。本研究的目标如下:(1) 对消费品评论中发现的健康问题类型进行定量和定性分析;(2) 开发一个机器学习分类器来检测包含健康相关问题的评论;(3) 深入了解文本分析的任务特征和挑战,以指导未来的研究。