Suppr超能文献

一种从社交媒体中提取药物不良事件的集成方法。

An ensemble method for extracting adverse drug events from social media.

作者信息

Liu Jing, Zhao Songzheng, Zhang Xiaodi

机构信息

School of Management, Northwestern Polytechnical University, Xi'an, Shaanxi 710072, PR China.

School of Management, Northwestern Polytechnical University, Xi'an, Shaanxi 710072, PR China.

出版信息

Artif Intell Med. 2016 Jun;70:62-76. doi: 10.1016/j.artmed.2016.05.004. Epub 2016 Jun 6.

Abstract

OBJECTIVE

Because adverse drug events (ADEs) are a serious health problem and a leading cause of death, it is of vital importance to identify them correctly and in a timely manner. With the development of Web 2.0, social media has become a large data source for information on ADEs. The objective of this study is to develop a relation extraction system that uses natural language processing techniques to effectively distinguish between ADEs and non-ADEs in informal text on social media.

METHODS AND MATERIALS

We develop a feature-based approach that utilizes various lexical, syntactic, and semantic features. Information-gain-based feature selection is performed to address high-dimensional features. Then, we evaluate the effectiveness of four well-known kernel-based approaches (i.e., subset tree kernel, tree kernel, shortest dependency path kernel, and all-paths graph kernel) and several ensembles that are generated by adopting different combination methods (i.e., majority voting, weighted averaging, and stacked generalization). All of the approaches are tested using three data sets: two health-related discussion forums and one general social networking site (i.e., Twitter).

RESULTS

When investigating the contribution of each feature subset, the feature-based approach attains the best area under the receiver operating characteristics curve (AUC) values, which are 78.6%, 72.2%, and 79.2% on the three data sets. When individual methods are used, we attain the best AUC values of 82.1%, 73.2%, and 77.0% using the subset tree kernel, shortest dependency path kernel, and feature-based approach on the three data sets, respectively. When using classifier ensembles, we achieve the best AUC values of 84.5%, 77.3%, and 84.5% on the three data sets, outperforming the baselines.

CONCLUSIONS

Our experimental results indicate that ADE extraction from social media can benefit from feature selection. With respect to the effectiveness of different feature subsets, lexical features and semantic features can enhance the ADE extraction capability. Kernel-based approaches, which can stay away from the feature sparsity issue, are qualified to address the ADE extraction problem. Combining different individual classifiers using suitable combination methods can further enhance the ADE extraction effectiveness.

摘要

目的

由于药物不良事件(ADEs)是一个严重的健康问题且是主要死因,及时准确地识别它们至关重要。随着Web 2.0的发展,社交媒体已成为关于ADEs信息的大数据源。本研究的目的是开发一种关系提取系统,该系统使用自然语言处理技术在社交媒体上的非正式文本中有效区分ADEs和非ADEs。

方法和材料

我们开发了一种基于特征的方法,该方法利用各种词汇、句法和语义特征。基于信息增益的特征选择用于处理高维特征。然后,我们评估四种著名的基于核的方法(即子集树核、树核、最短依赖路径核和全路径图核)以及通过采用不同组合方法(即多数投票、加权平均和堆叠泛化)生成的几个集成方法的有效性。所有方法都使用三个数据集进行测试:两个与健康相关的讨论论坛和一个通用社交网站(即Twitter)。

结果

在研究每个特征子集的贡献时,基于特征的方法在三个数据集上分别获得了接收者操作特征曲线(AUC)下面积的最佳值,分别为78.6%、72.2%和79.2%。当使用单个方法时,我们在三个数据集上分别使用子集树核、最短依赖路径核和基于特征的方法获得了最佳AUC值,分别为82.1%、73.2%和7

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验