Huang Jhih-Yuan, Lee Wei-Po, Lee King-Der
Department of Information Management, National Sun Yat-sen University, Kaohsiung 80424, Taiwan.
Department of Healthcare Administration and Medical Informatics, Kaohsiung Medical University, Kaohsiung 80708, Taiwan.
Healthcare (Basel). 2022 Mar 25;10(4):618. doi: 10.3390/healthcare10040618.
Social forums offer a lot of new channels for collecting patients' opinions to construct predictive models of adverse drug reactions (ADRs) for post-marketing surveillance. However, due to the characteristics of social posts, there are many challenges still to be solved when deriving such models, mainly including problems caused by data sparseness, data features with a high-dimensionality, and term diversity in data. To tackle these crucial issues related to identifying ADRs from social posts, we perform data analytics from the perspectives of data balance, feature selection, and feature learning. Meanwhile, we design a comprehensive experimental analysis to investigate the performance of different data processing techniques and data modeling methods. Most importantly, we present a deep learning-based approach that adopts the BERT (Bidirectional Encoder Representations from Transformers) model with a new batch-wise adaptive strategy to enhance the predictive performance. A series of experiments have been conducted to evaluate the machine learning methods with both manual and automated feature engineering processes. The results prove that with their own advantages both types of methods are effective in ADR prediction. In contrast to the traditional machine learning methods, our feature learning approach can automatically achieve the required task to save the manual effort for the large number of experiments.
社交论坛为收集患者意见提供了许多新渠道,以构建用于上市后监测的药物不良反应(ADR)预测模型。然而,由于社交帖子的特性,在推导此类模型时仍有许多挑战有待解决,主要包括数据稀疏性、高维数据特征以及数据中的术语多样性所导致的问题。为了解决这些与从社交帖子中识别ADR相关的关键问题,我们从数据平衡、特征选择和特征学习的角度进行数据分析。同时,我们设计了一项全面的实验分析,以研究不同数据处理技术和数据建模方法的性能。最重要的是,我们提出了一种基于深度学习的方法,该方法采用具有新的逐批自适应策略的BERT(来自Transformer的双向编码器表示)模型来提高预测性能。我们进行了一系列实验,以评估具有手动和自动特征工程过程的机器学习方法。结果证明,这两种方法都有各自的优势,在ADR预测中都是有效的。与传统机器学习方法相比,我们的特征学习方法可以自动完成所需任务,为大量实验节省了人工。