IXA Group, University of the Basque Country (UPV-EHU), Spain.
Health Informatics J. 2019 Dec;25(4):1768-1778. doi: 10.1177/1460458218799470. Epub 2018 Sep 19.
This work focuses on adverse drug reaction extraction tackling the class imbalance problem. Adverse drug reactions are infrequent events in electronic health records, nevertheless, it is compulsory to get them documented. Text mining techniques can help to retrieve this kind of valuable information from text. The class imbalance was tackled using different sampling methods, cost-sensitive learning, ensemble learning and one-class classification and the Random Forest classifier was used. The adverse drug reaction extraction model was inferred from a dataset that comprises real electronic health records with an imbalance ratio of 1:222, this means that for each drug-disease pair that is an adverse drug reaction, there are approximately 222 that are not adverse drug reactions. The application of a sampling technique before using cost-sensitive learning offered the best result. On the test set, the f-measure was 0.121 for the minority class and 0.996 for the majority class.
这项工作专注于处理不良反应提取中的类别不平衡问题。药物不良反应在电子健康记录中较为罕见,但必须记录下来。文本挖掘技术可以帮助从文本中检索这类有价值的信息。使用不同的采样方法、代价敏感学习、集成学习和单类分类来处理类别不平衡问题,并使用随机森林分类器。不良反应提取模型是从一个数据集推断出来的,该数据集包含具有 1:222 不平衡比例的真实电子健康记录,这意味着对于每个药物-疾病对是不良反应的情况,大约有 222 个不是不良反应。在使用代价敏感学习之前应用采样技术提供了最佳结果。在测试集上,少数类别的 F1 测度为 0.121,多数类别的 F1 测度为 0.996。