Suppr
超能文献

基于优化群体搜索特征选择的情感分析中的医学数据挖掘

Medical data mining in sentiment analysis based on optimized swarm search feature selection.

作者信息

Zeng Daohui, Peng Jidong, Fong Simon, Qiu Yining, Wong Raymond

机构信息

First Affiliated Hospital of Guangzhou University of TCM, Guangzhou, People's Republic of China.

Ganzhou People's Hospital, Jiangxi, People's Republic of China.

出版信息

Australas Phys Eng Sci Med. 2018 Dec;41(4):1087-1100. doi: 10.1007/s13246-018-0674-3. Epub 2018 Sep 11.

DOI:10.1007/s13246-018-0674-3

PMID:30206813

Abstract

In this paper, we propose a novel technique termed as optimized swarm search-based feature selection (OS-FS), which is a swarm-type of searching function that selects an ideal subset of features for enhanced classification accuracy. In terms of gaining insights from unstructured medical based texts, sentiment prediction is becoming an increasingly crucial machine learning technique. In fact, due to its robustness and accuracy, it recently gained popularity in the medical industries. Medical text mining is well known as a fundamental data analytic for sentiment prediction. To form a high-dimensional sparse matrix, a popular preprocessing step in text mining is employed to transform medical text strings to word vectors. However, such a sparse matrix poses problems to the induction of accurate sentiment prediction model. The swarm search in our proposed OS-FS can be optimized by a new feature evaluation technique called clustering-by-coefficient-of-variation. In order to find a subset of features from all the original features from the sparse matrix, this type of feature selection has been a commonly utilized dimensionality reduction technique, and has the capability to improve accuracy of the prediction model. We implement this method based on a case scenario where 279 medical articles related to 'meaningful use functionalities on health care quality, safety, and efficiency' from a systematic review of previous medical IT literature. For this medical text mining, a multi-class of sentiments, positive, mixed-positive, neutral and negative is recognized from the document contents. Our experimental results demonstrate the superiority of OS-FS over traditional feature selection methods in literature.

摘要

在本文中，我们提出了一种名为基于优化群搜索的特征选择（OS-FS）的新技术，它是一种群类型的搜索函数，用于选择理想的特征子集以提高分类准确率。就从非结构化医学文本中获取见解而言，情感预测正成为一种越来越重要的机器学习技术。事实上，由于其稳健性和准确性，它最近在医疗行业中受到欢迎。医学文本挖掘是众所周知的情感预测的基本数据分析方法。为了形成高维稀疏矩阵，在文本挖掘中采用一种流行的预处理步骤将医学文本字符串转换为词向量。然而，这样的稀疏矩阵给准确的情感预测模型的归纳带来了问题。我们提出的OS-FS中的群搜索可以通过一种名为变异系数聚类的新特征评估技术进行优化。为了从稀疏矩阵的所有原始特征中找到一个特征子集，这种类型的特征选择一直是一种常用的降维技术，并且有能力提高预测模型的准确性。我们基于一个案例场景实现了该方法，该场景来自对先前医学信息技术文献的系统综述中279篇与“医疗保健质量、安全和效率方面的有意义使用功能”相关的医学文章。对于这种医学文本挖掘，从文档内容中识别出多类情感，即积极、混合积极、中性和消极。我们的实验结果证明了OS-FS相对于文献中传统特征选择方法的优越性。

相似文献

Medical data mining in sentiment analysis based on optimized swarm search feature selection.

Australas Phys Eng Sci Med. 2018 Dec;41(4):1087-1100. doi: 10.1007/s13246-018-0674-3. Epub 2018 Sep 11.

Graph-based biomedical text summarization: An itemset mining and sentence clustering approach.

J Biomed Inform. 2018 Aug;84:42-58. doi: 10.1016/j.jbi.2018.06.005. Epub 2018 Jun 15.

A global optimization approach to multi-polarity sentiment analysis.

PLoS One. 2015 Apr 24;10(4):e0124672. doi: 10.1371/journal.pone.0124672. eCollection 2015.

Improving the Accuracy of Feature Selection in Big Data Mining Using Accelerated Flower Pollination (AFP) Algorithm.

J Med Syst. 2019 Mar 9;43(4):96. doi: 10.1007/s10916-019-1200-1.

Adaptive feature selection using v-shaped binary particle swarm optimization.

PLoS One. 2017 Mar 30;12(3):e0173907. doi: 10.1371/journal.pone.0173907. eCollection 2017.

Automated Surgical Term Clustering: A Text Mining Approach for Unstructured Textual Surgery Descriptions.

IEEE J Biomed Health Inform. 2020 Jul;24(7):2107-2118. doi: 10.1109/JBHI.2019.2956973. Epub 2019 Dec 2.

An Evolutionary Multitasking-Based Feature Selection Method for High-Dimensional Classification.

IEEE Trans Cybern. 2022 Jul;52(7):7172-7186. doi: 10.1109/TCYB.2020.3042243. Epub 2022 Jul 4.

Sentiment analysis in medical settings: New opportunities and challenges.

Artif Intell Med. 2015 May;64(1):17-27. doi: 10.1016/j.artmed.2015.03.006. Epub 2015 May 1.

Automated feature selection of predictors in electronic medical records data.

Biometrics. 2019 Mar;75(1):268-277. doi: 10.1111/biom.12987. Epub 2019 Apr 2.

CCFS: A Confidence-Based Cost-Effective Feature Selection Scheme for Healthcare Data Classification.

IEEE/ACM Trans Comput Biol Bioinform. 2021 May-Jun;18(3):902-911. doi: 10.1109/TCBB.2019.2903804. Epub 2021 Jun 3.

引用本文的文献

LASSO Regression Modeling on Prediction of Medical Terms among Seafarers' Health Documents Using Tidy Text Mining.

Bioengineering (Basel). 2022 Mar 17;9(3):124. doi: 10.3390/bioengineering9030124.

Managing Complexity. From Documentation to Knowledge Integration and Informed Decision Findings from the Clinical Information Systems Perspective for 2018.

Yearb Med Inform. 2019 Aug;28(1):95-100. doi: 10.1055/s-0039-1677919. Epub 2019 Aug 16.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

Suppr超能文献

基于优化群体搜索特征选择的情感分析中的医学数据挖掘

Medical data mining in sentiment analysis based on optimized swarm search feature selection.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译