医疗保健文本分类系统及其性能评估：通过描述医疗保健文本提供更好的智能来源。

Healthcare Text Classification System and its Performance Evaluation: A Source of Better Intelligence by Characterizing Healthcare Text.

机构信息

Department of Computer Science & Engineering, JIIT, Noida, India.

Advanced Knowledge Engineering Center, Global Biomedical Technologies, Inc., Roseville, CA, USA.

出版信息

J Med Syst. 2018 Apr 13;42(5):97. doi: 10.1007/s10916-018-0941-6.

DOI:10.1007/s10916-018-0941-6

PMID:29654417

Abstract

A machine learning (ML)-based text classification system has several classifiers. The performance evaluation (PE) of the ML system is typically driven by the training data size and the partition protocols used. Such systems lead to low accuracy because the text classification systems lack the ability to model the input text data in terms of noise characteristics. This research study proposes a concept of misrepresentation ratio (MRR) on input healthcare text data and models the PE criteria for validating the hypothesis. Further, such a novel system provides a platform to amalgamate several attributes of the ML system such as: data size, classifier type, partitioning protocol and percentage MRR. Our comprehensive data analysis consisted of five types of text data sets (TwitterA, WebKB4, Disease, Reuters (R8), and SMS); five kinds of classifiers (support vector machine with linear kernel (SVM-L), MLP-based neural network, AdaBoost, stochastic gradient descent and decision tree); and five types of training protocols (K2, K4, K5, K10 and JK). Using the decreasing order of MRR, our ML system demonstrates the mean classification accuracies as: 70.13 ± 0.15%, 87.34 ± 0.06%, 93.73 ± 0.03%, 94.45 ± 0.03% and 97.83 ± 0.01%, respectively, using all the classifiers and protocols. The corresponding AUC is 0.98 for SMS data using Multi-Layer Perceptron (MLP) based neural network. All the classifiers, the best accuracy of 91.84 ± 0.04% is shown to be of MLP-based neural network and this is 6% better over previously published. Further we observed that as MRR decreases, the system robustness increases and validated by standard deviations. The overall text system accuracy using all data types, classifiers, protocols is 89%, thereby showing the entire ML system to be novel, robust and unique. The system is also tested for stability and reliability.

摘要

基于机器学习 (ML) 的文本分类系统有几个分类器。ML 系统的性能评估 (PE) 通常由训练数据大小和使用的分区协议驱动。由于文本分类系统缺乏根据噪声特征对输入文本数据进行建模的能力，因此此类系统的准确性较低。本研究提出了输入医疗保健文本数据中的表示不当比例 (MRR) 的概念，并对验证假设的 PE 标准进行建模。此外，这种新颖的系统提供了一个平台，可以合并 ML 系统的几个属性，例如：数据大小、分类器类型、分区协议和百分比 MRR。我们的综合数据分析包括五种类型的文本数据集（TwitterA、WebKB4、疾病、Reuters（R8）和 SMS）；五种分类器（带线性核的支持向量机 (SVM-L)、基于 MLP 的神经网络、AdaBoost、随机梯度下降和决策树）；和五种训练协议（K2、K4、K5、K10 和 JK）。使用 MRR 的降序排列，我们的 ML 系统使用所有分类器和协议分别展示了以下平均分类精度：70.13 ± 0.15%、87.34 ± 0.06%、93.73 ± 0.03%、94.45 ± 0.03% 和 97.83 ± 0.01%。使用基于 MLP 的神经网络对 SMS 数据，相应的 AUC 为 0.98。所有分类器的最佳精度为 91.84 ± 0.04%，这比之前发表的结果提高了 6%。此外，我们观察到随着 MRR 的降低，系统的稳健性增加，并通过标准偏差进行验证。使用所有数据类型、分类器和协议的整个文本系统的准确率为 89%，从而表明整个 ML 系统是新颖的、稳健的和独特的。该系统还经过了稳定性和可靠性测试。

相似文献

Healthcare Text Classification System and its Performance Evaluation: A Source of Better Intelligence by Characterizing Healthcare Text.医疗保健文本分类系统及其性能评估：通过描述医疗保健文本提供更好的智能来源。

J Med Syst. 2018 Apr 13;42(5):97. doi: 10.1007/s10916-018-0941-6.

Effect of incremental feature enrichment on healthcare text classification system: A machine learning paradigm.增量特征增强对医疗保健文本分类系统的影响：一种机器学习范例。

Comput Methods Programs Biomed. 2019 Apr;172:35-51. doi: 10.1016/j.cmpb.2019.01.011. Epub 2019 Feb 1.

Extreme Learning Machine Framework for Risk Stratification of Fatty Liver Disease Using Ultrasound Tissue Characterization.基于超声组织特征的极端学习机框架用于脂肪肝疾病风险分层

J Med Syst. 2017 Aug 23;41(10):152. doi: 10.1007/s10916-017-0797-1.

Direct Kernel Perceptron (DKP): ultra-fast kernel ELM-based classification with non-iterative closed-form weight calculation.直接核感知机（DKP）：基于超快速核极限学习机的分类方法，具有非迭代的闭式权重计算。

Neural Netw. 2014 Feb;50:60-71. doi: 10.1016/j.neunet.2013.11.002. Epub 2013 Nov 14.

A novel and robust Bayesian approach for segmentation of psoriasis lesions and its risk stratification.一种用于银屑病皮损分割及其风险分层的新颖且稳健的贝叶斯方法。

Comput Methods Programs Biomed. 2017 Oct;150:9-22. doi: 10.1016/j.cmpb.2017.07.011. Epub 2017 Aug 3.

Application of machine learning classifiers to X-ray diffraction imaging with medically relevant phantoms.机器学习分类器在具有医学相关体模的 X 射线衍射成象中的应用。

Med Phys. 2022 Jan;49(1):532-546. doi: 10.1002/mp.15366. Epub 2021 Dec 1.

Improving the Accuracy of Ensemble Machine Learning Classification Models Using a Novel Bit-Fusion Algorithm for Healthcare AI Systems.利用一种新颖的位融合算法提高医疗 AI 系统中集成机器学习分类模型的准确性。

Front Public Health. 2022 May 4;10:858282. doi: 10.3389/fpubh.2022.858282. eCollection 2022.

Accurate Diabetes Risk Stratification Using Machine Learning: Role of Missing Value and Outliers.利用机器学习进行准确的糖尿病风险分层：缺失值和异常值的作用。

J Med Syst. 2018 Apr 10;42(5):92. doi: 10.1007/s10916-018-0940-7.

Human lung cancer classification and comprehensive analysis using different machine learning techniques.使用不同机器学习技术的人类肺癌分类与综合分析

Microsc Res Tech. 2025 Jan;88(1):234-250. doi: 10.1002/jemt.24682. Epub 2024 Sep 18.

Learning regular expressions for clinical text classification.学习正则表达式进行临床文本分类。

J Am Med Inform Assoc. 2014 Sep-Oct;21(5):850-7. doi: 10.1136/amiajnl-2013-002411. Epub 2014 Feb 27.

引用本文的文献

Prediction of Alzheimer's disease progression within 6 years using speech: A novel approach leveraging language models.利用语音预测 6 年内阿尔茨海默病的进展：一种利用语言模型的新方法。

Alzheimers Dement. 2024 Aug;20(8):5262-5270. doi: 10.1002/alz.13886. Epub 2024 Jun 25.

A Machine Learning Based Framework to Identify and Classify Non-alcoholic Fatty Liver Disease in a Large-Scale Population.基于机器学习的大型人群中非酒精性脂肪肝识别和分类框架。

Front Public Health. 2022 Apr 4;10:846118. doi: 10.3389/fpubh.2022.846118. eCollection 2022.

Use of Automated Thematic Annotations for Small Data Sets in a Psychotherapeutic Context: Systematic Review of Machine Learning Algorithms.心理治疗背景下小数据集自动主题注释的使用：机器学习算法的系统综述

JMIR Ment Health. 2021 Oct 22;8(10):e22651. doi: 10.2196/22651.

Identification of Potential Type II Diabetes in a Large-Scale Chinese Population Using a Systematic Machine Learning Framework.利用系统机器学习框架在中国大规模人群中识别潜在的 2 型糖尿病。

J Diabetes Res. 2020 Sep 24;2020:6873891. doi: 10.1155/2020/6873891. eCollection 2020.

RSMOTE: improving classification performance over imbalanced medical datasets.RSMOTE：提升不平衡医学数据集的分类性能

Health Inf Sci Syst. 2020 Jun 12;8(1):22. doi: 10.1007/s13755-020-00112-w. eCollection 2020 Dec.

Classification and prediction of diabetes disease using machine learning paradigm.使用机器学习范式对糖尿病疾病进行分类和预测。

Health Inf Sci Syst. 2020 Jan 3;8(1):7. doi: 10.1007/s13755-019-0095-z. eCollection 2020 Dec.

本文引用的文献

Text Messaging (SMS) Helping Cancer Care in Patients Undergoing Chemotherapy Treatment: a Pilot Study.短信（SMS）在化疗患者癌症护理中的应用：一项试点研究。

J Med Syst. 2017 Oct 9;41(11):181. doi: 10.1007/s10916-017-0831-3.

Computer-aided diagnosis of psoriasis skin images with HOS, texture and color features: A first comparative study of its kind.基于 HOS、纹理和颜色特征的银屑病皮肤图像计算机辅助诊断：同类研究中的首次比较。

Comput Methods Programs Biomed. 2016 Apr;126:98-109. doi: 10.1016/j.cmpb.2015.11.013. Epub 2016 Jan 20.

An Approach for Learning Expressive Ontologies in Medical Domain.医学领域中可表达本体的学习方法。

J Med Syst. 2015 Aug;39(8):75. doi: 10.1007/s10916-015-0261-z. Epub 2015 Jun 16.

Patient involvement in health care decision making: a review.患者参与医疗保健决策：一项综述。

Iran Red Crescent Med J. 2014 Jan;16(1):e12454. doi: 10.5812/ircmj.12454. Epub 2014 Jan 5.

An ensemble heterogeneous classification methodology for discovering health-related knowledge in social media messages.一种用于在社交媒体消息中发现健康相关知识的集成异构分类方法。

J Biomed Inform. 2014 Jun;49:255-68. doi: 10.1016/j.jbi.2014.03.005. Epub 2014 Mar 16.

Twitter mining for fine-grained syndromic surveillance.用于细粒度症状监测的推特挖掘

Artif Intell Med. 2014 Jul;61(3):153-63. doi: 10.1016/j.artmed.2014.01.002. Epub 2014 Jan 31.

Atherosclerotic plaque tissue characterization in 2D ultrasound longitudinal carotid scans for automated classification: a paradigm for stroke risk assessment.二维超声纵向颈动脉扫描中动脉粥样硬化斑块组织特征分析用于自动分类：中风风险评估范例。

Med Biol Eng Comput. 2013 May;51(5):513-23. doi: 10.1007/s11517-012-1019-0. Epub 2013 Jan 6.

Ovarian tumor characterization and classification using ultrasound-a new online paradigm.基于超声的卵巢肿瘤特征化与分类——一种新的在线范例。

J Digit Imaging. 2013 Jun;26(3):544-53. doi: 10.1007/s10278-012-9553-8.

Symptomatic vs. asymptomatic plaque classification in carotid ultrasound.颈动脉超声中症状性与无症状性斑块的分类。

J Med Syst. 2012 Jun;36(3):1861-71. doi: 10.1007/s10916-010-9645-2. Epub 2011 Jan 18.

An introduction to kernel-based learning algorithms.基于核的学习算法介绍。

IEEE Trans Neural Netw. 2001;12(2):181-201. doi: 10.1109/72.914517.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

医疗保健文本分类系统及其性能评估：通过描述医疗保健文本提供更好的智能来源。

Healthcare Text Classification System and its Performance Evaluation: A Source of Better Intelligence by Characterizing Healthcare Text.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献