Suppr超能文献

基于人工智能的疾病类别预测模型,利用来自资源匮乏的埃塞俄比亚语言(阿法尔语)的症状文本。

AI-based disease category prediction model using symptoms from low-resource Ethiopian language: Afaan Oromo text.

机构信息

Department of Computer Science and Engineering, Engineering and Technology, Wollega University, Oromia, Ethiopia.

Department of Data Science, Indian Institute of Technology Palakkad (IIT Palakkad), Palakkad, India.

出版信息

Sci Rep. 2024 May 16;14(1):11233. doi: 10.1038/s41598-024-62278-7.

Abstract

Automated disease diagnosis and prediction, powered by AI, play a crucial role in enabling medical professionals to deliver effective care to patients. While such predictive tools have been extensively explored in resource-rich languages like English, this manuscript focuses on predicting disease categories automatically from symptoms documented in the Afaan Oromo language, employing various classification algorithms. This study encompasses machine learning techniques such as support vector machines, random forests, logistic regression, and Naïve Bayes, as well as deep learning approaches including LSTM, GRU, and Bi-LSTM. Due to the unavailability of a standard corpus, we prepared three data sets with different numbers of patient symptoms arranged into 10 categories. The two feature representations, TF-IDF and word embedding, were employed. The performance of the proposed methodology has been evaluated using accuracy, recall, precision, and F1 score. The experimental results show that, among machine learning models, the SVM model using TF-IDF had the highest accuracy and F1 score of 94.7%, while the LSTM model using word2vec embedding showed an accuracy rate of 95.7% and F1 score of 96.0% from deep learning models. To enhance the optimal performance of each model, several hyper-parameter tuning settings were used. This study shows that the LSTM model verifies to be the best of all the other models over the entire dataset.

摘要

基于人工智能的自动疾病诊断和预测在帮助医疗专业人员为患者提供有效护理方面发挥着至关重要的作用。虽然在英语等资源丰富的语言中已经广泛探索了此类预测工具,但本文侧重于使用各种分类算法,自动从阿姆哈拉语记录的症状中预测疾病类别。本研究涵盖了机器学习技术,如支持向量机、随机森林、逻辑回归和朴素贝叶斯,以及深度学习方法,包括 LSTM、GRU 和 Bi-LSTM。由于没有标准语料库,我们准备了三个具有不同数量患者症状的数据集,这些症状被分为 10 类。我们使用了两种特征表示,TF-IDF 和词嵌入。使用准确性、召回率、精度和 F1 分数评估了所提出方法的性能。实验结果表明,在机器学习模型中,使用 TF-IDF 的 SVM 模型的准确率和 F1 分数最高,为 94.7%,而使用 word2vec 嵌入的 LSTM 模型的准确率和 F1 分数最高,分别为 95.7%和 96.0%。为了提高每个模型的最佳性能,我们使用了几种超参数调整设置。本研究表明,在整个数据集上,LSTM 模型比所有其他模型的性能都要好。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c1c3/11098814/25bae293c399/41598_2024_62278_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验