• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用自动机器学习(AutoML)在多标签文本分类中,对预训练语言模型在碳排放、时间和准确性方面的比较。

Comparison of pre-trained language models in terms of carbon emissions, time and accuracy in multi-label text classification using AutoML.

作者信息

Savci Pinar, Das Bihter

机构信息

Arçelik A.Ş. Karaağaç Caddesi 2-6, Sütlüce Beyoğlu 34445 Istanbul, Turkey.

Department of Software Engineering, Technology Faculty, Firat University, 23119, Elazig, Turkey.

出版信息

Heliyon. 2023 May 1;9(5):e15670. doi: 10.1016/j.heliyon.2023.e15670. eCollection 2023 May.

DOI:10.1016/j.heliyon.2023.e15670
PMID:37187909
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10176029/
Abstract

Since Turkish is an agglutinative language and contains reduplication, idiom, and metaphor words, Turkish texts are sources of information with extremely rich meanings. For this reason, the processing and classification of Turkish texts according to their characteristics is both time-consuming and difficult. In this study, the performances of pre-trained language models for multi-text classification using Autotrain were compared in a 250 K Turkish dataset that we created. The results showed that the BERTurk (uncased, 128 k) language model on the dataset showed higher accuracy performance with a training time of 66 min compared to the other models and the CO2 emission was quite low. The ConvBERTurk mC4 (uncased) model is also the best-performing second language model. As a result of this study, we have provided a deeper understanding of the capabilities of pre-trained language models for Turkish on machine learning.

摘要

由于土耳其语是一种黏着语,包含重复、习语和隐喻词,土耳其语文本是具有极其丰富含义的信息来源。因此,根据土耳其语文本的特征进行处理和分类既耗时又困难。在本研究中,我们在自己创建的25万个土耳其语数据集上比较了使用Autotrain进行多文本分类的预训练语言模型的性能。结果表明,数据集中的BERTurk(无大小写,128k)语言模型与其他模型相比,在66分钟的训练时间内表现出更高的准确率,并且二氧化碳排放量相当低。ConvBERTurk mC4(无大小写)模型也是表现最佳的第二语言模型。作为这项研究的结果,我们对预训练语言模型在土耳其语机器学习方面的能力有了更深入的理解。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/40a8/10176029/996902cca1d3/gr8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/40a8/10176029/09706150b220/gr1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/40a8/10176029/9b3fd824cd60/gr2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/40a8/10176029/b9113a6abf29/gr3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/40a8/10176029/1c14ae1b0c3a/gr4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/40a8/10176029/1914b565fbb0/gr5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/40a8/10176029/1a1678e5a3cd/gr6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/40a8/10176029/31302c27795d/gr7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/40a8/10176029/996902cca1d3/gr8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/40a8/10176029/09706150b220/gr1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/40a8/10176029/9b3fd824cd60/gr2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/40a8/10176029/b9113a6abf29/gr3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/40a8/10176029/1c14ae1b0c3a/gr4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/40a8/10176029/1914b565fbb0/gr5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/40a8/10176029/1a1678e5a3cd/gr6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/40a8/10176029/31302c27795d/gr7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/40a8/10176029/996902cca1d3/gr8.jpg

相似文献

1
Comparison of pre-trained language models in terms of carbon emissions, time and accuracy in multi-label text classification using AutoML.使用自动机器学习(AutoML)在多标签文本分类中,对预训练语言模型在碳排放、时间和准确性方面的比较。
Heliyon. 2023 May 1;9(5):e15670. doi: 10.1016/j.heliyon.2023.e15670. eCollection 2023 May.
2
Natural Language Processing for Imaging Protocol Assignment: Machine Learning for Multiclass Classification of Abdominal CT Protocols Using Indication Text Data.基于自然语言处理的成像协议分配:使用指示文本数据进行多类分类的腹部 CT 协议的机器学习。
J Digit Imaging. 2022 Oct;35(5):1120-1130. doi: 10.1007/s10278-022-00633-8. Epub 2022 Jun 2.
3
BioBERTurk: Exploring Turkish Biomedical Language Model Development Strategies in Low-Resource Setting.BioBERTurk:在资源匮乏环境中探索土耳其生物医学语言模型开发策略
J Healthc Inform Res. 2023 Sep 19;7(4):433-446. doi: 10.1007/s41666-023-00140-7. eCollection 2023 Dec.
4
Text data augmentation and pre-trained Language Model for enhancing text classification of low-resource languages.用于增强低资源语言文本分类的文本数据增强和预训练语言模型。
PeerJ Comput Sci. 2024 Mar 29;10:e1974. doi: 10.7717/peerj-cs.1974. eCollection 2024.
5
Medical text classification based on the discriminative pre-training model and prompt-tuning.基于判别式预训练模型和提示调整的医学文本分类
Digit Health. 2023 Aug 6;9:20552076231193213. doi: 10.1177/20552076231193213. eCollection 2023 Jan-Dec.
6
Natural language processing and machine learning approaches for food categorization and nutrition quality prediction compared with traditional methods.与传统方法相比,用于食品分类和营养质量预测的自然语言处理和机器学习方法。
Am J Clin Nutr. 2023 Mar;117(3):553-563. doi: 10.1016/j.ajcnut.2022.11.022. Epub 2022 Dec 23.
7
PDF text classification to leverage information extraction from publication reports.利用出版物报告中的信息提取进行PDF文本分类。
J Biomed Inform. 2016 Jun;61:141-8. doi: 10.1016/j.jbi.2016.03.026. Epub 2016 Apr 1.
8
Comparing human text classification performance and explainability with large language and machine learning models using eye-tracking.使用眼动追踪技术比较大语言和机器学习模型与人类文本分类性能和可解释性。
Sci Rep. 2024 Jun 21;14(1):14295. doi: 10.1038/s41598-024-65080-7.
9
Evaluation of the performance of traditional machine learning algorithms, convolutional neural network and AutoML Vision in ultrasound breast lesions classification: a comparative study.传统机器学习算法、卷积神经网络和自动机器学习视觉在超声乳腺病变分类中的性能评估:一项比较研究。
Quant Imaging Med Surg. 2021 Apr;11(4):1381-1393. doi: 10.21037/qims-20-922.
10
Tracking financing for global common goods for health: A machine learning approach using natural language processing techniques.追踪全球卫生共同财资金:使用自然语言处理技术的机器学习方法。
Front Public Health. 2022 Nov 17;10:1031147. doi: 10.3389/fpubh.2022.1031147. eCollection 2022.

引用本文的文献

1
An improved deep convolutional neural network-based YouTube video classification using textual features.一种基于改进的深度卷积神经网络并利用文本特征的YouTube视频分类方法。
Heliyon. 2024 Aug 10;10(16):e35812. doi: 10.1016/j.heliyon.2024.e35812. eCollection 2024 Aug 30.

本文引用的文献

1
Gradient-based optimization of hyperparameters.基于梯度的超参数优化。
Neural Comput. 2000 Aug;12(8):1889-900. doi: 10.1162/089976600300015187.