文献检索文档翻译深度研究
Suppr Zotero 插件Zotero 插件
邀请有礼套餐&价格历史记录

新学期,新优惠

限时优惠:9月1日-9月22日

30天高级会员仅需29元

1天体验卡首发特惠仅需5.99元

了解详情
不再提醒
插件&应用
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
高级版
套餐订阅购买积分包
AI 工具
文献检索文档翻译深度研究
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2025

一种基于语料库的处理关键单词排序问题的新型计算方法:以新冠疫情研究文章为例。

A novel corpus-based computing method for handling critical word-ranking issues: An example of COVID-19 research articles.

作者信息

Chen Liang-Ching, Chang Kuei-Hu

机构信息

Department of Foreign Languages R.O.C. Military Academy Kaohsiung Taiwan.

Institute of Education, National Sun Yat-sen University Kaohsiung Taiwan.

出版信息

Int J Intell Syst. 2021 Jul;36(7):3190-3216. doi: 10.1002/int.22413. Epub 2021 Mar 11.


DOI:10.1002/int.22413
PMID:38607844
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8207067/
Abstract

A corpus is a massive body of structured textual data that are stored and operated electronically. It usually combines with statistics, machine learning algorithms, or artificial intelligence (AI) technologies to explore the semantic relationship between lexical units, and beneficial when applied to language learning, information processing, translation, and so forth. In the face of a novel disease, like, COVID-19, establishing medical-specific corpus will enhance frontline medical personnel's information acquisition efficiency, guiding them on the right approaches to respond to and prevent the novel disease. To effectively retrieve critical messages from the corpus, appropriately handling word-ranking issues is quite crucial. However, traditional frequency-based approaches may cause bias in handling word-ranking issues because they neither optimize the corpus nor integrally take words' frequency dispersion and concentration criteria into consideration. Thus, this paper develops a novel corpus-based approach that combines a corpus software and Hirsch index (H-index) algorithm to handle the aforementioned issues simultaneously, making word-ranking processes more accurate. This paper compiled 100 COVID-19-related research articles as an empirical example of the target corpus. To verify the proposed approach, this study compared the results of two traditional frequency-based approaches and the proposed approach. The results indicate that the proposed approach can refine corpus and simultaneously compute words' frequency dispersion and concentration criteria in handling word-ranking issues.

摘要

语料库是大量以电子方式存储和操作的结构化文本数据。它通常与统计学、机器学习算法或人工智能(AI)技术相结合,以探索词汇单元之间的语义关系,并且在应用于语言学习、信息处理、翻译等方面时很有帮助。面对像COVID-19这样的新型疾病,建立特定医学语料库将提高一线医务人员的信息获取效率,指导他们采取正确的方法应对和预防这种新型疾病。为了有效地从语料库中检索关键信息,妥善处理词序问题至关重要。然而,传统的基于频率的方法在处理词序问题时可能会导致偏差,因为它们既没有优化语料库,也没有全面考虑词频的分散和集中标准。因此,本文开发了一种基于语料库的新方法,该方法结合了语料库软件和赫希指数(H指数)算法来同时处理上述问题,使词序处理过程更加准确。本文收集了100篇与COVID-19相关的研究文章作为目标语料库的实证示例。为了验证所提出的方法,本研究比较了两种传统的基于频率的方法和所提出方法的结果。结果表明,所提出的方法可以优化语料库,并在处理词序问题时同时计算词频的分散和集中标准。

相似文献

[1]
A novel corpus-based computing method for handling critical word-ranking issues: An example of COVID-19 research articles.

Int J Intell Syst. 2021-7

[2]
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.

Cochrane Database Syst Rev. 2022-2-1

[3]
Zero-inflated beta distribution applied to word frequency and lexical dispersion in corpus linguistics.

J Appl Stat. 2019-7-4

[4]
Jointly learning word embeddings using a corpus and a knowledge base.

PLoS One. 2018-3-12

[5]
A comparison of word embeddings for the biomedical natural language processing.

J Biomed Inform. 2018-9-12

[6]
Performance impact of stop lists and morphological decomposition on word-word corpus-based semantic space models.

Behav Res Methods. 2015-9

[7]
Artificial Intelligence Can Generate Fraudulent but Authentic-Looking Scientific Medical Articles: Pandora's Box Has Been Opened.

J Med Internet Res. 2023-5-31

[8]
Recognizing clinical entities in hospital discharge summaries using Structural Support Vector Machines with word representation features.

BMC Med Inform Decis Mak. 2013-4-5

[9]
Unseen but influential associates: Properties of words' associates influence lexical and semantic processing.

Psychon Bull Rev. 2024-10

[10]
A new word embedding model integrated with medical knowledge for deep learning-based sentiment classification.

Artif Intell Med. 2024-2

本文引用的文献

[1]
An overview of literature on COVID-19, MERS and SARS: Using text mining and latent Dirichlet allocation.

J Inf Sci. 2022-6

[2]
Documentary Analysis of the Scientific Literature on Autism and Technology in Web of Science.

Brain Sci. 2020-12-14

[3]
Demographic Characteristics, Experiences, and Beliefs Associated with Hand Hygiene Among Adults During the COVID-19 Pandemic - United States, June 24-30, 2020.

MMWR Morb Mortal Wkly Rep. 2020-10-16

[4]
Early Treatment of COVID-19 Disease: A Missed Opportunity.

Infect Dis Ther. 2020-12

[5]
Covid-19: Americans afraid to seek treatment because of the steep cost of their high deductible insurance plans.

BMJ. 2020-10-8

[6]
Research hotspots and trends of bone defects based on Web of Science: a bibliometric analysis.

J Orthop Surg Res. 2020-10-8

[7]
Identifying #addiction concerns on twitter during the COVID-19 pandemic: A text mining analysis.

Subst Abus. 2021

[8]
Using the bootstrapping method to verify whether hospital physicians have different h-indexes regarding individual research achievement: A bibliometric analysis.

Medicine (Baltimore). 2020-8-14

[9]
A machine learning model to identify early stage symptoms of SARS-Cov-2 infected patients.

Expert Syst Appl. 2020-12-1

[10]
Deep Sentiment Classification and Topic Discovery on Novel Coronavirus or COVID-19 Online Discussions: NLP Using LSTM Recurrent Neural Network Approach.

IEEE J Biomed Health Inform. 2020-6-9

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

推荐工具

医学文档翻译智能文献检索