• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于机器学习算法的英美文学语料库构建。

Construction of English and American Literature Corpus Based on Machine Learning Algorithm.

机构信息

School of Foreign Languages, Henan Polytechnic University, Jiaozuo 454003, Henan Province, China.

出版信息

Comput Intell Neurosci. 2022 Jun 2;2022:9773452. doi: 10.1155/2022/9773452. eCollection 2022.

DOI:10.1155/2022/9773452
PMID:35694598
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9184167/
Abstract

In China, the application of corpus in language teaching, especially in English and American literature teaching, is still in the preliminary research stage, and there are various shortcomings, which have not been paid due attention by front-line educators. Constructing English and American literature corpus according to certain principles can effectively promote English and American literature teaching. The research of this paper is devoted to how to automatically build a corpus of English and American literature. In the process of keyword extraction, key phrases and keywords are effectively combined. The similarity between atomic events is calculated by the TextRank algorithm, and then the first sentences with high similarity are selected and sorted. Based on ML (machine learning) text classification method, a combined classifier based on SVM (support vector machine) and NB (Naive Bayes) is proposed. The experimental results show that, from the point of view of accuracy and recall, the classification effect of the combined algorithm proposed in this paper is the best among the three methods. The best classification results of accuracy, recall, and value are 0.87, 0.9, and 0.89, respectively. Experimental results show that this method can quickly, accurately, and persistently obtain high-quality bilingual mixed web pages.

摘要

在中国,语料库在语言教学中的应用,尤其是在英语和美国文学教学中的应用,仍处于初步研究阶段,存在各种不足,没有得到一线教育工作者的应有关注。按照一定的原则构建英语美国文学语料库,可以有效地促进英语美国文学教学。本文的研究致力于如何自动构建英语美国文学语料库。在关键词提取过程中,有效结合了关键短语和关键词。通过 TextRank 算法计算原子事件之间的相似度,然后选择并排序相似度高的前 句子。基于 ML(机器学习)文本分类方法,提出了一种基于 SVM(支持向量机)和 NB(朴素贝叶斯)的组合分类器。实验结果表明,从准确性和召回率的角度来看,本文提出的组合算法的分类效果在三种方法中最好。精度、召回率和 F 值的最佳分类结果分别为 0.87、0.9 和 0.89。实验结果表明,该方法能够快速、准确、持续地获取高质量的双语混合网页。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/85a1/9184167/34008b9a254b/CIN2022-9773452.008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/85a1/9184167/b13f9176cf61/CIN2022-9773452.001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/85a1/9184167/b69dd5a88940/CIN2022-9773452.002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/85a1/9184167/d58731c1156f/CIN2022-9773452.003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/85a1/9184167/94a2f22b6714/CIN2022-9773452.004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/85a1/9184167/4e3f1d981cd0/CIN2022-9773452.005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/85a1/9184167/183239f20ec4/CIN2022-9773452.006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/85a1/9184167/0f3c0bb79d46/CIN2022-9773452.007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/85a1/9184167/34008b9a254b/CIN2022-9773452.008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/85a1/9184167/b13f9176cf61/CIN2022-9773452.001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/85a1/9184167/b69dd5a88940/CIN2022-9773452.002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/85a1/9184167/d58731c1156f/CIN2022-9773452.003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/85a1/9184167/94a2f22b6714/CIN2022-9773452.004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/85a1/9184167/4e3f1d981cd0/CIN2022-9773452.005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/85a1/9184167/183239f20ec4/CIN2022-9773452.006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/85a1/9184167/0f3c0bb79d46/CIN2022-9773452.007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/85a1/9184167/34008b9a254b/CIN2022-9773452.008.jpg

相似文献

1
Construction of English and American Literature Corpus Based on Machine Learning Algorithm.基于机器学习算法的英美文学语料库构建。
Comput Intell Neurosci. 2022 Jun 2;2022:9773452. doi: 10.1155/2022/9773452. eCollection 2022.
2
Machine Learning-Based Intelligent Scoring of College English Teaching in the Field of Natural Language Processing.基于机器学习的自然语言处理领域下的大学英语教学智能评分。
Comput Intell Neurosci. 2022 Aug 4;2022:2754626. doi: 10.1155/2022/2754626. eCollection 2022.
3
Predicting Chronic Kidney Disease Using Hybrid Machine Learning Based on Apache Spark.基于 Apache Spark 的混合机器学习预测慢性肾脏病。
Comput Intell Neurosci. 2022 Feb 23;2022:9898831. doi: 10.1155/2022/9898831. eCollection 2022.
4
Natural language processing and machine learning to enable automatic extraction and classification of patients' smoking status from electronic medical records.自然语言处理和机器学习可实现从电子病历中自动提取和分类患者的吸烟状况。
Ups J Med Sci. 2020 Nov;125(4):316-324. doi: 10.1080/03009734.2020.1792010. Epub 2020 Jul 22.
5
AI-based disease category prediction model using symptoms from low-resource Ethiopian language: Afaan Oromo text.基于人工智能的疾病类别预测模型,利用来自资源匮乏的埃塞俄比亚语言(阿法尔语)的症状文本。
Sci Rep. 2024 May 16;14(1):11233. doi: 10.1038/s41598-024-62278-7.
6
Development and evaluation of RapTAT: a machine learning system for concept mapping of phrases from medical narratives.开发和评估 RapTAT:一种用于从医学叙述中映射短语概念的机器学习系统。
J Biomed Inform. 2014 Apr;48:54-65. doi: 10.1016/j.jbi.2013.11.008. Epub 2013 Dec 4.
7
Soft Clustering for Enhancing the Diagnosis of Chronic Diseases over Machine Learning Algorithms.基于机器学习算法的软聚类在慢性病诊断中的应用。
J Healthc Eng. 2020 Mar 9;2020:4984967. doi: 10.1155/2020/4984967. eCollection 2020.
8
Improve hot region prediction by analyzing different machine learning algorithms.通过分析不同的机器学习算法来提高热点区域预测。
BMC Bioinformatics. 2021 Oct 25;22(Suppl 3):522. doi: 10.1186/s12859-021-04420-0.
9
Evaluation of College English Teaching Quality Based on Improved BT-SVM Algorithm.基于改进 BT-SVM 算法的大学英语教学质量评估。
Comput Intell Neurosci. 2022 Aug 19;2022:2974813. doi: 10.1155/2022/2974813. eCollection 2022.
10
Comparison of supervised machine learning classification techniques in prediction of locoregional recurrences in early oral tongue cancer.比较早期口腔舌癌局部区域复发预测中监督机器学习分类技术。
Int J Med Inform. 2020 Apr;136:104068. doi: 10.1016/j.ijmedinf.2019.104068. Epub 2019 Dec 28.

本文引用的文献

1
POCASUM : Policy Categorizer and Summarizer Based on Text Mining and Machine Learning.POCASUM:基于文本挖掘和机器学习的政策分类器与摘要器
Soft comput. 2021 Jul;25(14):9365-9375. doi: 10.1007/s00500-021-05916-w. Epub 2021 Jun 11.
2
Predicting High Imaging Utilization Based on Initial Radiology Reports: A Feasibility Study of Machine Learning.基于初始放射学报告预测高成像利用率:机器学习的可行性研究
Acad Radiol. 2016 Jan;23(1):84-9. doi: 10.1016/j.acra.2015.09.014. Epub 2015 Oct 27.
3
Semi-Supervised Text Classification With Universum Learning.
基于全集学习的半监督文本分类
IEEE Trans Cybern. 2016 Feb;46(2):462-73. doi: 10.1109/TCYB.2015.2403573. Epub 2015 Feb 27.