School of Foreign Languages, Henan Polytechnic University, Jiaozuo 454003, Henan Province, China.
Comput Intell Neurosci. 2022 Jun 2;2022:9773452. doi: 10.1155/2022/9773452. eCollection 2022.
In China, the application of corpus in language teaching, especially in English and American literature teaching, is still in the preliminary research stage, and there are various shortcomings, which have not been paid due attention by front-line educators. Constructing English and American literature corpus according to certain principles can effectively promote English and American literature teaching. The research of this paper is devoted to how to automatically build a corpus of English and American literature. In the process of keyword extraction, key phrases and keywords are effectively combined. The similarity between atomic events is calculated by the TextRank algorithm, and then the first sentences with high similarity are selected and sorted. Based on ML (machine learning) text classification method, a combined classifier based on SVM (support vector machine) and NB (Naive Bayes) is proposed. The experimental results show that, from the point of view of accuracy and recall, the classification effect of the combined algorithm proposed in this paper is the best among the three methods. The best classification results of accuracy, recall, and value are 0.87, 0.9, and 0.89, respectively. Experimental results show that this method can quickly, accurately, and persistently obtain high-quality bilingual mixed web pages.
在中国,语料库在语言教学中的应用,尤其是在英语和美国文学教学中的应用,仍处于初步研究阶段,存在各种不足,没有得到一线教育工作者的应有关注。按照一定的原则构建英语美国文学语料库,可以有效地促进英语美国文学教学。本文的研究致力于如何自动构建英语美国文学语料库。在关键词提取过程中,有效结合了关键短语和关键词。通过 TextRank 算法计算原子事件之间的相似度,然后选择并排序相似度高的前 句子。基于 ML(机器学习)文本分类方法,提出了一种基于 SVM(支持向量机)和 NB(朴素贝叶斯)的组合分类器。实验结果表明,从准确性和召回率的角度来看,本文提出的组合算法的分类效果在三种方法中最好。精度、召回率和 F 值的最佳分类结果分别为 0.87、0.9 和 0.89。实验结果表明,该方法能够快速、准确、持续地获取高质量的双语混合网页。