Suppr超能文献

设计包含代码片段问题的自动分类器的探索——以甲骨文SQL认证考试问题为例

Exploration of designing an automatic classifier for questions containing code snippets-A case study of Oracle SQL certification exam questions.

作者信息

Chen Hung-Yi, Shih Po-Chou, Wang Yunsen

机构信息

Department of Information Management, Chaoyang University of Technology, Taichung, Taiwan.

Department of Industrial Engineering and Management, National Yunlin University of Science and Technology, Yunlin, Taiwan.

出版信息

PLoS One. 2025 Jan 9;20(1):e0309050. doi: 10.1371/journal.pone.0309050. eCollection 2025.

Abstract

This study uses the Oracle SQL certification exam questions to explore the design of automatic classifiers for exam questions containing code snippets. SQL's question classification assigns a class label in the exam topics to a question. With this classification, questions can be selected from the test bank according to the testing scope to assemble a more suitable test paper. Classifying questions containing code snippets is more challenging than classifying questions with general text descriptions. In this study, we use factorial experiments to identify the effects of the factors of the feature representation scheme and the machine learning method on the performance of the question classifiers. Our experiment results showed the classifier with the TF-IDF scheme and Logistics Regression model performed best in the weighted macro-average AUC and F1 performance indices. The classifier with TF-IDF and Support Vector Machine performed best in weighted macro-average Precision. Moreover, the feature representation scheme was the main factor affecting the classifier's performance, followed by the machine learning method, over all the performance indices.

摘要

本研究使用甲骨文SQL认证考试题目来探索针对包含代码片段的考试题目设计自动分类器。SQL的题目分类会为一道题目在考试主题中分配一个类别标签。通过这种分类,可以根据测试范围从题库中选择题目,以组装出更合适的试卷。对包含代码片段的题目进行分类比分类具有一般文本描述的题目更具挑战性。在本研究中,我们使用析因实验来确定特征表示方案和机器学习方法的因素对题目分类器性能的影响。我们的实验结果表明,具有TF-IDF方案和逻辑回归模型的分类器在加权宏平均AUC和F1性能指标方面表现最佳。具有TF-IDF和支持向量机的分类器在加权宏平均精确率方面表现最佳。此外,在所有性能指标中,特征表示方案是影响分类器性能的主要因素,其次是机器学习方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/10c5/11717213/90a03f551a47/pone.0309050.g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验