Suppr超能文献

一个用于技术教育中文本情感分析的细粒度标记数据集。

A fine-grained labeled dataset for textual sentiment analysis in technical education.

作者信息

Singh Manoj, Panwar Subhash, Choudhary Sanju

机构信息

CSE & IT, Government Engineering College, Bikaner, India.

EE, SRMIST, Ghaziabad, India.

出版信息

Data Brief. 2024 Nov 9;57:111120. doi: 10.1016/j.dib.2024.111120. eCollection 2024 Dec.

Abstract

This paper presents a meticulously curated dataset tailored for textual sentiment analysis within the realm of technical education, falling under the domain of Natural Language Processing and Pattern Recognition. The dataset, crafted in collaboration with the All India Council for Technical Education (AICTE), encompasses over 14,000 records manually entered by representatives from technical institutes across India over the course of one year. The data, hosted on AICTE's in-house servers, has been categorized into seven distinct labels, including Appreciation, Complaint, Support, Suggestion, among others. Through a detailed data collection process facilitated by an online application, this dataset serves as a cornerstone for sentiment analysis within the domain of technical education. Notably, it is the first publicly available dataset of its kind, providing a rich resource for evaluating existing models and fostering the development of novel ones. The dataset, consisting of 14,272 records, is further enhanced with classification into 10 distinct modules, offering a nuanced understanding across various aspects of technical education. The paper outlines the experimental design, materials, and methods employed in the data collection process, along with its limitations and ethical considerations. Additionally, the paper acknowledges the contributions of the All India Council for Technical Education in facilitating the data collection process. This dataset holds significant value in advancing research and applications in sentiment analysis and related fields within the domain of technical education. Approximately 10,000 technical institutions in India operate under the jurisdiction of the All India Council for Technical Education (AICTE). To gather comprehensive data, an extensive online application with ten modules has been devised and distributed to all 10,000 institutions. Over the course of a year, labeled data has been systematically collected from these technical institutions, categorizable into seven distinct types such as Appreciation, Complaint, Support, Suggestion, and more. This rich dataset holds significant potential for applications in deep learning, including sentiment analysis and classification problems. The data, meticulously entered by representatives from these technical institutions, is stored in a highly accurate and systematic process. This dataset, characterized by its precision and reliability, stands as an excellent resource for both training and testing purposes in various deep-learning models. Its suitability extends to applications such as sentiment analysis, where the quality and authenticity of the data are crucial for robust model development. As best of our knowledge this is first kind of dataset in domain of technical education with 14,000 + samples. Very few Multiclass Multi-labeled dataset are available therefore this dataset is very much useful in applications of Natural Language Processing like Sentiment Analysis etc.

摘要

本文展示了一个精心策划的数据集,该数据集专为技术教育领域的文本情感分析量身定制,属于自然语言处理和模式识别领域。该数据集是与全印度技术教育理事会(AICTE)合作创建的,包含了印度各地技术院校代表在一年时间内手动录入的超过14000条记录。这些数据存储在AICTE的内部服务器上,已被分类为七个不同的标签,包括赞赏、投诉、支持、建议等。通过一个在线应用程序推动的详细数据收集过程,这个数据集成为了技术教育领域情感分析的基石。值得注意的是,它是同类首个公开可用的数据集,为评估现有模型和促进新模型的开发提供了丰富的资源。该数据集由14272条记录组成,进一步细分为10个不同的模块,能让人对技术教育的各个方面有更细致入微的理解。本文概述了数据收集过程中采用的实验设计、材料和方法,以及其局限性和伦理考量。此外,本文认可了全印度技术教育理事会在推动数据收集过程中所做的贡献。这个数据集在推进技术教育领域情感分析及相关领域的研究和应用方面具有重要价值。印度约有10000所技术院校在全印度技术教育理事会(AICTE)的管辖之下。为了收集全面的数据,已设计并向所有10000所院校分发了一个包含十个模块的广泛在线应用程序。在一年的时间里,已从这些技术院校系统地收集了带标签的数据,可分为七个不同类型,如赞赏、投诉、支持、建议等等。这个丰富的数据集在深度学习应用方面具有巨大潜力,包括情感分析和分类问题。这些数据由这些技术院校的代表精心录入,存储过程高度准确且系统。这个以精确性和可靠性为特点的数据集,是各种深度学习模型训练和测试的优秀资源。它适用于情感分析等应用,在这些应用中,数据的质量和真实性对于强大的模型开发至关重要。据我们所知,这是技术教育领域首个拥有14000多个样本的此类数据集。多类多标签数据集非常少,因此这个数据集在自然语言处理应用(如情感分析等)中非常有用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/46ba/11617942/06eae7f7f99d/gr1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验