Ismail Azlan, Mutalib Sofianita, Haron Haryani
Institute for Big Data Analytics and Artificial Intelligence (IBDAAI), Universiti Teknologi MARA (UiTM), 40450 Shah Alam, Selangor Malaysia.
School of Computing Sciences, College of Computing, Informatics and Media, Universiti Teknologi MARA (UiTM), 40450 Shah Alam, Selangor Malaysia.
Educ Inf Technol (Dordr). 2023 Jan 24:1-26. doi: 10.1007/s10639-022-11558-8.
This article discusses the key elements of the Data Science Technology course offered to postgraduate students enrolled in the Master of Data Science program. This course complements the existing curriculum by providing the skills to handle the Big Data platform and tools, in addition to data science activities. We tackle the discussion about this course based on three main requirements, which are related to the need to exploit the key skills from two dimensions, namely, Data Science and Big Data, and the need for a cluster-based computing platform and its accessibility. We address these requirements by presenting the course design and its assessments, the configuration of the computing platform, and the strategy to enable flexible accessibility. In terms of course design, the offered course contributes to several innovative elements and has covered multiple key areas of the data science body of knowledge and multiple quadrants of the job and skills matrix. In the case of the computing platform, a stable deployment of a Hadoop cluster with flexible accessibility, triggered by the pandemic situation, has been established. Furthermore, through our experience with the implementation of the cluster, it has shown the ability of the cluster to handle computing problems with a larger dataset than the one used for the semesters within the scope of the study. We also provide some reflections and highlight future improvements.
本文讨论了为数据科学硕士项目的研究生开设的数据科学技术课程的关键要素。该课程通过提供处理大数据平台和工具的技能,以及开展数据科学活动,对现有课程进行了补充。我们基于三个主要要求来探讨这门课程,这些要求与从数据科学和大数据这两个维度挖掘关键技能的需求,以及基于集群的计算平台及其可访问性的需求相关。我们通过介绍课程设计及其评估、计算平台的配置以及实现灵活可访问性的策略来满足这些要求。在课程设计方面,所提供的课程包含几个创新要素,涵盖了数据科学知识体系的多个关键领域以及工作和技能矩阵的多个象限。在计算平台方面,受疫情影响,已建立了一个具有灵活可访问性的稳定部署的Hadoop集群。此外,通过我们在集群实施过程中的经验,它已展示出能够处理比研究范围内学期所使用数据集更大的计算问题的能力。我们还提供了一些思考并突出了未来的改进方向。