Duru Ismail, Sunar Ayse Saliha, White Su, Diri Banu
Department of Software Engineering, Istanbul Sabahattin Zaim University, 34303 Istanbul, Turkey.
Department of Computer Engineering, Bitlis Eren University, 13000 Bitlis, Turkey.
Arab J Sci Eng. 2021;46(4):3613-3629. doi: 10.1007/s13369-020-05117-x. Epub 2021 Jan 6.
Analysing learners' behaviours in MOOCs has been used to identify predictive features associated with positive outcomes in engagement and learning success. Early methods predominantly analysed numerical features of behaviours such as the page views, video views, and assessment grades. Analysing extracted numeric features using baseline machine learning algorithms performed well to predict the learners' future performance in MOOCs. We propose categorising learners by likely English language proficiency and extending the range of data to include the content of comment texts. We compare results to a model trained with a combined set of extracted features. Not all platforms provide this rich variety of data. We analysed a series of a FutureLearn language focused MOOCs. Our data were from discussions embedded into each lesson's content. Analysing whether we gained any additional insights, over 420,000 comments were used to train the algorithm. We created a method for identifying one's possible first language from their country. We found that using comments alone is a weaker predictive approach than using a combination including extracted features from learners' activities. Our study contributes to research on generalisability of learning algorithms. We replicated the method across different MOOCs-the performance varies on the model though it always remained over 50%. One of the deep learning architecture, Bidirectional LSTM, trained with discussions on the language learning 73% successfully predicted learners' performance on a different MOOC.
分析学习者在大规模开放在线课程(MOOC)中的行为,已被用于识别与参与度和学习成功的积极成果相关的预测特征。早期方法主要分析行为的数值特征,如页面浏览量、视频观看量和评估成绩。使用基线机器学习算法分析提取的数值特征,在预测学习者在MOOC中的未来表现方面表现良好。我们建议根据可能的英语语言能力对学习者进行分类,并扩展数据范围以包括评论内容文本。我们将结果与使用一组提取特征组合训练的模型进行比较。并非所有平台都提供这种丰富多样的数据。我们分析了一系列FutureLearn语言聚焦的MOOC。我们的数据来自嵌入每节课内容中的讨论。为了分析我们是否获得了任何额外的见解,我们使用了超过42万条评论来训练算法。我们创建了一种从学习者所在国家识别其可能的母语的方法。我们发现,仅使用评论作为预测方法比使用包括学习者活动提取特征的组合方法更弱。我们的研究有助于学习算法通用性的研究。我们在不同的MOOC中复制了该方法——尽管模型的性能各不相同,但始终保持在50%以上。其中一种深度学习架构,双向长短期记忆网络(Bidirectional LSTM),通过对语言学习的讨论进行训练,成功地在另一门MOOC中73%地预测了学习者的表现。