Tajrian Mehedi, Rahman Azizur, Kabir Muhammad Ashad, Islam Md Rafiqul
School of Computing, Mathematics and Engineering, Charles Sturt University, NSW, Australia.
Heliyon. 2024 Aug 23;10(17):e36652. doi: 10.1016/j.heliyon.2024.e36652. eCollection 2024 Sep 15.
The rapid dissemination of misinformation on the internet complicates the decision-making process for individuals seeking reliable information, particularly parents researching child development topics. This misinformation can lead to adverse consequences, such as inappropriate treatment of children based on myths. While previous research has utilized text-mining techniques to predict child abuse cases, there has been a gap in the analysis of child development myths and facts. This study addresses this gap by applying text mining techniques and classification models to distinguish between myths and facts about child development, leveraging newly gathered data from publicly available websites. The research methodology involved several stages. First, text mining techniques were employed to pre-process the data, ensuring enhanced accuracy. Subsequently, the structured data was analysed using six robust Machine Learning (ML) classifiers and one Deep Learning (DL) model, with two feature extraction techniques applied to assess their performance across three different training-testing splits. To ensure the reliability of the results, cross-validation was performed using both k-fold and leave-one-out methods. Among the classification models tested, Logistic Regression (LR) demonstrated the highest accuracy, achieving a 90 % accuracy with the Bag-of-Words (BoW) feature extraction technique. LR stands out for its exceptional speed and efficiency, maintaining low testing time per statement (0.97 μs). These findings suggest that LR, when combined with BoW, is effective in accurately classifying child development information, thus providing a valuable tool for combating misinformation and assisting parents in making informed decisions.
互联网上错误信息的迅速传播使寻求可靠信息的个人的决策过程变得复杂,尤其是那些研究儿童发育主题的家长。这种错误信息可能会导致不良后果,比如基于错误观念对儿童进行不恰当的治疗。虽然此前的研究利用文本挖掘技术来预测虐待儿童的案例,但在分析儿童发育的错误观念和事实方面存在空白。本研究通过应用文本挖掘技术和分类模型来区分有关儿童发育的错误观念和事实,填补了这一空白,利用从公开网站新收集的数据。研究方法包括几个阶段。首先,采用文本挖掘技术对数据进行预处理,以确保提高准确性。随后,使用六个强大的机器学习(ML)分类器和一个深度学习(DL)模型对结构化数据进行分析,并应用两种特征提取技术来评估它们在三种不同训练-测试划分中的性能。为确保结果的可靠性,使用k折交叉验证和留一法进行交叉验证。在所测试的分类模型中,逻辑回归(LR)表现出最高的准确率,使用词袋(BoW)特征提取技术时达到了90%的准确率。LR因其卓越的速度和效率而脱颖而出,每条语句的测试时间保持在较低水平(0.97微秒)。这些发现表明,LR与BoW相结合时,能有效地准确分类儿童发育信息,从而为打击错误信息和帮助家长做出明智决策提供了一个有价值的工具。