Suppr超能文献

基于生成对抗网络(GAN)和视觉Transformer构建多模态数字人教育平台。

Construction of a multi-modal digital human education platform based on GAN and vision transformer.

作者信息

Yang Xuliang, Pan Aimin, Raga Rodolfo C

机构信息

University and Urban Integration Development Research Center, Dongguan City University, 523419, Dongguan, China.

College of Computing and Information Technologies, National University-Manila, Manila, 1008, Philippines.

出版信息

Sci Rep. 2025 Apr 28;15(1):14850. doi: 10.1038/s41598-025-97662-4.

Abstract

With the rapid development of artificial intelligence technology, digital human education platforms have become a research hotspot in education. This paper proposes a method to build a multi-modal digital human education platform based on a Generative Adversarial Network and a Vision Transformer. The platform enables high-quality avatar generation and interactive learning experiences. In the experimental part, we construct a large-scale dataset containing 1000 students and 50 teachers to evaluate the performance of the proposed method. The experimental results show that the proposed method has significantly improved avatars' authenticity, interaction response speed, and learning effect by comparing them with existing digital human education platforms. Specifically, the average recognition accuracy of avatars has increased by 12%, the interaction response time has been shortened by 25%, and students' academic performance has increased by 8% on average. This shows that the multi-modal digital human education platform based on GAN and ViT has excellent application potential and can provide new solutions for future education models.

摘要

随着人工智能技术的快速发展,数字人教育平台已成为教育领域的研究热点。本文提出了一种基于生成对抗网络和视觉Transformer构建多模态数字人教育平台的方法。该平台能够实现高质量的虚拟形象生成和交互式学习体验。在实验部分,我们构建了一个包含1000名学生和50名教师的大规模数据集,以评估所提方法的性能。实验结果表明,通过与现有的数字人教育平台进行比较,所提方法在虚拟形象的真实性、交互响应速度和学习效果方面有显著提升。具体而言,虚拟形象的平均识别准确率提高了12%,交互响应时间缩短了25%,学生的学业成绩平均提高了8%。这表明基于GAN和ViT的多模态数字人教育平台具有优异的应用潜力,可为未来教育模式提供新的解决方案。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f64/12037867/f69ef31abe9b/41598_2025_97662_Fig1_HTML.jpg

相似文献

1
Construction of a multi-modal digital human education platform based on GAN and vision transformer.
Sci Rep. 2025 Apr 28;15(1):14850. doi: 10.1038/s41598-025-97662-4.
2
Tabular transformer generative adversarial network for heterogeneous distribution in healthcare.
Sci Rep. 2025 Mar 25;15(1):10254. doi: 10.1038/s41598-025-93077-3.
3
A Study on Cross-Media Teaching Model for College English Classroom Based on Output-Driven Hypothetical Neural Network.
Comput Intell Neurosci. 2022 May 9;2022:5283439. doi: 10.1155/2022/5283439. eCollection 2022.
4
Multimodal English Teaching Classroom Interaction Based on Artificial Neural Network.
Comput Intell Neurosci. 2022 May 28;2022:3141451. doi: 10.1155/2022/3141451. eCollection 2022.
6
CrimeNet: Neural Structured Learning using Vision Transformer for violence detection.
Neural Netw. 2023 Apr;161:318-329. doi: 10.1016/j.neunet.2023.01.048. Epub 2023 Feb 2.
8
SwinCross: Cross-modal Swin transformer for head-and-neck tumor segmentation in PET/CT images.
Med Phys. 2024 Mar;51(3):2096-2107. doi: 10.1002/mp.16703. Epub 2023 Sep 30.
9
Automated multi-modal Transformer network (AMTNet) for 3D medical images segmentation.
Phys Med Biol. 2023 Jan 9;68(2). doi: 10.1088/1361-6560/aca74c.
10
CM-GAN: A Cross-Modal Generative Adversarial Network for Imputing Completely Missing Data in Digital Industry.
IEEE Trans Neural Netw Learn Syst. 2024 Mar;35(3):2917-2926. doi: 10.1109/TNNLS.2023.3284666. Epub 2024 Feb 29.

本文引用的文献

1
A client-server based recognition system: Non-contact single/multiple emotional and behavioral state assessment methods.
Comput Methods Programs Biomed. 2025 Mar;260:108564. doi: 10.1016/j.cmpb.2024.108564. Epub 2024 Dec 24.
2
Study of Subjective and Objective Quality Assessment of Audio-Visual Signals.
IEEE Trans Image Process. 2020 Apr 21. doi: 10.1109/TIP.2020.2988148.
3
A Multimodal Saliency Model for Videos with High Audio-Visual Correspondence.
IEEE Trans Image Process. 2020 Jan 17. doi: 10.1109/TIP.2020.2966082.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验