Heredia Perez Saul Alexis, Lok Tze Lun, Zhao Enduo, Harada Kanako
Graduate School of Medicine, The University of Tokyo, Hongo 7-3-1, Bunkyo City, 113-8654, Tokyo, Japan.
Graduate School of Engineering, The University of Tokyo, Hongo 7-3-1, Bunkyo City, 113-8654, Tokyo, Japan.
Int J Comput Assist Radiol Surg. 2025 Jun 26. doi: 10.1007/s11548-025-03465-3.
To support research on autonomous robotic micro-drilling for cranial window creation in mice, a multimodal digital twin (DT) is developed to generate realistic synthetic images and drilling sounds. The realism of the DT is evaluated using data from an eggshell drilling scenario, demonstrating its potential for training AI models with multimodal synthetic data.
The asynchronous multi-body framework (AMBF) simulator for volumetric drilling with haptic feedback is combined with the Isaac Sim simulator for photorealistic rendering. A deep audio generator (DAG) model is presented and its realism is evaluated on real drilling sounds. A convolutional neural network (CNN) trained on synthetic images is used to assess visual realism by detecting drilling areas in real eggshell images. Finally, the accuracy of the DT is evaluated by experiments on a real eggshell.
The DAG model outperformed pitch modulation methods, achieving lower Frechet audio distance (FAD) and Frechet inception distance (FID) scores, demonstrating a closer resemblance to real drilling sounds. The CNN trained on synthetic images achieved a mean average precision (mAP) of 70.2 when tested on real drilling images. The DT had an alignment error of 0.22 ± 0.03 mm.
A multimodal DT has been developed to simulate the creation of the cranial window on an eggshell model and its realism has been evaluated. The results indicate a high degree of realism in both the synthetic audio and images and submillimeter accuracy.
为支持小鼠颅骨开窗自主机器人微钻孔研究,开发了一种多模态数字孪生(DT)以生成逼真的合成图像和钻孔声音。使用蛋壳钻孔场景中的数据评估DT的逼真度,证明其使用多模态合成数据训练人工智能模型的潜力。
将用于具有触觉反馈的体积钻孔的异步多体框架(AMBF)模拟器与用于逼真渲染的Isaac Sim模拟器相结合。提出了一种深度音频生成器(DAG)模型,并在真实钻孔声音上评估其逼真度。在合成图像上训练的卷积神经网络(CNN)用于通过检测真实蛋壳图像中的钻孔区域来评估视觉逼真度。最后,通过在真实蛋壳上进行实验来评估DT的准确性。
DAG模型优于音高调制方法,实现了更低的弗雷歇音频距离(FAD)和弗雷歇初始距离(FID)分数,表明与真实钻孔声音更相似。在合成图像上训练的CNN在真实钻孔图像上测试时的平均平均精度(mAP)为70.2。DT的对准误差为0.22±0.03毫米。
已开发出一种多模态DT来模拟蛋壳模型上颅骨开窗的创建,并评估了其逼真度。结果表明合成音频和图像均具有高度逼真度且精度达到亚毫米级。