Seifert Tim J, Stritzke Mandy, Kasten Peer, Möller Björn, Fingscheidt Tim, Etzkorn Markus, de Wolff Timo, Schlickum Uta
Institute of Applied Physics, TU Braunschweig, 38106, Braunschweig, Germany.
Institute of Analysis and Algebra, TU Braunschweig, 38106, Braunschweig, Germany.
Small Methods. 2024 Dec;8(12):e2400549. doi: 10.1002/smtd.202400549. Epub 2024 Sep 9.
Enantiospecific effects play an uprising role in chemistry and technical applications. Chiral molecular networks formed by self-assembly processes at surfaces can be imaged by scanning probe microscopy (SPM). Low contrast and high noise in the topography map often interfere with the automatic image analysis using classical methods. The long SPM image acquisition times restrain Artificial Intelligence-based methods requiring large training sets, leaving only tedious manual work, inducing human-dependent errors and biased labeling. By generating realistic looking synthetic images, the acquisition of real datasets is avoided. Two state-of-the-art object detection architectures are trained to localize and classify chiral unit-cells in a regular molecular chiral network formed by self-assembly of linear molecular bricks. The comparison of different architectures and datasets demonstrates that the training on purely synthetic data outperforms models trained using augmented datasets. A Faster R-CNN model trained solely on synthetic data achieved an excellent mean average precision of 99% on real data. Hence this approach and the transfer to real data show high success, also highlighting the high robustness against experimental noise and different zoom levels across the full experimentally reasonable parameter range. The generalizability of this idea is demonstrated by achieving equally high performance on a different structure, too.
对映体特异性效应在化学和技术应用中发挥着越来越重要的作用。通过表面自组装过程形成的手性分子网络可以用扫描探针显微镜(SPM)成像。地形图中的低对比度和高噪声常常干扰使用传统方法的自动图像分析。SPM图像采集时间长限制了基于人工智能的方法,这些方法需要大量训练集,只能进行繁琐的人工操作,会导致人为误差和有偏差的标注。通过生成逼真的合成图像,避免了真实数据集的采集。训练了两种最先进的目标检测架构,以定位和分类由线性分子砖自组装形成的规则分子手性网络中的手性晶胞。不同架构和数据集的比较表明,在纯合成数据上的训练优于使用增强数据集训练的模型。仅在合成数据上训练的Faster R-CNN模型在真实数据上实现了99%的出色平均精度。因此,这种方法以及向真实数据的迁移显示出很高的成功率,也突出了在整个实验合理参数范围内对实验噪声和不同缩放级别具有高鲁棒性。通过在不同结构上也实现同样高的性能,证明了这一想法的通用性。