Suppr超能文献

使用常见机器学习模型对灵活3D物体正确识别的稳健性进行基准测试。

Benchmarking the robustness of the correct identification of flexible 3D objects using common machine learning models.

作者信息

Zhang Yang, Vitalis Andreas

机构信息

Department of Biochemistry, University of Zurich, 8057 Zurich, Switzerland.

出版信息

Patterns (N Y). 2025 Jan 10;6(1):101147. doi: 10.1016/j.patter.2024.101147.

Abstract

True three-dimensional (3D) data are prevalent in domains such as molecular science or computer vision. In these data, machine learning models are often asked to identify objects subject to intrinsic flexibility. Our study introduces two datasets from molecular science to assess the classification robustness of common model/feature combinations. Molecules are flexible, and shapes alone offer intra-class heterogeneities that yield a high risk for confusions. By blocking training and test sets to reduce overlap, we establish a baseline requiring the trained models to abstract from shape. As training data coverage grows, all tested architectures perform better on unseen data with reduced overfitting. Empirically, 2D embeddings of voxelized data produced the best-performing models. Evidently, both featurization and task-appropriate model design are of continued importance, the latter point reinforced by comparisons to recent, more specialized models. Finally, we show that the shape abstraction learned from database samples extends to samples that are evolving explicitly in time.

摘要

真实的三维(3D)数据在分子科学或计算机视觉等领域很常见。在这些数据中,机器学习模型经常被要求识别具有内在灵活性的物体。我们的研究引入了两个来自分子科学的数据集,以评估常见模型/特征组合的分类稳健性。分子是灵活的,仅形状就会带来类内异质性,从而产生混淆的高风险。通过阻止训练集和测试集以减少重叠,我们建立了一个基线,要求训练模型从形状中抽象出来。随着训练数据覆盖范围的扩大,所有测试的架构在未见数据上表现更好,且过拟合减少。根据经验,体素化数据的二维嵌入产生了性能最佳的模型。显然,特征化和适合任务的模型设计都持续重要,通过与最近更专门的模型进行比较,后一点得到了加强。最后,我们表明从数据库样本中学到的形状抽象可以扩展到随时间明确演变的样本。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/afb0/11783895/81b482ba4870/gr1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验