Li Jieming, Zhang Leyou, Johnson-Buck Alexander, Walter Nils G
Bristol Myers Squibb, New Brunswick, NJ, USA.
Google, New York City, NY, USA.
Res Sq. 2024 Oct 17:rs.3.rs-4970585. doi: 10.21203/rs.3.rs-4970585/v1.
Modern data-intensive techniques offer ever deeper insights into biology, but render the process of discovery increasingly complex. For example, exploiting the unique ability of single-molecule fluorescence microscopy (SMFM). to uncover rare but critical intermediates often demands manual inspection of time traces and iterative approaches that are difficult to systematize. To facilitate systematic and efficient discovery from SMFM data, we introduce META-SiM, a transformer-based foundation model pre-trained on diverse SMFM analysis tasks. META-SiM achieves high performance-rivaling best-in-class algorithms-on a broad range of analysis tasks including trace selection, classification, segmentation, idealization, and stepwise photobleaching analysis. Additionally, the model produces high-dimensional embedding vectors that encapsulate detailed information about each trace, which the web-based META-SiM Projector (https://www.simol-projector.org) casts into lower-dimensional space for efficient whole-dataset visualization, labeling, comparison, and sharing. Combining this Projector with the objective metric of Local Shannon Entropy enables rapid identification of condition-specific behaviors, even if rare or subtle. As a result, by applying META-SiM to an existing single-molecule Förster resonance energy transfer (smFRET) dataset, we discover a previously unobserved intermediate state in pre-mRNA splicing. META-SiM thus removes bottlenecks, improves objectivity, and both systematizes and accelerates biological discovery in complex single-molecule data.
现代数据密集型技术为生物学提供了更深入的见解,但也使发现过程变得越来越复杂。例如,利用单分子荧光显微镜(SMFM)的独特能力来揭示罕见但关键的中间体,通常需要人工检查时间轨迹和难以系统化的迭代方法。为了便于从SMFM数据中进行系统和高效的发现,我们引入了META-SiM,这是一个基于变换器的基础模型,在各种SMFM分析任务上进行了预训练。META-SiM在广泛的分析任务上实现了高性能,可与一流算法相媲美,这些任务包括轨迹选择、分类、分割、理想化和逐步光漂白分析。此外,该模型还生成了高维嵌入向量,封装了每条轨迹的详细信息,基于网络的META-SiM投影仪(https://www.simol-projector.org)将这些信息投射到低维空间,以实现对整个数据集的高效可视化、标记、比较和共享。将该投影仪与局部香农熵的客观指标相结合,即使是罕见或微妙的情况,也能快速识别特定条件下的行为。因此,通过将META-SiM应用于现有的单分子荧光共振能量转移(smFRET)数据集,我们在mRNA前体剪接中发现了一个以前未观察到的中间状态。META-SiM因此消除了瓶颈,提高了客观性,并使复杂单分子数据中的生物发现系统化和加速。