Keng Mithony, Merz Kenneth M
Department of Chemistry, Michigan State University, East Lansing, Michigan 48824, United States.
Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, Michigan 48824, United States.
J Chem Inf Model. 2025 Jul 28;65(14):7507-7515. doi: 10.1021/acs.jcim.5c00980. Epub 2025 Jul 8.
Accurately resolving a three-dimensional structure that corresponds to an experimental mass spectrometry (MS) result is valuable for outcomes such as improved analyte identification, determination of physiochemical properties relating to conformation, analyte impurity testing, and drug chemical integrity analysis. Computational approaches utilizing charge state modeling, conformational sampling, quantum mechanical optimizations, relative energy scoring, and computed ion-neutral collision cross sections (CCS) have historically achieved success at assigning equilibrium structures to ion-mobility MS-derived CCS values. Despite this positive status, there remains a lack of new computational software to achieve higher throughput when modeling large systems. A major adverse impact on computational cost is the general increase in titratable sites with molecular size, which then warrants additional protonation/deprotonation models in order to ensure that the correct charge state is captured. Here, we introduce a user-friendly machine learning program called SEER (tate nsemble nergy ecognition) to accurately and efficiently predict the equilibrium charge states of MS-relevant ions. We report that for all systems within the test set, SEER successfully captured the lowest relative energy minimum charge states within its top two predicted candidates from an overall average number of ∼ seven titratable sites. Furthermore, the density functional theory optimized geometries for SEER assigned charge states produced CCS experimental errors that are within the acceptable threshold (i.e., ≤3% error) set for this work. The benchmark study compared SEER to two well-established charge state prediction software packages CREST and Epik classic and found that SEER is either on par or better at consistently locating the correct charge states for the test set with competitive efficiency. SEER requires no additional user programming and is readily accessible through the Google Colab platform at https://github.com/mitkeng/SEER.
准确解析与实验质谱(MS)结果相对应的三维结构,对于诸如改进分析物鉴定、确定与构象相关的物理化学性质、分析物杂质测试以及药物化学完整性分析等结果具有重要价值。利用电荷态建模、构象采样、量子力学优化、相对能量评分和计算离子-中性碰撞截面(CCS)的计算方法,在将平衡结构分配给离子迁移率MS衍生的CCS值方面取得了成功。尽管有这种积极的进展,但在对大型系统进行建模时,仍然缺乏能够实现更高通量的新计算软件。分子大小的增加通常会导致可滴定位点增加,这对计算成本产生重大不利影响,进而需要额外的质子化/去质子化模型,以确保捕获正确的电荷态。在此,我们引入了一个名为SEER(状态集合能量识别)的用户友好型机器学习程序,以准确、高效地预测与MS相关离子的平衡电荷态。我们报告称,对于测试集中的所有系统,SEER从约七个可滴定位点的总体平均数中成功捕获了其前两个预测候选物中相对能量最低的最小电荷态。此外,SEER分配电荷态后经密度泛函理论优化的几何结构产生的CCS实验误差在本工作设定的可接受阈值内(即误差≤3%)。基准研究将SEER与两个成熟的电荷态预测软件包CREST和Epik classic进行了比较,发现SEER在以具有竞争力的效率持续定位测试集的正确电荷态方面与它们相当或更好。SEER不需要额外的用户编程,可通过https://github.com/mitkeng/SEER上的谷歌Colab平台轻松访问。