Department of BioHealth Informatics, School of Informatics and Computing, Indiana University-Purdue University Indianapolis, Indianapolis, Indiana 46202, United States.
Department of Biostatistics and Health Data Sciences, Indiana University School of Medicine, Indianapolis, Indiana 46202, United States.
Anal Chem. 2023 May 30;95(21):8189-8196. doi: 10.1021/acs.analchem.2c05244. Epub 2023 May 17.
Top-down liquid chromatography-mass spectrometry (LC-MS) analyzes intact proteoforms and generates mass spectra containing peaks of proteoforms with various isotopic compositions, charge states, and retention times. An essential step in top-down MS data analysis is proteoform feature detection, which aims to group these peaks into peak sets (features), each containing all peaks of a proteoform. Accurate protein feature detection enhances the accuracy in MS-based proteoform identification and quantification. Here, we present TopFD, a software tool for top-down MS feature detection that integrates algorithms for proteoform feature detection, feature boundary refinement, and machine learning models for proteoform feature evaluation. We performed extensive benchmarking of TopFD, ProMex, FlashDeconv, and Xtract using seven top-down MS data sets and demonstrated that TopFD outperforms other tools in feature accuracy, reproducibility, and feature abundance reproducibility.
自上而下的液相色谱-质谱联用 (LC-MS) 分析完整的蛋白质形式,并生成包含具有各种同位素组成、电荷状态和保留时间的蛋白质形式的质谱峰。自上而下的 MS 数据分析的一个重要步骤是蛋白质形式特征检测,旨在将这些峰分组到峰集中(特征),每个峰集中包含一个蛋白质形式的所有峰。准确的蛋白质特征检测可提高基于 MS 的蛋白质形式鉴定和定量的准确性。在这里,我们介绍了 TopFD,这是一种用于自上而下 MS 特征检测的软件工具,它集成了用于蛋白质形式特征检测、特征边界细化以及用于蛋白质形式特征评估的机器学习模型的算法。我们使用七个自上而下的 MS 数据集对 TopFD、ProMex、FlashDeconv 和 Xtract 进行了广泛的基准测试,并证明 TopFD 在特征准确性、可重复性和特征丰度可重复性方面优于其他工具。