Suppr超能文献

用于增强串联质谱数据中瓜氨酸化发现能力的统计建模

Statistical Modeling for Enhancing the Discovery Power of Citrullination from Tandem Mass Spectrometry Data.

作者信息

Huh Sunghyun, Hwang Daehee, Kim Min-Sik

机构信息

Department of New Biology, Daegu Gyeongbuk Institute of Science and Technology (DGIST), Daegu 42988, Republic of Korea.

School of Biological Sciences, Seoul National University, Seoul 88026, Republic of Korea.

出版信息

Anal Chem. 2020 Oct 6;92(19):12975-12986. doi: 10.1021/acs.analchem.0c01687. Epub 2020 Sep 16.

Abstract

Citrullination is a post-translational modification implicated in various human diseases including rheumatoid arthritis, Alzheimer's disease, multiple sclerosis, and cancers. Due to a relatively low concentration of citrullinated proteins in the total proteome, confident identification of citrullinated proteome is challenging in mass spectrometry (MS)-based proteomic analysis. From these MS-based analyses, MS features that characterize citrullination, such as immonium ions (IMs) and neutral losses (NLs), called diagnostic ions, have been reported. However, there has been a lack of systematic approaches to comprehensively search for diagnostic ions and no statistical methods for the identification of citrullinated proteome based on these diagnostic ions. Here, we present a systematic approach to identify diagnostic IMs, internal ions (INTs), and NLs for citrullination from tandem mass (MS/MS) spectra. Diagnostic INTs mainly consisted of internal fragment ions for di- and tripeptides that contained two and three amino acids with at least one citrullinated arginine, respectively. A statistical logistic regression model was built for a confident assessment of citrullinated peptides that database searches identified (true positives) and prediction of citrullinated peptides that database searches failed to identify (false negatives) using the diagnostic IMs, INTs, and NLs. Applications of our model to complex global proteome data sets demonstrated the increased accuracy in the identification of citrullinated peptides, thereby enhancing the size and functional interpretation of citrullinated proteomes.

摘要

瓜氨酸化是一种翻译后修饰,与包括类风湿性关节炎、阿尔茨海默病、多发性硬化症和癌症在内的多种人类疾病有关。由于瓜氨酸化蛋白在总蛋白质组中的浓度相对较低,在基于质谱(MS)的蛋白质组学分析中,可靠地鉴定瓜氨酸化蛋白质组具有挑战性。从这些基于MS的分析中,已经报道了表征瓜氨酸化的MS特征,如亚氨离子(IMs)和中性损失(NLs),称为诊断离子。然而,缺乏全面搜索诊断离子的系统方法,也没有基于这些诊断离子鉴定瓜氨酸化蛋白质组的统计方法。在这里,我们提出了一种系统方法,用于从串联质谱(MS/MS)谱中识别瓜氨酸化的诊断IMs、内部离子(INTs)和NLs。诊断INTs主要由二肽和三肽的内部碎片离子组成,这些二肽和三肽分别包含两个和三个氨基酸,且至少有一个瓜氨酸化的精氨酸。构建了一个统计逻辑回归模型,用于可靠评估数据库搜索识别出的瓜氨酸化肽段(真阳性),并使用诊断IMs、INTs和NLs预测数据库搜索未能识别出的瓜氨酸化肽段(假阴性)。我们的模型在复杂的全局蛋白质组数据集上的应用表明,瓜氨酸化肽段的识别准确性提高,从而扩大了瓜氨酸化蛋白质组的规模并增强了其功能解释。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验