Center for Molecular Medicine, National Institute of Mental Health and Neurosciences, Hosur Road, Bangalore, 560029, Karnataka, India.
Institute of Bioinformatics, International Technology Park, Bangalore, 560066, Karnataka, India.
Proteomics. 2019 Aug;19(15):e1800315. doi: 10.1002/pmic.201800315. Epub 2019 Jun 26.
Understanding the molecular profile of every human cell type is essential for understanding its role in normal physiology and disease. Technological advancements in DNA sequencing, mass spectrometry, and computational methods allow us to carry out multiomics analyses although such approaches are not routine yet. Human umbilical vein endothelial cells (HUVECs) are a widely used model system to study pathological and physiological processes associated with the cardiovascular system. In this study, next-generation sequencing and high-resolution mass spectrometry to profile the transcriptome and proteome of primary HUVECs is employed. Analysis of 145 million paired-end reads from next-generation sequencing confirmed expression of 12 186 protein-coding genes (FPKM ≥0.1), 439 novel long non-coding RNAs, and revealed 6089 novel isoforms that were not annotated in GENCODE. Proteomics analysis identifies 6477 proteins including confirmation of N-termini for 1091 proteins, isoforms for 149 proteins, and 1034 phosphosites. A database search to specifically identify other post-translational modifications provide evidence for a number of modification sites on 117 proteins which include ubiquitylation, lysine acetylation, and mono-, di- and tri-methylation events. Evidence for 11 "missing proteins," which are proteins for which there was insufficient or no protein level evidence, is provided. Peptides supporting missing protein and novel events are validated by comparison of MS/MS fragmentation patterns with synthetic peptides. Finally, 245 variant peptides derived from 207 expressed proteins in addition to alternate translational start sites for seven proteins and evidence for novel proteoforms for five proteins resulting from alternative splicing are identified. Overall, it is believed that the integrated approach employed in this study is widely applicable to study any primary cell type for deeper molecular characterization.
了解每种人类细胞类型的分子特征对于理解其在正常生理和疾病中的作用至关重要。尽管此类方法尚未常规应用,但 DNA 测序、质谱和计算方法的技术进步使我们能够进行多组学分析。人脐静脉内皮细胞(HUVEC)是广泛用于研究与心血管系统相关的病理和生理过程的模型系统。在这项研究中,采用下一代测序和高分辨率质谱技术来分析原代 HUVEC 的转录组和蛋白质组。对来自下一代测序的 1.45 亿对末端读取的分析证实了 12186 个蛋白编码基因(FPKM≥0.1)、439 个新的长非编码 RNA 的表达,并揭示了 6089 个在 GENCODE 中未注释的新亚型。蛋白质组学分析鉴定了 6477 种蛋白质,包括 1091 种蛋白质的 N 末端确认、149 种蛋白质的同工型和 1034 个磷酸化位点。专门用于识别其他翻译后修饰的数据库搜索为 117 种蛋白质上的许多修饰位点提供了证据,包括泛素化、赖氨酸乙酰化以及单、二和三甲基化事件。提供了 11 种“缺失蛋白”的证据,这些蛋白的蛋白质水平证据不足或不存在。通过比较 MS/MS 碎片模式与合成肽,验证支持缺失蛋白和新事件的肽。最后,除了七个蛋白质的七个替代翻译起始位点和五个蛋白质的新型蛋白质变体外,还鉴定了 207 种表达蛋白质的 245 种变异肽。总体而言,人们相信,本研究中采用的综合方法广泛适用于对任何原代细胞类型进行更深入的分子特征研究。