Chambers Matthew C, Jagtap Pratik D, Johnson James E, McGowan Thomas, Kumar Praveen, Onsongo Getiria, Guerrero Candace R, Barsnes Harald, Vaudel Marc, Martens Lennart, Grüning Björn, Cooke Ira R, Heydarian Mohammad, Reddy Karen L, Griffin Timothy J
Department of Biochemistry, Vanderbilt University, Nashville, Tennessee.
Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, Minnesota.
Cancer Res. 2017 Nov 1;77(21):e43-e46. doi: 10.1158/0008-5472.CAN-17-0331.
Proteogenomics has emerged as a valuable approach in cancer research, which integrates genomic and transcriptomic data with mass spectrometry-based proteomics data to directly identify expressed, variant protein sequences that may have functional roles in cancer. This approach is computationally intensive, requiring integration of disparate software tools into sophisticated workflows, challenging its adoption by nonexpert, bench scientists. To address this need, we have developed an extensible, Galaxy-based resource aimed at providing more researchers access to, and training in, proteogenomic informatics. Our resource brings together software from several leading research groups to address two foundational aspects of proteogenomics: (i) generation of customized, annotated protein sequence databases from RNA-Seq data; and (ii) accurate matching of tandem mass spectrometry data to putative variants, followed by filtering to confirm their novelty. Directions for accessing software tools and workflows, along with instructional documentation, can be found at z.umn.edu/canresgithub. .
蛋白质基因组学已成为癌症研究中的一种重要方法,它将基因组和转录组数据与基于质谱的蛋白质组学数据相结合,以直接识别可能在癌症中发挥功能作用的已表达变异蛋白质序列。这种方法计算量很大,需要将不同的软件工具集成到复杂的工作流程中,这对非专业的实验科学家采用该方法构成了挑战。为满足这一需求,我们开发了一种基于Galaxy的可扩展资源,旨在让更多研究人员能够使用蛋白质基因组学信息学并接受相关培训。我们的资源整合了几个领先研究团队的软件,以解决蛋白质基因组学的两个基础方面:(i)从RNA测序数据生成定制的、带注释的蛋白质序列数据库;(ii)将串联质谱数据与假定变异进行准确匹配,然后进行筛选以确认其新颖性。有关访问软件工具和工作流程的指导以及教学文档,可在z.umn.edu/canresgithub上找到。