Lee Joon-Yong, Choi Hyungwon, Colangelo Christopher M, Davis Darryl, Hoopmann Michael R, Käll Lukas, Lam Henry, Payne Samuel H, Perez-Riverol Yasset, The Matthew, Wilson Ryan, Weintraub Susan T, Palmblad Magnus
Pacific Northwest National Laboratory, Richland, Washington 99352, USA.
National University of Singapore, 117547 Singapore, Singapore.
J Biomol Tech. 2018 Jul;29(2):39-45. doi: 10.7171/jbt.18-2902-003. Epub 2018 Jun 21.
This report presents the results from the 2016 Association of Biomolecular Resource Facilities Proteome Informatics Research Group (iPRG) study on proteoform inference and false discovery rate (FDR) estimation from bottom-up proteomics data. For this study, 3 replicate Q Exactive Orbitrap liquid chromatography-tandom mass spectrometry datasets were generated from each of 4 samples spiked with different equimolar mixtures of small recombinant proteins selected to mimic pairs of homologous proteins. Participants were given raw data and a sequence file and asked to identify the proteins and provide estimates on the FDR at the proteoform level. As part of this study, we tested a new submission system with a format validator running on a virtual private server (VPS) and allowed methods to be provided as executable R Markdown or IPython Notebooks. The task was perceived as difficult, and only eight unique submissions were received, although those who participated did well with no one method performing best on all samples. However, none of the submissions included a complete Markdown or Notebook, even though examples were provided. Future iPRG studies need to be more successful in promoting and encouraging participation. The VPS and submission validator easily scale to much larger numbers of participants in these types of studies. The unique "ground-truth" dataset for proteoform identification generated for this study is now available to the research community, as are the server-side scripts for validating and managing submissions.
本报告展示了2016年生物分子资源设施协会蛋白质组信息学研究小组(iPRG)关于自下而上蛋白质组学数据的蛋白异构体推断和错误发现率(FDR)估计的研究结果。在这项研究中,从4个样品中的每个样品生成了3个重复的Q Exactive Orbitrap液相色谱 - 串联质谱数据集,这些样品中添加了不同的等摩尔小重组蛋白混合物,这些混合物被选择用来模拟同源蛋白对。参与者获得了原始数据和序列文件,并被要求鉴定蛋白质,并在蛋白异构体水平上提供FDR估计值。作为本研究的一部分,我们测试了一个新的提交系统,该系统带有在虚拟专用服务器(VPS)上运行的格式验证器,并允许以可执行的R Markdown或IPython Notebooks形式提供方法。该任务被认为具有挑战性,尽管参与者表现良好,但没有一种方法在所有样品上都表现最佳,最终仅收到了八份独特的提交结果。然而,即使提供了示例,也没有一份提交内容包含完整的Markdown或Notebook。未来的iPRG研究需要在促进和鼓励参与方面更加成功。VPS和提交验证器可以轻松扩展以适应这类研究中更多数量的参与者。本研究生成的用于蛋白异构体鉴定的独特“真实”数据集现在可供研究界使用,用于验证和管理提交的服务器端脚本也是如此。