Bowel Cancer and Biomarker Research, Kolling Institute, The University of Sydney, Sydney 2065, Australia.
Department of Molecular Sciences, Macquarie University, Sydney 2109, Australia.
Int J Mol Sci. 2020 Feb 19;21(4):1374. doi: 10.3390/ijms21041374.
Proteomics and genomics discovery experiments generate increasingly large result tables, necessitating more researcher time to convert the biological data into new knowledge. Literature review is an important step in this process and can be tedious for large scale experiments. An informed and strategic decision about which biomolecule targets should be pursued for follow-up experiments thus remains a considerable challenge. To streamline and formalise this process of literature retrieval and analysis of discovery based 'omics data and as a decision-facilitating support tool for follow-up experiments we present OmixLitMiner, a package written in the computational language R. The tool automates the retrieval of literature from PubMed based on UniProt protein identifiers, gene names and their synonyms, combined with user defined contextual keyword search (i.e., gene ontology based). The search strategy is programmed to allow either strict or more lenient literature retrieval and the outputs are assigned to three categories describing how well characterized a regulated gene or protein is. The category helps to meet a decision, regarding which gene/protein follow-up experiments may be performed for gaining new knowledge and to exclude following already known biomarkers. We demonstrate the tool's usefulness in this retrospective study assessing three cancer proteomics and one cancer genomics publication. Using the tool, we were able to corroborate most of the decisions in these papers as well as detect additional biomolecule leads that may be valuable for future research.
蛋白质组学和基因组学发现实验产生的结果表越来越大,需要研究人员花费更多时间将生物数据转化为新知识。文献综述是这个过程中的一个重要步骤,但对于大规模实验来说可能很繁琐。因此,对于哪些生物分子靶标应该进行后续实验,做出明智和战略性的决策仍然是一个相当大的挑战。为了简化和规范化基于发现的“omics”数据的文献检索和分析过程,并为后续实验提供决策支持工具,我们提出了 OmixLitMiner,这是一个用计算语言 R 编写的软件包。该工具可以根据 UniProt 蛋白质标识符、基因名称及其同义词自动从 PubMed 中检索文献,同时结合用户定义的上下文关键字搜索(即基于基因本体论的搜索)。搜索策略被编程为允许严格或更宽松的文献检索,输出结果分为三个类别,描述调控基因或蛋白质的特征描述程度。该类别有助于做出决策,即选择哪些基因/蛋白质进行后续实验,以获得新知识,并排除已有的生物标志物。我们在这项回顾性研究中评估了三篇癌症蛋白质组学和一篇癌症基因组学出版物,展示了该工具的实用性。使用该工具,我们能够证实这些论文中的大多数决策,并发现其他可能对未来研究有价值的生物分子线索。