Lister Hill National Center for Biomedical Communications, US National Library of Medicine.
Brief Bioinform. 2018 Nov 27;19(6):1400-1414. doi: 10.1093/bib/bbx057.
An estimated quarter of a trillion US dollars is invested in the biomedical research enterprise annually. There is growing alarm that a significant portion of this investment is wasted because of problems in reproducibility of research findings and in the rigor and integrity of research conduct and reporting. Recent years have seen a flurry of activities focusing on standardization and guideline development to enhance the reproducibility and rigor of biomedical research. Research activity is primarily communicated via textual artifacts, ranging from grant applications to journal publications. These artifacts can be both the source and the manifestation of practices leading to research waste. For example, an article may describe a poorly designed experiment, or the authors may reach conclusions not supported by the evidence presented. In this article, we pose the question of whether biomedical text mining techniques can assist the stakeholders in the biomedical research enterprise in doing their part toward enhancing research integrity and rigor. In particular, we identify four key areas in which text mining techniques can make a significant contribution: plagiarism/fraud detection, ensuring adherence to reporting guidelines, managing information overload and accurate citation/enhanced bibliometrics. We review the existing methods and tools for specific tasks, if they exist, or discuss relevant research that can provide guidance for future work. With the exponential increase in biomedical research output and the ability of text mining approaches to perform automatic tasks at large scale, we propose that such approaches can support tools that promote responsible research practices, providing significant benefits for the biomedical research enterprise.
每年大约有 2500 亿美元投入到生物医学研究企业中。人们越来越担心,由于研究结果的可重复性、研究开展和报告的严谨性和完整性方面存在问题,相当一部分投资被浪费了。近年来,人们开展了大量活动来关注标准化和指南制定,以提高生物医学研究的可重复性和严谨性。研究活动主要通过文本制品来进行交流,从资助申请到期刊出版物都有涉及。这些制品既可以是导致研究浪费的实践的源头,也可以是其表现形式。例如,一篇文章可能描述了一个设计不佳的实验,或者作者可能得出了与所呈现证据不符的结论。在本文中,我们提出了这样一个问题:生物医学文本挖掘技术是否可以帮助生物医学研究企业的利益相关者在提高研究诚信和严谨性方面发挥作用。特别是,我们确定了文本挖掘技术可以在四个关键领域做出重大贡献:剽窃/欺诈检测、确保遵守报告指南、管理信息过载和准确引用/增强文献计量学。我们回顾了特定任务的现有方法和工具(如果存在的话),或者讨论了可以为未来工作提供指导的相关研究。随着生物医学研究产出的指数级增长,以及文本挖掘方法在大规模上执行自动任务的能力,我们提出,这些方法可以支持促进负责任的研究实践的工具,为生物医学研究企业带来巨大的好处。