National Center for Biotechnology Information, National Library of Medicine, Bethesda, MD, USA.
Implementation Science Team, Division of Cancer Control and Population Sciences, National Cancer Institute, Bethesda, MD, USA.
Database (Oxford). 2019 Jan 1;2019:baz010. doi: 10.1093/database/baz010.
Tracking scientific research publications on the evaluation, utility and implementation of genomic applications is critical for the translation of basic research to impact clinical and population health. In this work, we utilize state-of-the-art machine learning approaches to identify translational research in genomics beyond bench to bedside from the biomedical literature. We apply the convolutional neural networks (CNNs) and support vector machines (SVMs) to the bench/bedside article classification on the weekly manual annotation data of the Public Health Genomics Knowledge Base database. Both classifiers employ salient features to determine the probability of curation-eligible publications, which can effectively reduce the workload of manual triage and curation process. We applied the CNNs and SVMs to an independent test set (n = 400), and the models achieved the F-measure of 0.80 and 0.74, respectively. We further tested the CNNs, which perform better results, on the routine annotation pipeline for 2 weeks and significantly reduced the effort and retrieved more appropriate research articles. Our approaches provide direct insight into the automated curation of genomic translational research beyond bench to bedside. The machine learning classifiers are found to be helpful for annotators to enhance the efficiency of manual curation.
跟踪评估、效用和实施基因组应用的科学研究出版物对于将基础研究转化为对临床和人群健康的影响至关重要。在这项工作中,我们利用最先进的机器学习方法从生物医学文献中识别出从实验室到临床的转化基因组学研究。我们将卷积神经网络(CNNs)和支持向量机(SVMs)应用于公共卫生基因组知识库数据库每周手动注释数据的床边/床边文章分类。这两种分类器都利用显著特征来确定有资格进行注释的出版物的概率,这可以有效地减少手动分诊和注释过程的工作量。我们将 CNNs 和 SVMs 应用于独立的测试集(n=400),模型的 F 度量分别为 0.80 和 0.74。我们进一步在常规注释管道上测试了表现更好的 CNNs 两周时间,并显著减少了工作量,检索到了更合适的研究文章。我们的方法为自动注释从实验室到临床的基因组转化研究提供了直接的见解。机器学习分类器被发现有助于注释者提高手动注释的效率。