新冠疫情研究的影响：一项使用机器学习和领域无关知识图谱预测有影响力学术文献的研究。

Impact of COVID-19 research: a study on predicting influential scholarly documents using machine learning and a domain-independent knowledge graph.

机构信息

L3S Research Center, Leibniz University Hannover, Hanover, Germany.

Department of Information and Knowledge Engineering, Prague University of Economics and Business, nám. Winstona Churchilla 1938/4, 120 00, Prague, Czech Republic.

出版信息

J Biomed Semantics. 2023 Nov 28;14(1):18. doi: 10.1186/s13326-023-00298-4.

DOI:10.1186/s13326-023-00298-4

PMID:38017587

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10683290/

Abstract

Multiple studies have investigated bibliometric features and uncategorized scholarly documents for the influential scholarly document prediction task. In this paper, we describe our work that attempts to go beyond bibliometric metadata to predict influential scholarly documents. Furthermore, this work also examines the influential scholarly document prediction task over categorized scholarly documents. We also introduce a new approach to enhance the document representation method with a domain-independent knowledge graph to find the influential scholarly document using categorized scholarly content. As the input collection, we use the WHO corpus with scholarly documents on the theme of COVID-19. This study examines different document representation methods for machine learning, including TF-IDF, BOW, and embedding-based language models (BERT). The TF-IDF document representation method works better than others. From various machine learning methods tested, logistic regression outperformed the other for scholarly document category classification, and the random forest algorithm obtained the best results for influential scholarly document prediction, with the help of a domain-independent knowledge graph, specifically DBpedia, to enhance the document representation method for predicting influential scholarly documents with categorical scholarly content. In this case, our study combines state-of-the-art machine learning methods with the BOW document representation method. We also enhance the BOW document representation with the direct type (RDF type) and unqualified relation from DBpedia. From this experiment, we did not find any impact of the enhanced document representation for the scholarly document category classification. We found an effect in the influential scholarly document prediction with categorical data.

摘要

多项研究调查了文献计量学特征和未分类的学术文献，以用于有影响力的学术文献预测任务。在本文中，我们描述了我们的工作，试图超越文献计量元数据来预测有影响力的学术文献。此外，这项工作还研究了在分类学术文献上的有影响力的学术文献预测任务。我们还引入了一种新方法，通过使用独立于领域的知识图谱来增强文档表示方法，以使用分类学术内容找到有影响力的学术文献。作为输入集合，我们使用了关于 COVID-19 主题的学术文献的世界卫生组织 (WHO) 语料库。本研究检验了机器学习的不同文档表示方法，包括 TF-IDF、BOW 和基于嵌入的语言模型 (BERT)。TF-IDF 文档表示方法比其他方法效果更好。在所测试的各种机器学习方法中，逻辑回归在学术文献类别分类方面优于其他方法，随机森林算法在借助独立于领域的知识图谱（特别是 DBpedia）增强文档表示方法以预测具有分类学术内容的有影响力的学术文献方面获得了最佳结果。在这种情况下，我们的研究将最先进的机器学习方法与 BOW 文档表示方法相结合。我们还使用 DBpedia 的直接类型（RDF 类型）和非限定关系增强了 BOW 文档表示。通过这个实验，我们没有发现增强的文档表示对学术文献类别分类有任何影响。我们在具有分类数据的有影响力的学术文献预测中发现了效果。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bb20/10683290/09acd7df7e70/13326_2023_298_Fig1_HTML.jpg

相似文献

Impact of COVID-19 research: a study on predicting influential scholarly documents using machine learning and a domain-independent knowledge graph.新冠疫情研究的影响：一项使用机器学习和领域无关知识图谱预测有影响力学术文献的研究。

J Biomed Semantics. 2023 Nov 28;14(1):18. doi: 10.1186/s13326-023-00298-4.

Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.在基层医疗机构或医院门诊环境中，如果患者出现以下症状和体征，可判断其是否患有 COVID-19。

Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3.

Prescription of Controlled Substances: Benefits and Risks管制药品的处方：益处与风险

Comparison of Two Modern Survival Prediction Tools, SORG-MLA and METSSS, in Patients With Symptomatic Long-bone Metastases Who Underwent Local Treatment With Surgery Followed by Radiotherapy and With Radiotherapy Alone.两种现代生存预测工具 SORG-MLA 和 METSSS 在接受手术联合放疗和单纯放疗治疗有症状长骨转移患者中的比较。

Clin Orthop Relat Res. 2024 Dec 1;482(12):2193-2208. doi: 10.1097/CORR.0000000000003185. Epub 2024 Jul 23.

Classification of finger movements through optimal EEG channel and feature selection.通过最优脑电图通道和特征选择对手指运动进行分类。

Front Hum Neurosci. 2025 Jul 16;19:1633910. doi: 10.3389/fnhum.2025.1633910. eCollection 2025.

Comparison of self-administered survey questionnaire responses collected using mobile apps versus other methods.使用移动应用程序与其他方法收集的自我管理调查问卷回复的比较。

Cochrane Database Syst Rev. 2015 Jul 27;2015(7):MR000042. doi: 10.1002/14651858.MR000042.pub2.

Individual-level interventions to reduce personal exposure to outdoor air pollution and their effects on people with long-term respiratory conditions.个体层面的干预措施以减少个人接触室外空气污染及其对长期呼吸系统疾病患者的影响。

Cochrane Database Syst Rev. 2021 Aug 9;8(8):CD013441. doi: 10.1002/14651858.CD013441.pub2.

Proposal for Using AI to Assess Clinical Data Integrity and Generate Metadata: Algorithm Development and Validation.关于使用人工智能评估临床数据完整性并生成元数据的提案：算法开发与验证

JMIR Med Inform. 2025 Jun 30;13:e60204. doi: 10.2196/60204.

Distilling knowledge from graph neural networks trained on cell graphs to non-neural student models.从在细胞图上训练的图神经网络中提取知识，用于非神经学生模型。

Sci Rep. 2025 Aug 10;15(1):29274. doi: 10.1038/s41598-025-13697-7.

Does the Presence of Missing Data Affect the Performance of the SORG Machine-learning Algorithm for Patients With Spinal Metastasis? Development of an Internet Application Algorithm.缺失数据的存在是否会影响 SORG 机器学习算法在脊柱转移瘤患者中的性能？开发一种互联网应用算法。

Clin Orthop Relat Res. 2024 Jan 1;482(1):143-157. doi: 10.1097/CORR.0000000000002706. Epub 2023 Jun 12.

本文引用的文献

Why was this cited? Explainable machine learning applied to COVID-19 research literature.为什么引用这个？可解释机器学习应用于新冠疫情研究文献。

Scientometrics. 2022;127(5):2313-2349. doi: 10.1007/s11192-022-04314-9. Epub 2022 Apr 9.

Trends in COVID-19 Publications: Streamlining Research Using NLP and LDA.COVID-19 相关出版物的趋势：使用自然语言处理和潜在狄利克雷分配简化研究

Front Digit Health. 2021 Jul 6;3:686720. doi: 10.3389/fdgth.2021.686720. eCollection 2021.

A qualitative and quantitative analysis of open citations to retracted articles: the Wakefield 1998 et al.'s case.对撤稿文章公开引用情况的定性和定量分析：1998年韦克菲尔德等人的案例。

Scientometrics. 2021;126(10):8433-8470. doi: 10.1007/s11192-021-04097-5. Epub 2021 Aug 5.

Research trends in COVID-19 vaccine: a bibliometric analysis.COVID-19 疫苗研究趋势：文献计量分析。

Hum Vaccin Immunother. 2021 Aug 3;17(8):2367-2372. doi: 10.1080/21645515.2021.1886806. Epub 2021 Mar 9.

A Comprehensive Overview of the COVID-19 Literature: Machine Learning-Based Bibliometric Analysis.《COVID-19 文献综述：基于机器学习的文献计量分析》

J Med Internet Res. 2021 Mar 8;23(3):e23703. doi: 10.2196/23703.

Bibliometric analysis of global scientific research on COVID-19.关于新冠病毒（COVID-19）全球科学研究的文献计量分析

J Biosaf Biosecur. 2021 Jun;3(1):4-9. doi: 10.1016/j.jobb.2020.12.002. Epub 2021 Jan 23.

A critical review of emerging technologies for tackling COVID-19 pandemic.对应对新冠疫情的新兴技术的批判性综述。

Hum Behav Emerg Technol. 2021 Jan;3(1):25-39. doi: 10.1002/hbe2.237. Epub 2020 Dec 1.

Google Scholar, Microsoft Academic, Scopus, Dimensions, Web of Science, and OpenCitations' COCI: a multidisciplinary comparison of coverage via citations.谷歌学术、微软学术、Scopus、Dimensions、科学网以及开放引文的COCI：基于引文的多学科覆盖范围比较

Scientometrics. 2021;126(1):871-906. doi: 10.1007/s11192-020-03690-4. Epub 2020 Sep 21.

The importance of airway and lung microbiome in the critically ill.危重症患者气道和肺部微生物组的重要性。

Crit Care. 2020 Aug 31;24(1):537. doi: 10.1186/s13054-020-03219-4.

Factors associated with COVID-19-related death using OpenSAFELY.使用 OpenSAFELY 分析与 COVID-19 相关死亡的因素。

Nature. 2020 Aug;584(7821):430-436. doi: 10.1038/s41586-020-2521-4. Epub 2020 Jul 8.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

新冠疫情研究的影响：一项使用机器学习和领域无关知识图谱预测有影响力学术文献的研究。

Impact of COVID-19 research: a study on predicting influential scholarly documents using machine learning and a domain-independent knowledge graph.

机构信息

出版信息

相似文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

本文引用的文献