• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

为什么引用这个?可解释机器学习应用于新冠疫情研究文献。

Why was this cited? Explainable machine learning applied to COVID-19 research literature.

作者信息

Beranová Lucie, Joachimiak Marcin P, Kliegr Tomáš, Rabby Gollam, Sklenák Vilém

机构信息

Department of Econometrics, Faculty of Informatics and Statistics, VSE Praha, W Churchill sq. 4, Prague, Czech Republic.

Environmental Genomics and Systems Biology Division at Lawrence Berkeley National Laboratory, Berkeley, USA.

出版信息

Scientometrics. 2022;127(5):2313-2349. doi: 10.1007/s11192-022-04314-9. Epub 2022 Apr 9.

DOI:10.1007/s11192-022-04314-9
PMID:35431364
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8993675/
Abstract

Multiple studies have investigated bibliometric factors predictive of the citation count a research article will receive. In this article, we go beyond bibliometric data by using a range of machine learning techniques to find patterns predictive of citation count using both article content and available metadata. As the input collection, we use the CORD-19 corpus containing research articles-mostly from biology and medicine-applicable to the COVID-19 crisis. Our study employs a combination of state-of-the-art machine learning techniques for text understanding, including embeddings-based language model BERT, several systems for detection and semantic expansion of entities: ConceptNet, Pubtator and ScispaCy. To interpret the resulting models, we use several explanation algorithms: random forest feature importance, LIME, and Shapley values. We compare the performance and comprehensibility of models obtained by "black-box" machine learning algorithms (neural networks and random forests) with models built with rule learning (CORELS, CBA), which are intrinsically explainable. Multiple rules were discovered, which referred to biomedical entities of potential interest. Of the rules with the highest lift measure, several rules pointed to dipeptidyl peptidase4 (DPP4), a known MERS-CoV receptor and a critical determinant of camel to human transmission of the camel coronavirus (MERS-CoV). Some other interesting patterns related to the type of animal investigated were found. Articles referring to bats and camels tend to draw citations, while articles referring to most other animal species related to coronavirus are lowly cited. Bat coronavirus is the only other virus from a non-human species in the betaB clade along with the SARS-CoV and SARS-CoV-2 viruses. MERS-CoV is in a sister betaC clade, also close to human SARS coronaviruses. Thus both species linked to high citation counts harbor coronaviruses which are more phylogenetically similar to human SARS viruses. On the other hand, feline (FIPV, FCOV) and canine coronaviruses (CCOV) are in the alpha coronavirus clade and more distant from the betaB clade with human SARS viruses. Other results include detection of apparent citation bias favouring authors with western sounding names. Equal performance of TF-IDF weights and binary word incidence matrix was observed, with the latter resulting in better interpretability. The best predictive performance was obtained with a "black-box" method-neural network. The rule-based models led to most insights, especially when coupled with text representation using semantic entity detection methods. Follow-up work should focus on the analysis of citation patterns in the context of phylogenetic trees, as well on patterns referring to DPP4, which is currently considered as a SARS-Cov-2 therapeutic target.

摘要

多项研究调查了可预测研究论文被引次数的文献计量学因素。在本文中,我们超越了文献计量数据,通过使用一系列机器学习技术,利用文章内容和可用的元数据来寻找可预测被引次数的模式。作为输入数据集,我们使用了CORD-19语料库,其中包含适用于新冠疫情危机的研究论文,大部分来自生物学和医学领域。我们的研究采用了多种用于文本理解的先进机器学习技术,包括基于嵌入的语言模型BERT、用于实体检测和语义扩展的多个系统:ConceptNet、Pubtator和ScispaCy。为了解释所得模型,我们使用了几种解释算法:随机森林特征重要性、LIME和Shapley值。我们将“黑箱”机器学习算法(神经网络和随机森林)得到的模型的性能和可理解性与基于规则学习构建的模型(CORELS、CBA)进行比较,后者本质上是可解释的。发现了多个与潜在感兴趣的生物医学实体相关的规则。在提升度最高的规则中,有几条规则指向二肽基肽酶4(DPP4),它是已知的中东呼吸综合征冠状病毒(MERS-CoV)受体,也是骆驼冠状病毒(MERS-CoV)从骆驼传播给人类的关键决定因素。还发现了一些与所研究动物类型相关的其他有趣模式。提及蝙蝠和骆驼的文章往往会获得引用,而提及与冠状病毒相关的大多数其他动物物种的文章被引次数较低。蝙蝠冠状病毒是βB进化枝中除严重急性呼吸综合征冠状病毒(SARS-CoV)和严重急性呼吸综合征冠状病毒2(SARS-CoV-2)之外的唯一一种非人类物种病毒。MERS-CoV处于一个姐妹βC进化枝中,也与人类SARS冠状病毒相近。因此,与高被引次数相关的两个物种都携带与人类SARS病毒在系统发育上更相似的冠状病毒。另一方面,猫科(猫传染性腹膜炎病毒、猫冠状病毒)和犬冠状病毒属于α冠状病毒进化枝,与含有人类SARS病毒的βB进化枝距离更远。其他结果包括检测到明显的引用偏向,偏向于名字带有西方风格的作者。观察到词频逆文档频率(TF-IDF)权重和二元词出现矩阵具有相同的性能,后者具有更好的可解释性。使用“黑箱”方法——神经网络获得了最佳预测性能。基于规则的模型带来了最多的见解,特别是当与使用语义实体检测方法的文本表示相结合时。后续工作应专注于在系统发育树的背景下分析引用模式,以及关于DPP4的模式,DPP4目前被认为是SARS-CoV-2的治疗靶点。

相似文献

1
Why was this cited? Explainable machine learning applied to COVID-19 research literature.为什么引用这个?可解释机器学习应用于新冠疫情研究文献。
Scientometrics. 2022;127(5):2313-2349. doi: 10.1007/s11192-022-04314-9. Epub 2022 Apr 9.
2
Role of biological Data Mining and Machine Learning Techniques in Detecting and Diagnosing the Novel Coronavirus (COVID-19): A Systematic Review.生物数据挖掘和机器学习技术在检测和诊断新型冠状病毒 (COVID-19) 中的作用:系统评价。
J Med Syst. 2020 May 25;44(7):122. doi: 10.1007/s10916-020-01582-x.
3
Detection and full genome characterization of two beta CoV viruses related to Middle East respiratory syndrome from bats in Italy.在意大利的蝙蝠中检测到与中东呼吸综合征相关的两种β-CoV 病毒,并对其全基因组进行了特征分析。
Virol J. 2017 Dec 19;14(1):239. doi: 10.1186/s12985-017-0907-1.
4
Impact of COVID-19 research: a study on predicting influential scholarly documents using machine learning and a domain-independent knowledge graph.新冠疫情研究的影响:一项使用机器学习和领域无关知识图谱预测有影响力学术文献的研究。
J Biomed Semantics. 2023 Nov 28;14(1):18. doi: 10.1186/s13326-023-00298-4.
5
SARS-CoV-2 and Three Related Coronaviruses Utilize Multiple ACE2 Orthologs and Are Potently Blocked by an Improved ACE2-Ig.SARS-CoV-2 及三种相关冠状病毒利用多种 ACE2 同源物,可被改良的 ACE2-Ig 有效阻断。
J Virol. 2020 Oct 27;94(22). doi: 10.1128/JVI.01283-20.
6
Host species restriction of Middle East respiratory syndrome coronavirus through its receptor, dipeptidyl peptidase 4.通过其受体二肽基肽酶 4 限制中东呼吸综合征冠状病毒的宿主物种。
J Virol. 2014 Aug;88(16):9220-32. doi: 10.1128/JVI.00676-14. Epub 2014 Jun 4.
7
Properties of Coronavirus and SARS-CoV-2.冠状病毒及新型冠状病毒(SARS-CoV-2)的特性
Malays J Pathol. 2020 Apr;42(1):3-11.
8
Receptor usage and cell entry of bat coronavirus HKU4 provide insight into bat-to-human transmission of MERS coronavirus.蝙蝠冠状病毒HKU4的受体使用情况及细胞进入机制为中东呼吸综合征冠状病毒从蝙蝠向人类的传播提供了线索。
Proc Natl Acad Sci U S A. 2014 Aug 26;111(34):12516-21. doi: 10.1073/pnas.1405889111. Epub 2014 Aug 11.
9
Multi-class classification of COVID-19 documents using machine learning algorithms.使用机器学习算法对新冠病毒疾病文档进行多类别分类。
J Intell Inf Syst. 2023;60(2):571-591. doi: 10.1007/s10844-022-00768-8. Epub 2022 Nov 29.
10
Permissivity of Dipeptidyl Peptidase 4 Orthologs to Middle East Respiratory Syndrome Coronavirus Is Governed by Glycosylation and Other Complex Determinants.二肽基肽酶4直系同源物对中东呼吸综合征冠状病毒的易感性受糖基化和其他复杂决定因素的调控。
J Virol. 2017 Sep 12;91(19). doi: 10.1128/JVI.00534-17. Print 2017 Oct 1.

引用本文的文献

1
Towards Improved XAI-Based Epidemiological Research into the Next Potential Pandemic.迈向基于可解释人工智能的流行病学研究,以应对下一次潜在的大流行。
Life (Basel). 2024 Jun 21;14(7):783. doi: 10.3390/life14070783.
2
A brief review and scientometric analysis on ensemble learning methods for handling COVID-19.关于处理新冠肺炎的集成学习方法的简要综述与科学计量分析
Heliyon. 2024 Feb 20;10(4):e26694. doi: 10.1016/j.heliyon.2024.e26694. eCollection 2024 Feb 29.
3
Impact of COVID-19 research: a study on predicting influential scholarly documents using machine learning and a domain-independent knowledge graph.

本文引用的文献

1
Stop Explaining Black Box Machine Learning Models for High Stakes Decisions and Use Interpretable Models Instead.停止为高风险决策解释黑箱机器学习模型,转而使用可解释模型。
Nat Mach Intell. 2019 May;1(5):206-215. doi: 10.1038/s42256-019-0048-x. Epub 2019 May 13.
2
Supervised learning with word embeddings derived from PubMed captures latent knowledge about protein kinases and cancer.利用源自PubMed的词嵌入进行监督学习,可捕获有关蛋白激酶和癌症的潜在知识。
NAR Genom Bioinform. 2021 Dec 8;3(4):lqab113. doi: 10.1093/nargab/lqab113. eCollection 2021 Dec.
3
Deep Learning in Mining Biological Data.
新冠疫情研究的影响:一项使用机器学习和领域无关知识图谱预测有影响力学术文献的研究。
J Biomed Semantics. 2023 Nov 28;14(1):18. doi: 10.1186/s13326-023-00298-4.
4
Evaluation of editors' abilities to predict the citation potential of research manuscripts submitted to : a cohort study.评估编辑预测投稿研究手稿被引潜力的能力:一项队列研究。
BMJ. 2022 Dec 14;379:e073880. doi: 10.1136/bmj-2022-073880.
5
Multi-class classification of COVID-19 documents using machine learning algorithms.使用机器学习算法对新冠病毒疾病文档进行多类别分类。
J Intell Inf Syst. 2023;60(2):571-591. doi: 10.1007/s10844-022-00768-8. Epub 2022 Nov 29.
生物数据挖掘中的深度学习
Cognit Comput. 2021;13(1):1-33. doi: 10.1007/s12559-020-09773-x. Epub 2021 Jan 5.
4
A chronicle of SARS-CoV-2: Seasonality, environmental fate, transport, inactivation, and antiviral drug resistance.SARS-CoV-2 编年史:季节性、环境命运、传播、失活和抗病毒药物耐药性。
J Hazard Mater. 2021 Mar 5;405:124043. doi: 10.1016/j.jhazmat.2020.124043. Epub 2020 Oct 6.
5
KG-COVID-19: A Framework to Produce Customized Knowledge Graphs for COVID-19 Response.KG-COVID-19:一个用于生成针对COVID-19应对的定制知识图谱的框架。
Patterns (N Y). 2021 Jan 8;2(1):100155. doi: 10.1016/j.patter.2020.100155. Epub 2020 Nov 9.
6
Animal models for COVID-19.用于 COVID-19 的动物模型。
Nature. 2020 Oct;586(7830):509-515. doi: 10.1038/s41586-020-2787-6. Epub 2020 Sep 23.
7
Coronavirus disease 2019 (COVID-19) in domestic animals and wildlife: advances and prospects in the development of animal models for vaccine and therapeutic research.动物和野生动物中的 2019 年冠状病毒病(COVID-19):疫苗和治疗研究中动物模型开发的进展和前景。
Hum Vaccin Immunother. 2020 Dec 1;16(12):3043-3054. doi: 10.1080/21645515.2020.1807802. Epub 2020 Sep 11.
8
From Local Explanations to Global Understanding with Explainable AI for Trees.利用可解释人工智能实现从局部解释到树木的全局理解
Nat Mach Intell. 2020 Jan;2(1):56-67. doi: 10.1038/s42256-019-0138-9. Epub 2020 Jan 17.
9
Guidelines for communicating about bats to prevent persecution in the time of COVID-19.在新冠疫情期间关于蝙蝠传播信息以防止其受迫害的指南。
Biol Conserv. 2020 Aug;248:108650. doi: 10.1016/j.biocon.2020.108650. Epub 2020 Jun 3.
10
Effect of published papers by the Institute for Health Metrics and Evaluation on the impact factor of journal.健康指标与评估研究所发表的论文对期刊影响因子的作用。
J Investig Med. 2020 Aug;68(6):1203-1204. doi: 10.1136/jim-2020-001398. Epub 2020 May 23.