改善与长新冠相关的文本分类：一种新颖的端到端领域自适应释义框架。

Improving long COVID-related text classification: a novel end-to-end domain-adaptive paraphrasing framework.

机构信息

Department of Electrical and Computer Engineering, University of California, La Jolla, San Diego, USA.

Division of Biomedical Informatics, University of California, La Jolla, San Diego, USA.

出版信息

Sci Rep. 2024 Jan 2;14(1):85. doi: 10.1038/s41598-023-48594-4.

DOI:10.1038/s41598-023-48594-4

PMID:38168099

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10761882/

Abstract

The emergence of long COVID during the ongoing COVID-19 pandemic has presented considerable challenges for healthcare professionals and researchers. The task of identifying relevant literature is particularly daunting due to the rapidly evolving scientific landscape, inconsistent definitions, and a lack of standardized nomenclature. This paper proposes a novel solution to this challenge by employing machine learning techniques to classify long COVID literature. However, the scarcity of annotated data for machine learning poses a significant obstacle. To overcome this, we introduce a strategy called medical paraphrasing, which diversifies the training data while maintaining the original content. Additionally, we propose a Data-Reweighting-Based Multi-Level Optimization Framework for Domain Adaptive Paraphrasing, supported by a Meta-Weight-Network (MWN). This innovative approach incorporates feedback from the downstream text classification model to influence the training of the paraphrasing model. During the training process, the framework assigns higher weights to the training examples that contribute more effectively to the downstream task of long COVID text classification. Our findings demonstrate that this method substantially improves the accuracy and efficiency of long COVID literature classification, offering a valuable tool for physicians and researchers navigating this complex and ever-evolving field.

摘要

在当前的 COVID-19 大流行期间，长新冠的出现给医疗保健专业人员和研究人员带来了相当大的挑战。由于科学领域的快速发展、定义不一致以及缺乏标准化术语，识别相关文献的任务特别艰巨。本文提出了一种通过使用机器学习技术对长新冠文献进行分类的新方法。然而，机器学习的注释数据稀缺是一个重大障碍。为了克服这个问题，我们引入了一种称为医学释义的策略，该策略在保持原始内容的同时，使训练数据多样化。此外，我们提出了一种基于数据重新加权的多层次优化框架，用于领域自适应释义，并得到了元权重网络（MWN）的支持。这种创新方法结合了来自下游文本分类模型的反馈，以影响释义模型的训练。在训练过程中，框架会为对下游长新冠文本分类任务贡献更大的训练示例分配更高的权重。我们的研究结果表明，这种方法大大提高了长新冠文献分类的准确性和效率，为医生和研究人员在这个复杂且不断发展的领域提供了有价值的工具。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c683/10761882/d438d9e8986d/41598_2023_48594_Fig1_HTML.jpg

相似文献

Improving long COVID-related text classification: a novel end-to-end domain-adaptive paraphrasing framework.

Sci Rep. 2024 Jan 2;14(1):85. doi: 10.1038/s41598-023-48594-4.

COVID-Net Biochem: an explainability-driven framework to building machine learning models for predicting survival and kidney injury of COVID-19 patients from clinical and biochemistry data.

Sci Rep. 2023 Oct 9;13(1):17001. doi: 10.1038/s41598-023-42203-0.

The future of Cochrane Neonatal.

Early Hum Dev. 2020 Nov;150:105191. doi: 10.1016/j.earlhumdev.2020.105191. Epub 2020 Sep 12.

Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.

Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

[Health professionals facing the coronavirus disease 2019 (COVID-19) pandemic: What are the mental health risks?].

Encephale. 2020 Jun;46(3S):S73-S80. doi: 10.1016/j.encep.2020.04.008. Epub 2020 Apr 22.

Paraphrasing to improve the performance of Electronic Health Records Question Answering.

AMIA Jt Summits Transl Sci Proc. 2020 May 30;2020:626-635. eCollection 2020.

SSA-Net: Spatial self-attention network for COVID-19 pneumonia infection segmentation with semi-supervised few-shot learning.

Med Image Anal. 2022 Jul;79:102459. doi: 10.1016/j.media.2022.102459. Epub 2022 Apr 22.

Detection of Hate Speech in COVID-19-Related Tweets in the Arab Region: Deep Learning and Topic Modeling Approach.

J Med Internet Res. 2020 Dec 8;22(12):e22609. doi: 10.2196/22609.

Development of a data-driven digital phenotype profile of distress experience of healthcare workers during COVID-19 pandemic.

Comput Methods Programs Biomed. 2023 Oct;240:107645. doi: 10.1016/j.cmpb.2023.107645. Epub 2023 Jun 12.

Educational Initiative about the COVID-19 Pandemic-Related Neuropsychiatry for Early Career Professionals in EU: The Impact of the Novel Virus on Brain, Mind, and Society.

Psychiatr Danub. 2022 Sep;34(Suppl 8):164-169.

本文引用的文献

Long COVID Classification: Findings from a Clustering Analysis in the Predi-COVID Cohort Study.

Int J Environ Res Public Health. 2022 Nov 30;19(23):16018. doi: 10.3390/ijerph192316018.

Comprehensively identifying Long Covid articles with human-in-the-loop machine learning.

Patterns (N Y). 2023 Jan 13;4(1):100659. doi: 10.1016/j.patter.2022.100659. Epub 2022 Dec 1.

Discovering Long COVID Symptom Patterns: Association Rule Mining and Sentiment Analysis in Social Media Tweets.

JMIR Form Res. 2022 Sep 7;6(9):e37984. doi: 10.2196/37984.

A prospective observational study of post-COVID-19 chronic fatigue syndrome following the first pandemic wave in Germany and biomarkers associated with symptom severity.

Nat Commun. 2022 Aug 30;13(1):5104. doi: 10.1038/s41467-022-32507-6.

Long-term cardiovascular outcomes in COVID-19 survivors among non-vaccinated population: A retrospective cohort study from the TriNetX US collaborative networks.

EClinicalMedicine. 2022 Nov;53:101619. doi: 10.1016/j.eclinm.2022.101619. Epub 2022 Aug 11.

Continuous development of the semantic search engine preVIEW: from COVID-19 to long COVID.

Database (Oxford). 2022 Jul 1;2022. doi: 10.1093/database/baac048.

Identifying who has long COVID in the USA: a machine learning approach using N3C data.

Lancet Digit Health. 2022 Jul;4(7):e532-e541. doi: 10.1016/S2589-7500(22)00048-6. Epub 2022 May 16.

SARS-CoV-2 is associated with changes in brain structure in UK Biobank.

Nature. 2022 Apr;604(7907):697-707. doi: 10.1038/s41586-022-04569-5. Epub 2022 Mar 7.

A clinical case definition of post-COVID-19 condition by a Delphi consensus.

Lancet Infect Dis. 2022 Apr;22(4):e102-e107. doi: 10.1016/S1473-3099(21)00703-9. Epub 2021 Dec 21.

Characterizing Long COVID: Deep Phenotype of a Complex Condition.

EBioMedicine. 2021 Dec;74:103722. doi: 10.1016/j.ebiom.2021.103722. Epub 2021 Nov 25.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

改善与长新冠相关的文本分类：一种新颖的端到端领域自适应释义框架。

Improving long COVID-related text classification: a novel end-to-end domain-adaptive paraphrasing framework.

机构信息

出版信息

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献