Department of Electrical and Computer Engineering, University of California, La Jolla, San Diego, USA.
Division of Biomedical Informatics, University of California, La Jolla, San Diego, USA.
Sci Rep. 2024 Jan 2;14(1):85. doi: 10.1038/s41598-023-48594-4.
The emergence of long COVID during the ongoing COVID-19 pandemic has presented considerable challenges for healthcare professionals and researchers. The task of identifying relevant literature is particularly daunting due to the rapidly evolving scientific landscape, inconsistent definitions, and a lack of standardized nomenclature. This paper proposes a novel solution to this challenge by employing machine learning techniques to classify long COVID literature. However, the scarcity of annotated data for machine learning poses a significant obstacle. To overcome this, we introduce a strategy called medical paraphrasing, which diversifies the training data while maintaining the original content. Additionally, we propose a Data-Reweighting-Based Multi-Level Optimization Framework for Domain Adaptive Paraphrasing, supported by a Meta-Weight-Network (MWN). This innovative approach incorporates feedback from the downstream text classification model to influence the training of the paraphrasing model. During the training process, the framework assigns higher weights to the training examples that contribute more effectively to the downstream task of long COVID text classification. Our findings demonstrate that this method substantially improves the accuracy and efficiency of long COVID literature classification, offering a valuable tool for physicians and researchers navigating this complex and ever-evolving field.
在当前的 COVID-19 大流行期间,长新冠的出现给医疗保健专业人员和研究人员带来了相当大的挑战。由于科学领域的快速发展、定义不一致以及缺乏标准化术语,识别相关文献的任务特别艰巨。本文提出了一种通过使用机器学习技术对长新冠文献进行分类的新方法。然而,机器学习的注释数据稀缺是一个重大障碍。为了克服这个问题,我们引入了一种称为医学释义的策略,该策略在保持原始内容的同时,使训练数据多样化。此外,我们提出了一种基于数据重新加权的多层次优化框架,用于领域自适应释义,并得到了元权重网络(MWN)的支持。这种创新方法结合了来自下游文本分类模型的反馈,以影响释义模型的训练。在训练过程中,框架会为对下游长新冠文本分类任务贡献更大的训练示例分配更高的权重。我们的研究结果表明,这种方法大大提高了长新冠文献分类的准确性和效率,为医生和研究人员在这个复杂且不断发展的领域提供了有价值的工具。