Computational Sciences and Engineering Division, Oak Ridge National Laboratory, 1 Bethel Valley Road, Oak Ridge, Tennessee, 37830, USA.
College of Medicine, University of Kentucky, Lexington, Kentucky, 24105, USA.
BMC Med Inform Decis Mak. 2024 Sep 17;24(Suppl 5):262. doi: 10.1186/s12911-024-02662-5.
Applying graph convolutional networks (GCN) to the classification of free-form natural language texts leveraged by graph-of-words features (TextGCN) was studied and confirmed to be an effective means of describing complex natural language texts. However, the text classification models based on the TextGCN possess weaknesses in terms of memory consumption and model dissemination and distribution. In this paper, we present a fast message passing network (FastMPN), implementing a GCN with message passing architecture that provides versatility and flexibility by allowing trainable node embedding and edge weights, helping the GCN model find the better solution. We applied the FastMPN model to the task of clinical information extraction from cancer pathology reports, extracting the following six properties: main site, subsite, laterality, histology, behavior, and grade.
We evaluated the clinical task performance of the FastMPN models in terms of micro- and macro-averaged F1 scores. A comparison was performed with the multi-task convolutional neural network (MT-CNN) model. Results show that the FastMPN model is equivalent to or better than the MT-CNN.
Our implementation revealed that our FastMPN model, which is based on the PyTorch platform, can train a large corpus (667,290 training samples) with 202,373 unique words in less than 3 minutes per epoch using one NVIDIA V100 hardware accelerator. Our experiments demonstrated that using this implementation, the clinical task performance scores of information extraction related to tumors from cancer pathology reports were highly competitive.
应用图卷积网络(GCN)对词图特征(TextGCN)支持的自由形式自然语言文本进行分类已被研究并证实是描述复杂自然语言文本的有效手段。然而,基于 TextGCN 的文本分类模型在内存消耗和模型传播和分布方面存在弱点。在本文中,我们提出了一个快速消息传递网络(FastMPN),它实现了具有消息传递架构的 GCN,通过允许可训练的节点嵌入和边权重,提供了多功能性和灵活性,帮助 GCN 模型找到更好的解决方案。我们将 FastMPN 模型应用于从癌症病理报告中提取临床信息的任务,提取了以下六个属性:主要部位、亚部位、侧别、组织学、行为和分级。
我们从微平均和宏平均 F1 分数两方面评估了 FastMPN 模型在临床任务中的性能,并与多任务卷积神经网络(MT-CNN)模型进行了比较。结果表明,FastMPN 模型与 MT-CNN 相当或优于 MT-CNN。
我们的实现表明,我们的基于 PyTorch 平台的 FastMPN 模型可以在不到 3 分钟的时间内使用一个 NVIDIA V100 硬件加速器训练一个包含 667,290 个训练样本和 202,373 个唯一单词的大型语料库。我们的实验表明,使用这种实现,从癌症病理报告中提取与肿瘤相关的临床信息任务的性能得分具有很强的竞争力。