Naufal Tsaqif, Mahendra Rahmad, Wicaksono Alfan Farizki
Faculty of Computer Science, Universitas Indonesia, Kampus UI, 16424, Depok, West Java, Indonesia.
J Biomed Semantics. 2025 May 6;16(1):8. doi: 10.1186/s13326-025-00329-2.
Online consumer health forums offer an alternative source of health-related information for internet users seeking specific details that may not be readily available through articles or other one-way communication channels. However, the effectiveness of these forums can be constrained by the limited number of healthcare professionals actively participating, which can impact response times to user inquiries. One potential solution to this issue is the integration of a semi-automatic system. A critical component of such a system is question processing, which often involves sentence recognition (SR), medical entity recognition (MER), and keyphrase extraction (KE) modules. We posit that the development of these three modules would enable the system to identify critical components of the question, thereby facilitating a deeper understanding of the question, and allowing for the re-formulation of more effective questions with extracted key information.
This work contributes to two key aspects related to these three tasks. First, we expand and publicly release an Indonesian dataset for each task. Second, we establish a baseline for all three tasks within the Indonesian language domain by employing transformer-based models with nine distinct encoder variations. Our feature studies revealed an interdependence among these three tasks. Consequently, we propose several multi-task learning (MTL) models, both in pairwise and three-way configurations, incorporating parallel and hierarchical architectures.
Using F1-score at the chunk level, the inter-annotator agreements for SR, MER, and KE tasks were , and respectively. In single-task learning (STL) settings, the best performance for each task was achieved by different model, with obtained the highest average score. These results suggested that a larger model did not always perform better. We also found no indication of which ones between Indonesian and multilingual language models that generally performed better for our tasks. In pairwise MTL settings, we found that pairing tasks could outperform the STL baseline for all three tasks. Despite varying loss weights across our three-way MTL models, we did not identify a consistent pattern. While some configurations improved MER and KE performance, none surpassed the best pairwise MTL model for the SR task.
We extended an Indonesian dataset for SR, MER, and KE tasks, resulted in 1, 173 labeled data points which splitted into 773 training instances, 200 validation instances, and 200 testing instances. We then used transformer-based models to set a baseline for all three tasks. Our MTL experiments suggested that additional information regarding the other two tasks could help the learning process for MER and KE tasks, while had only a small effect for SR task.
在线消费者健康论坛为寻求特定细节的互联网用户提供了一个与健康相关信息的替代来源,这些细节可能无法通过文章或其他单向沟通渠道轻易获取。然而,这些论坛的有效性可能会受到积极参与的医疗保健专业人员数量有限的限制,这可能会影响对用户询问的响应时间。解决这个问题的一个潜在方案是集成一个半自动系统。这种系统的一个关键组件是问题处理,它通常涉及句子识别(SR)、医学实体识别(MER)和关键短语提取(KE)模块。我们认为,这三个模块的开发将使系统能够识别问题的关键组件,从而促进对问题的更深入理解,并允许用提取的关键信息重新表述更有效的问题。
这项工作在与这三个任务相关的两个关键方面做出了贡献。首先,我们为每个任务扩展并公开发布了一个印尼语数据集。其次,我们通过采用具有九种不同编码器变体的基于变压器的模型,在印尼语领域内为所有三个任务建立了一个基线。我们的特征研究揭示了这三个任务之间的相互依存关系。因此,我们提出了几种多任务学习(MTL)模型,包括成对和三向配置,采用了并行和分层架构。
在块级别使用F1分数,SR、MER和KE任务的标注者间一致性分别为 、 和 。在单任务学习(STL)设置中,每个任务的最佳性能由不同的模型实现, 获得了最高平均分。这些结果表明,更大的模型并不总是表现得更好。我们也没有发现印尼语和多语言模型中哪一个在我们的任务中总体表现更好的迹象。在成对MTL设置中,我们发现配对任务在所有三个任务上都能优于STL基线。尽管我们的三向MTL模型的损失权重各不相同,但我们没有发现一致的模式。虽然一些配置提高了MER和KE的性能,但没有一个超过SR任务的最佳成对MTL模型。
我们扩展了用于SR、MER和KE任务的印尼语数据集,产生了1173个标注数据点,分为773个训练实例、200个验证实例和200个测试实例。然后我们使用基于变压器的模型为所有三个任务设置了一个基线。我们的MTL实验表明,关于其他两个任务的额外信息可以帮助MER和KE任务的学习过程,而对SR任务只有很小的影响。