Phatak Atharva, Savage David W, Ohle Robert, Smith Jonathan, Mago Vijay
Department of Computer Science, Lakehead University, Thunder Bay, ON, Canada.
NOSM University, Thunder Bay, ON, Canada.
JMIR Med Inform. 2022 Nov 18;10(11):e38095. doi: 10.2196/38095.
In most cases, the abstracts of articles in the medical domain are publicly available. Although these are accessible by everyone, they are hard to comprehend for a wider audience due to the complex medical vocabulary. Thus, simplifying these complex abstracts is essential to make medical research accessible to the general public.
This study aims to develop a deep learning-based text simplification (TS) approach that converts complex medical text into a simpler version while maintaining the quality of the generated text.
A TS approach using reinforcement learning and transformer-based language models was developed. Relevance reward, Flesch-Kincaid reward, and lexical simplicity reward were optimized to help simplify jargon-dense complex medical paragraphs to their simpler versions while retaining the quality of the text. The model was trained using 3568 complex-simple medical paragraphs and evaluated on 480 paragraphs via the help of automated metrics and human annotation.
The proposed method outperformed previous baselines on Flesch-Kincaid scores (11.84) and achieved comparable performance with other baselines when measured using ROUGE-1 (0.39), ROUGE-2 (0.11), and SARI scores (0.40). Manual evaluation showed that percentage agreement between human annotators was more than 70% when factors such as fluency, coherence, and adequacy were considered.
A unique medical TS approach is successfully developed that leverages reinforcement learning and accurately simplifies complex medical paragraphs, thereby increasing their readability. The proposed TS approach can be applied to automatically generate simplified text for complex medical text data, which would enhance the accessibility of biomedical research to a wider audience.
在大多数情况下,医学领域文章的摘要都是公开可用的。尽管每个人都可以获取这些摘要,但由于复杂的医学词汇,广大受众很难理解。因此,简化这些复杂的摘要对于让公众能够接触到医学研究至关重要。
本研究旨在开发一种基于深度学习的文本简化(TS)方法,该方法能将复杂的医学文本转换为更简单的版本,同时保持生成文本的质量。
开发了一种使用强化学习和基于Transformer的语言模型的TS方法。对相关性奖励、弗莱什-金凯德奖励和词汇简单性奖励进行了优化,以帮助将充斥着行话的复杂医学段落简化为更简单的版本,同时保留文本质量。该模型使用3568个复杂-简单医学段落进行训练,并通过自动指标和人工标注对480个段落进行评估。
所提出的方法在弗莱什-金凯德分数(11.84)上优于先前的基线,在使用ROUGE-1(0.39)、ROUGE-2(0.11)和SARI分数(0.40)进行测量时,与其他基线取得了相当的性能。人工评估表明,在考虑流畅性、连贯性和充分性等因素时,人工标注者之间的百分比一致性超过70%。
成功开发了一种独特的医学TS方法,该方法利用强化学习并准确简化复杂的医学段落,从而提高其可读性。所提出的TS方法可应用于为复杂的医学文本数据自动生成简化文本,这将提高生物医学研究对更广泛受众的可及性。