Suppr超能文献

用于乳腺 X 光报告模拟的自然语言生成模型。

Natural Language Generation Model for Mammography Reports Simulation.

出版信息

IEEE J Biomed Health Inform. 2020 Sep;24(9):2711-2717. doi: 10.1109/JBHI.2020.2980118. Epub 2020 Apr 20.

Abstract

Extending the size of labeled corpora of medical reports is a major step towards a successful training of machine learning algorithms. Simulating new text reports is a key solution for reports augmentation, which extends the cohort size. However, text generation in the medical domain is challenging because it needs to preserve both content and style that are typical for real reports, without risking the patients' privacy. In this paper, we present a conditioned LSTM-RNN architecture for simulating realistic mammography reports. We evaluated the performance by analyzing the characteristics of the simulated reports and classifying them into benign and malignant classes. An average classification AUC was calculated over two distinct test sets. A qualitative analysis was also performed in which a masked radiologist classified 0.75 of the simulated reports as real reports, showing that both the style and content of the simulated reports were similar to real reports. Finally, we compared our RNN-LSTM generative model with Markov Random Fields. The RNN-LSTM provided significantly better and more stable performance than MRFs ( , Wilcoxon).

摘要

扩展带标注的医学报告语料库的规模是成功训练机器学习算法的重要步骤。模拟新的文本报告是报告扩充的关键解决方案,它可以扩展队列规模。然而,医学领域的文本生成具有挑战性,因为它需要在不危及患者隐私的情况下保留真实报告特有的内容和风格。在本文中,我们提出了一种用于模拟逼真的乳房 X 光摄影报告的条件化 LSTM-RNN 架构。我们通过分析模拟报告的特征并将其分为良性和恶性两类来评估性能。在两个不同的测试集中计算了平均分类 AUC。还进行了定性分析,其中一位被屏蔽的放射科医生将 0.75 的模拟报告分类为真实报告,表明模拟报告的风格和内容与真实报告相似。最后,我们将我们的 RNN-LSTM 生成模型与马尔可夫随机场进行了比较。RNN-LSTM 提供的性能明显优于 MRF( ,Wilcoxon),且更稳定。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验