Medical Informatics, Friedrich-Alexander Universität Erlangen-Nürnberg (FAU), Erlangen, Germany.
Medical Center for Information and Communication Technology, University Hospital Erlangen, Erlangen, Germany.
Int J Med Inform. 2019 Sep;129:114-121. doi: 10.1016/j.ijmedinf.2019.05.019. Epub 2019 May 30.
Text summarization of clinical trial descriptions has the potential to reduce the time required to familiarize oneself with the subject of studies by condensing long-form detailed descriptions to concise, meaning-preserving synopses. This work describes the process and quality of automatically generated summaries of clinical trial descriptions using extractive text summarization methods.
We generated a novel dataset from the detailed descriptions and brief summaries of trials registered on clinicaltrials.gov. We executed several text summarization algorithms on the detailed descriptions in this corpus and calculated the standard ROUGE metrics using the brief summaries included in the record as a reference. To investigate the correlation of these metrics with human sentiments, four reviewers assessed the content-completeness of the generated summaries and the helpfulness of both the generated and reference summaries via a Likert scale questionnaire.
The filtering stages of the dataset generation process reduce the 277,228 trials registered on clinicaltrials.gov to 101,016 records usable for the summarization task. On average, the summaries in this corpus are 25% the length of the detailed descriptions. Of the evaluated text summarization methods, the TextRank algorithm exhibits the overall best performance with a ROUGE-1 F1 score of 0.3531, ROUGE-2 F1 score of 0.1723, and ROUGE-L F1 score of 0.3003. These scores correlate with the assessment of the helpfulness and content similarity by the human reviewers. Inter-rater agreement for the helpfulness and content similarity was slight and fair respectively (Fleiss' kappa of 0.12 and 0.22).
Extractive summarization is a viable tool for generating meaning-preserving synopses of detailed clinical trial descriptions. Further, the human evaluation has shown that the ROUGE-L F1 score is useful for rating the general quality of generated summaries of clinical trial descriptions in an automated way.
临床试验描述的文本摘要具有通过将冗长的详细描述压缩为简洁、保留意义的概要来减少熟悉研究主题所需时间的潜力。这项工作描述了使用提取式文本摘要方法自动生成临床试验描述摘要的过程和质量。
我们从 clinicaltrials.gov 上注册的试验的详细描述和简要摘要中生成了一个新的数据集。我们在这个语料库中的详细描述上执行了几个文本摘要算法,并使用记录中包含的简要摘要作为参考计算了标准 ROUGE 指标。为了研究这些指标与人类情感的相关性,四位审阅者通过李克特量表问卷评估了生成摘要的内容完整性和生成摘要和参考摘要的有用性。
数据集生成过程的过滤阶段将 clinicaltrials.gov 上注册的 277,228 项试验减少到 101,016 项可用于摘要任务的记录。在这个语料库中,摘要的平均长度是详细描述的 25%。在所评估的文本摘要方法中,TextRank 算法的整体表现最佳,ROUGE-1 F1 得分为 0.3531,ROUGE-2 F1 得分为 0.1723,ROUGE-L F1 得分为 0.3003。这些分数与人类审阅者对有用性和内容相似性的评估相关。有用性和内容相似性的组内一致性分别为轻微和公平(Fleiss' kappa 为 0.12 和 0.22)。
提取式摘要方法是生成详细临床试验描述有意义概要的可行工具。此外,人类评估表明,ROUGE-L F1 分数可用于自动评估临床试验描述生成摘要的总体质量。