• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

临床试验描述的抽取式总结。

Extractive summarization of clinical trial descriptions.

机构信息

Medical Informatics, Friedrich-Alexander Universität Erlangen-Nürnberg (FAU), Erlangen, Germany.

Medical Center for Information and Communication Technology, University Hospital Erlangen, Erlangen, Germany.

出版信息

Int J Med Inform. 2019 Sep;129:114-121. doi: 10.1016/j.ijmedinf.2019.05.019. Epub 2019 May 30.

DOI:10.1016/j.ijmedinf.2019.05.019
PMID:31445245
Abstract

PURPOSE

Text summarization of clinical trial descriptions has the potential to reduce the time required to familiarize oneself with the subject of studies by condensing long-form detailed descriptions to concise, meaning-preserving synopses. This work describes the process and quality of automatically generated summaries of clinical trial descriptions using extractive text summarization methods.

METHODS

We generated a novel dataset from the detailed descriptions and brief summaries of trials registered on clinicaltrials.gov. We executed several text summarization algorithms on the detailed descriptions in this corpus and calculated the standard ROUGE metrics using the brief summaries included in the record as a reference. To investigate the correlation of these metrics with human sentiments, four reviewers assessed the content-completeness of the generated summaries and the helpfulness of both the generated and reference summaries via a Likert scale questionnaire.

RESULTS

The filtering stages of the dataset generation process reduce the 277,228 trials registered on clinicaltrials.gov to 101,016 records usable for the summarization task. On average, the summaries in this corpus are 25% the length of the detailed descriptions. Of the evaluated text summarization methods, the TextRank algorithm exhibits the overall best performance with a ROUGE-1 F1 score of 0.3531, ROUGE-2 F1 score of 0.1723, and ROUGE-L F1 score of 0.3003. These scores correlate with the assessment of the helpfulness and content similarity by the human reviewers. Inter-rater agreement for the helpfulness and content similarity was slight and fair respectively (Fleiss' kappa of 0.12 and 0.22).

CONCLUSIONS

Extractive summarization is a viable tool for generating meaning-preserving synopses of detailed clinical trial descriptions. Further, the human evaluation has shown that the ROUGE-L F1 score is useful for rating the general quality of generated summaries of clinical trial descriptions in an automated way.

摘要

目的

临床试验描述的文本摘要具有通过将冗长的详细描述压缩为简洁、保留意义的概要来减少熟悉研究主题所需时间的潜力。这项工作描述了使用提取式文本摘要方法自动生成临床试验描述摘要的过程和质量。

方法

我们从 clinicaltrials.gov 上注册的试验的详细描述和简要摘要中生成了一个新的数据集。我们在这个语料库中的详细描述上执行了几个文本摘要算法,并使用记录中包含的简要摘要作为参考计算了标准 ROUGE 指标。为了研究这些指标与人类情感的相关性,四位审阅者通过李克特量表问卷评估了生成摘要的内容完整性和生成摘要和参考摘要的有用性。

结果

数据集生成过程的过滤阶段将 clinicaltrials.gov 上注册的 277,228 项试验减少到 101,016 项可用于摘要任务的记录。在这个语料库中,摘要的平均长度是详细描述的 25%。在所评估的文本摘要方法中,TextRank 算法的整体表现最佳,ROUGE-1 F1 得分为 0.3531,ROUGE-2 F1 得分为 0.1723,ROUGE-L F1 得分为 0.3003。这些分数与人类审阅者对有用性和内容相似性的评估相关。有用性和内容相似性的组内一致性分别为轻微和公平(Fleiss' kappa 为 0.12 和 0.22)。

结论

提取式摘要方法是生成详细临床试验描述有意义概要的可行工具。此外,人类评估表明,ROUGE-L F1 分数可用于自动评估临床试验描述生成摘要的总体质量。

相似文献

1
Extractive summarization of clinical trial descriptions.临床试验描述的抽取式总结。
Int J Med Inform. 2019 Sep;129:114-121. doi: 10.1016/j.ijmedinf.2019.05.019. Epub 2019 May 30.
2
CERC: an interactive content extraction, recognition, and construction tool for clinical and biomedical text.CERC:一个用于临床和生物医学文本的交互式内容提取、识别和构建工具。
BMC Med Inform Decis Mak. 2020 Dec 15;20(Suppl 14):306. doi: 10.1186/s12911-020-01330-8.
3
Extractive single document summarization using binary differential evolution: Optimization of different sentence quality measures.基于二进制差分进化的抽取式单文档摘要:不同句子质量度量的优化。
PLoS One. 2019 Nov 14;14(11):e0223477. doi: 10.1371/journal.pone.0223477. eCollection 2019.
4
Exploring the potential of ChatGPT in medical dialogue summarization: a study on consistency with human preferences.探索 ChatGPT 在医学对话总结中的潜力:一项关于与人类偏好一致性的研究。
BMC Med Inform Decis Mak. 2024 Mar 14;24(1):75. doi: 10.1186/s12911-024-02481-8.
5
Exploring optimal granularity for extractive summarization of unstructured health records: Analysis of the largest multi-institutional archive of health records in Japan.探索非结构化健康记录提取式摘要的最佳粒度:对日本最大的多机构健康记录存档进行分析。
PLOS Digit Health. 2022 Sep 15;1(9):e0000099. doi: 10.1371/journal.pdig.0000099. eCollection 2022 Sep.
6
Reaching for upper bound ROUGE score of extractive summarization methods.追求抽取式摘要方法的最高ROUGE分数。
PeerJ Comput Sci. 2022 Sep 26;8:e1103. doi: 10.7717/peerj-cs.1103. eCollection 2022.
7
Graph-based extractive text summarization method for Hausa text.基于图的豪萨文本抽取式文本摘要方法。
PLoS One. 2023 May 9;18(5):e0285376. doi: 10.1371/journal.pone.0285376. eCollection 2023.
8
Qualitative Analysis of Text Summarization Techniques and Its Applications in Health Domain.文本摘要技术的定性分析及其在健康领域的应用。
Comput Intell Neurosci. 2022 Feb 9;2022:3411881. doi: 10.1155/2022/3411881. eCollection 2022.
9
Biomedical semantic text summarizer.生物医学语义文本摘要器。
BMC Bioinformatics. 2024 Apr 16;25(1):152. doi: 10.1186/s12859-024-05712-x.
10
Single document text summarization addressed with a cat swarm optimization approach.基于猫群优化算法的单文档文本摘要
Appl Intell (Dordr). 2023;53(10):12268-12287. doi: 10.1007/s10489-022-04149-0. Epub 2022 Sep 24.

引用本文的文献

1
Evaluation of the Performance of Large Language Models in the Management of Axial Spondyloarthropathy: Analysis of EULAR 2022 Recommendations.大型语言模型在轴性脊柱关节炎管理中的性能评估:对欧洲抗风湿病联盟2022年建议的分析
Diagnostics (Basel). 2025 Jun 7;15(12):1455. doi: 10.3390/diagnostics15121455.
2
Multimodal Integrated Knowledge Transfer to Large Language Models through Preference Optimization with Biomedical Applications.通过偏好优化将多模态集成知识转移到具有生物医学应用的大语言模型
ArXiv. 2025 May 9:arXiv:2505.05736v1.
3
Empowering PET imaging reporting with retrieval-augmented large language models and reading reports database: a pilot single center study.
利用检索增强大语言模型和阅读报告数据库提升PET成像报告质量:一项单中心试点研究
Eur J Nucl Med Mol Imaging. 2025 Jun;52(7):2452-2462. doi: 10.1007/s00259-025-07101-9. Epub 2025 Jan 23.
4
[The Medical Informatics Initiative at a glance-establishing a health research data infrastructure in Germany].[德国医学信息学倡议概览——建立健康研究数据基础设施]
Bundesgesundheitsblatt Gesundheitsforschung Gesundheitsschutz. 2024 Jun;67(6):616-628. doi: 10.1007/s00103-024-03887-5. Epub 2024 Jun 5.
5
A systematic review of automatic text summarization for biomedical literature and EHRs.生物医学文献和电子健康记录的自动文本摘要的系统评价。
J Am Med Inform Assoc. 2021 Sep 18;28(10):2287-2297. doi: 10.1093/jamia/ocab143.
6
Automated classification of clinical trial eligibility criteria text based on ensemble learning and metric learning.基于集成学习和度量学习的临床试验资格标准文本的自动分类。
BMC Med Inform Decis Mak. 2021 Jul 30;21(Suppl 2):129. doi: 10.1186/s12911-021-01492-z.
7
Application of BERT to Enable Gene Classification Based on Clinical Evidence.基于临床证据的基因分类中 BERT 的应用
Biomed Res Int. 2020 Oct 7;2020:5491963. doi: 10.1155/2020/5491963. eCollection 2020.
8
An Ensemble Learning Strategy for Eligibility Criteria Text Classification for Clinical Trial Recruitment: Algorithm Development and Validation.一种用于临床试验招募资格标准文本分类的集成学习策略:算法开发与验证
JMIR Med Inform. 2020 Jul 1;8(7):e17832. doi: 10.2196/17832.