• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用大语言模型进行可扩展的科学兴趣剖析

Scalable Scientific Interest Profiling Using Large Language Models.

作者信息

Liang Yilun, Zhang Gongbo, Sun Edward, Idnay Betina, Fang Yilu, Chen Fangyi, Ta Casey, Peng Yifan, Weng Chunhua

机构信息

Department of Biomedical Informatics, Columbia University, New York, NY, USA.

Tandon School of Engineering, New York University, Brooklyn, NY, USA.

出版信息

ArXiv. 2025 Aug 19:arXiv:2508.15834v1.

PMID:40895076
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12393255/
Abstract

OBJECTIVE

Research profiles highlight scientists' research focus, enabling talent discovery and fostering collaborations, but they are often outdated. Automated, scalable methods are urgently needed to keep these profiles current.

METHODS

In this study, we design and evaluate two Large Language Models (LLMs)-based methods to generate scientific interest profiles-one summarizing researchers' PubMed abstracts and the other generating a summary using their publications' Medical Subject Headings (MeSH) terms-and compare these machine-generated profiles with researchers' self-summarized interests. We collected the titles, MeSH terms, and abstracts of PubMed publications for 595 faculty members affiliated with Columbia University Irving Medical Center (CUIMC), for 167 of whom we obtained human-written online research profiles. Subsequently, GPT-4o-mini, a state-of-the-art LLM, was prompted to summarize each researcher's interests. Both manual and automated evaluations were conducted to characterize the similarities and differences between the machine-generated and self-written research profiles.

RESULTS

The similarity study showed low ROUGE-L, BLEU, and METEOR scores, reflecting little overlap between terminologies used in machine-generated and self-written profiles. BERTScore analysis revealed moderate semantic similarity between machine-generated and reference summaries (F1: 0.542 for MeSH-based, 0.555 for abstract-based), despite low lexical overlap. In validation, paraphrased summaries achieved a higher F1 of 0.851. A further comparison between the original and paraphrased manually written summaries indicates the limitations of such metrics. Kullback-Leibler (KL) Divergence of term frequency-inverse document frequency (TF-IDF) values (8.56 and 8.58 for profiles derived from MeSH terms and abstracts, respectively) suggests that machine-generated summaries employ different keywords than human-written summaries. Manual reviews further showed that 77.78% rated the overall impression of MeSH-based profiling as "good" or "excellent," with readability receiving favorable ratings in 93.44% of cases, though granularity and factual accuracy varied. Overall, panel reviews favored 67.86% of machine-generated profiles derived from MeSH terms over those derived from abstracts.

CONCLUSION

LLMs promise to automate scientific interest profiling at scale. Profiles derived from MeSH terms have better readability than profiles derived from abstracts. Overall, machine-generated summaries differ from human-written ones in their choice of concepts, with the latter initiating more novel ideas.

摘要

目的

研究简介突出了科学家的研究重点,有助于人才发现和促进合作,但它们往往过时。迫切需要自动化、可扩展的方法来使这些简介保持最新。

方法

在本研究中,我们设计并评估了两种基于大语言模型(LLMs)的方法来生成科学兴趣简介——一种是总结研究人员的PubMed摘要,另一种是使用其出版物的医学主题词(MeSH)生成摘要——并将这些机器生成的简介与研究人员的自我总结兴趣进行比较。我们收集了哥伦比亚大学欧文医学中心(CUIMC)595名教职员工的PubMed出版物的标题、MeSH词和摘要,其中167人的在线研究简介是人工撰写的。随后,使用最先进的大语言模型GPT-4o-mini来总结每位研究人员的兴趣。进行了人工和自动评估,以描述机器生成的和人工撰写的研究简介之间的异同。

结果

相似性研究显示ROUGE-L、BLEU和METEOR分数较低,这反映出机器生成的简介和人工撰写的简介中使用的术语几乎没有重叠。BERTScore分析表明,尽管词汇重叠率较低,但机器生成的摘要与参考摘要之间存在适度的语义相似性(基于MeSH的F1值为0.542,基于摘要的F1值为0.555)。在验证中,释义摘要的F1值更高,为0.851。原始人工撰写摘要与释义人工撰写摘要之间的进一步比较表明了此类指标的局限性。词频-逆文档频率(TF-IDF)值的Kullback-Leibler(KL)散度(分别从MeSH词和摘要得出的简介的KL散度为8.56和8.58)表明,机器生成的摘要使用的关键词与人工撰写的摘要不同。人工评审进一步表明,77.78%的人对基于MeSH的简介的总体印象评价为“好”或“优秀”,93.44%的情况下可读性得到好评,不过粒度和事实准确性各不相同。总体而言,专家小组评审更青睐67.(此处原文可能有误,推测应为67.86%)从MeSH词得出的机器生成的简介,而不是从摘要得出的简介。

结论

大语言模型有望大规模自动化科学兴趣简介的生成。从MeSH词得出的简介比从摘要得出的简介可读性更好。总体而言,机器生成的摘要与人工撰写的摘要在概念选择上有所不同,后者引发的新想法更多。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f1bb/12393255/b1b3d913e133/nihpp-2508.15834v1-f0010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f1bb/12393255/d9757d1e3ae6/nihpp-2508.15834v1-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f1bb/12393255/e1eb1fd7ff57/nihpp-2508.15834v1-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f1bb/12393255/691537310029/nihpp-2508.15834v1-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f1bb/12393255/f132432ff28c/nihpp-2508.15834v1-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f1bb/12393255/ecaca1512abc/nihpp-2508.15834v1-f0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f1bb/12393255/3512b1f31312/nihpp-2508.15834v1-f0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f1bb/12393255/4256cb31e021/nihpp-2508.15834v1-f0007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f1bb/12393255/2e83f2e8f11b/nihpp-2508.15834v1-f0008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f1bb/12393255/94be5d948c4b/nihpp-2508.15834v1-f0009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f1bb/12393255/b1b3d913e133/nihpp-2508.15834v1-f0010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f1bb/12393255/d9757d1e3ae6/nihpp-2508.15834v1-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f1bb/12393255/e1eb1fd7ff57/nihpp-2508.15834v1-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f1bb/12393255/691537310029/nihpp-2508.15834v1-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f1bb/12393255/f132432ff28c/nihpp-2508.15834v1-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f1bb/12393255/ecaca1512abc/nihpp-2508.15834v1-f0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f1bb/12393255/3512b1f31312/nihpp-2508.15834v1-f0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f1bb/12393255/4256cb31e021/nihpp-2508.15834v1-f0007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f1bb/12393255/2e83f2e8f11b/nihpp-2508.15834v1-f0008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f1bb/12393255/94be5d948c4b/nihpp-2508.15834v1-f0009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f1bb/12393255/b1b3d913e133/nihpp-2508.15834v1-f0010.jpg

相似文献

1
Scalable Scientific Interest Profiling Using Large Language Models.使用大语言模型进行可扩展的科学兴趣剖析
ArXiv. 2025 Aug 19:arXiv:2508.15834v1.
2
Prescription of Controlled Substances: Benefits and Risks管制药品的处方:益处与风险
3
Improving Large Language Models' Summarization Accuracy by Adding Highlights to Discharge Notes: Comparative Evaluation.通过在出院小结中添加重点内容提高大语言模型的总结准确性:比较评估
JMIR Med Inform. 2025 Jul 24;13:e66476. doi: 10.2196/66476.
4
ChatGPT-4o Compared With Human Researchers in Writing Plain-Language Summaries for Cochrane Reviews: A Blinded, Randomized Non-Inferiority Controlled Trial.ChatGPT-4o与人类研究人员在为Cochrane系统评价撰写通俗易懂的总结方面的比较:一项双盲、随机非劣效性对照试验。
Cochrane Evid Synth Methods. 2025 Jul 28;3(4):e70037. doi: 10.1002/cesm.70037. eCollection 2025 Jul.
5
A dataset and benchmark for hospital course summarization with adapted large language models.一个用于医院病程总结的数据集和基准测试,采用了适配的大语言模型。
J Am Med Inform Assoc. 2025 Mar 1;32(3):470-479. doi: 10.1093/jamia/ocae312.
6
Drugs for preventing postoperative nausea and vomiting in adults after general anaesthesia: a network meta-analysis.成人全身麻醉后预防术后恶心呕吐的药物:网状Meta分析
Cochrane Database Syst Rev. 2020 Oct 19;10(10):CD012859. doi: 10.1002/14651858.CD012859.pub2.
7
Healthcare outcomes assessed with observational study designs compared with those assessed in randomized trials.与随机试验中评估的医疗保健结果相比,观察性研究设计评估的医疗保健结果。
Cochrane Database Syst Rev. 2014 Apr 29;2014(4):MR000034. doi: 10.1002/14651858.MR000034.pub2.
8
Developing and Evaluating Large Language Model-Generated Emergency Medicine Handoff Notes.开发和评估大语言模型生成的急诊医学交接班记录
JAMA Netw Open. 2024 Dec 2;7(12):e2448723. doi: 10.1001/jamanetworkopen.2024.48723.
9
A comparative study of recent large language models on generating hospital discharge summaries for lung cancer patients.近期大型语言模型在生成肺癌患者出院小结方面的比较研究。
J Biomed Inform. 2025 Aug;168:104867. doi: 10.1016/j.jbi.2025.104867. Epub 2025 Jun 20.
10
Comparison of Two Modern Survival Prediction Tools, SORG-MLA and METSSS, in Patients With Symptomatic Long-bone Metastases Who Underwent Local Treatment With Surgery Followed by Radiotherapy and With Radiotherapy Alone.两种现代生存预测工具 SORG-MLA 和 METSSS 在接受手术联合放疗和单纯放疗治疗有症状长骨转移患者中的比较。
Clin Orthop Relat Res. 2024 Dec 1;482(12):2193-2208. doi: 10.1097/CORR.0000000000003185. Epub 2024 Jul 23.

本文引用的文献

1
Evaluating Large Language Models in extracting cognitive exam dates and scores.评估大语言模型在提取认知测试日期和分数方面的能力。
PLOS Digit Health. 2024 Dec 11;3(12):e0000685. doi: 10.1371/journal.pdig.0000685. eCollection 2024 Dec.
2
Summarizing, Simplifying, and Synthesizing Medical Evidence Using GPT-3 (with Varying Success).使用GPT-3总结、简化和综合医学证据(效果各异)。
Proc Conf Assoc Comput Linguist Meet. 2023 Jul;2023:1387-1407. doi: 10.18653/v1/2023.acl-short.119.
3
Matching patients to clinical trials with large language models.
利用大型语言模型为患者匹配临床试验。
Nat Commun. 2024 Nov 18;15(1):9074. doi: 10.1038/s41467-024-53081-z.
4
Closing the gap between open source and commercial large language models for medical evidence summarization.弥合用于医学证据总结的开源大型语言模型与商业大型语言模型之间的差距。
NPJ Digit Med. 2024 Sep 9;7(1):239. doi: 10.1038/s41746-024-01239-w.
5
Rare disease diagnosis using knowledge guided retrieval augmentation for ChatGPT.利用知识引导检索增强的 ChatGPT 进行罕见病诊断。
J Biomed Inform. 2024 Sep;157:104702. doi: 10.1016/j.jbi.2024.104702. Epub 2024 Jul 29.
6
Clinical Note Structural Knowledge Improves Word Sense Disambiguation.临床笔记结构知识可改善词义消歧。
AMIA Jt Summits Transl Sci Proc. 2024 May 31;2024:515-524. eCollection 2024.
7
Criteria2Query 3.0: Leveraging generative large language models for clinical trial eligibility query generation.Criteria2Query 3.0:利用生成式大型语言模型生成临床试验资格查询。
J Biomed Inform. 2024 Jun;154:104649. doi: 10.1016/j.jbi.2024.104649. Epub 2024 Apr 30.
8
Leveraging generative AI for clinical evidence synthesis needs to ensure trustworthiness.利用生成式人工智能进行临床证据综合需要确保其可信度。
J Biomed Inform. 2024 May;153:104640. doi: 10.1016/j.jbi.2024.104640. Epub 2024 Apr 10.
9
A span-based model for extracting overlapping PICO entities from randomized controlled trial publications.基于跨度的模型,用于从随机对照试验出版物中提取重叠的 PICO 实体。
J Am Med Inform Assoc. 2024 Apr 19;31(5):1163-1171. doi: 10.1093/jamia/ocae065.
10
A Survey of Clinicians' Views of the Utility of Large Language Models.临床医生对大型语言模型实用性的看法调查。
Appl Clin Inform. 2024 Mar;15(2):306-312. doi: 10.1055/a-2281-7092. Epub 2024 Mar 5.