• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用社区参与研究协议对小数据集进行分类的基于注意力的模型:分类系统开发与验证试点研究

Attention-Based Models for Classifying Small Data Sets Using Community-Engaged Research Protocols: Classification System Development and Validation Pilot Study.

作者信息

Ferrell Brian J, Raskin Sarah E, Zimmerman Emily B, Timberline David H, McInnes Bridget T, Krist Alex H

机构信息

Center for Community Engagement and Impact, Virginia Commonwealth University, Richmond, VA, United States.

L Douglas Wilder School of Government and Public Affairs, Virginia Commonwealth University, Richmond, VA, United States.

出版信息

JMIR Form Res. 2022 Sep 6;6(9):e32460. doi: 10.2196/32460.

DOI:10.2196/32460
PMID:36066925
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9490525/
Abstract

BACKGROUND

Community-engaged research (CEnR) is a research approach in which scholars partner with community organizations or individuals with whom they share an interest in the study topic, typically with the goal of supporting that community's well-being. CEnR is well-established in numerous disciplines including the clinical and social sciences. However, universities experience challenges reporting comprehensive CEnR metrics, limiting the development of appropriate CEnR infrastructure and the advancement of relationships with communities, funders, and stakeholders.

OBJECTIVE

We propose a novel approach to identifying and categorizing community-engaged studies by applying attention-based deep learning models to human participants protocols that have been submitted to the university's institutional review board (IRB).

METHODS

We manually classified a sample of 280 protocols submitted to the IRB using a 3- and 6-level CEnR heuristic. We then trained an attention-based bidirectional long short-term memory unit (Bi-LSTM) on the classified protocols and compared it to transformer models such as Bidirectional Encoder Representations From Transformers (BERT), Bio + Clinical BERT, and Cross-lingual Language Model-Robustly Optimized BERT Pre-training Approach (XLM-RoBERTa). We applied the best-performing models to the full sample of unlabeled IRB protocols submitted in the years 2013-2019 (n>6000).

RESULTS

Although transfer learning is superior, receiving a 0.9952 evaluation F1 score for all transformer models implemented compared to the attention-based Bi-LSTM (between 48%-80%), there were key issues with overfitting. This finding is consistent across several methodological adjustments: an augmented data set with and without cross-validation, an unaugmented data set with and without cross-validation, a 6-class CEnR spectrum, and a 3-class one.

CONCLUSIONS

Transfer learning is a more viable method than the attention-based bidirectional-LSTM for differentiating small data sets characterized by the idiosyncrasies and variability of CEnR descriptions used by principal investigators in research protocols. Despite these issues involving overfitting, BERT and the other transformer models remarkably showed an understanding of our data unlike the attention-based Bi-LSTM model, promising a more realistic path toward solving this real-world application.

摘要

背景

社区参与研究(CEnR)是一种研究方法,学者们与社区组织或个人合作,他们对研究主题有着共同的兴趣,通常旨在促进该社区的福祉。CEnR在包括临床和社会科学在内的众多学科中已得到确立。然而,大学在报告全面的CEnR指标方面面临挑战,这限制了适当的CEnR基础设施的发展以及与社区、资助者和利益相关者关系的推进。

目的

我们提出一种新颖的方法,通过将基于注意力的深度学习模型应用于已提交给大学机构审查委员会(IRB)的人类受试者方案,来识别和分类社区参与研究。

方法

我们使用3级和6级CEnR启发式方法对手动分类的280个提交给IRB的方案样本进行分类。然后,我们在分类后的方案上训练基于注意力的双向长短期记忆单元(Bi-LSTM),并将其与诸如来自变换器的双向编码器表示(BERT)、生物+临床BERT以及跨语言语言模型-稳健优化的BERT预训练方法(XLM-RoBERTa)等变换器模型进行比较。我们将性能最佳的模型应用于2013 - 2019年提交的未标记IRB方案的完整样本(n>6000)。

结果

尽管迁移学习更具优势,与基于注意力的Bi-LSTM相比(介于48% - 80%之间),所有实施的变换器模型的评估F1分数为0.9952,但存在过度拟合的关键问题。这一发现在多种方法调整中是一致的:有交叉验证和无交叉验证的增强数据集、有交叉验证和无交叉验证的未增强数据集、6类CEnR频谱以及3类CEnR频谱。

结论

对于区分以研究方案中主要研究者使用的CEnR描述的特质和变异性为特征的小数据集,迁移学习是比基于注意力的双向LSTM更可行的方法。尽管存在这些过度拟合问题,但与基于注意力的Bi-LSTM模型不同,BERT和其他变换器模型显著表现出对我们数据的理解,有望为解决这一实际应用问题开辟一条更现实的道路。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/231c/9490525/5ef5c0af6e80/formative_v6i9e32460_fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/231c/9490525/1acd69ad361d/formative_v6i9e32460_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/231c/9490525/982f986e6ffb/formative_v6i9e32460_fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/231c/9490525/3a4f74b0f29c/formative_v6i9e32460_fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/231c/9490525/5ef5c0af6e80/formative_v6i9e32460_fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/231c/9490525/1acd69ad361d/formative_v6i9e32460_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/231c/9490525/982f986e6ffb/formative_v6i9e32460_fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/231c/9490525/3a4f74b0f29c/formative_v6i9e32460_fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/231c/9490525/5ef5c0af6e80/formative_v6i9e32460_fig4.jpg

相似文献

1
Attention-Based Models for Classifying Small Data Sets Using Community-Engaged Research Protocols: Classification System Development and Validation Pilot Study.使用社区参与研究协议对小数据集进行分类的基于注意力的模型:分类系统开发与验证试点研究
JMIR Form Res. 2022 Sep 6;6(9):e32460. doi: 10.2196/32460.
2
Fine-tuning Strategies for Classifying Community-Engaged Research Studies Using Transformer-Based Models: Algorithm Development and Improvement Study.使用基于Transformer的模型对社区参与研究进行分类的微调策略:算法开发与改进研究
JMIR Form Res. 2023 Feb 7;7:e41137. doi: 10.2196/41137.
3
Calibrating a Transformer-Based Model's Confidence on Community-Engaged Research Studies: Decision Support Evaluation Study.校准基于Transformer的模型对社区参与研究的置信度:决策支持评估研究
JMIR Form Res. 2023 Mar 20;7:e41516. doi: 10.2196/41516.
4
A comparative study on deep learning models for text classification of unstructured medical notes with various levels of class imbalance.深度学习模型在不同类别不平衡程度的非结构化医疗记录文本分类中的对比研究。
BMC Med Res Methodol. 2022 Jul 2;22(1):181. doi: 10.1186/s12874-022-01665-y.
5
Developing a classification system and algorithm to track community-engaged research using IRB protocols at a large research university.在一所大型研究型大学开发一个分类系统和算法,以使用机构审查委员会(IRB)协议跟踪社区参与研究。
J Clin Transl Sci. 2021 Nov 22;6(1):e6. doi: 10.1017/cts.2021.877. eCollection 2022.
6
Identifying the Perceived Severity of Patient-Generated Telemedical Queries Regarding COVID: Developing and Evaluating a Transfer Learning-Based Solution.识别患者生成的关于新冠病毒的远程医疗查询的感知严重程度:开发和评估基于迁移学习的解决方案。
JMIR Med Inform. 2022 Sep 2;10(9):e37770. doi: 10.2196/37770.
7
Deep Learning Approach for Negation and Speculation Detection for Automated Important Finding Flagging and Extraction in Radiology Report: Internal Validation and Technique Comparison Study.用于放射学报告中自动重要发现标记和提取的否定与推测检测的深度学习方法:内部验证与技术比较研究
JMIR Med Inform. 2023 Apr 25;11:e46348. doi: 10.2196/46348.
8
Multi-Label Classification in Patient-Doctor Dialogues With the RoBERTa-WWM-ext + CNN (Robustly Optimized Bidirectional Encoder Representations From Transformers Pretraining Approach With Whole Word Masking Extended Combining a Convolutional Neural Network) Model: Named Entity Study.基于RoBERTa-WWM-ext + CNN(带有全词掩码扩展的基于变换器预训练方法的稳健优化双向编码器表示与卷积神经网络相结合)模型的医患对话多标签分类:命名实体研究
JMIR Med Inform. 2022 Apr 21;10(4):e35606. doi: 10.2196/35606.
9
Extracting comprehensive clinical information for breast cancer using deep learning methods.利用深度学习方法提取乳腺癌全面临床信息。
Int J Med Inform. 2019 Dec;132:103985. doi: 10.1016/j.ijmedinf.2019.103985. Epub 2019 Oct 2.
10
Automatic text classification of actionable radiology reports of tinnitus patients using bidirectional encoder representations from transformer (BERT) and in-domain pre-training (IDPT).使用基于转换器的双向编码器表示 (BERT) 和领域内预训练 (IDPT) 对耳鸣患者的可操作放射学报告进行自动文本分类。
BMC Med Inform Decis Mak. 2022 Jul 30;22(1):200. doi: 10.1186/s12911-022-01946-y.

引用本文的文献

1
Calibrating a Transformer-Based Model's Confidence on Community-Engaged Research Studies: Decision Support Evaluation Study.校准基于Transformer的模型对社区参与研究的置信度:决策支持评估研究
JMIR Form Res. 2023 Mar 20;7:e41516. doi: 10.2196/41516.
2
Fine-tuning Strategies for Classifying Community-Engaged Research Studies Using Transformer-Based Models: Algorithm Development and Improvement Study.使用基于Transformer的模型对社区参与研究进行分类的微调策略:算法开发与改进研究
JMIR Form Res. 2023 Feb 7;7:e41137. doi: 10.2196/41137.

本文引用的文献

1
Developing a classification system and algorithm to track community-engaged research using IRB protocols at a large research university.在一所大型研究型大学开发一个分类系统和算法,以使用机构审查委员会(IRB)协议跟踪社区参与研究。
J Clin Transl Sci. 2021 Nov 22;6(1):e6. doi: 10.1017/cts.2021.877. eCollection 2022.
2
Conducting a Community "Street Survey" to Inform an Obesity Intervention: The WE Project.开展社区“街头调查”以了解肥胖干预情况:WE 项目。
Fam Community Health. 2021;44(3):117-125. doi: 10.1097/FCH.0000000000000271.
3
Comparing deep learning architectures for sentiment analysis on drug reviews.
比较药物评论情感分析的深度学习架构。
J Biomed Inform. 2020 Oct;110:103539. doi: 10.1016/j.jbi.2020.103539. Epub 2020 Aug 17.
4
Back to the Future: Achieving Health Equity Through Health Informatics and Digital Health.回到未来:通过健康信息学和数字健康实现健康公平
JMIR Mhealth Uhealth. 2020 Jan 14;8(1):e14512. doi: 10.2196/14512.
5
A Review of Recurrent Neural Networks: LSTM Cells and Network Architectures.递归神经网络综述:长短期记忆细胞和网络架构。
Neural Comput. 2019 Jul;31(7):1235-1270. doi: 10.1162/neco_a_01199. Epub 2019 May 21.
6
Using Asset Mapping to Engage Youth in Community-Based Participatory Research: The WE Project.利用资产映射让青少年参与基于社区的参与式研究:WE项目。
Prog Community Health Partnersh. 2018;12(2):223-236. doi: 10.1353/cpr.2018.0042.
7
Defining and Measuring Community Engagement and Community-Engaged Research: Clinical and Translational Science Institutional Practices.界定与衡量社区参与及社区参与研究:临床与转化科学机构实践
Prog Community Health Partnersh. 2018;12(2):145-156. doi: 10.1353/cpr.2018.0034.
8
Long short-term memory.长短期记忆
Neural Comput. 1997 Nov 15;9(8):1735-80. doi: 10.1162/neco.1997.9.8.1735.