Suppr超能文献

缺乏经验的临床研究人员所生成的数据驱动型假设的质量:一项案例研究。

The quality of data-driven hypotheses generated by inexperienced clinical researchers: A case study.

作者信息

Ernst Mytchell A, Draghi Brooke N, Cimino James J, Patel Vimla L, Zhou Yuchun, Shubrook Jay H, De Lacalle Sonsoles, Weaver Aneesa, Liu Chang, Jing Xia

机构信息

Department of Public Health Sciences, Clemson University, Clemson, SC.

Department of Biomedical Informatics and Data Science, Heersink School of Medicine, The University of Alabama at Birmingham, Birmingham, AL.

出版信息

medRxiv. 2024 Aug 13:2024.08.12.24311877. doi: 10.1101/2024.08.12.24311877.

Abstract

OBJECTIVES

We invited inexperienced clinical researchers to analyze coded health datasets and develop hypotheses. We recorded and analyzed their hypothesis generation process. All the hypotheses generated in the process were rated by the same group of seven experts by using the same metrics. This case study examines the higher quality (i.e., higher ratings) and lower quality of hypotheses and participants who generated them. We characterized the contextual factors associated with the quality of hypotheses.

METHODS

All participants (i.e., clinical researchers) completed a 2-hour study session to analyze data and generate scientific hypotheses using the think-aloud method. Participants' screen activity and audio were recorded and transcribed. These transcriptions were used to measure the time used to generate each hypothesis and to code cognitive events (i.e., cognitive activities used when generating hypotheses, for example, "Seeking for Connection" describes an attempt to draw connections between data points). The hypothesis ratings by the expert panel were used as the quality of the hypotheses during the analysis. We analyzed the factors associated with (1) the five highest and (2) five lowest rated hypotheses and (3) the participants who generated them, including the number of hypotheses per participant, the validity of those hypotheses, the number of cognitive events used for each hypothesis, as well as the participant's research experience and basic demographics.

RESULTS

Participants who generated the five highest-rated hypotheses used similar lengths of time (difference 3:03), whereas those who generated the five lowest-rated hypotheses used more varying lengths of time (difference 7:13). Participants who generated the five highest-rated hypotheses also utilized slightly fewer cognitive events on average compared to the five lowest-rated hypotheses (4 per hypothesis vs. 4.8 per hypothesis). When we examine the participants (who generated the five highest and five lowest hypotheses) and their total hypotheses generated during the 2-hour study sessions, the participants with the five highest-rated hypotheses again had a shorter range of time per hypothesis on average (0:03:34 vs. 0:07:17). They (with the five highest ratings) used fewer cognitive events per hypothesis (3.498 vs. 4.626). They (with the five highest ratings) also had a higher percentage of valid rate (75.51% vs. 63.63%) and generally had more experience with clinical research.

CONCLUSION

The quality of the hypotheses was shown to be associated with the time taken to generate them, where too long or too short time to generate hypotheses appears to be negatively associated with the hypotheses' quality ratings. Also, having more experience seems to positively correlate with higher ratings of hypotheses and higher valid rates. Validity is a quality dimension used by the expert panel during rating. However, we acknowledge that our results are anecdotal. The effect may not be simply linear, and future research is necessary. These results underscore the multi-factor nature of hypothesis generation.

摘要

目的

我们邀请了缺乏经验的临床研究人员分析编码后的健康数据集并提出假设。我们记录并分析了他们的假设生成过程。在此过程中生成的所有假设均由同一组七名专家使用相同的指标进行评分。本案例研究考察了假设的较高质量(即较高评分)和较低质量,以及提出这些假设的参与者。我们描述了与假设质量相关的背景因素。

方法

所有参与者(即临床研究人员)完成了一个为时两小时的研究环节,采用出声思维法分析数据并提出科学假设。记录并转录了参与者的屏幕活动和音频。这些转录内容用于测量生成每个假设所用的时间,并对认知事件进行编码(即生成假设时使用的认知活动,例如,“寻求联系”描述了尝试在数据点之间建立联系的行为)。在分析过程中,专家小组的假设评分被用作假设的质量指标。我们分析了与(1)评分最高的五个和(2)评分最低的五个假设以及(3)提出这些假设的参与者相关的因素,包括每个参与者提出的假设数量、这些假设的有效性、用于每个假设的认知事件数量,以及参与者的研究经验和基本人口统计学信息。

结果

提出评分最高的五个假设的参与者所用时间长度相似(相差3分03秒),而提出评分最低的五个假设的参与者所用时间差异较大(相差7分13秒)。与评分最低的五个假设相比,提出评分最高的五个假设的参与者平均使用的认知事件也略少(每个假设4个 vs. 每个假设4.8个)。当我们考察提出评分最高和最低的五个假设的参与者以及他们在两小时研究环节中提出的假设总数时,提出评分最高的五个假设的参与者每个假设平均用时范围再次较短(0:03:34 vs. 0:07:17)。他们(评分最高的)每个假设使用的认知事件较少(3.498个 vs. 4.626个)。他们(评分最高的)有效率百分比也更高(75.51% vs. 63.63%),并且一般临床研究经验更丰富。

结论

研究表明,假设的质量与生成假设所用的时间有关,生成假设的时间过长或过短似乎都与假设的质量评分呈负相关。此外,经验更丰富似乎与假设的较高评分和较高有效率呈正相关。有效性是专家小组在评分时使用的一个质量维度。然而,我们承认我们的结果只是轶事性的。这种影响可能并非简单的线性关系,未来有必要进行进一步研究。这些结果强调了假设生成的多因素性质。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验