Suppr超能文献

健康研究中的语言偏见:影响潜在语言模式的外部因素。

Language Bias in Health Research: External Factors That Influence Latent Language Patterns.

作者信息

Valdez Danny, Goodson Patricia

机构信息

Department of Applied Health Science, Indiana University School of Public Health, Bloomington, IN, United States.

Department of Health and Kinesiology, Texas A&M University, College Station, TX, United States.

出版信息

Front Res Metr Anal. 2020 Aug 20;5:4. doi: 10.3389/frma.2020.00004. eCollection 2020.

Abstract

Concerns with problematic research are primarily attributed to statistics and methods used to support data. Language, as an extended component of problematic research in published work, is rarely given the same attention despite language's equally important role in shaping the discussion and framings of presented data. This study uses a topic modeling approach to study language as a predictor of potential bias among collected publication histories of several health research areas. We applied Latent Dirichlet Allocation (LDA) topic models to dissect publication histories disaggregated by three factors commonly cited as language influencers: (1) time, to study ADHD pharmacotherapy; (2) funding source, to study sugar consumption; and (3) nation of origin, to study Pediatric Highly-Active Anti-Retroviral Therapy (P-HAART). We found that, for each factor, there were notable differences in language among each corpus when disaggregated by each factor. For time, article content changed to reflect new trends and research practices for the commonly prescribed ADHD medication, Ritalin. For funding source, industry and federally funded studies had differing foci, despite testing the same hypothesis. For nation of origin, regulatory structures between the United States and Europe seemingly influenced the direction of research. This work presents two contributions to ethics research: (1) language and language framing should be studied as carefully as numeric data among studies of rigor, reproducibility, and transparency; and (2) the scientific community should continue to apply topic models as mediums to answer hypothesis-driven research questions.

摘要

对有问题研究的担忧主要归因于用于支持数据的统计和方法。语言作为已发表作品中有问题研究的一个延伸组成部分,尽管其在塑造所呈现数据的讨论和框架方面同样重要,但很少受到同样的关注。本研究采用主题建模方法,将语言作为几个健康研究领域所收集出版历史中潜在偏差的预测指标进行研究。我们应用潜在狄利克雷分配(LDA)主题模型,剖析按通常被视为语言影响因素的三个因素分类的出版历史:(1)时间,用于研究注意力缺陷多动障碍(ADHD)药物治疗;(2)资金来源,用于研究食糖消费;(3)原产国,用于研究儿科高效抗逆转录病毒疗法(P-HAART)。我们发现,对于每个因素,当按每个因素分类时,各语料库之间在语言上存在显著差异。就时间而言,文章内容发生了变化,以反映常用的ADHD药物利他林的新趋势和研究实践。就资金来源而言,行业资助研究和联邦资助研究尽管检验的是相同假设,但重点不同。就原产国而言,美国和欧洲之间的监管结构似乎影响了研究方向。这项工作对伦理学研究有两点贡献:(1)在严谨性、可重复性和透明度研究中,应像对待数值数据一样仔细研究语言和语言框架;(2)科学界应继续应用主题模型作为媒介来回答假设驱动的研究问题。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a477/8028389/1e13c1c6749b/frma-05-00004-g0001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验