Suppr超能文献

客观性的错觉及数据挖掘结果报告的建议

Illusions of objectivity and a recommendation for reporting data mining results.

作者信息

Hauben Manfred, Reich Lester, Gerrits Charles M, Younus Muhammad

机构信息

Department of Medicine, New York University School of Medicine, Valhalla, NY, USA.

出版信息

Eur J Clin Pharmacol. 2007 May;63(5):517-21. doi: 10.1007/s00228-007-0279-3. Epub 2007 Mar 16.

Abstract

OBJECTIVE

Data mining algorithms (DMAs) are being applied to spontaneous reporting system (SRS) databases in the hope of obtaining timely insights into post-licensure safety data. Some DMAs have been characterized as "objective" screening tools. However, there are numerous available modifiable configuration parameters to choose from, including choice of vendor, that may affect results. Our objective is to compare the data mining results on pre-selected drug-event combinations (DECs) between two commonly used software programs using similar protocols.

METHODS

Two DMAs, using three thresholds, were retrospectively applied to the USFDA safety database through Q2 2005 to a set of eight pre-selected DECs.

RESULTS

Differences between the two vendors were found for the number of cases associated with a signal of disproportionate reporting (SDR), first year of SDRs, and the magnitude of the SDR scores for the selected DECs. These were deemed to be potentially significant for 45.8% (11/24) of the data points.

CONCLUSION

The observed differences between vendors could partially be explained by their differing methods of data cleaning and transformation as well as by the specific features of individual algorithms. The choices of vendors and available data mining configurations maximize the exploratory capacity of data mining, but they also raise questions about the claimed objectivity of data mining results and can make data mining exercises susceptible to confirmation bias given the exploratory nature of data mining in pharmacovigilance. When reporting results, the vendor and all data mining configuration details should be specified.

摘要

目的

数据挖掘算法(DMAs)正被应用于自发报告系统(SRS)数据库,以期及时洞察上市后安全性数据。一些数据挖掘算法被视为“客观”筛选工具。然而,有众多可修改的配置参数可供选择,包括供应商的选择,这可能会影响结果。我们的目的是使用相似方案比较两个常用软件程序对预先选定的药物 - 事件组合(DECs)的数据挖掘结果。

方法

通过使用三个阈值的两种数据挖掘算法,对美国食品药品监督管理局(USFDA)截至2005年第二季度的安全数据库回顾性应用于一组八个预先选定的药物 - 事件组合。

结果

发现两个供应商在与不成比例报告信号(SDR)相关的病例数、SDR的第一年以及选定药物 - 事件组合的SDR分数大小方面存在差异。这些差异被认为对45.8%(11/24)的数据点可能具有显著意义。

结论

供应商之间观察到的差异部分可归因于其不同的数据清理和转换方法以及个别算法的特定特征。供应商的选择和可用的数据挖掘配置最大化了数据挖掘的探索能力,但它们也引发了关于数据挖掘结果所谓客观性的问题,并且鉴于药物警戒中数据挖掘的探索性质,可能使数据挖掘活动容易受到确认偏倚的影响。报告结果时,应指定供应商和所有数据挖掘配置细节。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验