Hand David J
Department of Mathematics, Imperial College London, London, UK.
Drug Saf. 2007;30(7):621-2. doi: 10.2165/00002018-200730070-00010.
Data mining is the discovery of interesting, unexpected or valuable structures in large datasets. As such, it has two rather different aspects. One of these concerns large-scale, 'global' structures, and the aim is to model the shapes, or features of the shapes, of distributions. The other concerns small-scale, 'local' structures, and the aim is to detect these anomalies and decide if they are real or chance occurrences. In the context of signal detection in the pharmaceutical sector, most interest lies in the second of the above two aspects; however, signal detection occurs relative to an assumed background model, therefore, some discussion of the first aspect is also necessary. This paper gives a lightning overview of data mining and its relation to statistics, with particular emphasis on tools for the detection of adverse drug reactions.
数据挖掘是指在大型数据集中发现有趣的、意想不到的或有价值的结构。因此,它有两个截然不同的方面。其中一个涉及大规模的“全局”结构,目的是对分布的形状或形状特征进行建模。另一个涉及小规模的“局部”结构,目的是检测这些异常情况并确定它们是真实存在还是偶然发生。在制药行业的信号检测背景下,大多数兴趣集中在上述两个方面中的第二个;然而,信号检测是相对于一个假定的背景模型进行的,因此,对第一个方面也需要进行一些讨论。本文简要概述了数据挖掘及其与统计学的关系,特别强调了用于检测药物不良反应的工具。