Knepper David, Lindblad Anne S, Sharma Gaurav, Gensler Gary R, Manukyan Zorayr, Matthews Abigail G, Seifu Yodit
1 Drug Development Operations, Allergan, Jersey City, NJ, USA.
2 The Emmes Corporation, Rockville, MD, USA.
Ther Innov Regul Sci. 2016 Mar;50(2):144-154. doi: 10.1177/2168479016630576.
Traditional site-monitoring techniques are not optimal in finding data fabrication and other nonrandom data distributions with the greatest potential for jeopardizing the validity of study results. TransCelerate BioPharma conducted an experiment testing the utility of statistical methods for detecting implanted fabricated data and other signals of noncompliance.
TransCelerate tested statistical monitoring on a data set from a chronic obstructive pulmonary disease (COPD) clinical study with 178 sites and 1554 subjects. Fabricated data were selectively implanted in 7 sites and 43 subjects by expert clinicians in COPD. The data set was partitioned to simulate studies of different sizes. Analyses of vital signs, spirometry, visit dates, and adverse events included distributions of standard deviations, correlations, repeated values, digit preference, and outlier/inlier detection. An interpretation team, including clinicians, statisticians, site monitoring, and data management, reviewed the results and created an algorithm to flag sites for fabricated data.
The algorithm identified 11 sites (19%), 19 sites (31%), 28 sites (16%), and 45 sites (25%) as having potentially fabricated data for studies 2A, 2, 1A, and 1, respectively. For study 2A, 3 of 7 sites with fabricated data were detected, 5 of 7 were detected for studies 2 and 1A, and 6 of 7 for study 1. Except for study 2A, the algorithm had good sensitivity and specificity (>70%) for identifying sites with fabricated data.
We recommend a cross-functional, collaborative approach to statistical monitoring that can adapt to study design and data source and use a combination of statistical screening techniques and confirmatory graphics.
传统的现场监测技术在发现数据造假以及其他最有可能危及研究结果有效性的非随机数据分布方面并非最佳选择。跨速生物制药公司进行了一项实验,测试统计方法在检测植入的伪造数据和其他违规信号方面的效用。
跨速生物制药公司对一项慢性阻塞性肺疾病(COPD)临床研究的数据集进行了统计监测测试,该研究有178个研究点和1554名受试者。慢性阻塞性肺疾病领域的专家临床医生在7个研究点和43名受试者中选择性地植入了伪造数据。该数据集被划分以模拟不同规模的研究。对生命体征、肺功能、就诊日期和不良事件的分析包括标准差分布、相关性、重复值、数字偏好以及异常值/正常值检测。一个由临床医生、统计学家、现场监测人员和数据管理人员组成的解读团队对结果进行了审查,并创建了一种算法来标记存在伪造数据的研究点。
该算法分别将11个研究点(19%)、19个研究点(31%)、28个研究点(16%)和45个研究点(25%)识别为在研究2A、2、1A和1中可能存在伪造数据。对于研究2A,7个植入伪造数据的研究点中检测出3个,研究2和1A中检测出7个中的5个,研究1中检测出7个中的6个。除研究2A外,该算法在识别存在伪造数据的研究点方面具有良好的敏感性和特异性(>70%)。
我们建议采用一种跨职能的协作方法进行统计监测,该方法能够适应研究设计和数据源,并结合使用统计筛选技术和验证性图表。