Division of Cancer Epidemiology and Genetics, National Cancer Institute, Rockville, MD, USA.
Stat Med. 2012 Feb 28;31(5):436-48. doi: 10.1002/sim.4422. Epub 2011 Dec 4.
We focus on the efficient usage of specimen repositories for the evaluation of new diagnostic tests and for comparing new tests with existing tests. Typically, all pre-existing diagnostic tests will already have been conducted on all specimens. However, we propose retesting only a judicious subsample of the specimens by the new diagnostic test. Subsampling minimizes study costs and specimen consumption, yet estimates of agreement or diagnostic accuracy potentially retain adequate statistical efficiency. We introduce methods to estimate agreement statistics and conduct symmetry tests when the second test is conducted on only a subsample and no gold standard exists. The methods treat the subsample as a stratified two-phase sample and use inverse-probability weighting. Strata can be any information available on all specimens and can be used to oversample the most informative specimens. The verification bias framework applies if the test conducted on only the subsample is a gold standard. We also present inverse-probability-weighting-based estimators of diagnostic accuracy that take advantage of stratification. We present three examples demonstrating that adequate statistical efficiency can be achieved under subsampling while greatly reducing the number of specimens requiring retesting. Naively using standard estimators that ignore subsampling can lead to drastically misleading estimates. Through simulation, we assess the finite-sample properties of our estimators and consider other possible sampling designs for our examples that could have further improved statistical efficiency. To help promote subsampling designs, our R package CompareTests computes all of our agreement and diagnostic accuracy statistics.
我们专注于高效利用标本库来评估新的诊断测试,并将新测试与现有测试进行比较。通常,所有现有的诊断测试都已经在所有标本上进行过了。然而,我们建议仅用新的诊断测试重新测试标本的一个明智的子样本。抽样可以最小化研究成本和标本消耗,同时估计一致性或诊断准确性仍然具有足够的统计效率。当第二次测试仅在子样本上进行且不存在金标准时,我们引入了估计一致性统计量和进行对称性检验的方法。这些方法将子样本视为分层两阶段样本,并使用逆概率加权。层可以是所有标本上可用的任何信息,并且可以用于对最具信息量的标本进行过采样。如果仅对子样本进行的测试是金标准,则应用验证偏差框架。我们还提出了基于逆概率加权的诊断准确性估计量,这些估计量利用了分层。我们提出了三个示例,证明在大大减少需要重新测试的标本数量的同时,通过抽样可以实现足够的统计效率。如果盲目使用忽略抽样的标准估计量,可能会导致误导性极大的估计。通过模拟,我们评估了我们的估计量的有限样本性质,并考虑了我们示例中可能进一步提高统计效率的其他可能的抽样设计。为了帮助推广抽样设计,我们的 R 包 CompareTests 计算了我们所有的一致性和诊断准确性统计量。