Hazra Avijit, Gogtay Nithya
Department of Pharmacology, Institute of Postgraduate Medical Education and Research, Kolkata, West Bengal, India.
Department of Clinical Pharmacology, Seth GS Medical College and KEM Hospital, Mumbai, Maharashtra, India.
Indian J Dermatol. 2017 Jan-Feb;62(1):18-24. doi: 10.4103/0019-5154.198047.
Crucial therapeutic decisions are based on diagnostic tests. Therefore, it is important to evaluate such tests before adopting them for routine use. Although things such as blood tests, cultures, biopsies, and radiological imaging are obvious diagnostic tests, it is not to be forgotten that specific clinical examination procedures, scoring systems based on physiological or psychological evaluation, and ratings based on questionnaires are also diagnostic tests and therefore merit similar evaluation. In the simplest scenario, a diagnostic test will give either a positive (disease likely) or negative (disease unlikely) result. Ideally, all those with the disease should be classified by a test as positive and all those without the disease as negative. Unfortunately, practically no test gives 100% accurate results. Therefore, leaving aside the economic question, the performance of diagnostic tests is evaluated on the basis of certain indices such as sensitivity, specificity, positive predictive value, and negative predictive value. Likelihood ratios combine information on specificity and sensitivity to expresses the likelihood that a given test result would occur in a subject with a disorder compared to the probability that the same result would occur in a subject without the disorder. Not all test can be categorized simply as "positive" or "negative." Physicians are frequently exposed to test results on a numerical scale, and in such cases, judgment is required in choosing a cutoff point to distinguish normal from abnormal. Naturally, a cutoff value should provide the greatest predictive accuracy, but there is a trade-off between sensitivity and specificity here - if the cutoff is too low, it will identify most patients who have the disease (high sensitivity) but will also incorrectly identify many who do not (low specificity). A receiver operating characteristic curve plots pairs of sensitivity versus (1 - specificity) values and helps in selecting an optimum cutoff - the one lying on the "elbow" of the curve. Cohen's kappa (κ) statistic is a measure of inter-rater agreement for categorical variables. It can also be applied to assess how far two tests agree with respect to diagnostic categorization. It is generally thought to be a more robust measure than simple percent agreement calculation since kappa takes into account the agreement occurring by chance.
关键的治疗决策基于诊断测试。因此,在将此类测试用于常规用途之前对其进行评估很重要。尽管血液检查、培养、活检和放射影像学检查等显然属于诊断测试,但不应忘记,特定的临床检查程序、基于生理或心理评估的评分系统以及基于问卷的评级也都是诊断测试,因此也值得进行类似的评估。在最简单的情况下,诊断测试会给出阳性(可能患病)或阴性(可能未患病)结果。理想情况下,所有患病者都应被测试分类为阳性,所有未患病者都应被分类为阴性。不幸的是,实际上没有测试能给出100%准确的结果。因此,撇开经济问题不谈,诊断测试的性能是根据某些指标来评估的,如敏感性、特异性、阳性预测值和阴性预测值。似然比结合了特异性和敏感性信息,以表达在患有某种疾病的受试者中出现给定测试结果的可能性与在未患该疾病的受试者中出现相同结果的概率之比。并非所有测试都能简单地归类为“阳性”或“阴性”。医生经常会接触到数值范围内的测试结果,在这种情况下,选择一个区分正常与异常的临界点需要进行判断。自然地,一个临界值应该提供最大的预测准确性,但在这里敏感性和特异性之间存在权衡——如果临界值设置得过低,它将识别出大多数患病患者(高敏感性),但也会错误地识别出许多未患病的患者(低特异性)。受试者工作特征曲线绘制敏感性与(特异性-1)值的成对数据,并有助于选择最佳临界点——位于曲线“肘部”的那个点。科恩kappa(κ)统计量是分类变量的评分者间一致性的一种度量。它也可用于评估两种测试在诊断分类方面的一致程度。一般认为它比简单的百分比一致性计算更可靠,因为kappa考虑了偶然出现的一致性。