FAY J W, ASHFORD J R
Br J Ind Med. 1960 Oct;17(4):279-92. doi: 10.1136/oem.17.4.279.
In a long-term investigation such as the National Coal Board's Pneumoconiosis Field Research (P.F.R.), it is essential to establish satisfactory and stable procedures for making the necessary observations and measurements. It is equally important regularly to apply suitable methods of checking the accuracy and consistency of the various observations and measurements. One aspect of vital importance in the P.F.R. is the classification of the series of chest radiographs taken, at intervals, of all the men under observation. This is inevitably a subjective process, and (as with other similar fields of work) it is desirable to obtain some understanding of the basic process behind the operation. This can usefully be done by the help of “models” designed to describe the process, if necessary in simplified terms. The problem of the radiological classification of pneumoconiosis has been studied hitherto in terms of coefficients of disagreement (inter-observer variation) and inconsistency (intra-observer variation), but for various reasons the method was not considered entirely satisfactory. New methods of approach were therefore developed for studying the performance of the two doctors responsible for the film reading in the Research, and two distinct “models” were derived. The advantages and disadvantages of each are described in the paper, together with the applications of the two models to the study of some of the problems arising in the course of the investigation. The first model is based on the assumption that if a film is selected at random from a batch representing a whole colliery population, and that if the film is of “true” category i, the chance of its being read as another category (j) is a constant, P, which depends upon the observer concerned, the particular batch of films being read, and the values of i and j. This model enables the performance of the readers to be monitored satisfactorily, and it has also been used to investigate different methods for arriving at an agreed, or “definitive”, assessment of radiological abnormality. The P model suffers from the disadvantage of applying only to “average” films, and the assumptions made are such that it manifestly does not provide an entirely realistic representation of the reading process on any particular film. The second “improved” model was therefore developed to overcome this criticism. Briefly, it is considered that each film is representative of a unique degree of abnormality, located on a continuum, or abnormality scale, which covers the whole range of simple pneumoconiosis. The scale of abnormality is then chosen in such a way that, whatever the true degree of abnormality of the film, the observer's readings will be normally distributed about the true value with constant bias and variability at all points along the scale. The very large number of readings available has been analysed to determine the optimum positions of the category boundaries on the abnormality scale and in this way the scale has been unambiguously defined. The model enables the routine reading standards to be monitored, and it has also been used to investigate the underlying distribution of abnormality at individual collieries. Its chief disadvantage is the extensive computational work required. The “fit” of both models to the data collected in the Research is shown to be satisfactory and on balance it appears that both have applications in this field of study. The method chosen in any given circumstance will depend upon the particular requirement and the facilities available for computational work.
在诸如国家煤炭委员会尘肺病现场研究(P.F.R.)这样的长期调查中,建立令人满意且稳定的程序以进行必要的观察和测量至关重要。定期应用合适的方法来检查各种观察和测量的准确性与一致性同样重要。在P.F.R.中一个至关重要的方面是对所有被观察人员定期拍摄的胸部X光片系列进行分类。这不可避免地是一个主观过程,并且(与其他类似工作领域一样)了解该操作背后的基本过程是很有必要的。如有必要,借助旨在描述该过程的“模型”(以简化形式)可以有效地做到这一点。尘肺病的放射学分类问题迄今已根据分歧系数(观察者间差异)和不一致性(观察者内差异)进行了研究,但由于各种原因,该方法并不被认为完全令人满意。因此,开发了新的方法来研究负责该研究中胶片解读的两位医生的工作表现,并得出了两个不同的“模型”。本文描述了每个模型的优缺点,以及这两个模型在研究调查过程中出现的一些问题时的应用。第一个模型基于这样的假设:如果从代表整个煤矿人群的一批胶片中随机选择一张胶片,并且如果该胶片属于“真实”类别i,那么它被解读为另一个类别(j)的概率是一个常数P,它取决于相关观察者、正在解读的特定批次胶片以及i和j的值。这个模型能够令人满意地监测解读人员的工作表现,并且它还被用于研究达成一致的或“确定的”放射学异常评估的不同方法。P模型的缺点是仅适用于“平均”胶片,并且所做的假设使得它显然不能完全真实地反映任何特定胶片的解读过程。因此开发了第二个“改进”模型来克服这一批评。简而言之,认为每张胶片代表一种独特的异常程度,位于一个连续统或异常量表上,该量表涵盖了单纯尘肺病的整个范围。然后以这样一种方式选择异常量表,即无论胶片的真实异常程度如何,观察者的读数将围绕真实值呈正态分布,在量表上的所有点都具有恒定的偏差和变异性。已经对大量可用读数进行了分析,以确定异常量表上类别边界的最佳位置,通过这种方式明确地定义了量表。该模型能够监测常规解读标准,并且它还被用于研究各个煤矿异常情况的潜在分布。其主要缺点是需要大量的计算工作。结果表明,两个模型与研究中收集的数据的“拟合”情况令人满意,总体而言,两者似乎在该研究领域都有应用。在任何给定情况下选择的方法将取决于特定要求和可用于计算工作的设施。