Google Health, Palo Alto, California.
JAMA. 2019 Nov 12;322(18):1806-1816. doi: 10.1001/jama.2019.16489.
In recent years, many new clinical diagnostic tools have been developed using complicated machine learning methods. Irrespective of how a diagnostic tool is derived, it must be evaluated using a 3-step process of deriving, validating, and establishing the clinical effectiveness of the tool. Machine learning-based tools should also be assessed for the type of machine learning model used and its appropriateness for the input data type and data set size. Machine learning models also generally have additional prespecified settings called hyperparameters, which must be tuned on a data set independent of the validation set. On the validation set, the outcome against which the model is evaluated is termed the reference standard. The rigor of the reference standard must be assessed, such as against a universally accepted gold standard or expert grading.
近年来,许多新的临床诊断工具已经使用复杂的机器学习方法开发出来。无论诊断工具是如何产生的,都必须通过三个步骤来评估:工具的推导、验证和临床有效性的建立。基于机器学习的工具也应该评估所使用的机器学习模型的类型及其对输入数据类型和数据集大小的适宜性。机器学习模型通常还有其他预定义的设置,称为超参数,这些参数必须在与验证集无关的数据集上进行调整。在验证集上,模型评估所依据的结果称为参考标准。必须评估参考标准的严格程度,例如与普遍接受的金标准或专家分级相对比。