Ingine Inc., DE, USA; The Dirac Foundation, Oxfordshire, UK.
Ingine Inc., DE, USA; The Dirac Foundation, Oxfordshire, UK.
Comput Biol Med. 2018 Apr 1;95:147-166. doi: 10.1016/j.compbiomed.2018.02.013. Epub 2018 Mar 21.
Theoretical and methodological principles are presented for the construction of very large inference nets for odds calculations, composed of hundreds or many thousands or more of elements, in this paper generated by structured data mining. It is argued that the usual small inference nets can sometimes represent rather simple, arbitrary estimates. Examples of applications in clinical and public health data analysis, medical claims data and detection of irregular entries, and bioinformatics data, are presented. Construction of large nets benefits from application of a theory of expected information for sparse data and the Dirac notation and algebra. The extent to which these are important here is briefly discussed. Purposes of the study include (a) exploration of the properties of large inference nets and a perturbation and tacit conditionality models, (b) using these to propose simpler models including one that a physician could use routinely, analogous to a "risk score", (c) examination of the merit of describing optimal performance in a single measure that combines accuracy, specificity, and sensitivity in place of a ROC curve, and (d) relationship to methods for detecting anomalous and potentially fraudulent data.
本文提出了构建非常大的odds 计算推理网络的理论和方法学原则,这些推理网络由数百个甚至数千个或更多元素组成,是通过结构化数据挖掘生成的。本文认为,通常的小推理网络有时可以表示相当简单的、任意的估计。本文还介绍了在临床和公共卫生数据分析、医疗索赔数据和不规则条目检测以及生物信息学数据中的应用示例。大型网络的构建受益于稀疏数据的期望信息理论以及狄拉克符号和代数的应用。简要讨论了这些方法的重要性。研究目的包括:(a)探索大型推理网络和摄动和隐性条件模型的特性;(b)使用这些模型提出更简单的模型,包括一个医生可以常规使用的模型,类似于“风险评分”;(c)以单一指标来描述最佳性能的优点,该指标综合了准确性、特异性和敏感性,而不是 ROC 曲线;(d)与检测异常和潜在欺诈数据的方法的关系。