医学中真实世界高维结构化数据的挖掘及其在决策支持中的应用。未知、相互依存和可区分性的一些不同视角。

Mining real-world high dimensional structured data in medicine and its use in decision support. Some different perspectives on unknowns, interdependency, and distinguishability.

机构信息

Ingine Inc, Ohio, USA; The Dirac Foundation, Oxfordshire, UK.

Ingine Inc, Ohio, USA.

出版信息

Comput Biol Med. 2022 Feb;141:105118. doi: 10.1016/j.compbiomed.2021.105118. Epub 2021 Dec 11.

DOI:10.1016/j.compbiomed.2021.105118

PMID:34971979

Abstract

There are many difficulties in extracting and using knowledge for medical analytic and predictive purposes from Real-World Data, even when the data is already well structured in the manner of a large spreadsheet. Preparative curation and standardization or "normalization" of such data involves a variety of chores but underlying them is an interrelated set of fundamental problems that can in part be dealt with automatically during the datamining and inference processes. These fundamental problems are reviewed here and illustrated and investigated with examples. They concern the treatment of unknowns, the need to avoid independency assumptions, and the appearance of entries that may not be fully distinguished from each other. Unknowns include errors detected as implausible (e.g., out of range) values that are subsequently converted to unknowns. These problems are further impacted by high dimensionality and problems of sparse data that inevitably arise from high-dimensional datamining even if the data is extensive. All these considerations are different aspects of incomplete information, though they also relate to problems that arise if care is not taken to avoid or ameliorate consequences of including the same information twice or more, or if misleading or inconsistent information is combined. This paper addresses these aspects from a slightly different perspective using the Q-UEL language and inference methods based on it by borrowing some ideas from the mathematics of quantum mechanics and information theory. It takes the view that detection and correction of probabilistic elements of knowledge subsequently used in inference need only involve testing and correction so that they satisfy certain extended notions of coherence between probabilities. This is by no means the only possible view, and it is explored here and later compared with a related notion of consistency.

摘要

从真实世界的数据中提取和利用知识进行医学分析和预测存在许多困难，即使数据已经以大型电子表格的方式进行了很好的结构化。这种数据的预处理、规范化或“标准化”涉及各种杂务，但它们的基础是一组相互关联的基本问题，这些问题可以在数据挖掘和推理过程中部分自动处理。本文回顾了这些基本问题，并通过示例进行了说明和研究。它们涉及到对未知值的处理、避免独立性假设的需要，以及可能彼此之间无法完全区分的条目的出现。未知值包括被检测为不合理（例如，超出范围）的值，随后被转换为未知值。这些问题进一步受到高维性和稀疏数据问题的影响，即使数据广泛存在，高维数据挖掘也不可避免地会出现这些问题。所有这些考虑都是不完整信息的不同方面，尽管它们也与如果不注意避免或减轻包含重复信息的后果，或者如果包含误导性或不一致的信息，所产生的问题有关。本文从略微不同的角度使用 Q-UEL 语言和基于它的推理方法来解决这些方面的问题，借鉴了量子力学和信息论数学的一些思想。它认为，随后在推理中使用的知识的概率元素的检测和修正只需要涉及测试和修正，以便它们满足概率之间某些扩展的一致性概念。这绝不是唯一可能的观点，本文对此进行了探讨，并在后面与一致性的相关概念进行了比较。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

医学中真实世界高维结构化数据的挖掘及其在决策支持中的应用。未知、相互依存和可区分性的一些不同视角。

Mining real-world high dimensional structured data in medicine and its use in decision support. Some different perspectives on unknowns, interdependency, and distinguishability.

机构信息

出版信息

相似文献

引用本文的文献

医学中真实世界高维结构化数据的挖掘及其在决策支持中的应用。未知、相互依存和可区分性的一些不同视角。

Mining real-world high dimensional structured data in medicine and its use in decision support. Some different perspectives on unknowns, interdependency, and distinguishability.

机构信息

出版信息

相似文献

引用本文的文献