Department of Bioengineering, Universidade de Sao Paulo Escola de Engenharia de Sao Carlos, Sao Carlos, Sao Paulo, Brazil.
Center of Information and Informatics of Medical School, Ribeirao Preto, Universidade de Sao Paulo Escola de Enfermagem de Ribeirao Preto, Sao Paulo, Brazil.
PLoS One. 2020 Jul 1;15(7):e0235147. doi: 10.1371/journal.pone.0235147. eCollection 2020.
Digital datasets in several health care facilities, as hospitals and prehospital services, accumulated data from thousands of patients for more than a decade. In general, there is no local team with enough experts with the required different skills capable of analyzing them in entirety. The integration of those abilities usually demands a relatively long-period and is cost. Considering that scenario, this paper proposes a new Feature Sensitivity technique that can automatically deal with a large dataset. It uses a criterion-based sampling strategy from the Optimization based on Phylogram Analysis. Called FS-opa, the new approach seems proper for dealing with any types of raw data from health centers and manipulate their entire datasets. Besides, FS-opa can find the principal features for the construction of inference models without depending on expert knowledge of the problem domain. The selected features can be combined with usual statistical or machine learning methods to perform predictions. The new method can mine entire datasets from scratch. FS-opa was evaluated using a relatively large dataset from electronic health records of mental disorder prehospital services in Brazil. Cox's approach was integrated to FS-opa to generate survival analysis models related to the length of stay (LOS) in hospitals, assuming that it is a relevant aspect that can benefit estimates of the efficiency of hospitals and the quality of patient treatments. Since FS-opa can work with raw datasets, no knowledge from the problem domain was used to obtain the preliminary prediction models found. Results show that FS-opa succeeded in performing a feature sensitivity analysis using only the raw data available. In this way, FS-opa can find the principal features without bias of an inference model, since the proposed method does not use it. Moreover, the experiments show that FS-opa can provide models with a useful trade-off according to their representativeness and parsimony. It can benefit further analyses by experts since they can focus on aspects that benefit problem modeling.
在多个医疗保健设施(如医院和院前服务)中,数字数据集积累了数千名患者的数据超过十年。通常情况下,没有一个当地团队拥有足够的专家,他们具备分析这些数据的必要技能。整合这些能力通常需要相对较长的时间和成本。考虑到这种情况,本文提出了一种新的特征敏感性技术,可以自动处理大型数据集。它使用基于系统发育分析的优化的基于标准的采样策略。称为 FS-opa,新方法似乎适合处理来自健康中心的任何类型的原始数据并处理其整个数据集。此外,FS-opa 可以找到构建推理模型的主要特征,而无需依赖问题领域的专家知识。选择的特征可以与常用的统计或机器学习方法结合使用来进行预测。新方法可以从原始数据中挖掘整个数据集。FS-opa 使用来自巴西院前精神障碍电子健康记录的相对较大的数据集进行了评估。Cox 方法被集成到 FS-opa 中,以生成与医院住院时间(LOS)相关的生存分析模型,假设这是一个相关方面,可以有助于估计医院的效率和患者治疗质量。由于 FS-opa 可以使用原始数据集,因此在获得发现的初步预测模型时,没有使用问题领域的知识。结果表明,FS-opa 仅使用可用的原始数据成功地执行了特征敏感性分析。这样,FS-opa 可以找到主要特征,而不会对推理模型产生偏见,因为该方法不使用推理模型。此外,实验表明,FS-opa 可以根据代表性和简约性提供有用的权衡模型。它可以使专家的进一步分析受益,因为他们可以专注于有利于问题建模的方面。