ORIENTATE：用于口腔健康预测和研究的自动化机器学习分类器。

ORIENTATE: automated machine learning classifiers for oral health prediction and research.

机构信息

Department of Dermatology, Stomatology, Radiology and Physical Medicine, Universidad de Murcia, Murcia, Spain.

Dept. Information Technologies and Communications, Universidad Politecnica de Cartagena (UPCT), Cartagena, Spain.

出版信息

BMC Oral Health. 2023 Jun 20;23(1):408. doi: 10.1186/s12903-023-03112-w.

DOI:10.1186/s12903-023-03112-w

PMID:37340367

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10283267/

Abstract

BACKGROUND

The application of data-driven methods is expected to play an increasingly important role in healthcare. However, a lack of personnel with the necessary skills to develop these models and interpret its output is preventing a wider adoption of these methods. To address this gap, we introduce and describe ORIENTATE, a software for automated application of machine learning classification algorithms by clinical practitioners lacking specific technical skills. ORIENTATE allows the selection of features and the target variable, then automatically generates a number of classification models and cross-validates them, finding the best model and evaluating it. It also implements a custom feature selection algorithm for systematic searches of the best combination of predictors for a given target variable. Finally, it outputs a comprehensive report with graphs that facilitates the explanation of the classification model results, using global interpretation methods, and an interface for the prediction of new input samples. Feature relevance and interaction plots provided by ORIENTATE allow to use it for statistical inference, which can replace and/or complement classical statistical studies.

RESULTS

Its application to a dataset with healthy and special health care needs (SHCN) children, treated under deep sedation, was discussed as case study. On the example dataset, despite its small size, the feature selection algorithm found a set of features able to predict the need for a second sedation with a f1 score of 0.83 and a ROC (AUC) of 0.92. Eight predictive factors for both populations were found and ordered by the relevance assigned to them by the model. A discussion of how to derive inferences from the relevance and interaction plots and a comparison with a classical study is also provided.

CONCLUSIONS

ORIENTATE automatically finds suitable features and generates accurate classifiers which can be used in preventive tasks. In addition, researchers without specific skills on data methods can use it for the application of machine learning classification and as a complement to classical studies for inferential analysis of features. In the case study, a high prediction accuracy for a second sedation in SHCN children was achieved. The analysis of the relevance of the features showed that the number of teeth with pulpar treatments at the first sedation is a predictive factor for a second sedation.

摘要

背景

数据驱动方法的应用有望在医疗保健领域发挥越来越重要的作用。然而，缺乏必要技能来开发这些模型并解释其输出的人员，阻碍了这些方法的更广泛采用。为了解决这一差距，我们引入并描述了 ORIENTATE，这是一款为缺乏特定技术技能的临床医生自动应用机器学习分类算法的软件。ORIENTATE 允许选择特征和目标变量，然后自动生成多个分类模型并进行交叉验证，找到最佳模型并进行评估。它还实现了一种自定义特征选择算法，用于系统地搜索给定目标变量的最佳预测因子组合。最后，它输出一个带有图形的综合报告，使用全局解释方法，方便解释分类模型结果，并提供一个用于预测新输入样本的接口。ORIENTATE 提供的特征相关性和交互图可用于统计推断，这可以替代和/或补充经典的统计研究。

结果

我们将其应用于一个数据集进行了讨论，该数据集包含接受深度镇静治疗的健康和特殊保健需求 (SHCN) 儿童。在示例数据集上，尽管规模较小，特征选择算法还是找到了一组能够以 0.83 的 f1 分数和 0.92 的 ROC（AUC）预测第二次镇静需求的特征。为两个群体找到了 8 个预测因素，并根据模型分配给它们的相关性对它们进行了排序。还提供了如何从相关性和交互图中得出推论的讨论以及与经典研究的比较。

结论

ORIENTATE 自动找到合适的特征并生成准确的分类器，可用于预防任务。此外，没有数据方法特定技能的研究人员可以将其用于机器学习分类的应用，并作为经典研究的补充，用于特征的推理分析。在案例研究中，实现了对 SHCN 儿童第二次镇静的高预测精度。特征相关性分析表明，第一次镇静时牙髓治疗的牙齿数量是第二次镇静的预测因素。