Frias Mario, Moyano Jose M, Rivero-Juarez Antonio, Luna Jose M, Camacho Ángela, Fardoun Habib M, Machuca Isabel, Al-Twijri Mohamed, Rivero Antonio, Ventura Sebastian
Department of Clinical Virology and Zoonoses, Maimonides Biomedical Research Institute of Córdoba, Córdoba, Spain.
Department of Computer Science and Numerical Analysis, University of Córdoba, Córdoba, Spain.
J Med Internet Res. 2021 Feb 24;23(2):e18766. doi: 10.2196/18766.
The dataset from genes used to predict hepatitis C virus outcome was evaluated in a previous study using a conventional statistical methodology.
The aim of this study was to reanalyze this same dataset using the data mining approach in order to find models that improve the classification accuracy of the genes studied.
We built predictive models using different subsets of factors, selected according to their importance in predicting patient classification. We then evaluated each independent model and also a combination of them, leading to a better predictive model.
Our data mining approach identified genetic patterns that escaped detection using conventional statistics. More specifically, the partial decision trees and ensemble models increased the classification accuracy of hepatitis C virus outcome compared with conventional methods.
Data mining can be used more extensively in biomedicine, facilitating knowledge building and management of human diseases.
在之前的一项研究中,使用传统统计方法对用于预测丙型肝炎病毒结果的基因数据集进行了评估。
本研究的目的是使用数据挖掘方法重新分析同一数据集,以找到提高所研究基因分类准确性的模型。
我们使用根据其在预测患者分类中的重要性选择的不同因素子集构建预测模型。然后我们评估了每个独立模型以及它们的组合,从而得到一个更好的预测模型。
我们的数据挖掘方法识别出了使用传统统计方法未检测到的基因模式。更具体地说,与传统方法相比,部分决策树和集成模型提高了丙型肝炎病毒结果的分类准确性。
数据挖掘可在生物医学中更广泛地使用,促进人类疾病的知识构建和管理。