Suppr超能文献

用于结直肠癌预测的分类学降维。

Taxonomy dimension reduction for colorectal cancer prediction.

机构信息

College of Intelligence and Computing, Tianjin University, Tianjin, China.

Department of Oncology, Heilongjiang Province Land Reclamation Headquarters General Hospital, Harbin, China.

出版信息

Comput Biol Chem. 2019 Dec;83:107160. doi: 10.1016/j.compbiolchem.2019.107160. Epub 2019 Nov 9.

Abstract

A growing number of people suffer from colorectal cancer, which is one of the most common cancers. It is essential to diagnose and treat the cancer as early as possible. The disease may change the microorganism communities in the gut, and it could be an efficient method to employ gut microorganisms to predict colorectal cancer. In this study, we selected operational taxonomic units that include several kinds of microorganisms to predict colorectal cancer. To find the most important microorganisms and obtain the best prediction performance, we explore effective feature selection methods. We employ three main steps. First, we use a single method to reduce features. Next, to reduce the number of features, we integrate the dimension reduction methods correlation-based feature selection and maximum relevance-maximum distance (MRMD 1.0 and MRMD 2.0). Then, we selected the important features according to the taxonomy files. In this study, we created training and test sets to obtain a more objective evaluation. Random forest, naïve Bayes, and decision tree classifiers were evaluated. The results show that the methods proposed in this study are better than hierarchical feature engineering. The proposed method, which combines correlation-based feature selection with MRMD 2.0, performed the best on the CRC2 dataset. The dataset and methods can be found in http://lab.malab.cn/data/microdata/data.html.

摘要

越来越多的人患有结直肠癌,这是最常见的癌症之一。尽早诊断和治疗癌症至关重要。该疾病可能会改变肠道中的微生物群落,利用肠道微生物来预测结直肠癌可能是一种有效的方法。在这项研究中,我们选择了包含多种微生物的操作分类单位来预测结直肠癌。为了找到最重要的微生物并获得最佳的预测性能,我们探索了有效的特征选择方法。我们采用了三个主要步骤。首先,我们使用单一方法来减少特征。接下来,为了减少特征数量,我们整合了降维方法相关性特征选择和最大相关性-最大距离(MRMD 1.0 和 MRMD 2.0)。然后,我们根据分类学文件选择重要特征。在这项研究中,我们创建了训练集和测试集,以获得更客观的评估。随机森林、朴素贝叶斯和决策树分类器进行了评估。结果表明,本研究提出的方法优于层次特征工程。在 CRC2 数据集上,结合相关性特征选择和 MRMD 2.0 的方法表现最佳。数据集和方法可在 http://lab.malab.cn/data/microdata/data.html 找到。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验