Suppr超能文献

基于基因选择、数据增强和提升方法的集成算法用于卵巢癌分类

Ensemble Algorithm Based on Gene Selection, Data Augmentation, and Boosting Approaches for Ovarian Cancer Classification.

作者信息

Lee Zne-Jung, Cai Jing-Xun, Wang Liang-Hung, Yang Ming-Ren

机构信息

School of Advanced Manufacturing, Fuzhou University, Quanzhou 362200, China.

Graduate School of New Generation Electronic Information Engineer, School of Advanced Manufacturing, Fuzhou University, Quanzhou 362200, China.

出版信息

Diagnostics (Basel). 2024 Dec 10;14(24):2772. doi: 10.3390/diagnostics14242772.

Abstract

Ovarian cancer is a difficult and lethal illness that requires early detection and precise classification for effective therapy. Microarray technology has permitted the simultaneous assessment of hundreds of genes' expression levels, yielding important insights into the molecular pathways driving ovarian cancer. To reduce computational complexity and improve accuracy, choosing the most likely differential genes to explain the impacts of ovarian cancer is necessary. Medical datasets, including those related to ovarian cancer, are often limited in size due to privacy concerns, data collection challenges, and the rarity of certain conditions. Data augmentation allows researchers to expand the dataset, providing a larger and more diverse set of examples for model training. Recent advances in machine learning and bioinformatics have shown promise in improving ovarian cancer classification based on gene information. In this paper, we present an ensemble algorithm based on gene selection, data augmentation, and boosting approaches for ovarian cancer classification. In the proposed approach, the initial genetic data were first subjected to feature selection. The target genes were screened and combined with data augmentation and ensemble boosting algorithms. From the results, the chosen ten genes could accurately classify ovarian cancer at 98.21%. We further show that the proposed algorithm based on clustering approaches is effective for real-world ovarian cancer data, with 100% accuracy and strong performance in distinguishing between distinct ovarian cancer subtypes. The proposed algorithm may help doctors identify ovarian cancer patients early and develop individualized treatment plans.

摘要

卵巢癌是一种难治且致命的疾病,需要早期检测和精确分类才能进行有效治疗。微阵列技术能够同时评估数百个基因的表达水平,从而深入了解驱动卵巢癌的分子途径。为了降低计算复杂度并提高准确性,有必要选择最可能的差异基因来解释卵巢癌的影响。由于隐私问题、数据收集挑战以及某些病症的罕见性,包括与卵巢癌相关的医学数据集,其规模通常有限。数据增强使研究人员能够扩充数据集,为模型训练提供更大且更多样化的示例集。机器学习和生物信息学的最新进展在基于基因信息改进卵巢癌分类方面显示出了前景。在本文中,我们提出了一种基于基因选择、数据增强和增强方法的集成算法用于卵巢癌分类。在所提出的方法中,首先对初始基因数据进行特征选择。筛选出目标基因,并将其与数据增强和集成增强算法相结合。从结果来看,所选择的十个基因能够以98.21%的准确率准确分类卵巢癌。我们进一步表明,所提出的基于聚类方法的算法对于实际的卵巢癌数据是有效的,准确率达100%,并且在区分不同的卵巢癌亚型方面表现出色。所提出的算法可能有助于医生早期识别卵巢癌患者并制定个性化的治疗方案。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bf14/11674093/707783ea2417/diagnostics-14-02772-g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验