Mulenga Mwenge, Rajamanikam Arutchelvan, Kumar Suresh, Muhammad Saharuddin Bin, Bhassu Subha, Samudid Chandramathi, Sabri Aznul Qalid Md, Seera Manjeevan, Eke Christopher Ifeanyi
Business Studies Division, National Institute of Public Administration, Lusaka, Zambia.
Centre for Research and Emerging Technologies, New Mulungushi, Kabwe, Zambia.
PLoS One. 2025 Jan 29;20(1):e0316493. doi: 10.1371/journal.pone.0316493. eCollection 2025.
The emergence of Next Generation Sequencing (NGS) technology has catalyzed a paradigm shift in clinical diagnostics and personalized medicine, enabling unprecedented access to high-throughput microbiome data. However, the inherent high dimensionality, noise, and variability of microbiome data present substantial obstacles to conventional statistical methods and machine learning techniques. Even the promising deep learning (DL) methods are not immune to these challenges. This paper introduces a novel feature engineering method that circumvents these limitations by amalgamating two feature sets derived from input data to generate a new dataset, which is then subjected to feature selection. This innovative approach markedly enhances the Area Under the Curve (AUC) performance of the Deep Neural Network (DNN) algorithm in colorectal cancer (CRC) detection using gut microbiome data, elevating it from 0.800 to 0.923. The proposed method constitutes a significant advancement in the field, providing a robust solution to the intricacies of microbiome data analysis and amplifying the potential of DL methods in disease detection.
下一代测序(NGS)技术的出现催化了临床诊断和个性化医疗的范式转变,使人们能够以前所未有的方式获取高通量微生物组数据。然而,微生物组数据固有的高维度、噪声和可变性给传统统计方法和机器学习技术带来了巨大障碍。即使是有前景的深度学习(DL)方法也无法避免这些挑战。本文介绍了一种新颖的特征工程方法,该方法通过合并从输入数据中派生的两个特征集来生成一个新数据集,从而规避这些限制,然后对新数据集进行特征选择。这种创新方法显著提高了深度神经网络(DNN)算法在使用肠道微生物组数据进行结直肠癌(CRC)检测时的曲线下面积(AUC)性能,将其从0.800提高到0.923。所提出的方法是该领域的一项重大进展,为微生物组数据分析的复杂性提供了一个强大的解决方案,并增强了DL方法在疾病检测中的潜力。