School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, China.
School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, China.
J Pharm Biomed Anal. 2022 Sep 5;218:114873. doi: 10.1016/j.jpba.2022.114873. Epub 2022 Jun 6.
Analyzing the biological data by considering the molecule interactions may induce a more accurate identification of disease-related biomarkers. In this study, a novel feature selection method based on molecule (feature) interactive effect network is proposed, denoted as Distance Correlation Gain-Network (DCG-Net). In DCG-Net, DCG is defined to measure the interactive effects between pairwise features with respect to the process of physiological and pathological changes and infer the molecule interactive effect network. DCG index is suitable for discrete random variables and continuous random variables. Then a greedy searching strategy is developed to search the informational modules of the interactive features with high statistical dependence on disease outcome. To evaluate the performance of DCG-Net, it was compared with eight representative feature selection techniques including t-test, ReliefF, SVM-RFE, mRMR, IG-RFE, INDEED, MN-PCC and Dcor-SFS on ten public datasets. The experiment results showed the superior performance of DCG-Net in classification accuracy rate, sensitivity, and specificity for three different classifiers. Subsequently, DCG-Net was employed to analyze a lung adenocarcinoma metabolomics dataset, and the metabolites selected involved in the important pathway and had a better discrimination ability. The experiments demonstrate that DCG can effectively detect the molecular interactions, and incorporation of the molecule interactions is helpful to identify informational biomarkers reflecting the occurrence and development of complex diseases.
通过考虑分子相互作用来分析生物学数据,可能会更准确地识别与疾病相关的生物标志物。本研究提出了一种基于分子(特征)相互作用网络的新特征选择方法,称为距离相关增益网络(DCG-Net)。在 DCG-Net 中,定义了距离相关增益(DCG)来测量成对特征之间的相互作用,以反映生理和病理变化的过程,并推断分子相互作用网络。DCG 指数适用于离散随机变量和连续随机变量。然后,开发了一种贪婪搜索策略,以搜索具有高统计依赖性的交互特征的信息模块与疾病结局。为了评估 DCG-Net 的性能,将其与包括 t 检验、ReliefF、SVM-RFE、mRMR、IG-RFE、INDEED、MN-PCC 和 Dcor-SFS 在内的八种代表性特征选择技术在十个公共数据集上进行了比较。实验结果表明,在三种不同的分类器中,DCG-Net 在分类准确率、灵敏度和特异性方面具有优越的性能。随后,将 DCG-Net 应用于肺腺癌代谢组学数据集的分析,选择的代谢物涉及重要的途径,具有更好的区分能力。实验表明,DCG 可以有效地检测分子相互作用,并且整合分子相互作用有助于识别反映复杂疾病发生和发展的信息生物标志物。