基于特征区分能力和网络影响的新特征选择方法。

A new feature selection method based on feature distinguishing ability and network influence.

机构信息

School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, Liaoning, China.

出版信息

J Biomed Inform. 2022 Apr;128:104048. doi: 10.1016/j.jbi.2022.104048. Epub 2022 Mar 3.

DOI:10.1016/j.jbi.2022.104048

Abstract

The occurrence and development of diseases are related to the dysfunction of biomolecules (genes, metabolites, etc.) and the changes of molecule interactions. Identifying the key molecules related to the physiological and pathological changes of organisms from omics data is of great significance for disease diagnosis, early warning and drug-target prediction, etc. A novel feature selection algorithm based on the feature individual distinguishing ability and feature influence in the biological network (FS-DANI) is proposed for defining important biomolecules (features) to discriminate different disease conditions. The feature individual distinguishing ability is evaluated based on the overlapping area of the feature effective ranges in different classes. FS-DANI measures the feature network influence based on the module importance in the correlation network and the feature centrality in the modules. The feature comprehensive weight is obtained by combining the feature individual distinguishing ability and feature influence in the network. Then crucial feature subset is determined by the sequential forward search (SFS) on the feature list sorted according to the comprehensive weights of features. FS-DANI is compared with the six efficient feature selection methods on ten public omics datasets. The ablation experiment is also conducted. Experimental results show that FS-DANI is better than the compared algorithms in accuracy, sensitivity and specificity on the whole. On analyzing the gastric cancer miRNA expression data, FS-DANI identified two miRNAs (hsa-miR-18a* and hsa-miR-381), whose AUCs for distinguishing gastric cancer samples and normal samples are 0.959 and 0.879 in the discovery set and an independent validation set, respectively. Hence, evaluating biomolecules from the molecular level and network level is helpful for identifying the potential disease biomarkers of high performance.

摘要

疾病的发生和发展与生物分子（基因、代谢物等）的功能障碍以及分子相互作用的变化有关。从组学数据中识别与生物体生理和病理变化相关的关键分子，对于疾病诊断、预警和药物靶点预测等具有重要意义。为了定义重要的生物分子（特征）以区分不同的疾病状态，提出了一种基于生物网络中特征个体区分能力和特征影响的新特征选择算法（FS-DANI）。特征个体区分能力基于不同类别中特征有效范围的重叠区域进行评估。FS-DANI 根据相关网络中的模块重要性和模块中的特征中心性来衡量特征网络影响。通过结合网络中特征个体区分能力和特征影响，得到特征综合权重。然后通过对根据特征综合权重排序的特征列表进行顺序向前搜索（SFS），确定关键特征子集。FS-DANI 在十个公共组学数据集上与六种高效特征选择方法进行了比较。还进行了消融实验。实验结果表明，FS-DANI 在整体准确性、敏感性和特异性方面均优于比较算法。在分析胃癌 miRNA 表达数据时，FS-DANI 鉴定了两个 miRNA（hsa-miR-18a* 和 hsa-miR-381），在发现集中区分胃癌样本和正常样本的 AUC 分别为 0.959 和 0.879，在独立验证集中。因此，从分子水平和网络水平评估生物分子有助于识别高性能的潜在疾病生物标志物。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

基于特征区分能力和网络影响的新特征选择方法。

A new feature selection method based on feature distinguishing ability and network influence.

机构信息

出版信息

相似文献

引用本文的文献

基于特征区分能力和网络影响的新特征选择方法。

A new feature selection method based on feature distinguishing ability and network influence.

机构信息

出版信息

相似文献

引用本文的文献