Suppr超能文献

PPIGCF:一种基于蛋白质相互作用的基因关联滤波器,用于最优基因选择。

PPIGCF: A Protein-Protein Interaction-Based Gene Correlation Filter for Optimal Gene Selection.

机构信息

Department of Bioinformatics, Maulana Abul Kalam Azad University of Technology, Haringhata 741249, West Bengal, India.

Department of Computer Science and Engineering, Jalpaiguri Govt. Engineering College, Jalpaiguri 735102, West Bengal, India.

出版信息

Genes (Basel). 2023 May 10;14(5):1063. doi: 10.3390/genes14051063.

Abstract

Biological data at the omics level are highly complex, requiring powerful computational approaches to identifying significant intrinsic characteristics to further search for informative markers involved in the studied phenotype. In this paper, we propose a novel dimension reduction technique, protein-protein interaction-based gene correlation filtration (PPIGCF), which builds on gene ontology (GO) and protein-protein interaction (PPI) structures to analyze microarray gene expression data. PPIGCF first extracts the gene symbols with their expression from the experimental dataset, and then, classifies them based on GO biological process (BP) and cellular component (CC) annotations. Every classification group inherits all the information on its CCs, corresponding to the BPs, to establish a PPI network. Then, the gene correlation filter (regarding gene rank and the proposed correlation coefficient) is computed on every network and eradicates a few weakly correlated genes connected with their corresponding networks. PPIGCF finds the information content (IC) of the other genes related to the PPI network and takes only the genes with the highest IC values. The satisfactory results of PPIGCF are used to prioritize significant genes. We performed a comparison with current methods to demonstrate our technique's efficiency. From the experiment, it can be concluded that PPIGCF needs fewer genes to reach reasonable accuracy (~99%) for cancer classification. This paper reduces the computational complexity and enhances the time complexity of biomarker discovery from datasets.

摘要

在组学层面上,生物数据高度复杂,需要强大的计算方法来识别重要的内在特征,以进一步寻找与所研究表型相关的信息标记物。在本文中,我们提出了一种新的降维技术,基于蛋白质-蛋白质相互作用的基因相关性过滤(PPIGCF),它基于基因本体论(GO)和蛋白质-蛋白质相互作用(PPI)结构来分析微阵列基因表达数据。PPIGCF 首先从实验数据集提取具有表达的基因符号,然后根据 GO 生物过程(BP)和细胞成分(CC)注释对其进行分类。每个分类组继承其 CC 上的所有信息,对应于 BP,以建立一个 PPI 网络。然后,在每个网络上计算基因相关性过滤器(关于基因等级和建议的相关系数),并消除与其相应网络相关的一些弱相关基因。PPIGCF 找到与 PPI 网络相关的其他基因的信息量(IC),并只选择具有最高 IC 值的基因。PPIGCF 的令人满意的结果用于优先考虑重要基因。我们与当前方法进行了比较,以证明我们的技术的效率。从实验中可以得出结论,PPIGCF 只需要较少的基因即可达到合理的癌症分类准确性(约 99%)。本文降低了计算复杂度,并提高了从数据集中发现生物标志物的时间复杂度。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cfb9/10218330/d141eaabf152/genes-14-01063-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验