• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于必需功能特征的多标签多类模型预测基因表型。

Predicting gene phenotype by multi-label multi-class model based on essential functional features.

机构信息

School of Life Sciences, Shanghai University, Shanghai, 200444, People's Republic of China.

College of Information Engineering, Shanghai Maritime University, Shanghai, 201306, People's Republic of China.

出版信息

Mol Genet Genomics. 2021 Jul;296(4):905-918. doi: 10.1007/s00438-021-01789-8. Epub 2021 Apr 29.

DOI:10.1007/s00438-021-01789-8
PMID:33914130
Abstract

Phenotype is one of the most significant concepts in genetics, which is used to describe all the characteristics of a research object that can be observed. Considering that phenotype reflects the integrated features of genotype and environment factors, it is hard to define phenotype characteristics, even difficult to predict unknown phenotypes. Restricted by current biological techniques, it is still quite expensive and time-consuming to obtain sufficient structural information of large-scale phenotype-associated genes/proteins. Various bioinformatics methods have been presented to solve such problem, and researchers have confirmed the efficacy and prediction accuracy of functional network-based prediction. But general functional descriptions have highly complicated inner structures for phenotype prediction. To further address this issue and improve the efficacy of phenotype prediction on more than ten kinds of phenotypes, we first extract functional enrichment features from GO and KEGG, and then use node2vec to learn functional embedding features of genes from a gene-gene network. All these features are analyzed by some feature selection methods (Boruta, minimum redundancy maximum relevance) to generate a feature list. Such list is fed into the incremental feature selection, incorporating some multi-label classifiers built by RAkEL and some classic base classifiers, to build an optimum multi-label multi-class classification model for phenotype prediction. According to recent researches, our method has indeed identified many literature-supported genes/proteins and their associated phenotypes, and even some candidate genes with re-assigned new phenotypes, which provide a new computational tool for the accurate and effective phenotypic prediction.

摘要

表型是遗传学中最重要的概念之一,用于描述可以观察到的研究对象的所有特征。由于表型反映了基因型和环境因素的综合特征,因此很难定义表型特征,甚至难以预测未知的表型。受当前生物技术的限制,获取大规模表型相关基因/蛋白质的足够结构信息仍然非常昂贵和耗时。已经提出了各种生物信息学方法来解决这个问题,研究人员已经证实了基于功能网络的预测的功效和预测准确性。但是,一般的功能描述对于表型预测具有高度复杂的内部结构。为了进一步解决这个问题,并提高对十多种表型的表型预测的功效,我们首先从 GO 和 KEGG 中提取功能富集特征,然后使用 node2vec 从基因-基因网络中学习基因的功能嵌入特征。所有这些特征都通过一些特征选择方法(Boruta、最小冗余最大相关性)进行分析,以生成特征列表。该列表被输入到增量特征选择中,整合了由 RAkEL 构建的一些多标签分类器和一些经典的基础分类器,以构建用于表型预测的最优多标签多类分类模型。根据最近的研究,我们的方法确实已经确定了许多有文献支持的基因/蛋白质及其相关表型,甚至一些候选基因被重新分配了新的表型,这为准确有效的表型预测提供了一种新的计算工具。

相似文献

1
Predicting gene phenotype by multi-label multi-class model based on essential functional features.基于必需功能特征的多标签多类模型预测基因表型。
Mol Genet Genomics. 2021 Jul;296(4):905-918. doi: 10.1007/s00438-021-01789-8. Epub 2021 Apr 29.
2
Using PPI network autocorrelation in hierarchical multi-label classification trees for gene function prediction.利用 PPI 网络自相关性在层次多标签分类树中进行基因功能预测。
BMC Bioinformatics. 2013 Sep 26;14:285. doi: 10.1186/1471-2105-14-285.
3
Identifying Functions of Proteins in Mice With Functional Embedding Features.利用功能嵌入特征鉴定小鼠体内蛋白质的功能
Front Genet. 2022 May 16;13:909040. doi: 10.3389/fgene.2022.909040. eCollection 2022.
4
A Feature and Algorithm Selection Method for Improving the Prediction of Protein Structural Class.一种用于改进蛋白质结构类预测的特征与算法选择方法
Comb Chem High Throughput Screen. 2017;20(7):612-621. doi: 10.2174/1386207320666170314103147.
5
Prediction of gene phenotypes based on GO and KEGG pathway enrichment scores.基于基因本体论(GO)和京都基因与基因组百科全书(KEGG)通路富集分数预测基因表型。
Biomed Res Int. 2013;2013:870795. doi: 10.1155/2013/870795. Epub 2013 Nov 7.
6
Protein function prediction from protein-protein interaction network using gene ontology based neighborhood analysis and physico-chemical features.基于基因本体的邻域分析和物理化学特征,从蛋白质-蛋白质相互作用网络预测蛋白质功能。
J Bioinform Comput Biol. 2018 Dec;16(6):1850025. doi: 10.1142/S0219720018500257. Epub 2018 Sep 19.
7
An integrative multi-network and multi-classifier approach to predict genetic interactions.一种综合多网络和多分类器方法,用于预测遗传相互作用。
PLoS Comput Biol. 2010 Sep 9;6(9):e1000928. doi: 10.1371/journal.pcbi.1000928.
8
NIDM: network impulsive dynamics on multiplex biological network for disease-gene prediction.NIDM:用于疾病-基因预测的复发性生物网络上的网络脉冲动力学。
Brief Bioinform. 2021 Sep 2;22(5). doi: 10.1093/bib/bbab080.
9
Multiple Protein Subcellular Locations Prediction Based on Deep Convolutional Neural Networks with Self-Attention Mechanism.基于具有自注意力机制的深度卷积神经网络的多种蛋白质亚细胞定位预测。
Interdiscip Sci. 2022 Jun;14(2):421-438. doi: 10.1007/s12539-021-00496-7. Epub 2022 Jan 23.
10
Gene expression network reconstruction by convex feature selection when incorporating genetic perturbations.在纳入遗传扰动时通过凸特征选择进行基因表达网络重构。
PLoS Comput Biol. 2010 Dec 2;6(12):e1001014. doi: 10.1371/journal.pcbi.1001014.

引用本文的文献

1
Identification of Smoking-Associated Transcriptome Aberration in Blood with Machine Learning Methods.利用机器学习方法鉴定血液中与吸烟相关的转录组异常。
Biomed Res Int. 2023 Jan 4;2023:5333361. doi: 10.1155/2023/5333361. eCollection 2023.
2
Computational systems biology in disease modeling and control, review and perspectives.疾病建模与控制中的计算系统生物学:综述与展望。
NPJ Syst Biol Appl. 2022 Oct 3;8(1):37. doi: 10.1038/s41540-022-00247-4.
3
Identification of protein-protein interaction associated functions based on gene ontology and KEGG pathway.

本文引用的文献

1
Identification of Protein Subcellular Localization With Network and Functional Embeddings.利用网络和功能嵌入识别蛋白质亚细胞定位
Front Genet. 2021 Jan 20;11:626500. doi: 10.3389/fgene.2020.626500. eCollection 2020.
2
Detecting the Multiomics Signatures of Factor-Specific Inflammatory Effects on Airway Smooth Muscles.检测因子特异性炎症对气道平滑肌影响的多组学特征
Front Genet. 2021 Jan 13;11:599970. doi: 10.3389/fgene.2020.599970. eCollection 2020.
3
Identifying Transcriptomic Signatures and Rules for SARS-CoV-2 Infection.
基于基因本体论和KEGG通路鉴定蛋白质-蛋白质相互作用相关功能
Front Genet. 2022 Sep 12;13:1011659. doi: 10.3389/fgene.2022.1011659. eCollection 2022.
4
PseAraUbi: predicting arabidopsis ubiquitination sites by incorporating the physico-chemical and structural features.PseAraUbi:通过整合物理化学和结构特征预测拟南芥泛素化位点。
Plant Mol Biol. 2022 Sep;110(1-2):81-92. doi: 10.1007/s11103-022-01288-3. Epub 2022 Jul 1.
5
A New Risk Score Based on Eight Hepatocellular Carcinoma- Immune Gene Expression Can Predict the Prognosis of the Patients.一种基于八种肝细胞癌免疫基因表达的新风险评分可预测患者预后。
Front Oncol. 2021 Nov 19;11:766072. doi: 10.3389/fonc.2021.766072. eCollection 2021.
6
iMPT-FDNPL: Identification of Membrane Protein Types with Functional Domains and a Natural Language Processing Approach.iMPT-FDNPL:基于功能域和自然语言处理方法识别膜蛋白类型。
Comput Math Methods Med. 2021 Oct 11;2021:7681497. doi: 10.1155/2021/7681497. eCollection 2021.
7
A Multi-Objective Multi-Label Feature Selection Algorithm Based on Shapley Value.一种基于沙普利值的多目标多标签特征选择算法
Entropy (Basel). 2021 Aug 22;23(8):1094. doi: 10.3390/e23081094.
确定新型冠状病毒感染的转录组特征和规律
Front Cell Dev Biol. 2021 Jan 11;8:627302. doi: 10.3389/fcell.2020.627302. eCollection 2020.
4
iMPTCE-Hnetwork: A Multilabel Classifier for Identifying Metabolic Pathway Types of Chemicals and Enzymes with a Heterogeneous Network.iMPTCE-Hnetwork:一种基于异构网络的用于识别化学物质和酶代谢途径类型的多标签分类器。
Comput Math Methods Med. 2021 Jan 4;2021:6683051. doi: 10.1155/2021/6683051. eCollection 2021.
5
Prediction of Drug Side Effects with a Refined Negative Sample Selection Strategy.采用改进的负样本选择策略预测药物副作用。
Comput Math Methods Med. 2020 May 9;2020:1573543. doi: 10.1155/2020/1573543. eCollection 2020.
6
iATC-FRAKEL: a simple multi-label web server for recognizing anatomical therapeutic chemical classes of drugs with their fingerprints only.iATC-FRAKEL:一个简单的多标签网络服务器,仅使用药物的指纹识别其解剖治疗化学类别。
Bioinformatics. 2020 Jun 1;36(11):3568-3569. doi: 10.1093/bioinformatics/btaa166.
7
iATC-NRAKEL: an efficient multi-label classifier for recognizing anatomical therapeutic chemical classes of drugs.iATC-NRAKEL:一种用于识别药物解剖治疗化学类别的高效多标签分类器。
Bioinformatics. 2020 Mar 1;36(5):1391-1396. doi: 10.1093/bioinformatics/btz757.
8
Classification of Widely and Rarely Expressed Genes with Recurrent Neural Network.基于递归神经网络的广泛表达和稀有表达基因分类
Comput Struct Biotechnol J. 2018 Dec 14;17:49-60. doi: 10.1016/j.csbj.2018.12.002. eCollection 2019.
9
Identification of synthetic lethality based on a functional network by using machine learning algorithms.基于功能网络的机器学习算法识别合成致死性。
J Cell Biochem. 2019 Jan;120(1):405-416. doi: 10.1002/jcb.27395. Epub 2018 Aug 20.
10
eIF2A, an initiator tRNA carrier refractory to eIF2α kinases, functions synergistically with eIF5B.真核起始因子 2A(eIF2A)是一种起始 tRNA 载体,对 eIF2α 激酶有抗性,与 eIF5B 具有协同作用。
Cell Mol Life Sci. 2018 Dec;75(23):4287-4300. doi: 10.1007/s00018-018-2870-4. Epub 2018 Jul 17.