• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

利用联合三联体特征准确预测核受体。

Accurate prediction of nuclear receptors with conjoint triad feature.

作者信息

Wang Hongchu, Hu Xuehai

机构信息

Department of Mathemaitcs, South China Normal University, Guangzhou, 510631, P.R. of China.

College of Informatics, Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan, 430070, P.R. of China.

出版信息

BMC Bioinformatics. 2015 Dec 3;16:402. doi: 10.1186/s12859-015-0828-1.

DOI:10.1186/s12859-015-0828-1
PMID:26630876
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4668603/
Abstract

BACKGROUND

Nuclear receptors (NRs) form a large family of ligand-inducible transcription factors that regulate gene expressions involved in numerous physiological phenomena, such as embryogenesis, homeostasis, cell growth and death. These nuclear receptors-related pathways are important targets of marketed drugs. Therefore, the design of a reliable computational model for predicting NRs from amino acid sequence has now been a significant biomedical problem.

RESULTS

Conjoint triad feature (CTF) mainly considers neighbor relationships in protein sequences by encoding each protein sequence using the triad (continuous three amino acids) frequency distribution extracted from a 7-letter reduced alphabet. In addition, chaos game representation (CGR) can investigate the patterns hidden in protein sequences and visually reveal previously unknown structure. In this paper, three methods, CTF, CGR, amino acid composition (AAC), are applied to formulate the protein samples. By considering different combinations of three methods, we study seven groups of features, and each group is evaluated by the 10-fold cross-validation test. Meanwhile, a new non-redundant dataset containing 474 NR sequences and 500 non-NR sequences is built based on the latest NucleaRDB database. Comparing the results of numerical experiments, the group of combined features with CTF and AAC gets the best result with the accuracy of 96.30% for identifying NRs from non-NRs. Moreover, if it is classified as a NR, it will be further put into the second level, which will classify a NR into one of the eight main subfamilies. At the second level, the group of combined features with CTF and AAC also gets the best accuracy of 94.73%. Subsequently, the proposed predictor is compared with two existing methods, and the comparisons show that the accuracies of two levels significantly increase to 98.79% (NR-2L: 92.56 %; iNR-PhysChem: 98.18%; the first level) and 93.71% (NR-2L: 88.68%; iNR-PhysChem: 92.45%; the second level) with the introduction of our CTF-based method. Finally, each component of CTF features is analyzed via the statistical significant test, and a simplified model only with the resulting top-50 significant features achieves accuracy of 95.28%.

CONCLUSIONS

The experimental results demonstrate that our CTF-based method is an effective way for predicting nuclear receptor proteins. Furthermore, the top-50 significant features obtained from the statistical significant test are considered as the "intrinsic features" in predicting NRs based on the analysis of relative importance.

摘要

背景

核受体(NRs)构成了一大类配体诱导型转录因子,可调节参与多种生理现象的基因表达,如胚胎发生、体内平衡、细胞生长和死亡。这些与核受体相关的途径是市售药物的重要靶点。因此,设计一种可靠的从氨基酸序列预测核受体的计算模型,现已成为一个重大的生物医学问题。

结果

联合三联体特征(CTF)主要通过使用从7字母简化字母表中提取的三联体(连续三个氨基酸)频率分布对每个蛋白质序列进行编码,来考虑蛋白质序列中的相邻关系。此外,混沌游戏表示(CGR)可以研究隐藏在蛋白质序列中的模式,并直观地揭示以前未知的结构。在本文中,应用三种方法,即CTF、CGR、氨基酸组成(AAC)来构建蛋白质样本。通过考虑三种方法的不同组合,我们研究了七组特征,每组特征均通过10折交叉验证测试进行评估。同时,基于最新的NucleaRDB数据库构建了一个包含474个NR序列和500个非NR序列的新的非冗余数据集。比较数值实验结果,CTF和AAC组合的特征组在从非NR中识别NR时获得了最佳结果,准确率为96.30%。此外,如果将其分类为NR,它将被进一步放入第二级,该级将NR分类为八个主要亚家族之一。在第二级,CTF和AAC组合的特征组也获得了94.73%的最佳准确率。随后,将所提出的预测器与两种现有方法进行比较,比较结果表明,随着我们基于CTF的方法的引入,两级的准确率显著提高到98.79%(NR - 2L:92.56%;iNR - PhysChem:98.18%;第一级)和93.71%(NR - 2L:88.68%;iNR - PhysChem:92.45%;第二级)。最后,通过统计显著性检验分析CTF特征的每个组成部分,仅由产生的前50个显著特征组成的简化模型实现了95.28%的准确率。

结论

实验结果表明,我们基于CTF的方法是预测核受体蛋白的有效方法。此外,基于相对重要性分析,从统计显著性检验中获得的前50个显著特征被视为预测NR时的“内在特征”。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/09bd/4668603/549a9ee34391/12859_2015_828_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/09bd/4668603/343ffcbf11c4/12859_2015_828_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/09bd/4668603/cbca3c407b75/12859_2015_828_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/09bd/4668603/549a9ee34391/12859_2015_828_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/09bd/4668603/343ffcbf11c4/12859_2015_828_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/09bd/4668603/cbca3c407b75/12859_2015_828_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/09bd/4668603/549a9ee34391/12859_2015_828_Fig3_HTML.jpg

相似文献

1
Accurate prediction of nuclear receptors with conjoint triad feature.利用联合三联体特征准确预测核受体。
BMC Bioinformatics. 2015 Dec 3;16:402. doi: 10.1186/s12859-015-0828-1.
2
A novel fractal approach for predicting G-protein-coupled receptors and their subfamilies with support vector machines.一种结合支持向量机的用于预测G蛋白偶联受体及其亚家族的新型分形方法。
Biomed Mater Eng. 2015;26 Suppl 1:S1829-36. doi: 10.3233/BME-151485.
3
NR-2L: a two-level predictor for identifying nuclear receptor subfamilies based on sequence-derived features.NR-2L:一种基于序列衍生特征识别核受体亚家族的两级预测器。
PLoS One. 2011;6(8):e23505. doi: 10.1371/journal.pone.0023505. Epub 2011 Aug 15.
4
iNR-PhysChem: a sequence-based predictor for identifying nuclear receptors and their subfamilies via physical-chemical property matrix.iNR-PhysChem:一种基于序列的预测器,通过物理化学性质矩阵来识别核受体及其亚家族。
PLoS One. 2012;7(2):e30869. doi: 10.1371/journal.pone.0030869. Epub 2012 Feb 21.
5
Prediction of nuclear receptors with optimal pseudo amino acid composition.基于最优伪氨基酸组成的核受体预测。
Anal Biochem. 2009 Apr 1;387(1):54-9. doi: 10.1016/j.ab.2009.01.018. Epub 2009 Jan 19.
6
Predicting DNA binding proteins using support vector machine with hybrid fractal features.使用支持向量机和混合分形特征预测 DNA 结合蛋白。
J Theor Biol. 2014 Feb 21;343:186-92. doi: 10.1016/j.jtbi.2013.10.009. Epub 2013 Nov 1.
7
Improving the classification of nuclear receptors with feature selection.通过特征选择改进核受体的分类。
Protein Pept Lett. 2009;16(7):823-9. doi: 10.2174/092986609788681733.
8
RF-NR: Random Forest Based Approach for Improved Classification of Nuclear Receptors.RF-NR:基于随机森林的核受体分类改进方法。
IEEE/ACM Trans Comput Biol Bioinform. 2018 Nov-Dec;15(6):1844-1852. doi: 10.1109/TCBB.2017.2773063. Epub 2017 Nov 14.
9
Prediction of RNA-protein interactions using conjoint triad feature and chaos game representation.基于联合三联体特征和混沌游戏表示预测 RNA-蛋白质相互作用。
Bioengineered. 2018;9(1):242-251. doi: 10.1080/21655979.2018.1470721.
10
Predicting thermophilic proteins with pseudo amino acid composition:approached from chaos game representation and principal component analysis.基于伪氨基酸组成预测嗜热蛋白:从混沌博弈表示和主成分分析入手
Protein Pept Lett. 2011 Dec;18(12):1244-50. doi: 10.2174/092986611797642661.

引用本文的文献

1
Identification of Family-Specific Features in Cas9 and Cas12 Proteins: A Machine Learning Approach Using Complete Protein Feature Spectrum.鉴定Cas9和Cas12蛋白中家族特异性特征:一种使用完整蛋白质特征谱的机器学习方法。
bioRxiv. 2024 Jan 23:2024.01.22.576286. doi: 10.1101/2024.01.22.576286.
2
In silico protein function prediction: the rise of machine learning-based approaches.计算机模拟蛋白质功能预测:基于机器学习方法的兴起
Med Rev (2021). 2023 Nov 29;3(6):487-510. doi: 10.1515/mr-2023-0038. eCollection 2023 Dec.
3
A robust protein language model for SARS-CoV-2 protein-protein interaction network prediction.

本文引用的文献

1
Pse-in-One: a web server for generating various modes of pseudo components of DNA, RNA, and protein sequences.Pse-in-One:一个用于生成DNA、RNA和蛋白质序列各种伪组件模式的网络服务器。
Nucleic Acids Res. 2015 Jul 1;43(W1):W65-71. doi: 10.1093/nar/gkv458. Epub 2015 May 9.
2
Identification of real microRNA precursors with a pseudo structure status composition approach.采用伪结构状态组成方法鉴定真实的微小RNA前体。
PLoS One. 2015 Mar 30;10(3):e0121501. doi: 10.1371/journal.pone.0121501. eCollection 2015.
3
repDNA: a Python package to generate various modes of feature vectors for DNA sequences by incorporating user-defined physicochemical properties and sequence-order effects.
用于 SARS-CoV-2 蛋白质相互作用网络预测的强健蛋白质语言模型。
Artif Intell Med. 2023 Aug;142:102574. doi: 10.1016/j.artmed.2023.102574. Epub 2023 May 6.
4
Determining human-coronavirus protein-protein interaction using machine intelligence.利用机器智能确定人类冠状病毒的蛋白质-蛋白质相互作用。
Med Nov Technol Devices. 2023 Jun;18:100228. doi: 10.1016/j.medntd.2023.100228. Epub 2023 Apr 6.
5
Computational predictions for protein sequences of COVID-19 virus via machine learning algorithms.通过机器学习算法对新冠病毒蛋白质序列进行计算预测。
Med Biol Eng Comput. 2021 Sep;59(9):1723-1734. doi: 10.1007/s11517-021-02412-z. Epub 2021 Jul 22.
6
Don't sugar coat the COVID (only the vasculature).不要美化 COVID(只有血管系统会)。
Biomed J. 2020 Oct;43(5):393-398. doi: 10.1016/j.bj.2020.10.003. Epub 2020 Oct 10.
7
Machine learning techniques for sequence-based prediction of viral-host interactions between SARS-CoV-2 and human proteins.基于序列的 SARS-CoV-2 与人类蛋白质之间病毒-宿主相互作用的预测的机器学习技术。
Biomed J. 2020 Oct;43(5):438-450. doi: 10.1016/j.bj.2020.08.003. Epub 2020 Sep 3.
8
Identifying Heat Shock Protein Families from Imbalanced Data by Using Combined Features.利用组合特征从不平衡数据中识别热休克蛋白家族。
Comput Math Methods Med. 2020 Sep 23;2020:8894478. doi: 10.1155/2020/8894478. eCollection 2020.
9
Graph2GO: a multi-modal attributed network embedding method for inferring protein functions.Graph2GO:一种用于推断蛋白质功能的多模态属性网络嵌入方法。
Gigascience. 2020 Aug 1;9(8). doi: 10.1093/gigascience/giaa081.
10
Sequence-based predictive modeling to identify cancerlectins.基于序列的预测建模以识别癌凝集素。
Oncotarget. 2017 Apr 25;8(17):28169-28175. doi: 10.18632/oncotarget.15963.
repDNA:一个 Python 包,通过结合用户定义的物理化学性质和序列顺序效应,为 DNA 序列生成各种模式的特征向量。
Bioinformatics. 2015 Apr 15;31(8):1307-9. doi: 10.1093/bioinformatics/btu820. Epub 2014 Dec 10.
4
Predicting the subcellular localization of mycobacterial proteins by incorporating the optimal tripeptides into the general form of pseudo amino acid composition.通过将最优三肽纳入伪氨基酸组成的一般形式来预测分枝杆菌蛋白质的亚细胞定位。
Mol Biosyst. 2015 Feb;11(2):558-63. doi: 10.1039/c4mb00645c. Epub 2014 Dec 1.
5
NRfamPred: a proteome-scale two level method for prediction of nuclear receptor proteins and their sub-families.NRfamPred:一种用于预测核受体蛋白及其亚家族的蛋白质组规模的两级方法。
Sci Rep. 2014 Oct 29;4:6810. doi: 10.1038/srep06810.
6
AcalPred: a sequence-based tool for discriminating between acidic and alkaline enzymes.AcalPred:一种基于序列的区分酸酶和碱酶的工具。
PLoS One. 2013 Oct 9;8(10):e75726. doi: 10.1371/journal.pone.0075726. eCollection 2013.
7
Combining phylogenetic profiling-based and machine learning-based techniques to predict functional related proteins.结合基于系统发育轮廓和基于机器学习的技术来预测功能相关蛋白。
PLoS One. 2013 Sep 19;8(9):e75940. doi: 10.1371/journal.pone.0075940. eCollection 2013.
8
iHSP-PseRAAAC: Identifying the heat shock protein families using pseudo reduced amino acid alphabet composition.iHSP-PseRAAAC:利用伪简约氨基酸字母组成鉴定热休克蛋白家族。
Anal Biochem. 2013 Nov 1;442(1):118-25. doi: 10.1016/j.ab.2013.05.024. Epub 2013 Jun 10.
9
A novel protocol for three-dimensional structure prediction of RNA-protein complexes.一种新型的 RNA-蛋白质复合物三维结构预测方案。
Sci Rep. 2013;3:1887. doi: 10.1038/srep01887.
10
Techniques to cope with missing data in host-pathogen protein interaction prediction.宿主-病原体蛋白相互作用预测中缺失数据的处理技术。
Bioinformatics. 2012 Sep 15;28(18):i466-i472. doi: 10.1093/bioinformatics/bts375.