• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

TNFPred:基于词嵌入的混合特征识别肿瘤坏死因子。

TNFPred: identifying tumor necrosis factors using hybrid features based on word embeddings.

机构信息

Department of Computer Science and Engineering, Yuan Ze University, Taoyuan, 32003, Taiwan.

Professional Master Program in Artificial Intelligence in Medicine, College of Medicine, Taipei Medical University, Taipei City, 106, Taiwan.

出版信息

BMC Med Genomics. 2020 Oct 22;13(Suppl 10):155. doi: 10.1186/s12920-020-00779-w.

DOI:10.1186/s12920-020-00779-w
PMID:33087125
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7579990/
Abstract

BACKGROUND

Cytokines are a class of small proteins that act as chemical messengers and play a significant role in essential cellular processes including immunity regulation, hematopoiesis, and inflammation. As one important family of cytokines, tumor necrosis factors have association with the regulation of a various biological processes such as proliferation and differentiation of cells, apoptosis, lipid metabolism, and coagulation. The implication of these cytokines can also be seen in various diseases such as insulin resistance, autoimmune diseases, and cancer. Considering the interdependence between this kind of cytokine and others, classifying tumor necrosis factors from other cytokines is a challenge for biological scientists.

METHODS

In this research, we employed a word embedding technique to create hybrid features which was proved to efficiently identify tumor necrosis factors given cytokine sequences. We segmented each protein sequence into protein words and created corresponding word embedding for each word. Then, word embedding-based vector for each sequence was created and input into machine learning classification models. When extracting feature sets, we not only diversified segmentation sizes of protein sequence but also conducted different combinations among split grams to find the best features which generated the optimal prediction. Furthermore, our methodology follows a well-defined procedure to build a reliable classification tool.

RESULTS

With our proposed hybrid features, prediction models obtain more promising performance compared to seven prominent sequenced-based feature kinds. Results from 10 independent runs on the surveyed dataset show that on an average, our optimal models obtain an area under the curve of 0.984 and 0.998 on 5-fold cross-validation and independent test, respectively.

CONCLUSIONS

These results show that biologists can use our model to identify tumor necrosis factors from other cytokines efficiently. Moreover, this study proves that natural language processing techniques can be applied reasonably to help biologists solve bioinformatics problems efficiently.

摘要

背景

细胞因子是一类小分子蛋白质,作为化学信使,在包括免疫调节、造血和炎症在内的基本细胞过程中发挥重要作用。作为细胞因子的一个重要家族,肿瘤坏死因子与细胞的增殖和分化、细胞凋亡、脂代谢和凝血等各种生物过程的调节有关。这些细胞因子的意义也可以在胰岛素抵抗、自身免疫性疾病和癌症等各种疾病中看到。考虑到这种细胞因子与其他细胞因子之间的相互依存关系,将肿瘤坏死因子与其他细胞因子区分开来是生物科学家面临的一个挑战。

方法

在这项研究中,我们采用了一种词嵌入技术来创建混合特征,事实证明,这种混合特征可以有效地识别细胞因子序列中的肿瘤坏死因子。我们将每个蛋白质序列分割成蛋白质单词,并为每个单词创建相应的词嵌入。然后,为每个序列创建基于词嵌入的向量,并将其输入到机器学习分类模型中。在提取特征集时,我们不仅多样化了蛋白质序列的分割大小,还在分割的单词之间进行了不同的组合,以找到产生最佳预测的最佳特征。此外,我们的方法遵循一个明确的步骤来构建一个可靠的分类工具。

结果

使用我们提出的混合特征,与七种突出的基于序列的特征相比,预测模型的性能更有前景。在调查数据集上进行的 10 次独立运行的结果表明,平均而言,我们的最佳模型在 5 折交叉验证和独立测试中分别获得了 0.984 和 0.998 的曲线下面积。

结论

这些结果表明,生物学家可以有效地使用我们的模型来识别其他细胞因子中的肿瘤坏死因子。此外,本研究证明了自然语言处理技术可以合理地应用于帮助生物学家有效地解决生物信息学问题。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4cb8/7579990/a2c1102c365f/12920_2020_779_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4cb8/7579990/3b53eaf6c8d5/12920_2020_779_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4cb8/7579990/4f1e9115488d/12920_2020_779_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4cb8/7579990/c4d6bbade4a1/12920_2020_779_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4cb8/7579990/a2c1102c365f/12920_2020_779_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4cb8/7579990/3b53eaf6c8d5/12920_2020_779_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4cb8/7579990/4f1e9115488d/12920_2020_779_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4cb8/7579990/c4d6bbade4a1/12920_2020_779_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4cb8/7579990/a2c1102c365f/12920_2020_779_Fig4_HTML.jpg

相似文献

1
TNFPred: identifying tumor necrosis factors using hybrid features based on word embeddings.TNFPred:基于词嵌入的混合特征识别肿瘤坏死因子。
BMC Med Genomics. 2020 Oct 22;13(Suppl 10):155. doi: 10.1186/s12920-020-00779-w.
2
Using word embedding technique to efficiently represent protein sequences for identifying substrate specificities of transporters.利用词嵌入技术有效地表示蛋白质序列,以识别转运蛋白的底物特异性。
Anal Biochem. 2019 Jul 15;577:73-81. doi: 10.1016/j.ab.2019.04.011. Epub 2019 Apr 22.
3
Using Language Representation Learning Approach to Efficiently Identify Protein Complex Categories in Electron Transport Chain.利用语言表示学习方法高效识别电子传递链中的蛋白质复合物类别。
Mol Inform. 2020 Oct;39(10):e2000033. doi: 10.1002/minf.202000033. Epub 2020 Jul 16.
4
A comparison of word embeddings for the biomedical natural language processing.生物医学自然语言处理中词嵌入的比较。
J Biomed Inform. 2018 Nov;87:12-20. doi: 10.1016/j.jbi.2018.09.008. Epub 2018 Sep 12.
5
Social Reminiscence in Older Adults' Everyday Conversations: Automated Detection Using Natural Language Processing and Machine Learning.老年人日常对话中的社会怀旧:使用自然语言处理和机器学习的自动检测。
J Med Internet Res. 2020 Sep 15;22(9):e19133. doi: 10.2196/19133.
6
Artificial Intelligence Learning Semantics via External Resources for Classifying Diagnosis Codes in Discharge Notes.人工智能通过外部资源学习语义以对出院小结中的诊断代码进行分类。
J Med Internet Res. 2017 Nov 6;19(11):e380. doi: 10.2196/jmir.8344.
7
Use Chou's 5-Steps Rule With Different Word Embedding Types to Boost Performance of Electron Transport Protein Prediction Model.使用 Chou 的五步法则和不同的词嵌入类型来提高电子传输蛋白预测模型的性能。
IEEE/ACM Trans Comput Biol Bioinform. 2022 Mar-Apr;19(2):1235-1244. doi: 10.1109/TCBB.2020.3010975. Epub 2022 Apr 1.
8
Modeling aspects of the language of life through transfer-learning protein sequences.通过转移学习蛋白质序列来模拟生命语言的各个方面。
BMC Bioinformatics. 2019 Dec 17;20(1):723. doi: 10.1186/s12859-019-3220-8.
9
Fine-Tuning Word Embeddings for Hierarchical Representation of Data Using a Corpus and a Knowledge Base for Various Machine Learning Applications.使用语料库和知识库对数据进行层次表示的词向量微调,用于各种机器学习应用。
Comput Math Methods Med. 2021 Nov 16;2021:9761163. doi: 10.1155/2021/9761163. eCollection 2021.
10
DeepSSPred: A Deep Learning Based Sulfenylation Site Predictor Via a Novel nSegmented Optimize Federated Feature Encoder.DeepSSPred:一种基于深度学习的新型 nSegmented Optimize 联邦特征编码器的硫化位点预测器。
Protein Pept Lett. 2021;28(6):708-721. doi: 10.2174/0929866527666201202103411.

引用本文的文献

1
Protein Sequence Analysis landscape: A Systematic Review of Task Types, Databases, Datasets, Word Embeddings Methods, and Language Models.蛋白质序列分析全景:任务类型、数据库、数据集、词嵌入方法和语言模型的系统综述
Database (Oxford). 2025 May 30;2025. doi: 10.1093/database/baaf027.
2
pNPs-CapsNet: Predicting Neuropeptides Using Protein Language Models and FastText Encoding-Based Weighted Multi-View Feature Integration with Deep Capsule Neural Network.pNPs-CapsNet:使用蛋白质语言模型和基于FastText编码的加权多视图特征集成与深度胶囊神经网络预测神经肽
ACS Omega. 2025 Mar 18;10(12):12403-12416. doi: 10.1021/acsomega.4c11449. eCollection 2025 Apr 1.
3

本文引用的文献

1
Prediction of ATP-binding sites in membrane proteins using a two-dimensional convolutional neural network.使用二维卷积神经网络预测膜蛋白中的ATP结合位点。
J Mol Graph Model. 2019 Nov;92:86-93. doi: 10.1016/j.jmgm.2019.07.003. Epub 2019 Jul 15.
2
iN6-methylat (5-step): identifying DNA N-methyladenine sites in rice genome using continuous bag of nucleobases via Chou's 5-step rule.iN6-methylat(5 步):使用 Chou 的 5 步规则通过连续核苷酸袋鉴定水稻基因组中的 DNA N6-甲基腺嘌呤位点。
Mol Genet Genomics. 2019 Oct;294(5):1173-1182. doi: 10.1007/s00438-019-01570-y. Epub 2019 May 4.
3
Using word embedding technique to efficiently represent protein sequences for identifying substrate specificities of transporters.
TNFR-LSTM: A Deep Intelligent Model for Identification of Tumour Necroses Factor Receptor (TNFR) Activity.
TNFR-LSTM:一种用于识别肿瘤坏死因子受体(TNFR)活性的深度智能模型。
IET Syst Biol. 2025 Jan-Dec;19(1):e70007. doi: 10.1049/syb2.70007.
4
AntiBP3: A Method for Predicting Antibacterial Peptides against Gram-Positive/Negative/Variable Bacteria.AntiBP3:一种预测抗革兰氏阳性/阴性/可变细菌抗菌肽的方法。
Antibiotics (Basel). 2024 Feb 8;13(2):168. doi: 10.3390/antibiotics13020168.
5
Large-scale comparative review and assessment of computational methods for phage virion proteins identification.噬菌体病毒粒子蛋白质鉴定计算方法的大规模比较综述与评估
EXCLI J. 2022 Jan 3;21:11-29. doi: 10.17179/excli2021-4411. eCollection 2022.
6
Representation learning applications in biological sequence analysis.生物序列分析中的表示学习应用。
Comput Struct Biotechnol J. 2021 May 23;19:3198-3208. doi: 10.1016/j.csbj.2021.05.039. eCollection 2021.
7
Comprehensive Analysis of Prognostic and Genetic Signatures for General Transcription Factor III (GTF3) in Clinical Colorectal Cancer Patients Using Bioinformatics Approaches.基于生物信息学方法分析临床结直肠癌患者一般转录因子 III(GTF3)的预后和遗传特征。
Curr Issues Mol Biol. 2021 Apr 27;43(1):2-20. doi: 10.3390/cimb43010002.
利用词嵌入技术有效地表示蛋白质序列,以识别转运蛋白的底物特异性。
Anal Biochem. 2019 Jul 15;577:73-81. doi: 10.1016/j.ab.2019.04.011. Epub 2019 Apr 22.
4
iMotor-CNN: Identifying molecular functions of cytoskeleton motor proteins using 2D convolutional neural network via Chou's 5-step rule.iMotor-CNN:通过 Chou 的 5 步规则使用 2D 卷积神经网络识别细胞骨架马达蛋白的分子功能。
Anal Biochem. 2019 Jun 15;575:17-26. doi: 10.1016/j.ab.2019.03.017. Epub 2019 Mar 28.
5
iEnhancer-5Step: Identifying enhancers using hidden information of DNA sequences via Chou's 5-step rule and word embedding.iEnhancer-5Step:通过 Chou 的 5 步规则和词嵌入利用 DNA 序列的隐藏信息识别增强子。
Anal Biochem. 2019 Apr 15;571:53-61. doi: 10.1016/j.ab.2019.02.017. Epub 2019 Feb 26.
6
iPPI-PseAAC(CGR): Identify protein-protein interactions by incorporating chaos game representation into PseAAC.iPPI-PseAAC(CGR):通过将混沌游戏表示法纳入 PseAAC 来识别蛋白质-蛋白质相互作用。
J Theor Biol. 2019 Jan 7;460:195-203. doi: 10.1016/j.jtbi.2018.10.021. Epub 2018 Oct 9.
7
Predicting membrane proteins and their types by extracting various sequence features into Chou's general PseAAC.通过将各种序列特征提取到周氏广义伪氨基酸组成中预测膜蛋白及其类型。
Mol Biol Rep. 2018 Dec;45(6):2295-2306. doi: 10.1007/s11033-018-4391-5. Epub 2018 Sep 20.
8
Implications of Newly Identified Brain eQTL Genes and Their Interactors in Schizophrenia.新发现的脑表达定量性状基因座基因及其相互作用因子在精神分裂症中的意义
Mol Ther Nucleic Acids. 2018 Sep 7;12:433-442. doi: 10.1016/j.omtn.2018.05.026. Epub 2018 Jul 11.
9
A New Method for Recognizing Cytokines Based on Feature Combination and a Support Vector Machine Classifier.基于特征组合和支持向量机分类器的细胞因子识别新方法。
Molecules. 2018 Aug 11;23(8):2008. doi: 10.3390/molecules23082008.
10
Beyond Cell Death: New Functions for TNF Family Cytokines in Autoimmunity and Tumor Immunotherapy.超越细胞死亡:TNF 家族细胞因子在自身免疫和肿瘤免疫治疗中的新功能。
Trends Mol Med. 2018 Jul;24(7):642-653. doi: 10.1016/j.molmed.2018.05.004. Epub 2018 Jun 4.