• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

BioSeq-Diabolo:使用 Diabolo 进行生物序列相似性分析。

BioSeq-Diabolo: Biological sequence similarity analysis using Diabolo.

机构信息

School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China.

Advanced Research Institute of Multidisciplinary Science, Beijing Institute of Technology, Beijing, China.

出版信息

PLoS Comput Biol. 2023 Jun 20;19(6):e1011214. doi: 10.1371/journal.pcbi.1011214. eCollection 2023 Jun.

DOI:10.1371/journal.pcbi.1011214
PMID:37339155
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10313010/
Abstract

As the key for biological sequence structure and function prediction, disease diagnosis and treatment, biological sequence similarity analysis has attracted more and more attentions. However, the exiting computational methods failed to accurately analyse the biological sequence similarities because of the various data types (DNA, RNA, protein, disease, etc) and their low sequence similarities (remote homology). Therefore, new concepts and techniques are desired to solve this challenging problem. Biological sequences (DNA, RNA and protein sequences) can be considered as the sentences of "the book of life", and their similarities can be considered as the biological language semantics (BLS). In this study, we are seeking the semantics analysis techniques derived from the natural language processing (NLP) to comprehensively and accurately analyse the biological sequence similarities. 27 semantics analysis methods derived from NLP were introduced to analyse biological sequence similarities, bringing new concepts and techniques to biological sequence similarity analysis. Experimental results show that these semantics analysis methods are able to facilitate the development of protein remote homology detection, circRNA-disease associations identification and protein function annotation, achieving better performance than the other state-of-the-art predictors in the related fields. Based on these semantics analysis methods, a platform called BioSeq-Diabolo has been constructed, which is named after a popular traditional sport in China. The users only need to input the embeddings of the biological sequence data. BioSeq-Diabolo will intelligently identify the task, and then accurately analyse the biological sequence similarities based on biological language semantics. BioSeq-Diabolo will integrate different biological sequence similarities in a supervised manner by using Learning to Rank (LTR), and the performance of the constructed methods will be evaluated and analysed so as to recommend the best methods for the users. The web server and stand-alone package of BioSeq-Diabolo can be accessed at http://bliulab.net/BioSeq-Diabolo/server/.

摘要

作为生物序列结构和功能预测、疾病诊断和治疗的关键,生物序列相似性分析越来越受到关注。然而,由于数据类型(DNA、RNA、蛋白质、疾病等)的多样性以及它们之间的低序列相似性(远缘同源性),现有的计算方法无法准确分析生物序列相似性。因此,需要新的概念和技术来解决这个具有挑战性的问题。生物序列(DNA、RNA 和蛋白质序列)可以被视为“生命之书”的句子,它们的相似性可以被视为生物语言语义(BLS)。在本研究中,我们正在寻找源自自然语言处理(NLP)的语义分析技术,以全面准确地分析生物序列相似性。本文介绍了 27 种源自 NLP 的语义分析方法,用于分析生物序列相似性,为生物序列相似性分析带来了新的概念和技术。实验结果表明,这些语义分析方法能够促进蛋白质远程同源性检测、环状 RNA 与疾病关联识别和蛋白质功能注释的发展,在相关领域的其他最先进预测器中表现出更好的性能。基于这些语义分析方法,构建了一个名为 BioSeq-Diabolo 的平台,它以中国流行的传统运动命名。用户只需输入生物序列数据的嵌入即可。BioSeq-Diabolo 将智能识别任务,然后基于生物语言语义准确分析生物序列相似性。BioSeq-Diabolo 将通过学习排序(LTR)以监督方式集成不同的生物序列相似性,并评估和分析所构建方法的性能,以向用户推荐最佳方法。BioSeq-Diabolo 的网络服务器和独立包可在 http://bliulab.net/BioSeq-Diabolo/server/ 访问。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd80/10313010/f443d652f24c/pcbi.1011214.g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd80/10313010/31b98ac9cc5d/pcbi.1011214.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd80/10313010/54f25f8436ff/pcbi.1011214.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd80/10313010/1dc26172a879/pcbi.1011214.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd80/10313010/b73ea4b9c508/pcbi.1011214.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd80/10313010/f6513ca13d8c/pcbi.1011214.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd80/10313010/5aa46f0ad2ca/pcbi.1011214.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd80/10313010/b30d4a52bc59/pcbi.1011214.g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd80/10313010/f443d652f24c/pcbi.1011214.g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd80/10313010/31b98ac9cc5d/pcbi.1011214.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd80/10313010/54f25f8436ff/pcbi.1011214.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd80/10313010/1dc26172a879/pcbi.1011214.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd80/10313010/b73ea4b9c508/pcbi.1011214.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd80/10313010/f6513ca13d8c/pcbi.1011214.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd80/10313010/5aa46f0ad2ca/pcbi.1011214.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd80/10313010/b30d4a52bc59/pcbi.1011214.g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd80/10313010/f443d652f24c/pcbi.1011214.g008.jpg

相似文献

1
BioSeq-Diabolo: Biological sequence similarity analysis using Diabolo.BioSeq-Diabolo:使用 Diabolo 进行生物序列相似性分析。
PLoS Comput Biol. 2023 Jun 20;19(6):e1011214. doi: 10.1371/journal.pcbi.1011214. eCollection 2023 Jun.
2
The Black Book of Psychotropic Dosing and Monitoring.《精神药物剂量与监测黑皮书》
Psychopharmacol Bull. 2024 Jul 8;54(3):8-59.
3
Short-Term Memory Impairment短期记忆障碍
4
Comparison of Two Modern Survival Prediction Tools, SORG-MLA and METSSS, in Patients With Symptomatic Long-bone Metastases Who Underwent Local Treatment With Surgery Followed by Radiotherapy and With Radiotherapy Alone.两种现代生存预测工具 SORG-MLA 和 METSSS 在接受手术联合放疗和单纯放疗治疗有症状长骨转移患者中的比较。
Clin Orthop Relat Res. 2024 Dec 1;482(12):2193-2208. doi: 10.1097/CORR.0000000000003185. Epub 2024 Jul 23.
5
Sexual Harassment and Prevention Training性骚扰与预防培训
6
A rapid and systematic review of the clinical effectiveness and cost-effectiveness of paclitaxel, docetaxel, gemcitabine and vinorelbine in non-small-cell lung cancer.对紫杉醇、多西他赛、吉西他滨和长春瑞滨在非小细胞肺癌中的临床疗效和成本效益进行的快速系统评价。
Health Technol Assess. 2001;5(32):1-195. doi: 10.3310/hta5320.
7
Systemic pharmacological treatments for chronic plaque psoriasis: a network meta-analysis.系统性药理学治疗慢性斑块状银屑病:网络荟萃分析。
Cochrane Database Syst Rev. 2021 Apr 19;4(4):CD011535. doi: 10.1002/14651858.CD011535.pub4.
8
Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.在基层医疗机构或医院门诊环境中,如果患者出现以下症状和体征,可判断其是否患有 COVID-19。
Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3.
9
A New Measure of Quantified Social Health Is Associated With Levels of Discomfort, Capability, and Mental and General Health Among Patients Seeking Musculoskeletal Specialty Care.一种新的量化社会健康指标与寻求肌肉骨骼专科护理的患者的不适程度、能力以及心理和总体健康水平相关。
Clin Orthop Relat Res. 2025 Apr 1;483(4):647-663. doi: 10.1097/CORR.0000000000003394. Epub 2025 Feb 5.
10
Technological aids for the rehabilitation of memory and executive functioning in children and adolescents with acquired brain injury.脑损伤儿童和青少年记忆与执行功能康复的技术辅助手段。
Cochrane Database Syst Rev. 2016 Jul 1;7(7):CD011020. doi: 10.1002/14651858.CD011020.pub2.

引用本文的文献

1
HyperACP: A cutting-edge hybrid framework for anticancer peptide classification via scalable feature extraction and adaptive neighbor-based synthesis.HyperACP:一种通过可扩展特征提取和基于自适应邻域的合成进行抗癌肽分类的前沿混合框架。
PLoS Comput Biol. 2025 Sep 11;21(9):e1013489. doi: 10.1371/journal.pcbi.1013489. eCollection 2025 Sep.
2
EDNTOM: An Ensemble Learning and Weight Mechanism-Based Nanopore Methylation Detection Tool.EDNTOM:一种基于集成学习和权重机制的纳米孔甲基化检测工具。
ACS Omega. 2025 Jul 23;10(30):33031-33044. doi: 10.1021/acsomega.5c01924. eCollection 2025 Aug 5.
3
Integration of pre-trained protein language models with equivariant graph neural networks for peptide toxicity prediction.

本文引用的文献

1
GMNN2CD: identification of circRNA-disease associations based on variational inference and graph Markov neural networks.GMNN2CD:基于变分推理和图马尔可夫神经网络的环状RNA与疾病关联识别
Bioinformatics. 2022 Apr 12;38(8):2246-2253. doi: 10.1093/bioinformatics/btac079.
2
Heterogeneous graph attention network based on meta-paths for lncRNA-disease association prediction.基于元路径的异质图注意力网络用于 lncRNA-疾病关联预测。
Brief Bioinform. 2022 Jan 17;23(1). doi: 10.1093/bib/bbab407.
3
BioSeq-BLM: a platform for analyzing DNA, RNA and protein sequences based on biological language models.
将预训练的蛋白质语言模型与等变图神经网络集成用于肽毒性预测。
BMC Biol. 2025 Jul 28;23(1):229. doi: 10.1186/s12915-025-02329-1.
4
An artificial intelligence-based approach for identifying the proteins regulating liquid-liquid phase separation.一种基于人工智能的方法用于识别调节液-液相分离的蛋白质。
Brief Bioinform. 2025 Jul 2;26(4). doi: 10.1093/bib/bbaf313.
5
AOPxSVM: A Support Vector Machine for Identifying Antioxidant Peptides Using a Block Substitution Matrix and Amino Acid Composition, Transformation, and Distribution Embeddings.AOPxSVM:一种使用块替换矩阵以及氨基酸组成、转化和分布嵌入来识别抗氧化肽的支持向量机。
Foods. 2025 Jun 6;14(12):2014. doi: 10.3390/foods14122014.
6
Classification of Acid and Alkaline Enzymes Based on Normalized Van der Waals Volume Features.基于归一化范德华体积特征的酸碱酶分类
Proteomics Clin Appl. 2025 Jul;19(4):e70009. doi: 10.1002/prca.70009. Epub 2025 May 31.
7
DGCLCMI: a deep graph collaboration learning method to predict circRNA-miRNA interactions.DGCLCMI:一种用于预测环状RNA-微小RNA相互作用的深度图协作学习方法。
BMC Biol. 2025 Apr 23;23(1):104. doi: 10.1186/s12915-025-02197-9.
8
Identification of Eight Histone Methylation Modification Regulators Associated With Breast Cancer Prognosis.与乳腺癌预后相关的八种组蛋白甲基化修饰调节因子的鉴定
IET Syst Biol. 2025 Jan-Dec;19(1):e70012. doi: 10.1049/syb2.70012.
9
HNF-DDA: subgraph contrastive-driven transformer-style heterogeneous network embedding for drug-disease association prediction.HNF-DDA:用于药物-疾病关联预测的基于子图对比驱动的变压器式异构网络嵌入
BMC Biol. 2025 Apr 16;23(1):101. doi: 10.1186/s12915-025-02206-x.
10
Cancer Drug Sensitivity Prediction Based on Deep Transfer Learning.基于深度迁移学习的癌症药物敏感性预测
Int J Mol Sci. 2025 Mar 10;26(6):2468. doi: 10.3390/ijms26062468.
BioSeq-BLM:一个基于生物语言模型分析 DNA、RNA 和蛋白质序列的平台。
Nucleic Acids Res. 2021 Dec 16;49(22):e129. doi: 10.1093/nar/gkab829.
4
ProtRe-CN: Protein Remote Homology Detection by Combining Classification Methods and Network Methods via Learning to Rank.ProtRe-CN:通过学习排序结合分类方法和网络方法进行蛋白质远程同源性检测
IEEE/ACM Trans Comput Biol Bioinform. 2021 Aug 30;PP. doi: 10.1109/TCBB.2021.3108168.
5
Highly accurate protein structure prediction for the human proteome.高精准度的人类蛋白质组蛋白结构预测。
Nature. 2021 Aug;596(7873):590-596. doi: 10.1038/s41586-021-03828-1. Epub 2021 Jul 22.
6
ProtTrans: Toward Understanding the Language of Life Through Self-Supervised Learning.ProtTrans:通过自监督学习理解生命语言。
IEEE Trans Pattern Anal Mach Intell. 2022 Oct;44(10):7112-7127. doi: 10.1109/TPAMI.2021.3095381. Epub 2022 Sep 14.
7
Learning the protein language: Evolution, structure, and function.学习蛋白质语言:进化、结构和功能。
Cell Syst. 2021 Jun 16;12(6):654-669.e3. doi: 10.1016/j.cels.2021.05.017.
8
Structure-based protein function prediction using graph convolutional networks.基于结构的蛋白质功能预测使用图卷积网络。
Nat Commun. 2021 May 26;12(1):3168. doi: 10.1038/s41467-021-23303-9.
9
iCircDA-LTR: identification of circRNA-disease associations based on Learning to Rank.iCircDA-LTR:基于排序学习的环状RNA与疾病关联识别
Bioinformatics. 2021 Oct 11;37(19):3302-3310. doi: 10.1093/bioinformatics/btab334.
10
Sensitive protein alignments at tree-of-life scale using DIAMOND.使用 DIAMOND 进行生命之树尺度上的敏感蛋白质比对。
Nat Methods. 2021 Apr;18(4):366-368. doi: 10.1038/s41592-021-01101-x. Epub 2021 Apr 7.