• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

UltraPse:一种用于表示生物序列的通用且可扩展的软件平台。

UltraPse: A Universal and Extensible Software Platform for Representing Biological Sequences.

机构信息

School of Computer Science and Technology, Tianjin University, Tianjin 300350, China.

School of Chemical Engineering, Tianjin University, Tianjin 300350, China.

出版信息

Int J Mol Sci. 2017 Nov 14;18(11):2400. doi: 10.3390/ijms18112400.

DOI:10.3390/ijms18112400
PMID:29135934
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5713368/
Abstract

With the avalanche of biological sequences in public databases, one of the most challenging problems in computational biology is to predict their biological functions and cellular attributes. Most of the existing prediction algorithms can only handle fixed-length numerical vectors. Therefore, it is important to be able to represent biological sequences with various lengths using fixed-length numerical vectors. Although several algorithms, as well as software implementations, have been developed to address this problem, these existing programs can only provide a fixed number of representation modes. Every time a new sequence representation mode is developed, a new program will be needed. In this paper, we propose the UltraPse as a universal software platform for this problem. The function of the UltraPse is not only to generate various existing sequence representation modes, but also to simplify all future programming works in developing novel representation modes. The extensibility of UltraPse is particularly enhanced. It allows the users to define their own representation mode, their own physicochemical properties, or even their own types of biological sequences. Moreover, UltraPse is also the fastest software of its kind. The source code package, as well as the executables for both Linux and Windows platforms, can be downloaded from the GitHub repository.

摘要

随着公共数据库中生物序列的大量涌现,计算生物学中最具挑战性的问题之一是预测它们的生物学功能和细胞属性。大多数现有的预测算法只能处理固定长度的数值向量。因此,能够使用固定长度的数值向量表示各种长度的生物序列非常重要。尽管已经开发了几种算法和软件实现来解决这个问题,但这些现有的程序只能提供固定数量的表示模式。每次开发新的序列表示模式时,都需要一个新的程序。在本文中,我们提出了 UltraPse 作为解决这个问题的通用软件平台。UltraPse 的功能不仅是生成各种现有的序列表示模式,还简化了开发新表示模式的所有未来编程工作。UltraPse 的可扩展性得到了特别增强。它允许用户定义自己的表示模式、自己的物理化学性质,甚至自己类型的生物序列。此外,UltraPse 也是同类软件中最快的。源代码包以及适用于 Linux 和 Windows 平台的可执行文件均可从 GitHub 存储库下载。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8c93/5713368/b93d5917419a/ijms-18-02400-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8c93/5713368/6fc9d5ef72f3/ijms-18-02400-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8c93/5713368/222c79d9d824/ijms-18-02400-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8c93/5713368/6426052e76c4/ijms-18-02400-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8c93/5713368/b93d5917419a/ijms-18-02400-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8c93/5713368/6fc9d5ef72f3/ijms-18-02400-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8c93/5713368/222c79d9d824/ijms-18-02400-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8c93/5713368/6426052e76c4/ijms-18-02400-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8c93/5713368/b93d5917419a/ijms-18-02400-g004.jpg

相似文献

1
UltraPse: A Universal and Extensible Software Platform for Representing Biological Sequences.UltraPse:一种用于表示生物序列的通用且可扩展的软件平台。
Int J Mol Sci. 2017 Nov 14;18(11):2400. doi: 10.3390/ijms18112400.
2
PseKNC-General: a cross-platform package for generating various modes of pseudo nucleotide compositions.PseKNC通用版:一个用于生成各种伪核苷酸组成模式的跨平台软件包。
Bioinformatics. 2015 Jan 1;31(1):119-20. doi: 10.1093/bioinformatics/btu602. Epub 2014 Sep 16.
3
A Brief Review on Software Tools in Generating Chou's Pseudo-factor Representations for All Types of Biological Sequences.关于用于生成所有类型生物序列的周氏伪因子表示的软件工具的简要综述。
Protein Pept Lett. 2018;25(9):822-829. doi: 10.2174/0929866525666180905111124.
4
PseAAC-General: fast building various modes of general form of Chou's pseudo-amino acid composition for large-scale protein datasets.PseAAC-General:快速构建用于大规模蛋白质数据集的周氏伪氨基酸组成通用形式的各种模式。
Int J Mol Sci. 2014 Feb 26;15(3):3495-506. doi: 10.3390/ijms15033495.
5
Pse-in-One: a web server for generating various modes of pseudo components of DNA, RNA, and protein sequences.Pse-in-One:一个用于生成DNA、RNA和蛋白质序列各种伪组件模式的网络服务器。
Nucleic Acids Res. 2015 Jul 1;43(W1):W65-71. doi: 10.1093/nar/gkv458. Epub 2015 May 9.
6
SFAPS: an R package for structure/function analysis of protein sequences based on informational spectrum method.SFAPS:一个基于信息谱方法进行蛋白质序列结构/功能分析的R软件包。
Methods. 2014 Oct 1;69(3):207-12. doi: 10.1016/j.ymeth.2014.08.004. Epub 2014 Aug 15.
7
repRNA: a web server for generating various feature vectors of RNA sequences.repRNA:一个用于生成RNA序列各种特征向量的网络服务器。
Mol Genet Genomics. 2016 Feb;291(1):473-81. doi: 10.1007/s00438-015-1078-7. Epub 2015 Jun 18.
8
PseAAC-Builder: a cross-platform stand-alone program for generating various special Chou's pseudo-amino acid compositions.PseAAC-Builder:一个跨平台的独立程序,用于生成各种特殊的周的伪氨基酸组成。
Anal Biochem. 2012 Jun 15;425(2):117-9. doi: 10.1016/j.ab.2012.03.015. Epub 2012 Mar 27.
9
libFLASM: a software library for fixed-length approximate string matching.libFLASM:一个用于固定长度近似字符串匹配的软件库。
BMC Bioinformatics. 2016 Nov 10;17(1):454. doi: 10.1186/s12859-016-1320-2.
10
repDNA: a Python package to generate various modes of feature vectors for DNA sequences by incorporating user-defined physicochemical properties and sequence-order effects.repDNA:一个 Python 包,通过结合用户定义的物理化学性质和序列顺序效应,为 DNA 序列生成各种模式的特征向量。
Bioinformatics. 2015 Apr 15;31(8):1307-9. doi: 10.1093/bioinformatics/btu820. Epub 2014 Dec 10.

引用本文的文献

1
Advances in the regulation of macrophage polarization by the tumor microenvironment.肿瘤微环境对巨噬细胞极化调控的研究进展
Discov Oncol. 2025 Aug 6;16(1):1487. doi: 10.1007/s12672-025-03258-9.
2
HOXD9 is a potential prognostic biomarker involved in immune microenvironment of glioma.HOXD9 是一种潜在的与胶质瘤免疫微环境相关的预后生物标志物。
J Cancer Res Clin Oncol. 2023 Nov;149(16):14911-14926. doi: 10.1007/s00432-023-05275-z. Epub 2023 Aug 21.
3
Review of Machine Learning Methods for the Prediction and Reconstruction of Metabolic Pathways.

本文引用的文献

1
iPromoter-2L: a two-layer predictor for identifying promoters and their types by multi-window-based PseKNC.iPromoter-2L:一种双层预测器,通过基于多窗口的 PseKNC 来识别启动子及其类型。
Bioinformatics. 2018 Jan 1;34(1):33-40. doi: 10.1093/bioinformatics/btx579.
2
iSS-PC: Identifying Splicing Sites via Physical-Chemical Properties Using Deep Sparse Auto-Encoder.iSS-PC:使用深度稀疏自动编码器通过物理化学性质识别剪接位点。
Sci Rep. 2017 Aug 15;7(1):8222. doi: 10.1038/s41598-017-08523-8.
3
iRNAm5C-PseDNC: identifying RNA 5-methylcytosine sites by incorporating physical-chemical properties into pseudo dinucleotide composition.
用于代谢途径预测与重建的机器学习方法综述
Front Mol Biosci. 2021 Jun 17;8:634141. doi: 10.3389/fmolb.2021.634141. eCollection 2021.
4
Identifying Heat Shock Protein Families from Imbalanced Data by Using Combined Features.利用组合特征从不平衡数据中识别热休克蛋白家族。
Comput Math Methods Med. 2020 Sep 23;2020:8894478. doi: 10.1155/2020/8894478. eCollection 2020.
5
Use of Chou's 5-steps rule to predict the subcellular localization of gram-negative and gram-positive bacterial proteins by multi-label learning based on gene ontology annotation and profile alignment.利用 Chou 的 5 步规则,通过基于基因本体论注释和序列比对的多标签学习,预测革兰氏阴性和革兰氏阳性细菌蛋白质的亚细胞定位。
J Integr Bioinform. 2020 Jun 29;18(1):51-79. doi: 10.1515/jib-2019-0091.
6
Identification of an Individualized Prognostic Signature Based on the RWSR Model in Early-Stage Bladder Carcinoma.基于 RWSR 模型的早期膀胱癌个体化预后特征的鉴定。
Biomed Res Int. 2020 Jun 4;2020:9186546. doi: 10.1155/2020/9186546. eCollection 2020.
7
SFPEL-LPI: Sequence-based feature projection ensemble learning for predicting LncRNA-protein interactions.SFPEL-LPI:基于序列的特征投影集成学习预测 LncRNA-蛋白质相互作用。
PLoS Comput Biol. 2018 Dec 11;14(12):e1006616. doi: 10.1371/journal.pcbi.1006616. eCollection 2018 Dec.
8
Special Protein Molecules Computational Identification.特殊蛋白质分子的计算鉴定。
Int J Mol Sci. 2018 Feb 10;19(2):536. doi: 10.3390/ijms19020536.
iRNAm5C-PseDNC:通过将理化性质融入伪二核苷酸组成来识别RNA 5-甲基胞嘧啶位点
Oncotarget. 2017 Jun 20;8(25):41178-41188. doi: 10.18632/oncotarget.17104.
4
iRNA-PseU: Identifying RNA pseudouridine sites.iRNA-PseU:识别RNA假尿苷位点。
Mol Ther Nucleic Acids. 2016;5(7):e332. doi: 10.1038/mtna.2016.37.
5
Predicting protein lysine phosphoglycerylation sites by hybridizing many sequence based features.通过整合多种基于序列的特征来预测蛋白质赖氨酸磷酸甘油化位点。
Mol Biosyst. 2017 May 2;13(5):874-882. doi: 10.1039/c6mb00875e.
6
Predicting protein submitochondrial locations by incorporating the positional-specific physicochemical properties into Chou's general pseudo-amino acid compositions.通过将位置特异性物理化学性质纳入周氏广义伪氨基酸组成来预测蛋白质亚线粒体定位
J Theor Biol. 2017 Mar 7;416:81-87. doi: 10.1016/j.jtbi.2016.12.026. Epub 2017 Jan 8.
7
Recombination Hotspot/Coldspot Identification Combining Three Different Pseudocomponents via an Ensemble Learning Approach.通过集成学习方法结合三种不同拟组份识别重组热点/冷点。
Biomed Res Int. 2016;2016:8527435. doi: 10.1155/2016/8527435. Epub 2016 Aug 25.
8
Combining pseudo dinucleotide composition with the Z curve method to improve the accuracy of predicting DNA elements: a case study in recombination spots.结合伪二核苷酸组成与Z曲线方法提高DNA元件预测准确性:以重组位点为例
Mol Biosyst. 2016 Aug 16;12(9):2893-900. doi: 10.1039/c6mb00374e.
9
pSumo-CD: predicting sumoylation sites in proteins with covariance discriminant algorithm by incorporating sequence-coupled effects into general PseAAC.pSumo-CD:通过将序列耦合效应纳入通用伪氨基酸组成,利用协方差判别算法预测蛋白质中的SUMO化位点。
Bioinformatics. 2016 Oct 15;32(20):3133-3141. doi: 10.1093/bioinformatics/btw387. Epub 2016 Jun 26.
10
iHyd-PseCp: Identify hydroxyproline and hydroxylysine in proteins by incorporating sequence-coupled effects into general PseAAC.iHyd-PseCp:通过将序列耦合效应纳入通用伪氨基酸组成来鉴定蛋白质中的羟脯氨酸和羟赖氨酸。
Oncotarget. 2016 Jul 12;7(28):44310-44321. doi: 10.18632/oncotarget.10027.