Interpro: 一个用于蛋白质序列预处理的 R 包。

Interpol: An R package for preprocessing of protein sequences.

机构信息

Department of Bioinformatics, Center for Medical Biotechnology, University of Duisburg-Essen, Universitaetsstr, 2, 45141 Essen, Germany.

出版信息

BioData Min. 2011 Jun 17;4:16. doi: 10.1186/1756-0381-4-16.

DOI:10.1186/1756-0381-4-16

PMID:21682849

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3138420/

Abstract

BACKGROUND

Most machine learning techniques currently applied in the literature need a fixed dimensionality of input data. However, this requirement is frequently violated by real input data, such as DNA and protein sequences, that often differ in length due to insertions and deletions. It is also notable that performance in classification and regression is often improved by numerical encoding of amino acids, compared to the commonly used sparse encoding.

RESULTS

The software "Interpol" encodes amino acid sequences as numerical descriptor vectors using a database of currently 532 descriptors (mainly from AAindex), and normalizes sequences to uniform length with one of five linear or non-linear interpolation algorithms. Interpol is distributed with open source as platform independent R-package. It is typically used for preprocessing of amino acid sequences for classification or regression.

CONCLUSIONS

The functionality of Interpol widens the spectrum of machine learning methods that can be applied to biological sequences, and it will in many cases improve their performance in classification and regression.

摘要

背景

目前文献中应用的大多数机器学习技术都需要输入数据具有固定的维度。然而，实际输入数据（如 DNA 和蛋白质序列）经常违反此要求，因为它们由于插入和缺失而在长度上有所不同。值得注意的是，与常用的稀疏编码相比，对氨基酸进行数值编码通常可以提高分类和回归的性能。

结果

软件“Interpol”使用当前 532 个描述符（主要来自 AAindex）的数据库将氨基酸序列编码为数值描述符向量，并使用五种线性或非线性插值算法之一将序列标准化为统一长度。Interpol 作为独立于平台的 R 包以开源形式发布。它通常用于分类或回归的氨基酸序列的预处理。

结论

Interpol 的功能拓宽了可应用于生物序列的机器学习方法的范围，并且在许多情况下会提高它们在分类和回归中的性能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1786/3138420/47540f8e74ac/1756-0381-4-16-1.jpg

相似文献

Interpol: An R package for preprocessing of protein sequences.Interpro: 一个用于蛋白质序列预处理的 R 包。

BioData Min. 2011 Jun 17;4:16. doi: 10.1186/1756-0381-4-16.

protr/ProtrWeb: R package and web server for generating various numerical representation schemes of protein sequences.protr/ProtrWeb：用于生成蛋白质序列各种数值表示方案的R包和网络服务器。

Bioinformatics. 2015 Jun 1;31(11):1857-9. doi: 10.1093/bioinformatics/btv042. Epub 2015 Jan 24.

MathFeature: feature extraction package for DNA, RNA and protein sequences based on mathematical descriptors.MathFeature：基于数学描述符的 DNA、RNA 和蛋白质序列特征提取包。

Brief Bioinform. 2022 Jan 17;23(1). doi: 10.1093/bib/bbab434.

Machine learning on normalized protein sequences.基于标准化蛋白质序列的机器学习。

BMC Res Notes. 2011 Mar 31;4:94. doi: 10.1186/1756-0500-4-94.

MLSeq: Machine learning interface for RNA-sequencing data.MLSeq：用于 RNA-seq 数据的机器学习接口。

Comput Methods Programs Biomed. 2019 Jul;175:223-231. doi: 10.1016/j.cmpb.2019.04.007. Epub 2019 Apr 29.

Computational methods for ubiquitination site prediction using physicochemical properties of protein sequences.利用蛋白质序列的物理化学性质进行泛素化位点预测的计算方法。

BMC Bioinformatics. 2016 Mar 3;17:116. doi: 10.1186/s12859-016-0959-z.

iLearn: an integrated platform and meta-learner for feature engineering, machine-learning analysis and modeling of DNA, RNA and protein sequence data.iLearn：一个集成平台和元学习者，用于 DNA、RNA 和蛋白质序列数据的特征工程、机器学习分析和建模。

Brief Bioinform. 2020 May 21;21(3):1047-1057. doi: 10.1093/bib/bbz041.

SFAPS: an R package for structure/function analysis of protein sequences based on informational spectrum method.SFAPS：一个基于信息谱方法进行蛋白质序列结构/功能分析的R软件包。

Methods. 2014 Oct 1;69(3):207-12. doi: 10.1016/j.ymeth.2014.08.004. Epub 2014 Aug 15.

AMS 4.0: consensus prediction of post-translational modifications in protein sequences.AMS 4.0：蛋白质序列中翻译后修饰的共识预测。

Amino Acids. 2012 Aug;43(2):573-82. doi: 10.1007/s00726-012-1290-2. Epub 2012 May 4.

UltraPse: A Universal and Extensible Software Platform for Representing Biological Sequences.UltraPse：一种用于表示生物序列的通用且可扩展的软件平台。

Int J Mol Sci. 2017 Nov 14;18(11):2400. doi: 10.3390/ijms18112400.

引用本文的文献

Chaos game representation and its applications in bioinformatics.混沌游戏表示法及其在生物信息学中的应用。

Comput Struct Biotechnol J. 2021 Nov 10;19:6263-6271. doi: 10.1016/j.csbj.2021.11.008. eCollection 2021.

Encodings and models for antimicrobial peptide classification for multi-resistant pathogens.用于多重耐药病原体抗菌肽分类的编码与模型

BioData Min. 2019 Mar 4;12:7. doi: 10.1186/s13040-019-0196-x. eCollection 2019.

A Computational Approach for the Prediction of HIV Resistance Based on Amino Acid and Nucleotide Descriptors.基于氨基酸和核苷酸描述符的 HIV 耐药性预测的计算方法。

Molecules. 2018 Oct 24;23(11):2751. doi: 10.3390/molecules23112751.

COUSCOus: improved protein contact prediction using an empirical Bayes covariance estimator.COUSCOus：使用经验贝叶斯协方差估计器改进蛋白质接触预测。

BMC Bioinformatics. 2016 Dec 15;17(1):533. doi: 10.1186/s12859-016-1400-3.

Learning the Relationship between the Primary Structure of HIV Envelope Glycoproteins and Neutralization Activity of Particular Antibodies by Using Artificial Neural Networks.利用人工神经网络研究HIV包膜糖蛋白一级结构与特定抗体中和活性之间的关系。

Int J Mol Sci. 2016 Oct 11;17(10):1710. doi: 10.3390/ijms17101710.

SHIVA - a web application for drug resistance and tropism testing in HIV.SHIVA——一款用于HIV耐药性和嗜性检测的网络应用程序。

BMC Bioinformatics. 2016 Aug 22;17(1):314. doi: 10.1186/s12859-016-1179-2.

Genotypic Prediction of Co-receptor Tropism of HIV-1 Subtypes A and C.HIV-1 A和C亚型共受体嗜性的基因分型预测

Sci Rep. 2016 Apr 29;6:24883. doi: 10.1038/srep24883.

Exploiting HIV-1 protease and reverse transcriptase cross-resistance information for improved drug resistance prediction by means of multi-label classification.利用HIV-1蛋白酶和逆转录酶交叉耐药性信息，通过多标签分类改进耐药性预测。

BioData Min. 2016 Feb 29;9:10. doi: 10.1186/s13040-016-0089-1. eCollection 2016.

A simple structure-based model for the prediction of HIV-1 co-receptor tropism.一种基于结构的简单模型，用于预测 HIV-1 共受体嗜性。

BioData Min. 2014 Aug 1;7:14. doi: 10.1186/1756-0381-7-14. eCollection 2014.

3'-Phosphoadenosine 5'-phosphosulfate (PAPS) synthases, naturally fragile enzymes specifically stabilized by nucleotide binding.3'-磷酸腺苷 5'-磷酸硫酸（PAPS）合酶，天然脆弱的酶，特异性地通过核苷酸结合稳定。

J Biol Chem. 2012 May 18;287(21):17645-17655. doi: 10.1074/jbc.M111.325498. Epub 2012 Mar 26.

本文引用的文献

Insights into the classification of small GTPases.对小GTP酶分类的见解。

Adv Appl Bioinform Chem. 2010;3:15-24. doi: 10.2147/aabc.s8891. Epub 2010 May 21.

Machine learning on normalized protein sequences.基于标准化蛋白质序列的机器学习。

BMC Res Notes. 2011 Mar 31;4:94. doi: 10.1186/1756-0500-4-94.

Prediction of co-receptor usage of HIV-1 from genotype.从基因型预测 HIV-1 的辅助受体使用情况。

PLoS Comput Biol. 2010 Apr 15;6(4):e1000743. doi: 10.1371/journal.pcbi.1000743.

A computational approach for the identification of small GTPases based on preprocessed amino acid sequences.一种基于预处理氨基酸序列鉴定小GTP酶的计算方法。

Technol Cancer Res Treat. 2009 Oct;8(5):333-41. doi: 10.1177/153303460900800503.

AAindex: amino acid index database, progress report 2008.AAindex：氨基酸索引数据库，2008年进展报告。

Nucleic Acids Res. 2008 Jan;36(Database issue):D202-5. doi: 10.1093/nar/gkm998. Epub 2007 Nov 12.

Huntington's disease.亨廷顿舞蹈症

Lancet. 2007 Jan 20;369(9557):218-28. doi: 10.1016/S0140-6736(07)60111-1.

ROCR: visualizing classifier performance in R.ROCR：在R语言中可视化分类器性能

Bioinformatics. 2005 Oct 15;21(20):3940-1. doi: 10.1093/bioinformatics/bti623. Epub 2005 Aug 11.

Reliable prediction of T-cell epitopes using neural networks with novel sequence representations.使用具有新型序列表示的神经网络对T细胞表位进行可靠预测。

Protein Sci. 2003 May;12(5):1007-17. doi: 10.1110/ps.0239403.

Classifying G-protein coupled receptors with support vector machines.使用支持向量机对G蛋白偶联受体进行分类。

Bioinformatics. 2002 Jan;18(1):147-59. doi: 10.1093/bioinformatics/18.1.147.

Combining evolutionary information and neural networks to predict protein secondary structure.结合进化信息与神经网络预测蛋白质二级结构。

Proteins. 1994 May;19(1):55-72. doi: 10.1002/prot.340190108.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

Interpro: 一个用于蛋白质序列预处理的 R 包。

Interpol: An R package for preprocessing of protein sequences.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献