Suppr超能文献

使用非正交决策树挖掘严重急性呼吸综合征冠状病毒蛋白酶切割数据:一种用于决定性模板选择的新方法。

Mining SARS-CoV protease cleavage data using non-orthogonal decision trees: a novel method for decisive template selection.

作者信息

Yang Zheng Rong

机构信息

Department of Computer Science, Exeter University, United Kingdom.

出版信息

Bioinformatics. 2005 Jun 1;21(11):2644-50. doi: 10.1093/bioinformatics/bti404. Epub 2005 Mar 29.

Abstract

MOTIVATION

Although the outbreak of the severe acute respiratory syndrome (SARS) is currently over, it is expected that it will return to attack human beings. A critical challenge to scientists from various disciplines worldwide is to study the specificity of cleavage activity of SARS-related coronavirus (SARS-CoV) and use the knowledge obtained from the study for effective inhibitor design to fight the disease. The most commonly used inductive programming methods for knowledge discovery from data assume that the elements of input patterns are orthogonal to each other. Suppose a sub-sequence is denoted as P2-P1-P1'-P2', the conventional inductive programming method may result in a rule like 'if P1 = Q, then the sub-sequence is cleaved, otherwise non-cleaved'. If the site P1 is not orthogonal to the others (for instance, P2, P1' and P2'), the prediction power of these kind of rules may be limited. Therefore this study is aimed at developing a novel method for constructing non-orthogonal decision trees for mining protease data.

RESULT

Eighteen sequences of coronavirus polyprotein were downloaded from NCBI (http://www.ncbi.nlm.nih.gov). Among these sequences, 252 cleavage sites were experimentally determined. These sequences were scanned using a sliding window with size k to generate about 50,000 k-mer sub-sequences (for short, k-mers). The value of k varies from 4 to 12 with a gap of two. The bio-basis function proposed by Thomson et al. is used to transform the k-mers to a high-dimensional numerical space on which an inductive programming method is applied for the purpose of deriving a decision tree for decision-making. The process of this transform is referred to as a bio-mapping. The constructed decision trees select about 10 out of 50,000 k-mers. This small set of selected k-mers is regarded as a set of decisive templates. By doing so, non-orthogonal decision trees are constructed using the selected templates and the prediction accuracy is significantly improved.

摘要

动机

尽管严重急性呼吸综合征(SARS)的爆发目前已经结束,但预计它会卷土重来侵袭人类。对于全球各个学科的科学家来说,一项关键挑战是研究严重急性呼吸综合征相关冠状病毒(SARS-CoV)切割活性的特异性,并利用该研究获得的知识来设计有效的抑制剂以对抗该疾病。从数据中进行知识发现时最常用的归纳编程方法假定输入模式的元素彼此正交。假设一个子序列表示为P2 - P1 - P1' - P2',传统的归纳编程方法可能会得出一条规则,如“如果P1 = Q,那么该子序列被切割,否则未被切割”。如果位点P1与其他位点(例如P2、P1'和P2')不正交,这类规则的预测能力可能会受到限制。因此,本研究旨在开发一种用于构建非正交决策树以挖掘蛋白酶数据的新方法。

结果

从美国国立医学图书馆(NCBI)(http://www.ncbi.nlm.nih.gov)下载了18个冠状病毒多聚蛋白序列。在这些序列中,通过实验确定了252个切割位点。使用大小为k的滑动窗口对这些序列进行扫描,以生成约50,000个k聚体子序列(简称为k聚体)。k的值从4变化到12,间隔为2。汤姆森等人提出的生物基函数用于将k聚体转换到一个高维数值空间上,并在该空间上应用归纳编程方法以导出用于决策制定的决策树这个转换过程被称为生物映射。构建好的决策树从50,000个k聚体中选出约10个。这一小部分选定的k聚体被视为一组决定性模板。通过这样做,使用选定的模板构建了非正交决策树,并且预测准确性得到了显著提高。

相似文献

1
Mining SARS-CoV protease cleavage data using non-orthogonal decision trees: a novel method for decisive template selection.
Bioinformatics. 2005 Jun 1;21(11):2644-50. doi: 10.1093/bioinformatics/bti404. Epub 2005 Mar 29.
2
Prediction of caspase cleavage sites using Bayesian bio-basis function neural networks.
Bioinformatics. 2005 May 1;21(9):1831-7. doi: 10.1093/bioinformatics/bti281. Epub 2005 Jan 25.
5
Peptide aldehyde inhibitors challenge the substrate specificity of the SARS-coronavirus main protease.
Antiviral Res. 2011 Nov;92(2):204-12. doi: 10.1016/j.antiviral.2011.08.001. Epub 2011 Aug 11.
6
Inhibitor recognition specificity of MERS-CoV papain-like protease may differ from that of SARS-CoV.
ACS Chem Biol. 2015 Jun 19;10(6):1456-65. doi: 10.1021/cb500917m. Epub 2015 Mar 16.
8
The substrate specificity of SARS coronavirus 3C-like proteinase.
Biochem Biophys Res Commun. 2005 Apr 15;329(3):934-40. doi: 10.1016/j.bbrc.2005.02.061.

引用本文的文献

1
SARS-CoV-2 3CLpro whole human proteome cleavage prediction and enrichment/depletion analysis.
Comput Biol Chem. 2022 Jun;98:107671. doi: 10.1016/j.compbiolchem.2022.107671. Epub 2022 Mar 28.
2
Big data analytics for preventive medicine.
Neural Comput Appl. 2020;32(9):4417-4451. doi: 10.1007/s00521-019-04095-y. Epub 2019 Mar 16.
4
Peptide bioinformatics: peptide classification using peptide machines.
Methods Mol Biol. 2008;458:159-83. doi: 10.1007/978-1-60327-101-1_9.
5
In silico prediction of SARS protease inhibitors by virtual high throughput screening.
Chem Biol Drug Des. 2007 Apr;69(4):269-79. doi: 10.1111/j.1747-0285.2007.00475.x.

本文引用的文献

1
RONN: the bio-basis function neural network technique applied to the detection of natively disordered regions in proteins.
Bioinformatics. 2005 Aug 15;21(16):3369-76. doi: 10.1093/bioinformatics/bti534. Epub 2005 Jun 9.
2
Bio-basis function neural network for prediction of protease cleavage sites in proteins.
IEEE Trans Neural Netw. 2005 Jan;16(1):263-74. doi: 10.1109/TNN.2004.836196.
3
Prediction of caspase cleavage sites using Bayesian bio-basis function neural networks.
Bioinformatics. 2005 May 1;21(9):1831-7. doi: 10.1093/bioinformatics/bti281. Epub 2005 Jan 25.
4
Biological applications of support vector machines.
Brief Bioinform. 2004 Dec;5(4):328-38. doi: 10.1093/bib/5.4.328.
5
Comparing two K-category assignments by a K-category correlation coefficient.
Comput Biol Chem. 2004 Dec;28(5-6):367-74. doi: 10.1016/j.compbiolchem.2004.09.006.
6
Reduced bio-basis function neural networks for protease cleavage site prediction.
J Bioinform Comput Biol. 2004 Sep;2(3):511-31. doi: 10.1142/s0219720004000715.
7
Predicting genetic regulatory response using classification.
Bioinformatics. 2004 Aug 4;20 Suppl 1:i232-40. doi: 10.1093/bioinformatics/bth923.
8
Coronavirus 3CLpro proteinase cleavage sites: possible relevance to SARS virus pathology.
BMC Bioinformatics. 2004 Jun 6;5:72. doi: 10.1186/1471-2105-5-72.
10
Bio-support vector machines for computational proteomics.
Bioinformatics. 2004 Mar 22;20(5):735-41. doi: 10.1093/bioinformatics/btg477. Epub 2004 Jan 29.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验