• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用决策树算法在人类DNA中定位蛋白质编码区域。

Locating protein coding regions in human DNA using a decision tree algorithm.

作者信息

Salzberg S

机构信息

Department of Computer Science, Johns Hopkins University, Baltimore, MD 21218, USA.

出版信息

J Comput Biol. 1995 Fall;2(3):473-85. doi: 10.1089/cmb.1995.2.473.

DOI:10.1089/cmb.1995.2.473
PMID:8521276
Abstract

Genes in eukaryotic DNA cover hundreds or thousands of base pairs, while the regions of those genes that code for proteins may occupy only a small percentage of the sequence. Identifying the coding regions is of vital importance in understanding these genes. Many recent research efforts have studied computational methods for distinguishing between coding and noncoding regions, and several promising results have been reported. We describe here a new approach, using a machine learning system that builds decision trees from the data. This approach combines several coding measures to produce classifiers with consistently higher accuracies than previous methods, on DNA sequences ranging from 54 to 162 base pairs in length. The algorithm is very efficient, and it can easily be adapted to different sequence lengths. Our conclusion is that decision trees are a highly effective tool for identifying protein coding regions.

摘要

真核生物DNA中的基因涵盖数百或数千个碱基对,而那些编码蛋白质的基因区域可能仅占序列的一小部分。识别编码区域对于理解这些基因至关重要。最近许多研究致力于研究区分编码区和非编码区的计算方法,并已报道了一些有前景的结果。我们在此描述一种新方法,该方法使用一个从数据构建决策树的机器学习系统。这种方法结合了多种编码度量,以生成在长度为54至162个碱基对的DNA序列上,准确率始终高于先前方法的分类器。该算法非常高效,并且可以轻松适应不同的序列长度。我们的结论是,决策树是识别蛋白质编码区域的高效工具。

相似文献

1
Locating protein coding regions in human DNA using a decision tree algorithm.使用决策树算法在人类DNA中定位蛋白质编码区域。
J Comput Biol. 1995 Fall;2(3):473-85. doi: 10.1089/cmb.1995.2.473.
2
A decision tree system for finding genes in DNA.一种用于在DNA中寻找基因的决策树系统。
J Comput Biol. 1998 Winter;5(4):667-80. doi: 10.1089/cmb.1998.5.667.
3
Locating protein-coding regions in human DNA sequences by a multiple sensor-neural network approach.通过多传感器神经网络方法在人类DNA序列中定位蛋白质编码区域。
Proc Natl Acad Sci U S A. 1991 Dec 15;88(24):11261-5. doi: 10.1073/pnas.88.24.11261.
4
Finding genes in DNA using decision trees and dynamic programming.使用决策树和动态规划在DNA中寻找基因。
Proc Int Conf Intell Syst Mol Biol. 1996;4:201-10.
5
Determination of eukaryotic protein coding regions using neural networks and information theory.使用神经网络和信息论确定真核生物蛋白质编码区域
J Mol Biol. 1992 Jul 20;226(2):471-9. doi: 10.1016/0022-2836(92)90961-i.
6
A Fourier characteristic of coding sequences: origins and a non-Fourier approximation.编码序列的傅里叶特征:起源与非傅里叶近似
J Comput Biol. 2005 Nov;12(9):1153-65. doi: 10.1089/cmb.2005.12.1153.
7
[Analysis, identification and correction of some errors of model refseqs appeared in NCBI Human Gene Database by in silico cloning and experimental verification of novel human genes].[通过新型人类基因的电子克隆和实验验证对NCBI人类基因数据库中出现的模型参考序列的一些错误进行分析、鉴定和校正]
Yi Chuan Xue Bao. 2004 May;31(5):431-43.
8
Human coding and noncoding DNA: compositional correlations.人类编码和非编码DNA:组成相关性。
Mol Phylogenet Evol. 1996 Feb;5(1):2-12. doi: 10.1006/mpev.1996.0002.
9
Discrete Ramanujan transform for distinguishing the protein coding regions from other regions.用于区分蛋白质编码区域与其他区域的离散拉马努金变换。
Mol Cell Probes. 2014 Oct-Dec;28(5-6):228-36. doi: 10.1016/j.mcp.2014.04.002. Epub 2014 Apr 29.
10
Recognizing shorter coding regions of human genes based on the statistics of stop codons.基于终止密码子统计识别人类基因的较短编码区域。
Biopolymers. 2002 Mar;63(3):207-16. doi: 10.1002/bip.10054.

引用本文的文献

1
Characterization and Identification of Natural Antimicrobial Peptides on Different Organisms.不同生物体中天然抗菌肽的特性与鉴定。
Int J Mol Sci. 2020 Feb 2;21(3):986. doi: 10.3390/ijms21030986.
2
Investigation and identification of protein carbonylation sites based on position-specific amino acid composition and physicochemical features.基于位点特异性氨基酸组成和理化特性的蛋白质羰基化位点的研究与鉴定
BMC Bioinformatics. 2017 Mar 14;18(Suppl 3):66. doi: 10.1186/s12859-017-1472-8.
3
A Random Forest Approach for Counting Silicone Oil Droplets and Protein Particles in Antibody Formulations Using Flow Microscopy.
一种使用流动显微镜对抗体制剂中的硅油滴和蛋白质颗粒进行计数的随机森林方法。
Pharm Res. 2017 Feb;34(2):479-491. doi: 10.1007/s11095-016-2079-x. Epub 2016 Dec 19.
4
Gene expression prediction using low-rank matrix completion.使用低秩矩阵补全进行基因表达预测。
BMC Bioinformatics. 2016 Jun 17;17(1):243. doi: 10.1186/s12859-016-1106-6.
5
IN-MACA-MCC: Integrated Multiple Attractor Cellular Automata with Modified Clonal Classifier for Human Protein Coding and Promoter Prediction.IN-MACA-MCC:用于人类蛋白质编码和启动子预测的带有改进克隆分类器的集成多吸引子细胞自动机
Adv Bioinformatics. 2014;2014:261362. doi: 10.1155/2014/261362. Epub 2014 Jul 15.
6
The use of classification trees for bioinformatics.分类树在生物信息学中的应用。
Wiley Interdiscip Rev Data Min Knowl Discov. 2011 Jan;1(1):55-63. doi: 10.1002/widm.14. Epub 2011 Jan 6.
7
Classification of genomic islands using decision trees and their ensemble algorithms.基于决策树及其集成算法的基因组岛分类。
BMC Genomics. 2010 Nov 2;11 Suppl 2(Suppl 2):S1. doi: 10.1186/1471-2164-11-S2-S1.
8
Computational gene finding in plants.植物中的计算基因发现
Plant Mol Biol. 2002 Jan;48(1-2):39-48.