Suppr超能文献

蛋白质编码指标评估。

Assessment of protein coding measures.

作者信息

Fickett J W, Tung C S

机构信息

Theoretical Biology and Biophysics Group, Los Alamos National Laboratory, NM 87545.

出版信息

Nucleic Acids Res. 1992 Dec 25;20(24):6441-50. doi: 10.1093/nar/20.24.6441.

Abstract

A number of methods for recognizing protein coding genes in DNA sequence have been published over the last 13 years, and new, more comprehensive algorithms, drawing on the repertoire of existing techniques, continue to be developed. To optimize continued development, it is valuable to systematically review and evaluate published techniques. At the core of most gene recognition algorithms is one or more coding measures--functions which produce, given any sample window of sequence, a number or vector intended to measure the degree to which a sample sequence resembles a window of 'typical' exonic DNA. In this paper we review and synthesize the underlying coding measures from published algorithms. A standardized benchmark is described, and each of the measures is evaluated according to this benchmark. Our main conclusion is that a very simple and obvious measure--counting oligomers--is more effective than any of the more sophisticated measures. Different measures contain different information. However there is a great deal of redundancy in the current suite of measures. We show that in future development of gene recognition algorithms, attention can probably be limited to six of the twenty or so measures proposed to date.

摘要

在过去的13年里,已经发表了许多用于识别DNA序列中蛋白质编码基因的方法,并且基于现有技术的新的、更全面的算法仍在不断开发。为了优化持续发展,系统地回顾和评估已发表的技术是很有价值的。大多数基因识别算法的核心是一个或多个编码度量——这些函数在给定任何序列样本窗口的情况下,产生一个数字或向量,旨在衡量样本序列与“典型”外显子DNA窗口的相似程度。在本文中,我们回顾并综合了已发表算法中的潜在编码度量。描述了一个标准化的基准,并根据这个基准对每个度量进行评估。我们的主要结论是,一个非常简单且明显的度量——计算寡聚物——比任何更复杂的度量都更有效。不同的度量包含不同的信息。然而,当前的度量组中存在大量冗余。我们表明,在基因识别算法的未来发展中,注意力可能可以局限于迄今为止提出的二十多种度量中的六种。

相似文献

1
Assessment of protein coding measures.蛋白质编码指标评估。
Nucleic Acids Res. 1992 Dec 25;20(24):6441-50. doi: 10.1093/nar/20.24.6441.

引用本文的文献

6
AI applications in functional genomics.人工智能在功能基因组学中的应用。
Comput Struct Biotechnol J. 2021 Oct 11;19:5762-5790. doi: 10.1016/j.csbj.2021.10.009. eCollection 2021.

本文引用的文献

1
Distance, size and shape.距离、大小和形状。
Ann Eugen. 1954 Mar;18(4):337-43. doi: 10.1111/j.1469-1809.1952.tb02527.x.
2
Recognition of protein coding regions in DNA sequences.DNA序列中蛋白质编码区域的识别。
Nucleic Acids Res. 1982 Sep 11;10(17):5303-18. doi: 10.1093/nar/10.17.5303.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验