Suppr超能文献

基因组特征、自组织映射和高阶系统发育:参数分析。

Genome signatures, self-organizing maps and higher order phylogenies: a parametric analysis.

机构信息

MRC Virology Unit, Institute of Virology. Church Street, Glasgow G11 5JR, UK.

出版信息

Evol Bioinform Online. 2007 Sep 17;3:211-36.

Abstract

Genome signatures are data vectors derived from the compositional statistics of DNA. The self-organizing map (SOM) is a neural network method for the conceptualisation of relationships within complex data, such as genome signatures. The various parameters of the SOM training phase are investigated for their effect on the accuracy of the resulting output map. It is concluded that larger SOMs, as well as taking longer to train, are less sensitive in phylogenetic classification of unknown DNA sequences. However, where a classification can be made, a larger SOM is more accurate. Increasing the number of iterations in the training phase of the SOM only slightly increases accuracy, without improving sensitivity. The optimal length of the DNA sequence k-mer from which the genome signature should be derived is 4 or 5, but shorter values are almost as effective. In general, these results indicate that small, rapidly trained SOMs are generally as good as larger, longer trained ones for the analysis of genome signatures. These results may also be more generally applicable to the use of SOMs for other complex data sets, such as microarray data.

摘要

基因组特征是从 DNA 的组成统计数据中得出的数据向量。自组织映射(SOM)是一种用于概念化复杂数据(如基因组特征)内部关系的神经网络方法。研究了 SOM 训练阶段的各种参数对生成输出映射准确性的影响。结论是,更大的 SOM 需要更长的训练时间,在对未知 DNA 序列进行系统发育分类时,其敏感性较低。然而,在可以进行分类的情况下,更大的 SOM 更准确。增加 SOM 训练阶段的迭代次数仅略微提高了准确性,而没有提高敏感性。从基因组特征中应该导出的 DNA 序列 k-mer 的最佳长度为 4 或 5,但较短的值也几乎同样有效。一般来说,这些结果表明,对于基因组特征的分析,小的、快速训练的 SOM 通常与大的、长时间训练的 SOM 一样好。这些结果可能更普遍适用于 SOM 对其他复杂数据集(如微阵列数据)的使用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1381/2684143/46f072bbe5f3/EBO-03-211-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验