Suppr超能文献

GANN:用于检测DNA中特征保守组合的遗传算法神经网络。

GANN: genetic algorithm neural networks for the detection of conserved combinations of features in DNA.

作者信息

Beiko Robert G, Charlebois Robert L

机构信息

Institute for Molecular Bioscience, The University of Queensland, Brisbane 4072, Australia.

出版信息

BMC Bioinformatics. 2005 Feb 22;6:36. doi: 10.1186/1471-2105-6-36.

Abstract

BACKGROUND

The multitude of motif detection algorithms developed to date have largely focused on the detection of patterns in primary sequence. Since sequence-dependent DNA structure and flexibility may also play a role in protein-DNA interactions, the simultaneous exploration of sequence- and structure-based hypotheses about the composition of binding sites and the ordering of features in a regulatory region should be considered as well. The consideration of structural features requires the development of new detection tools that can deal with data types other than primary sequence.

RESULTS

GANN (available at http://bioinformatics.org.au/gann) is a machine learning tool for the detection of conserved features in DNA. The software suite contains programs to extract different regions of genomic DNA from flat files and convert these sequences to indices that reflect sequence and structural composition or the presence of specific protein binding sites. The machine learning component allows the classification of different types of sequences based on subsamples of these indices, and can identify the best combinations of indices and machine learning architecture for sequence discrimination. Another key feature of GANN is the replicated splitting of data into training and test sets, and the implementation of negative controls. In validation experiments, GANN successfully merged important sequence and structural features to yield good predictive models for synthetic and real regulatory regions.

CONCLUSION

GANN is a flexible tool that can search through large sets of sequence and structural feature combinations to identify those that best characterize a set of sequences.

摘要

背景

迄今为止开发的众多基序检测算法主要集中在一级序列模式的检测上。由于依赖序列的DNA结构和灵活性也可能在蛋白质-DNA相互作用中发挥作用,因此还应考虑同时探索基于序列和结构的关于结合位点组成以及调控区域中特征顺序的假设。对结构特征的考虑需要开发能够处理除一级序列之外的数据类型的新检测工具。

结果

GANN(可在http://bioinformatics.org.au/gann获取)是一种用于检测DNA中保守特征的机器学习工具。该软件套件包含从平面文件中提取基因组DNA不同区域并将这些序列转换为反映序列和结构组成或特定蛋白质结合位点存在情况的索引的程序。机器学习组件允许根据这些索引的子样本对不同类型的序列进行分类,并可以识别用于序列区分的索引和机器学习架构的最佳组合。GANN的另一个关键特性是将数据重复拆分为训练集和测试集,并实施阴性对照。在验证实验中,GANN成功地融合了重要的序列和结构特征,为合成和真实调控区域生成了良好的预测模型。

结论

GANN是一种灵活的工具,它可以在大量的序列和结构特征组合中进行搜索,以识别那些最能表征一组序列的组合。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dc44/553964/d4b04e8093da/1471-2105-6-36-1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验