Suppr超能文献

用于革兰氏阳性菌启动子预测的启动子序列数据集。

promoter sequences data set for promoter prediction in Gram-positive bacteria.

作者信息

Coelho Rafael Vieira, de Avila E Silva Scheila, Echeverrigaray Sergio, Delamare Ana Paula Longaray

机构信息

Rio Grande do Sul Federal Institute of Education, Science and Technology (IFRS), Farroupilha Campus, Farroupilha, RS, Brazil.

Biotechnology Institute, University of Caxias do Sul (UCS), Caxias do Sul, RS, Brazil.

出版信息

Data Brief. 2018 May 13;19:264-270. doi: 10.1016/j.dib.2018.05.025. eCollection 2018 Aug.

Abstract

This paper presents a prediction of promoters using a Support Vector Machine system. In the literature, there is a lack of information on Gram-positive bacterial promoter sequences compared to Gram-negative bacteria. Promoter sequence identification is essential for studying gene expression. Initially, we collected the genome sequence from the NCBI database, and promoters were identified by their sigma factors in the DBTBS database. We then grouped the promoters according to 15 factors in 2 domains, corresponding to sigma 54 and sigma 70 of Gram-negative bacteria. Based on these data we developed a script in Python to search for promoters in the genome. After processing the data, we obtained 767 promoter sequences for , most of which were recognized by sigma SigA. To validate the data we found, we developed a software package called BacSVM+, which receives promoters as input and returns the best combination of parameters in a LibSVM library to predict promoter regions in the bacteria used in the simulation. All data gathered as well as the BacSVM+ software is available for download at http://bacpp.bioinfoucs.com/rafael/Sigmas.zip.

摘要

本文介绍了一种使用支持向量机系统对启动子进行预测的方法。在文献中,与革兰氏阴性菌相比,革兰氏阳性菌启动子序列的信息较少。启动子序列的识别对于研究基因表达至关重要。最初,我们从NCBI数据库收集基因组序列,并在DBTBS数据库中通过其sigma因子识别启动子。然后,我们根据两个结构域中的15个因子对启动子进行分组,这两个结构域分别对应革兰氏阴性菌的sigma 54和sigma 70。基于这些数据,我们用Python编写了一个脚本,用于在基因组中搜索启动子。处理数据后,我们获得了767个启动子序列,其中大部分由sigma SigA识别。为了验证我们找到的数据,我们开发了一个名为BacSVM+的软件包,该软件包以启动子作为输入,并返回LibSVM库中的最佳参数组合,以预测模拟中使用的细菌中的启动子区域。所有收集的数据以及BacSVM+软件均可从http://bacpp.bioinfoucs.com/rafael/Sigmas.zip下载。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7f57/5993011/67bb3024f0b4/gr1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验