Suppr超能文献

机器学习和统计学为古菌启动子注释开辟了一条新途径。

Machine learning and statistics shape a novel path in archaeal promoter annotation.

机构信息

Programa de Pós-Graduação em Biotecnologia, Universidade de Caxias do Sul, Av. Francisco Getúlio Vargas, 1130, Caxias do Sul, RS, CEP 95070-560, Brazil.

Instituto de Investigaciones en Matemáticas Aplicadas y en Sistemas, Universidad Nacional Autónoma de México, Unidad Académica de Yucatán, Yucatán, Mérida, Mexico.

出版信息

BMC Bioinformatics. 2022 May 10;23(1):171. doi: 10.1186/s12859-022-04714-x.

Abstract

BACKGROUND

Archaea are a vast and unexplored domain. Bioinformatic techniques might enlighten the path to a higher quality genome annotation in varied organisms. Promoter sequences of archaea have the action of a plethora of proteins upon it. The conservation found in a structural level of the binding site of proteins such as TBP, TFB, and TFE aids RNAP-DNA stabilization and makes the archaeal promoter prone to be explored by statistical and machine learning techniques.

RESULTS AND DISCUSSIONS

In this study, experimentally verified promoter sequences of the organisms Haloferax volcanii, Sulfolobus solfataricus, and Thermococcus kodakarensis were converted into DNA duplex stability attributes (i.e. numerical variables) and were classified through Artificial Neural Networks and an in-house statistical method of classification, being tested with three forms of controls. The recognition of these promoters enabled its use to validate unannotated promoter sequences in other organisms. As a result, the binding site of basal transcription factors was located through a DNA duplex stability codification. Additionally, the classification presented satisfactory results (above 90%) among varied levels of control.

CONCLUDING REMARKS

The classification models were employed to perform genomic annotation into the archaea Aciduliprofundum boonei and Thermofilum pendens, from which potential promoters have been identified and uploaded into public repositories.

摘要

背景

古菌是一个广阔而尚未探索的领域。生物信息学技术可能会为不同生物体的高质量基因组注释开辟道路。古菌启动子序列上有大量蛋白质的作用。TBP、TFB 和 TFE 等蛋白质结合位点在结构水平上的保守性有助于 RNAP-DNA 的稳定,并使古菌启动子易于通过统计和机器学习技术进行探索。

结果与讨论

在这项研究中,对生物体 Haloferax volcanii、Sulfolobus solfataricus 和 Thermococcus kodakarensis 的实验验证启动子序列被转化为 DNA 双链体稳定性属性(即数值变量),并通过人工神经网络和内部分类统计方法进行分类,使用三种形式的对照进行测试。这些启动子的识别使其能够用于验证其他生物体中未注释的启动子序列。结果,通过 DNA 双链体稳定性编码找到了基本转录因子的结合位点。此外,分类在不同水平的对照中均取得了令人满意的结果(超过 90%)。

结论

分类模型被用于对古菌 Aciduliprofundum boonei 和 Thermofilum pendens 进行基因组注释,从中鉴定出潜在的启动子并上传到公共存储库中。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验