一种用于大肠杆菌中转录因子结合位点识别的高阶表示与分类方法。

A high-order representation and classification method for transcription factor binding sites recognition in Escherichia coli.

作者信息

Sun Shiquan, Zhang Xiongpan, Peng Qinke

机构信息

Systems Engineering Institute, Xi'an Jiaotong University, 28 Xianning West Road, Xi'an, Shaanxi 710049, China; Department of Biostatistics, University of Michigan, 1415 Washington Heights, Ann Arbor, MI 48109, USA.

Systems Engineering Institute, Xi'an Jiaotong University, 28 Xianning West Road, Xi'an, Shaanxi 710049, China.

出版信息

Artif Intell Med. 2017 Jan;75:16-23. doi: 10.1016/j.artmed.2016.11.004. Epub 2016 Dec 1.

DOI:10.1016/j.artmed.2016.11.004

PMID:28363453

Abstract

BACKGROUND

Identifying transcription factors binding sites (TFBSs) plays an important role in understanding gene regulatory processes. The underlying mechanism of the specific binding for transcription factors (TFs) is still poorly understood. Previous machine learning-based approaches to identifying TFBSs commonly map a known TFBS to a one-dimensional vector using its physicochemical properties. However, when the dimension-sample rate is large (i.e., number of dimensions/number of samples), concatenating different physicochemical properties to a one-dimensional vector not only is likely to lose some structural information, but also poses significant challenges to recognition methods.

MATERIALS AND METHOD

In this paper, we introduce a purely geometric representation method, tensor (also called multidimensional array), to represent TFs using their physicochemical properties. Accompanying the multidimensional array representation, we also develop a tensor-based recognition method, tensor partial least squares classifier (abbreviated as TPLSC). Intuitively, multidimensional arrays enable borrowing more information than one-dimensional arrays. The performance of each method is evaluated by average F-measure on 51 Escherichia coli TFs from RegulonDB database.

RESULTS

In our first experiment, the results show that multiple nucleotide properties can obtain more power than dinucleotide properties. In the second experiment, the results demonstrate that our method can gain increased prediction power, roughly 33% improvements more than the best result from existing methods.

CONCLUSION

The representation method for TFs is an important step in TFBSs recognition. We illustrate the benefits of this representation on real data application via a series of experiments. This method can gain further insights into the mechanism of TF binding and be of great use for metabolic engineering applications.

摘要

背景

识别转录因子结合位点（TFBSs）在理解基因调控过程中起着重要作用。转录因子（TFs）特异性结合的潜在机制仍知之甚少。以前基于机器学习识别TFBSs的方法通常利用其物理化学性质将已知的TFBS映射到一维向量。然而，当维度-样本率较大时（即维度数/样本数），将不同的物理化学性质连接成一维向量不仅可能丢失一些结构信息，而且给识别方法带来重大挑战。

材料与方法

在本文中，我们引入一种纯几何表示方法——张量（也称为多维数组），利用其物理化学性质来表示TFs。伴随多维数组表示，我们还开发了一种基于张量的识别方法——张量偏最小二乘分类器（简称为TPLSC）。直观地说，多维数组能够比一维数组借用更多信息。每种方法的性能通过对来自RegulonDB数据库的51个大肠杆菌TFs的平均F值进行评估。

结果

在我们的第一个实验中，结果表明多个核苷酸性质比二核苷酸性质能获得更强的能力。在第二个实验中，结果表明我们的方法可以提高预测能力大约33%，比现有方法的最佳结果有显著提升。

结论

TFs的表示方法是TFBSs识别中的重要一步。我们通过一系列实验说明了这种表示方法在实际数据应用中的优势。该方法可以进一步深入了解TF结合机制，对代谢工程应用有很大帮助。

相似文献

A high-order representation and classification method for transcription factor binding sites recognition in Escherichia coli.一种用于大肠杆菌中转录因子结合位点识别的高阶表示与分类方法。

Artif Intell Med. 2017 Jan;75:16-23. doi: 10.1016/j.artmed.2016.11.004. Epub 2016 Dec 1.

Molecular and structural considerations of TF-DNA binding for the generation of biologically meaningful and accurate phylogenetic footprinting analysis: the LysR-type transcriptional regulator family as a study model.用于生成具有生物学意义和准确的系统发育足迹分析的TF-DNA结合的分子和结构考量：以LysR型转录调节因子家族作为研究模型

BMC Genomics. 2016 Aug 27;17(1):686. doi: 10.1186/s12864-016-3025-3.

Transcription factor binding sites detection by using alignment-based approach.基于比对的转录因子结合位点检测。

J Theor Biol. 2012 Jul 7;304:96-102. doi: 10.1016/j.jtbi.2012.03.039. Epub 2012 Apr 6.

Identifying cooperative transcription factors in yeast using multiple data sources.利用多种数据源鉴定酵母中的协同转录因子。

BMC Syst Biol. 2014;8 Suppl 5(Suppl 5):S2. doi: 10.1186/1752-0509-8-S5-S2. Epub 2014 Dec 12.

Transcription Factor Information System (TFIS): A Tool for Detection of Transcription Factor Binding Sites.转录因子信息系统（TFIS）：一种用于检测转录因子结合位点的工具。

Interdiscip Sci. 2017 Sep;9(3):378-391. doi: 10.1007/s12539-016-0168-5. Epub 2016 Apr 6.

BMC Syst Biol. 2014;8 Suppl 5(Suppl 5):S9. doi: 10.1186/1752-0509-8-S5-S9. Epub 2014 Dec 12.

Using sequence-specific chemical and structural properties of DNA to predict transcription factor binding sites.利用 DNA 的序列特异性化学和结构特性来预测转录因子结合位点。

PLoS Comput Biol. 2010 Nov 18;6(11):e1001007. doi: 10.1371/journal.pcbi.1001007.

Improved predictions of transcription factor binding sites using physicochemical features of DNA.利用 DNA 的理化特征提高转录因子结合位点的预测。

Nucleic Acids Res. 2012 Dec;40(22):e175. doi: 10.1093/nar/gks771. Epub 2012 Aug 25.

Quantitative modeling of transcription factor binding specificities using DNA shape.利用DNA形状对转录因子结合特异性进行定量建模。

Proc Natl Acad Sci U S A. 2015 Apr 14;112(15):4654-9. doi: 10.1073/pnas.1422023112. Epub 2015 Mar 9.

Predicting the classification of transcription factors by incorporating their binding site properties into a novel mode of Chou's pseudo amino acid composition.通过将转录因子的结合位点特性纳入周氏伪氨基酸组成的新模式来预测转录因子的分类。

Protein Pept Lett. 2012 Nov;19(11):1170-6. doi: 10.2174/092986612803217088.

引用本文的文献

Development of Supervised Learning Predictive Models for Highly Non-linear Biological, Biomedical, and General Datasets.针对高度非线性的生物学、生物医学及通用数据集的监督学习预测模型的开发。

Front Mol Biosci. 2020 Feb 13;7:13. doi: 10.3389/fmolb.2020.00013. eCollection 2020.

Higher-order partial least squares for predicting gene expression levels from chromatin states.基于高阶偏最小二乘法的染色质状态预测基因表达水平。

BMC Bioinformatics. 2018 Apr 11;19(Suppl 5):113. doi: 10.1186/s12859-018-2100-y.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

一种用于大肠杆菌中转录因子结合位点识别的高阶表示与分类方法。

A high-order representation and classification method for transcription factor binding sites recognition in Escherichia coli.

作者信息

机构信息

出版信息

BACKGROUND

MATERIALS AND METHOD

RESULTS

CONCLUSION

背景

材料与方法

结果

结论

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献