使用断点和分类算法检测和识别顺式调控元件。
Detection and identification of cis-regulatory elements using change-point and classification algorithms.
机构信息
School of Mathematics and Statistics, The University of Melbourne, Melbourne, 3010, VIC, Australia.
School of Mathematics, Monash University, Melbourne, 3800, VIC, Australia.
出版信息
BMC Genomics. 2022 Jan 25;23(1):78. doi: 10.1186/s12864-021-08190-0.
BACKGROUND
Transcriptional regulation is primarily mediated by the binding of factors to non-coding regions in DNA. Identification of these binding regions enhances understanding of tissue formation and potentially facilitates the development of gene therapies. However, successful identification of binding regions is made difficult by the lack of a universal biological code for their characterisation.
RESULTS
We extend an alignment-based method, changept, and identify clusters of biological significance, through ontology and de novo motif analysis. Further, we apply a Bayesian method to estimate and combine binary classifiers on the clusters we identify to produce a better performing composite.
CONCLUSIONS
The analysis we describe provides a computational method for identification of conserved binding sites in the human genome and facilitates an alternative interrogation of combinations of existing data sets with alignment data.
背景
转录调控主要通过因子与 DNA 中非编码区域的结合来介导。这些结合区域的鉴定增强了对组织形成的理解,并可能促进基因治疗的发展。然而,由于缺乏用于其特征描述的通用生物学代码,成功鉴定结合区域变得困难。
结果
我们通过本体论和从头 motif 分析扩展了基于比对的方法 changept,并识别出具有生物学意义的聚类。此外,我们应用贝叶斯方法对我们识别的聚类进行二进制分类器的估计和组合,以生成性能更好的组合。
结论
我们描述的分析提供了一种在人类基因组中识别保守结合位点的计算方法,并促进了与比对数据组合的现有数据集的替代查询。
相似文献
Bioinformatics. 2009-1-15
Methods Mol Biol. 2010
BMC Bioinformatics. 2012-2-7
Curr Protoc Bioinformatics. 2007-9
Proc IEEE Comput Soc Bioinform Conf. 2003
BMC Bioinformatics. 2009-3-10
引用本文的文献
本文引用的文献
Nucleic Acids Res. 2018-7-2
Nat Rev Mol Cell Biol. 2018-7
Biomed Pharmacother. 2018-2-16
Cell. 2018-2-8
Trends Genet. 2018-1-19
Genome Res. 2017-11-15