Bioinformatics Section, School of Basic Medical Sciences, Southern Medical University, Guangzhou, China.
Network and Information Technology Center, Southern Medical University, Guangzhou, China.
Nat Protoc. 2019 Mar;14(3):795-818. doi: 10.1038/s41596-018-0115-5.
Abundant long, noncoding RNAs (lncRNAs) in mammals can bind to DNA sequences and recruit histone- and DNA-modifying enzymes to binding sites to epigenetically regulate target genes. However, most lncRNAs' binding motifs and target sites are unknown. The large numbers of lncRNAs and target sites in the whole genome make it infeasible to examine lncRNA binding to DNA purely experimentally. Here, we report a protocol for lncRNA/DNA-binding analysis that is built upon a database containing the GENCODE-annotated human and mouse lncRNAs, the orthologs of these lncRNAs in 17 mammals, and the genome sequences of the 17 mammals. Cross-species and genome-wide lncRNA/DNA-binding analysis begins with and is driven by database search. The predicted DNA-binding motifs and binding sites answer the general question of which lncRNAs may epigenetically regulate which genes, and can be used to identify potential sites for genome and epigenome editing. To use the protocol, preliminary knowledge of the base-pairing rules that guide the binding of noncoding RNAs to DNA to form triplexes, as well as the skills required to use the UCSC Genome Browser, are needed. A genome-wide prediction takes from 2 to 10 d, and the results are sent to users automatically by e-mail. The platform is updated continuously, making it possible to study more lncRNAs and larger genomic regions in less computational time.
哺乳动物中大量的长非编码 RNA(lncRNA)可以与 DNA 序列结合,并募集组蛋白和 DNA 修饰酶到结合位点,从而表观遗传调控靶基因。然而,大多数 lncRNA 的结合基序和靶位点是未知的。lncRNA 和靶位点在整个基因组中的数量众多,使得纯粹通过实验来检测 lncRNA 与 DNA 的结合变得不可行。在这里,我们报告了一种基于数据库的 lncRNA/DNA 结合分析的方案,该数据库包含了 GENCODE 注释的人类和小鼠 lncRNA、这些 lncRNA 在 17 种哺乳动物中的同源物,以及 17 种哺乳动物的基因组序列。跨物种和全基因组的 lncRNA/DNA 结合分析始于数据库搜索,并由数据库搜索驱动。预测的 DNA 结合基序和结合位点回答了哪些 lncRNA 可能表观遗传调控哪些基因的一般问题,并且可以用于鉴定基因组和表观基因组编辑的潜在位点。要使用该方案,需要预先了解指导非编码 RNA 与 DNA 形成三螺旋的碱基配对规则,以及使用 UCSC 基因组浏览器所需的技能。全基因组预测需要 2 到 10 天,结果会通过电子邮件自动发送给用户。该平台不断更新,使得在更短的计算时间内可以研究更多的 lncRNA 和更大的基因组区域成为可能。