Suppr超能文献

使用基于张量的正交多项式分析基因组数据及其在合成RNA中的应用。

Analyzing genomic data using tensor-based orthogonal polynomials with application to synthetic RNAs.

作者信息

Nafees Saba, Rice Sean H, Wakeman Catherine A

机构信息

Department of Biological Sciences, Texas Tech University, 2901 Main St, Lubbock, TX 79409, USA.

出版信息

NAR Genom Bioinform. 2020 Dec 11;2(4):lqaa101. doi: 10.1093/nargab/lqaa101. eCollection 2020 Dec.

Abstract

An important goal in molecular biology is to quantify both the patterns across a genomic sequence and the relationship between phenotype and underlying sequence. We propose a multivariate tensor-based orthogonal polynomial approach to characterize nucleotides or amino acids in a given sequence and map corresponding phenotypes onto the sequence space. We have applied this method to a previously published case of small transcription activating RNAs. Covariance patterns along the sequence showcased strong correlations between nucleotides at the ends of the sequence. However, when the phenotype is projected onto the sequence space, this pattern does not emerge. When doing second order analysis and quantifying the functional relationship between the phenotype and pairs of sites along the sequence, we identified sites with high regressions spread across the sequence, indicating potential intramolecular binding. In addition to quantifying interactions between different parts of a sequence, the method quantifies sequence-phenotype interactions at first and higher order levels. We discuss the strengths and constraints of the method and compare it to computational methods such as machine learning approaches. An accompanying command line tool to compute these polynomials is provided. We show proof of concept of this approach and demonstrate its potential application to other biological systems.

摘要

分子生物学的一个重要目标是量化基因组序列中的模式以及表型与潜在序列之间的关系。我们提出了一种基于多元张量的正交多项式方法,用于表征给定序列中的核苷酸或氨基酸,并将相应的表型映射到序列空间。我们已将此方法应用于先前发表的小转录激活RNA案例。沿着序列的协方差模式显示出序列末端核苷酸之间的强相关性。然而,当将表型投影到序列空间时,这种模式并未出现。在进行二阶分析并量化表型与序列上位点对之间的功能关系时,我们识别出具有高回归值的位点分布在整个序列中,表明存在潜在的分子内结合。除了量化序列不同部分之间的相互作用外,该方法还能在一阶和更高阶水平上量化序列-表型相互作用。我们讨论了该方法的优点和局限性,并将其与机器学习方法等计算方法进行了比较。提供了一个用于计算这些多项式的配套命令行工具。我们展示了该方法的概念验证,并证明了其在其他生物系统中的潜在应用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ae86/7731874/5a3f4082965f/lqaa101fig1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验