Philip Melcy, Chen Tyrone, Tyagi Sonika
School of Biological Sciences, Monash University, 25 Rainforest Walk, Clayton, VIC 3800, Australia.
Monash eResearch Centre, Monash University, Clayton, VIC 3800, Australia.
Noncoding RNA. 2021 Jun 8;7(2):33. doi: 10.3390/ncrna7020033.
Phenotypes are driven by regulated gene expression, which in turn are mediated by complex interactions between diverse biological molecules. Protein-DNA interactions such as histone and transcription factor binding are well studied, along with RNA-RNA interactions in short RNA silencing of genes. In contrast, lncRNA-protein interaction (LPI) mechanisms are comparatively unknown, likely directed by the difficulties in studying LPI. However, LPI are emerging as key interactions in epigenetic mechanisms, playing a role in development and disease. Their importance is further highlighted by their conservation across kingdoms. Hence, interest in LPI research is increasing. We therefore review the current state of the art in lncRNA-protein interactions. We specifically surveyed recent computational methods and databases which researchers can exploit for LPI investigation. We discovered that algorithm development is heavily reliant on a few generic databases containing curated LPI information. Additionally, these databases house information at gene-level as opposed to transcript-level annotations. We show that early methods predict LPI using molecular docking, have limited scope and are slow, creating a data processing bottleneck. Recently, machine learning has become the strategy of choice in LPI prediction, likely due to the rapid growth in machine learning infrastructure and expertise. While many of these methods have notable limitations, machine learning is expected to be the basis of modern LPI prediction algorithms.
表型由受调控的基因表达驱动,而基因表达又由多种生物分子之间的复杂相互作用介导。诸如组蛋白和转录因子结合等蛋白质 - DNA相互作用已得到充分研究,基因的短RNA沉默中的RNA - RNA相互作用也是如此。相比之下,长链非编码RNA - 蛋白质相互作用(LPI)机制相对未知,这可能是由于研究LPI存在困难。然而,LPI正在成为表观遗传机制中的关键相互作用,在发育和疾病中发挥作用。它们在不同物种间的保守性进一步凸显了其重要性。因此,对LPI研究的兴趣正在增加。我们因此综述了长链非编码RNA - 蛋白质相互作用的当前研究现状。我们特别调查了研究人员可用于LPI研究的最新计算方法和数据库。我们发现算法开发严重依赖于少数几个包含精心整理的LPI信息的通用数据库。此外,这些数据库存储的是基因水平的信息,而非转录本水平的注释。我们表明,早期方法使用分子对接预测LPI,范围有限且速度慢,造成了数据处理瓶颈。最近,机器学习已成为LPI预测中的首选策略,这可能是由于机器学习基础设施和专业知识的快速增长。虽然这些方法中有许多存在显著局限性,但机器学习有望成为现代LPI预测算法的基础。