Suppr超能文献

植物长链非编码RNA分析流程(Plant-LncPipe):一种显著提升植物长链非编码RNA识别能力的计算流程。

Plant-LncPipe: a computational pipeline providing significant improvement in plant lncRNA identification.

作者信息

Tian Xue-Chan, Chen Zhao-Yang, Nie Shuai, Shi Tian-Le, Yan Xue-Mei, Bao Yu-Tao, Li Zhi-Chao, Ma Hai-Yao, Jia Kai-Hua, Zhao Wei, Mao Jian-Feng

机构信息

State Key Laboratory of Tree Genetics and Breeding, National Engineering Research Center of Tree Breeding and Ecological Restoration, Beijing Advanced Innovation Center for Tree Breeding by Molecular Design, National Engineering Laboratory for Tree Breeding, Key Laboratory of Genetics and Breeding in Forest Trees and Ornamental Plants, Ministry of Education, College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, China.

Rice Research Institute, Guangdong Academy of Agricultural Sciences & Key Laboratory of Genetics and Breeding of High Quality Rice in Southern China (Co-construction by Ministry and Province), Ministry of Agriculture and Rural Affairs & Guangdong Key Laboratory of New Technology in Rice Breeding, Guangzhou 510640, China.

出版信息

Hortic Res. 2024 Feb 8;11(4):uhae041. doi: 10.1093/hr/uhae041. eCollection 2024 Apr.

Abstract

Long non-coding RNAs (lncRNAs) play essential roles in various biological processes, such as chromatin remodeling, post-transcriptional regulation, and epigenetic modifications. Despite their critical functions in regulating plant growth, root development, and seed dormancy, the identification of plant lncRNAs remains a challenge due to the scarcity of specific and extensively tested identification methods. Most mainstream machine learning-based methods used for plant lncRNA identification were initially developed using human or other animal datasets, and their accuracy and effectiveness in predicting plant lncRNAs have not been fully evaluated or exploited. To overcome this limitation, we retrained several models, including CPAT, PLEK, and LncFinder, using plant datasets and compared their performance with mainstream lncRNA prediction tools such as CPC2, CNCI, RNAplonc, and LncADeep. Retraining these models significantly improved their performance, and two of the retrained models, LncFinder-plant and CPAT-plant, alongside their ensemble, emerged as the most suitable tools for plant lncRNA identification. This underscores the importance of model retraining in tackling the challenges associated with plant lncRNA identification. Finally, we developed a pipeline (Plant-LncPipe) that incorporates an ensemble of the two best-performing models and covers the entire data analysis process, including reads mapping, transcript assembly, lncRNA identification, classification, and origin, for the efficient identification of lncRNAs in plants. The pipeline, Plant-LncPipe, is available at: https://github.com/xuechantian/Plant-LncRNA-pipline.

摘要

长链非编码RNA(lncRNAs)在各种生物学过程中发挥着重要作用,如染色质重塑、转录后调控和表观遗传修饰。尽管它们在调节植物生长、根系发育和种子休眠方面具有关键功能,但由于缺乏特异性且经过广泛测试的鉴定方法,植物lncRNAs的鉴定仍然是一项挑战。大多数用于植物lncRNA鉴定的基于机器学习的主流方法最初是使用人类或其他动物数据集开发的,它们在预测植物lncRNAs方面的准确性和有效性尚未得到充分评估或利用。为了克服这一局限性,我们使用植物数据集对包括CPAT、PLEK和LncFinder在内的几种模型进行了重新训练,并将它们的性能与CPC2、CNCI、RNAplonc和LncADeep等主流lncRNA预测工具进行了比较。对这些模型进行重新训练显著提高了它们的性能,重新训练后的两个模型LncFinder-plant和CPAT-plant及其集成模型成为了最适合植物lncRNA鉴定的工具。这突出了模型重新训练在应对植物lncRNA鉴定相关挑战中的重要性。最后,我们开发了一个流程(Plant-LncPipe),该流程整合了两个性能最佳的模型,并涵盖了整个数据分析过程,包括 reads 映射、转录本组装、lncRNA鉴定、分类和起源,用于高效鉴定植物中的lncRNAs。该流程Plant-LncPipe可在以下网址获取:https://github.com/xuechantian/Plant-LncRNA-pipline

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f034/11024640/eeb3ea2aaadf/uhae041f1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验