Collaborative Innovation Center of Henan Grain Crops, Henan Agricultural University, Zhengzhou 450046, China.
State Key Laboratory of Cotton Biology, Institute of Cotton Research of Chinese Academy of Agricultural Sciences (CAAS), Anyang 455000, China.
Nucleic Acids Res. 2021 Jun 4;49(10):e60. doi: 10.1093/nar/gkab122.
Sequence-based analysis and prediction are fundamental bioinformatic tasks that facilitate understanding of the sequence(-structure)-function paradigm for DNAs, RNAs and proteins. Rapid accumulation of sequences requires equally pervasive development of new predictive models, which depends on the availability of effective tools that support these efforts. We introduce iLearnPlus, the first machine-learning platform with graphical- and web-based interfaces for the construction of machine-learning pipelines for analysis and predictions using nucleic acid and protein sequences. iLearnPlus provides a comprehensive set of algorithms and automates sequence-based feature extraction and analysis, construction and deployment of models, assessment of predictive performance, statistical analysis, and data visualization; all without programming. iLearnPlus includes a wide range of feature sets which encode information from the input sequences and over twenty machine-learning algorithms that cover several deep-learning approaches, outnumbering the current solutions by a wide margin. Our solution caters to experienced bioinformaticians, given the broad range of options, and biologists with no programming background, given the point-and-click interface and easy-to-follow design process. We showcase iLearnPlus with two case studies concerning prediction of long noncoding RNAs (lncRNAs) from RNA transcripts and prediction of crotonylation sites in protein chains. iLearnPlus is an open-source platform available at https://github.com/Superzchen/iLearnPlus/ with the webserver at http://ilearnplus.erc.monash.edu/.
基于序列的分析和预测是基本的生物信息学任务,有助于理解 DNA、RNA 和蛋白质的序列(结构)-功能范例。序列的快速积累需要同样普遍的新型预测模型的发展,这取决于有效工具的可用性,这些工具支持这些努力。我们引入了 iLearnPlus,这是第一个具有图形和基于网络的界面的机器学习平台,用于构建使用核酸和蛋白质序列进行分析和预测的机器学习管道。iLearnPlus 提供了一套全面的算法,并实现了基于序列的特征提取和分析、模型的构建和部署、预测性能的评估、统计分析和数据可视化的自动化;所有这些都无需编程。iLearnPlus 包括广泛的特征集,这些特征集编码输入序列中的信息,以及二十多种机器学习算法,涵盖了几种深度学习方法,远远超过当前的解决方案。我们的解决方案迎合了具有广泛选择的经验丰富的生物信息学家,以及具有无编程背景的生物学家,因为它具有点击和点击接口以及易于遵循的设计过程。我们通过两个案例研究展示了 iLearnPlus,这两个案例研究涉及从 RNA 转录本预测长非编码 RNA(lncRNA)和预测蛋白质链中的巴豆酰化位点。iLearnPlus 是一个开源平台,可在 https://github.com/Superzchen/iLearnPlus/ 上获得,网络服务器可在 http://ilearnplus.erc.monash.edu/ 上获得。