Suppr超能文献

描绘机器学习元素在pre-miRNA检测中的影响。

Delineating the impact of machine learning elements in pre-microRNA detection.

作者信息

Saçar Demirci Müşerref Duygu, Allmer Jens

机构信息

Department of Molecular Biology and Genetics, Izmir Institute of Technology , Urla , Izmir , Turkey.

Department of Molecular Biology and Genetics, Izmir Institute of Technology, Urla, Izmir, Turkey; Bionia Incorporated, IZTEKGEB A8, Urla, Izmir, Turkey.

出版信息

PeerJ. 2017 Mar 29;5:e3131. doi: 10.7717/peerj.3131. eCollection 2017.

Abstract

Gene regulation modulates RNA expression via transcription factors. Post-transcriptional gene regulation in turn influences the amount of protein product through, for example, microRNAs (miRNAs). Experimental establishment of miRNAs and their effects is complicated and even futile when aiming to establish the entirety of miRNA target interactions. Therefore, computational approaches have been proposed. Many such tools rely on machine learning (ML) which involves example selection, feature extraction, model training, algorithm selection, and parameter optimization. Different ML algorithms have been used for model training on various example sets, more than 1,000 features describing pre-miRNAs have been proposed and different training and testing schemes have been used for model establishment. For pre-miRNA detection, negative examples cannot easily be established causing a problem for two class classification algorithms. There is also no consensus on what ML approach works best and, therefore, we set forth and established the impact of the different parts involved in ML on model performance. Furthermore, we established two new negative datasets and analyzed the impact of using them for training and testing. It was our aim to attach an order of importance to the parts involved in ML for pre-miRNA detection, but instead we found that all parts are intricately connected and their contributions cannot be easily untangled leading us to suggest that when attempting ML-based pre-miRNA detection many scenarios need to be explored.

摘要

基因调控通过转录因子调节RNA表达。转录后基因调控则通过例如微小RNA(miRNA)等影响蛋白质产物的数量。当旨在建立miRNA靶标相互作用的整体情况时,miRNA及其作用的实验确定是复杂的,甚至是徒劳的。因此,人们提出了计算方法。许多这样的工具依赖于机器学习(ML),这涉及示例选择、特征提取、模型训练、算法选择和参数优化。不同的ML算法已用于在各种示例集上进行模型训练,已经提出了1000多种描述前体miRNA的特征,并且使用了不同的训练和测试方案来建立模型。对于前体miRNA检测,负样本不容易确定,这给二类分类算法带来了问题。对于哪种ML方法效果最佳也没有共识,因此,我们阐述并确定了ML中不同部分对模型性能的影响。此外,我们建立了两个新的负样本数据集,并分析了将它们用于训练和测试的影响。我们的目的是确定ML中参与前体miRNA检测的各部分的重要性顺序,但相反,我们发现所有部分都错综复杂地联系在一起,它们的贡献不容易理清,这使我们建议在尝试基于ML的前体miRNA检测时,需要探索许多情况。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a7f1/5374968/8e29c5c405d5/peerj-05-3131-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验