文献检索文档翻译深度研究
Suppr Zotero 插件Zotero 插件
邀请有礼套餐&价格历史记录

新学期,新优惠

限时优惠:9月1日-9月22日

30天高级会员仅需29元

1天体验卡首发特惠仅需5.99元

了解详情
不再提醒
插件&应用
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
高级版
套餐订阅购买积分包
AI 工具
文献检索文档翻译深度研究
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2025

一种基于混合矩特征建模的piRNA识别集成策略。

An ensemble strategy for piRNA identification through hybrid moment-based feature modeling.

作者信息

Rasheed Mansoor Ahmed, Alkhalifah Tamim, Alturise Fahad, Khan Yaser Daanial

机构信息

School of Systems and Technology, University of Management and Technology, Lahore, Pakistan.

Department of Computer Engineering, College of Computer, Buraydah, Saudi Arabia.

出版信息

Sci Rep. 2025 Aug 18;15(1):30157. doi: 10.1038/s41598-025-14194-7.


DOI:10.1038/s41598-025-14194-7
PMID:40820010
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12358548/
Abstract

This study aims to enhance the accuracy of predicting transposon-derived piRNAs through the development of a novel computational method namely TranspoPred. TranspoPred leverages positional, frequency, and moments-based features extracted from RNA sequences. By integrating multiple deep learning networks, the objective is to create a robust tool for forecasting transposon-derived piRNAs, thereby contributing to a deeper understanding of their biological functions and regulatory mechanisms. Piwi-interacting RNAs (piRNAs) are currently considered the most diverse and abundant class of small, non-coding RNA molecules. Such accurate instrumentation of transposon-associated piRNA tags can considerably involve the study of small ncRNAs and support the understanding of the gametogenesis process. First, a number of moments were adopted for the conversion of the primary sequences into feature vectors. Bagging, boosting, and stacking based ensemble classification approaches were employed during the study. Classifiers such as Random Forest (RF), Extra Trees (ET), and Decision Tree were utilized in the Bagging approach. The Boosting approach involved the use of XGBoost (XGB), AdaBoost, and Gradient Boost. For the Stacking method, base learners such as k-Nearest Neighbor (KNN), Support Vector Machine (SVM), Artificial Neural Network (ANN), and Decision Trees were employed, with a Neural Network (NN) serving as the meta-learner. The computational models underwent rigorous evaluation through 2 × 5-fold cross-validation, 10-fold cross-validation, and independent testing across datasets from three species: human, mouse, and Drosophila. The evaluation metrics used were Accuracy (ACC), Specificity (SP), Sensitivity (SN), and Matthew's Correlation Coefficient (MCC) along with F-1 measure. The ensemble methods consistently outperformed others in almost all testing scenarios. Notably, stacking achieved perfect scores for accuracy, specificity, sensitivity, and MCC in independent set testing for human and Drosophila datasets, and nearly perfect scores for the mouse dataset. Use of independent set testing accross species evaluates the generalizability and adaptability of the model for diverse data samples. The proposed method TranspoRed achieved exquisite results on diverse datasets for humans, mouse and Drosophila. Our methods exhibited superior performance compared to other state-of-the-art techniques for predicting transposon-derived piRNA. The proposed approaches show great potential for enhancing the accuracy of piRNA prediction, significantly aiding future research and the scientific community in the in-silico identification of piRNA. The source codes and datasets utilized in this study are accessible at https://github.com/MansoorAhmadRasheed/piRNA-codes-and-result .

摘要

本研究旨在通过开发一种名为TranspoPred的新型计算方法来提高转座子衍生piRNA预测的准确性。TranspoPred利用从RNA序列中提取的基于位置、频率和矩的特征。通过整合多个深度学习网络,目标是创建一个强大的工具来预测转座子衍生的piRNA,从而有助于更深入地了解它们的生物学功能和调控机制。Piwi相互作用RNA(piRNA)目前被认为是种类最多且最丰富的一类小的非编码RNA分子。对转座子相关piRNA标签进行如此精确的检测可极大地推动对小ncRNA的研究,并有助于理解配子发生过程。首先,采用了一些矩将原始序列转换为特征向量。在研究过程中采用了基于Bagging、Boosting和Stacking的集成分类方法。Bagging方法中使用了随机森林(RF)、极端随机树(ET)和决策树等分类器。Boosting方法涉及使用XGBoost(XGB)、AdaBoost和梯度提升。对于Stacking方法,使用了k近邻(KNN)、支持向量机(SVM)、人工神经网络(ANN)和决策树等基学习器,其中神经网络(NN)作为元学习器。通过2×5折交叉验证、10折交叉验证以及对来自人类、小鼠和果蝇三个物种的数据集进行独立测试,对计算模型进行了严格评估。使用的评估指标包括准确率(ACC)、特异性(SP)、灵敏度(SN)、马修斯相关系数(MCC)以及F1度量。在几乎所有测试场景中,集成方法始终优于其他方法。值得注意的是,Stacking在人类和果蝇数据集的独立集测试中,准确率、特异性、灵敏度和MCC均取得了满分,在小鼠数据集上也取得了近乎满分的成绩。跨物种使用独立集测试评估了模型对不同数据样本的通用性和适应性。所提出的方法TranspoRed在人类、小鼠和果蝇的不同数据集上都取得了出色的结果。与其他用于预测转座子衍生piRNA的最先进技术相比,我们的方法表现出卓越的性能。所提出的方法在提高piRNA预测准确性方面显示出巨大潜力,极大地有助于未来研究以及科学界在计算机上对piRNA进行识别。本研究中使用的源代码和数据集可在https://github.com/MansoorAhmadRasheed/piRNA-codes-and-result获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/47f5/12358548/6999adb0f576/41598_2025_14194_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/47f5/12358548/69d3ce13e6a1/41598_2025_14194_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/47f5/12358548/84e927c73dd6/41598_2025_14194_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/47f5/12358548/0f78e12af9ba/41598_2025_14194_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/47f5/12358548/ac2854859c80/41598_2025_14194_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/47f5/12358548/17f381544bdf/41598_2025_14194_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/47f5/12358548/7548b8e1af87/41598_2025_14194_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/47f5/12358548/0b1bf20a8b36/41598_2025_14194_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/47f5/12358548/e198ab6706bc/41598_2025_14194_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/47f5/12358548/6999adb0f576/41598_2025_14194_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/47f5/12358548/69d3ce13e6a1/41598_2025_14194_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/47f5/12358548/84e927c73dd6/41598_2025_14194_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/47f5/12358548/0f78e12af9ba/41598_2025_14194_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/47f5/12358548/ac2854859c80/41598_2025_14194_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/47f5/12358548/17f381544bdf/41598_2025_14194_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/47f5/12358548/7548b8e1af87/41598_2025_14194_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/47f5/12358548/0b1bf20a8b36/41598_2025_14194_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/47f5/12358548/e198ab6706bc/41598_2025_14194_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/47f5/12358548/6999adb0f576/41598_2025_14194_Fig9_HTML.jpg

相似文献

[1]
An ensemble strategy for piRNA identification through hybrid moment-based feature modeling.

Sci Rep. 2025-8-18

[2]
Supervised Machine Learning Models for Predicting Sepsis-Associated Liver Injury in Patients With Sepsis: Development and Validation Study Based on a Multicenter Cohort Study.

J Med Internet Res. 2025-5-26

[3]
Enhancing brain tumor classification by integrating radiomics and deep learning features: A comprehensive study utilizing ensemble methods on MRI scans.

J Xray Sci Technol. 2025-1

[4]
Proposal for Using AI to Assess Clinical Data Integrity and Generate Metadata: Algorithm Development and Validation.

JMIR Med Inform. 2025-6-30

[5]
Prescription of Controlled Substances: Benefits and Risks

2025-1

[6]
Optimizing machine learning methods for groundwater quality prediction: Case study in District Bagh, Azad Kashmir, Pakistan.

Ecotoxicol Environ Saf. 2025-9-1

[7]
Establishment and validation of an interactive artificial intelligence platform to predict postoperative ambulatory status for patients with metastatic spinal disease: a multicenter analysis.

Int J Surg. 2024-5-1

[8]
Autoencoder-Assisted Stacked Ensemble Learning for Lymphoma Subtype Classification: A Hybrid Deep Learning and Machine Learning Approach.

Tomography. 2025-8-18

[9]
Classification of finger movements through optimal EEG channel and feature selection.

Front Hum Neurosci. 2025-7-16

[10]
Development of Machine Learning-based Algorithms to Predict the 2- and 5-year Risk of TKA After Tibial Plateau Fracture Treatment.

Clin Orthop Relat Res. 2025-3-12

本文引用的文献

[1]
Identification of 6-methyladenosine sites using novel feature encoding methods and ensemble models.

Sci Rep. 2024-4-8

[2]
MHCLMDA: multihypergraph contrastive learning for miRNA-disease association prediction.

Brief Bioinform. 2023-11-22

[3]
Hemolytic-Pred: A machine learning-based predictor for hemolytic proteins using position and composition-based features.

Digit Health. 2023-7-5

[4]
iDHU-Ensem: Identification of dihydrouridine sites through ensemble learning models.

Digit Health. 2023-3-29

[5]
RCCC_Pred: A Novel Method for Sequence-Based Identification of Renal Clear Cell Carcinoma Genes through DNA Mutations and a Blend of Features.

Diagnostics (Basel). 2022-12-3

[6]
Deep Learning Approaches for Detection of Breast Adenocarcinoma Causing Carcinogenic Mutations.

Int J Mol Sci. 2022-9-29

[7]
A machine learning technique for identifying DNA enhancer regions utilizing CIS-regulatory element patterns.

Sci Rep. 2022-9-7

[8]
Machine learning techniques for identification of carcinogenic mutations, which cause breast adenocarcinoma.

Sci Rep. 2022-7-11

[9]
DNAPred_Prot: Identification of DNA-Binding Proteins Using Composition- and Position-Based Features.

Appl Bionics Biomech. 2022-4-13

[10]
LBCEPred: a machine learning model to predict linear B-cell epitopes.

Brief Bioinform. 2022-5-13

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

推荐工具

医学文档翻译智能文献检索