基于 PyFeat 和梯度提升决策树预测帕金森病相关基因。

Predicting Parkinson disease related genes based on PyFeat and gradient boosted decision tree.

机构信息

Information Technology Department, Faculty of Computers and Information, Mansoura University, Mansoura, 35516, Egypt.

出版信息

Sci Rep. 2022 Jun 15;12(1):10004. doi: 10.1038/s41598-022-14127-8.

DOI:10.1038/s41598-022-14127-8

PMID:35705654

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9200794/

Abstract

Identifying genes related to Parkinson's disease (PD) is an active research topic in biomedical analysis, which plays a critical role in diagnosis and treatment. Recently, many studies have proposed different techniques for predicting disease-related genes. However, a few of these techniques are designed or developed for PD gene prediction. Most of these PD techniques are developed to identify only protein genes and discard long noncoding (lncRNA) genes, which play an essential role in biological processes and the transformation and development of diseases. This paper proposes a novel prediction system to identify protein and lncRNA genes related to PD that can aid in an early diagnosis. First, we preprocessed the genes into DNA FASTA sequences from the University of California Santa Cruz (UCSC) genome browser and removed the redundancies. Second, we extracted some significant features of DNA FASTA sequences using the PyFeat method with the AdaBoost as feature selection. These selected features achieved promising results compared with extracted features from some state-of-the-art feature extraction techniques. Finally, the features were fed to the gradient-boosted decision tree (GBDT) to diagnose different tested cases. Seven performance metrics were used to evaluate the performance of the proposed system. The proposed system achieved an average accuracy of 78.6%, the area under the curve equals 84.5%, the area under precision-recall (AUPR) equals 85.3%, F1-score equals 78.3%, Matthews correlation coefficient (MCC) equals 0.575, sensitivity (SEN) equals 77.1%, and specificity (SPC) equals 80.2%. The experiments demonstrate promising results compared with other systems. The predicted top-rank protein and lncRNA genes are verified based on a literature review.

摘要

识别与帕金森病（PD）相关的基因是生物医学分析中的一个活跃研究课题，它在诊断和治疗中起着关键作用。最近，许多研究提出了不同的技术来预测疾病相关基因。然而，其中一些技术是专门为 PD 基因预测设计或开发的。这些 PD 技术中的大多数是为了识别仅与蛋白质基因有关而开发的，而忽略了在生物过程以及疾病的转化和发展中起着重要作用的长非编码（lncRNA）基因。本文提出了一种新的预测系统，用于识别与 PD 相关的蛋白质和 lncRNA 基因，以帮助进行早期诊断。首先，我们从加利福尼亚大学圣克鲁斯分校（UCSC）基因组浏览器中将基因预处理成 DNA FASTA 序列，并去除冗余。其次，我们使用 PyFeat 方法和 AdaBoost 作为特征选择来提取 DNA FASTA 序列的一些重要特征。与从一些最先进的特征提取技术中提取的特征相比，这些选择的特征取得了有希望的结果。最后，将特征输入梯度提升决策树（GBDT）以诊断不同的测试案例。使用七个性能指标来评估所提出系统的性能。所提出的系统实现了平均准确率为 78.6%，曲线下面积等于 84.5%，精度-召回率（AUPR）下面积等于 85.3%，F1 分数等于 78.3%，马修斯相关系数（MCC）等于 0.575，灵敏度（SEN）等于 77.1%，特异性（SPC）等于 80.2%。与其他系统相比，实验结果表明了有希望的结果。根据文献综述验证了预测的顶级蛋白质和 lncRNA 基因。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3704/9200794/55711b6ec6e7/41598_2022_14127_Fig1_HTML.jpg

相似文献

Predicting Parkinson disease related genes based on PyFeat and gradient boosted decision tree.基于 PyFeat 和梯度提升决策树预测帕金森病相关基因。

Sci Rep. 2022 Jun 15;12(1):10004. doi: 10.1038/s41598-022-14127-8.

LPI-deepGBDT: a multiple-layer deep framework based on gradient boosting decision trees for lncRNA-protein interaction identification.LPI-deepGBDT：基于梯度提升决策树的多层深度框架，用于 lncRNA-蛋白质相互作用识别。

BMC Bioinformatics. 2021 Oct 4;22(1):479. doi: 10.1186/s12859-021-04399-8.

Predicting Parkinson's disease using gradient boosting decision tree models with electroencephalography signals.使用梯度提升决策树模型和脑电图信号预测帕金森病

Parkinsonism Relat Disord. 2022 Feb;95:77-85. doi: 10.1016/j.parkreldis.2022.01.011. Epub 2022 Jan 15.

LPI-HyADBS: a hybrid framework for lncRNA-protein interaction prediction integrating feature selection and classification.LPI-HyADBS：一种集成特征选择和分类的 lncRNA-蛋白质相互作用预测的混合框架。

BMC Bioinformatics. 2021 Nov 26;22(1):568. doi: 10.1186/s12859-021-04485-x.

Predictors of rapid eye movement sleep behavior disorder in patients with Parkinson's disease based on random forest and decision tree.基于随机森林和决策树的帕金森病患者快速眼动睡眠行为障碍预测因素。

PLoS One. 2022 Jun 16;17(6):e0269392. doi: 10.1371/journal.pone.0269392. eCollection 2022.

A random forest based computational model for predicting novel lncRNA-disease associations.基于随机森林的计算模型预测新型 lncRNA-疾病关联。

BMC Bioinformatics. 2020 Mar 27;21(1):126. doi: 10.1186/s12859-020-3458-1.

Recognition of bovine milk somatic cells based on multi-feature extraction and a GBDT-AdaBoost fusion model.基于多特征提取和 GBDT-AdaBoost 融合模型的牛乳体细胞识别。

Math Biosci Eng. 2022 Apr 7;19(6):5850-5866. doi: 10.3934/mbe.2022274.

Predicting potential miRNA-disease associations by combining gradient boosting decision tree with logistic regression.通过梯度提升决策树与逻辑回归相结合来预测潜在的 miRNA-疾病关联。

Comput Biol Chem. 2020 Apr;85:107200. doi: 10.1016/j.compbiolchem.2020.107200. Epub 2020 Jan 28.

Comparison of supervised machine learning classification techniques in prediction of locoregional recurrences in early oral tongue cancer.比较早期口腔舌癌局部区域复发预测中监督机器学习分类技术。

Int J Med Inform. 2020 Apr;136:104068. doi: 10.1016/j.ijmedinf.2019.104068. Epub 2019 Dec 28.

gGATLDA: lncRNA-disease association prediction based on graph-level graph attention network.基于图级图注意力网络的 lncRNA-疾病关联预测

BMC Bioinformatics. 2022 Jan 4;23(1):11. doi: 10.1186/s12859-021-04548-z.

引用本文的文献

Cutting-edge AI tools revolutionizing scientific research in life sciences.前沿人工智能工具正在彻底改变生命科学领域的科学研究。

BioTechnologia (Pozn). 2025 Mar 31;106(1):77-102. doi: 10.5114/bta/200803. eCollection 2025.

Prediction of radiation-induced acute skin toxicity in breast cancer patients using data encapsulation screening and dose-gradient-based multi-region radiomics technique: A multicenter study.使用数据封装筛选和基于剂量梯度的多区域放射组学技术预测乳腺癌患者放射性急性皮肤毒性：一项多中心研究。

Front Oncol. 2022 Nov 10;12:1017435. doi: 10.3389/fonc.2022.1017435. eCollection 2022.

本文引用的文献

lncRNA-disease association prediction based on latent factor model and projection.基于潜在因子模型和投影的 lncRNA-疾病关联预测。

Sci Rep. 2021 Oct 7;11(1):19965. doi: 10.1038/s41598-021-99493-5.

In silico screening of ssDNA aptamer against Escherichia coli O157:H7: A machine learning and the Pseudo K-tuple nucleotide composition based approach.

Comput Biol Chem. 2021 Dec;95:107568. doi: 10.1016/j.compbiolchem.2021.107568. Epub 2021 Aug 27.

SubFeat: Feature subspacing ensemble classifier for function prediction of DNA, RNA and protein sequences.子特征：用于 DNA、RNA 和蛋白质序列功能预测的特征子空间集成分类器。

Comput Biol Chem. 2021 Jun;92:107489. doi: 10.1016/j.compbiolchem.2021.107489. Epub 2021 Apr 24.

Association between Sour Taste SNP -rs236514, Diet Quality and Mild Cognitive Impairment in an Elderly Cohort.酸味觉 SNP-rs236514 与饮食质量和老年队列中轻度认知障碍的关联。

Nutrients. 2021 Feb 24;13(3):719. doi: 10.3390/nu13030719.

Identification of LRRK2 missense variants in the accelerating medicines partnership Parkinson's disease cohort.在加速药物合作帕金森病队列中鉴定 LRRK2 错义变异。

Hum Mol Genet. 2021 Apr 30;30(6):454-466. doi: 10.1093/hmg/ddab058.

A scalable random walk with restart on heterogeneous networks with Apache Spark for ranking disease-related genes through type-II fuzzy data fusion.基于 Apache Spark 的异构网络上可扩展的随机游走与重启动算法，用于通过 II 型模糊数据融合对疾病相关基因进行排序。

J Biomed Inform. 2021 Mar;115:103688. doi: 10.1016/j.jbi.2021.103688. Epub 2021 Feb 2.

TNR Gene Mutation in Familial Parkinson's Disease: Possible Implications for Essential Tremor.家族性帕金森病中的TNR基因突变：对特发性震颤的潜在影响。

J Mov Disord. 2021 May;14(2):170-172. doi: 10.14802/jmd.20057. Epub 2020 Dec 7.

A novel CERNNE approach for predicting Parkinson's Disease-associated genes and brain regions based on multimodal imaging genetics data.基于多模态影像遗传学数据的新型 CERNNE 方法预测帕金森病相关基因和脑区。

Med Image Anal. 2021 Jan;67:101830. doi: 10.1016/j.media.2020.101830. Epub 2020 Oct 10.

LncRNA H19 Attenuates Apoptosis in MPTP-Induced Parkinson's Disease Through Regulating miR-585-3p/PIK3R3.长链非编码 RNA H19 通过调节 miR-585-3p/PIK3R3 减轻 MPTP 诱导的帕金森病中的细胞凋亡。

Neurochem Res. 2020 Jul;45(7):1700-1710. doi: 10.1007/s11064-020-03035-w. Epub 2020 Apr 30.

GBDTL2E: Predicting lncRNA-EF Associations Using Diffusion and HeteSim Features Based on a Heterogeneous Network.GBDTL2E：基于异质网络利用扩散和异质相似性特征预测长链非编码RNA与增强子RNA的关联

Front Genet. 2020 Apr 15;11:272. doi: 10.3389/fgene.2020.00272. eCollection 2020.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基于 PyFeat 和梯度提升决策树预测帕金森病相关基因。

Predicting Parkinson disease related genes based on PyFeat and gradient boosted decision tree.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献