• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

AlzGenPred - 基于CatBoost的基因分类器,用于利用高通量测序数据预测阿尔茨海默病。

AlzGenPred - CatBoost-based gene classifier for predicting Alzheimer's disease using high-throughput sequencing data.

作者信息

Shukla Rohit, Singh Tiratha Raj

机构信息

Department of Biotechnology and Bioinformatics, Jaypee University of Information Technology (JUIT), Waknaghat, Solan, 173234, H.P., India.

Center of Excellence for Aging and Brain Repair, Morsani College of Medicine, University of South Florida, Tampa, 33613, FL, USA.

出版信息

Sci Rep. 2024 Dec 5;14(1):30294. doi: 10.1038/s41598-024-82208-x.

DOI:10.1038/s41598-024-82208-x
PMID:39639110
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11621786/
Abstract

AD is a progressive neurodegenerative disorder characterized by memory loss. Due to the advancement in next-generation sequencing, an enormous amount of AD-associated genomics data is available. However, the information about the involvement of these genes in AD association is still a research topic. Therefore, AlzGenPred is developed to identify the AD-associated genes using machine-learning. A total of 13,504 features derived from eight sequence-encoding schemes were generated and evaluated using 16 machine learning algorithms. Network-based features significantly outperformed sequence-based features, effectively distinguishing AD-associated genes. In contrast, sequence-based features failed to classify accurately. To improve performance, we generated 24 fused features (6020 D) from sequence-based encodings, increasing accuracy by 5-7% using a two-step lightGBM-based recursive feature selection method. However, accuracy remained below 70% even after hyperparameter tuning. Therefore, network-based features were used to generate the CatBoost-based ML method AlzGenPred with 96.55% accuracy and 98.99% AUROC. The developed method is tested on the AlzGene dataset where it showed 96.43% accuracy. Then the model was validated using the transcriptomics dataset. AlzGenPred provides a reliable and user-friendly tool for identifying potential AD biomarkers, accelerating biomarker discovery, and advancing our understanding of AD. It is available at https://www.bioinfoindia.org/alzgenpred/ and https://github.com/shuklarohit815/AlzGenPred .

摘要

阿尔茨海默病(AD)是一种以记忆丧失为特征的进行性神经退行性疾病。由于下一代测序技术的进步,大量与AD相关的基因组学数据得以获取。然而,这些基因在AD关联中的作用信息仍是一个研究课题。因此,开发了AlzGenPred来使用机器学习识别与AD相关的基因。从八种序列编码方案中总共生成了13504个特征,并使用16种机器学习算法进行了评估。基于网络的特征显著优于基于序列的特征,能有效区分与AD相关的基因。相比之下,基于序列的特征未能准确分类。为了提高性能,我们从基于序列的编码中生成了24个融合特征(6020维),使用基于lightGBM的两步递归特征选择方法,准确率提高了5 - 7%。然而,即使经过超参数调整,准确率仍低于70%。因此,基于网络的特征被用于生成基于CatBoost的机器学习方法AlzGenPred,其准确率为96.55%,曲线下面积(AUROC)为98.99%。所开发的方法在AlzGene数据集上进行了测试,显示准确率为96.43%。然后使用转录组学数据集对模型进行了验证。AlzGenPred为识别潜在的AD生物标志物、加速生物标志物发现以及增进我们对AD的理解提供了一个可靠且用户友好的工具。它可在https://www.bioinfoindia.org/alzgenpred/ 和https://github.com/shuklarohit815/AlzGenPred 获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8c5f/11621786/397896f50fc1/41598_2024_82208_Fig11_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8c5f/11621786/c90ef8d422aa/41598_2024_82208_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8c5f/11621786/598a5c0142a4/41598_2024_82208_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8c5f/11621786/ddce025545eb/41598_2024_82208_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8c5f/11621786/a13522e88d4b/41598_2024_82208_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8c5f/11621786/f3b510438c16/41598_2024_82208_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8c5f/11621786/f2476f43ade2/41598_2024_82208_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8c5f/11621786/e4b48d915e90/41598_2024_82208_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8c5f/11621786/696fcc3c0b6c/41598_2024_82208_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8c5f/11621786/68f52fc97b71/41598_2024_82208_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8c5f/11621786/d2e538a9ada8/41598_2024_82208_Fig10_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8c5f/11621786/397896f50fc1/41598_2024_82208_Fig11_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8c5f/11621786/c90ef8d422aa/41598_2024_82208_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8c5f/11621786/598a5c0142a4/41598_2024_82208_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8c5f/11621786/ddce025545eb/41598_2024_82208_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8c5f/11621786/a13522e88d4b/41598_2024_82208_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8c5f/11621786/f3b510438c16/41598_2024_82208_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8c5f/11621786/f2476f43ade2/41598_2024_82208_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8c5f/11621786/e4b48d915e90/41598_2024_82208_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8c5f/11621786/696fcc3c0b6c/41598_2024_82208_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8c5f/11621786/68f52fc97b71/41598_2024_82208_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8c5f/11621786/d2e538a9ada8/41598_2024_82208_Fig10_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8c5f/11621786/397896f50fc1/41598_2024_82208_Fig11_HTML.jpg

相似文献

1
AlzGenPred - CatBoost-based gene classifier for predicting Alzheimer's disease using high-throughput sequencing data.AlzGenPred - 基于CatBoost的基因分类器,用于利用高通量测序数据预测阿尔茨海默病。
Sci Rep. 2024 Dec 5;14(1):30294. doi: 10.1038/s41598-024-82208-x.
2
Deciphering the role of lipid metabolism-related genes in Alzheimer's disease: a machine learning approach integrating Traditional Chinese Medicine.解析脂质代谢相关基因在阿尔茨海默病中的作用:一种整合中医的机器学习方法。
Front Endocrinol (Lausanne). 2024 Oct 23;15:1448119. doi: 10.3389/fendo.2024.1448119. eCollection 2024.
3
AITeQ: a machine learning framework for Alzheimer's prediction using a distinctive five-gene signature.AITeQ:使用独特的五基因特征进行阿尔茨海默病预测的机器学习框架。
Brief Bioinform. 2024 May 23;25(4). doi: 10.1093/bib/bbae291.
4
Integrating network, sequence and functional features using machine learning approaches towards identification of novel Alzheimer genes.使用机器学习方法整合网络、序列和功能特征以鉴定新型阿尔茨海默病基因。
BMC Genomics. 2016 Oct 18;17(1):807. doi: 10.1186/s12864-016-3108-1.
5
Graph Convolutional Network for AD and MCI Diagnosis Utilizing Peripheral DNA Methylation: Réseau de neurones en graphes pour le diagnostic de la MA et du TCL à l'aide de la méthylation de l'ADN périphérique.利用外周血DNA甲基化的阿尔茨海默病和轻度认知障碍诊断的图卷积网络:使用外周血DNA甲基化进行阿尔茨海默病和轻度认知障碍诊断的图神经网络
Can J Psychiatry. 2024 Dec;69(12):869-879. doi: 10.1177/07067437241300947. Epub 2024 Nov 25.
6
seqQscorer: automated quality control of next-generation sequencing data using machine learning.seqQscorer:使用机器学习进行下一代测序数据的自动化质量控制。
Genome Biol. 2021 Mar 5;22(1):75. doi: 10.1186/s13059-021-02294-2.
7
hdWGCNA and Cellular Communication Identify Active NK Cell Subtypes in Alzheimer's Disease and Screen for Diagnostic Markers through Machine Learning.hdWGCNA 和细胞通讯通过机器学习鉴定阿尔茨海默病中活跃的 NK 细胞亚型并筛选诊断标志物。
Curr Alzheimer Res. 2024;21(2):120-140. doi: 10.2174/0115672050314171240527064514.
8
Glypred: Lysine Glycation Site Prediction via CCU-LightGBM-BiLSTM Framework with Multi-Head Attention Mechanism.Glypred:基于 CCU-LightGBM-BiLSTM 框架与多头注意力机制的赖氨酸糖基化位点预测
J Chem Inf Model. 2024 Aug 26;64(16):6699-6711. doi: 10.1021/acs.jcim.4c01034. Epub 2024 Aug 9.
9
Interpretable machine learning-driven biomarker identification and validation for Alzheimer's disease.可解释的机器学习驱动的阿尔茨海默病生物标志物识别与验证
Sci Rep. 2024 Dec 28;14(1):30770. doi: 10.1038/s41598-024-80401-6.
10
VEPAD - Predicting the effect of variants associated with Alzheimer's disease using machine learning.VEPAD - 使用机器学习预测与阿尔茨海默病相关变异的影响。
Comput Biol Med. 2020 Sep;124:103933. doi: 10.1016/j.compbiomed.2020.103933. Epub 2020 Aug 5.

本文引用的文献

1
Integrative Graph-Based Framework for Predicting circRNA Drug Resistance Using Disease Contextualization and Deep Learning.基于整合图谱的框架,利用疾病情境化和深度学习预测环状RNA耐药性
IEEE J Biomed Health Inform. 2024 Sep 10;PP. doi: 10.1109/JBHI.2024.3457271.
2
PMTPred: machine-learning-based prediction of protein methyltransferases using the composition of k-spaced amino acid pairs.PMTPred:基于k间隔氨基酸对组成的蛋白质甲基转移酶的机器学习预测
Mol Divers. 2024 Aug;28(4):2301-2315. doi: 10.1007/s11030-024-10937-2. Epub 2024 Jul 21.
3
An explainable machine learning approach for Alzheimer's disease classification.
基于可解释机器学习的阿尔茨海默病分类方法。
Sci Rep. 2024 Feb 1;14(1):2637. doi: 10.1038/s41598-024-51985-w.
4
Compilation of reported protein changes in the brain in Alzheimer's disease.阿尔茨海默病脑内报道的蛋白质变化汇编。
Nat Commun. 2023 Jul 25;14(1):4466. doi: 10.1038/s41467-023-40208-x.
5
Mechanisms of circRNA/lncRNA-miRNA interactions and applications in disease and drug research.circRNA/lncRNA-miRNA 相互作用的机制及其在疾病和药物研究中的应用。
Biomed Pharmacother. 2023 Jun;162:114672. doi: 10.1016/j.biopha.2023.114672. Epub 2023 Apr 13.
6
Exploiting machine learning models to identify novel Alzheimer's disease biomarkers and potential targets.利用机器学习模型识别新型阿尔茨海默病生物标志物和潜在靶点。
Sci Rep. 2023 Mar 27;13(1):4979. doi: 10.1038/s41598-023-30904-5.
7
Collaborative deep learning improves disease-related circRNA prediction based on multi-source functional information.基于多源功能信息的协作深度学习改进疾病相关环状RNA预测
Brief Bioinform. 2023 Mar 19;24(2). doi: 10.1093/bib/bbad069.
8
Network-based approaches for modeling disease regulation and progression.基于网络的疾病调控与进展建模方法。
Comput Struct Biotechnol J. 2022 Dec 16;21:780-795. doi: 10.1016/j.csbj.2022.12.022. eCollection 2023.
9
G-quadruplexes originating from evolutionary conserved L1 elements interfere with neuronal gene expression in Alzheimer's disease.G-四链体起源于进化上保守的 L1 元件,干扰阿尔茨海默病中的神经元基因表达。
Nat Commun. 2021 Mar 23;12(1):1828. doi: 10.1038/s41467-021-22129-9.
10
prPred: A Predictor to Identify Plant Resistance Proteins by Incorporating k-Spaced Amino Acid (Group) Pairs.prPred:一种通过整合k间隔氨基酸(组)对来识别植物抗性蛋白的预测工具。
Front Bioeng Biotechnol. 2021 Jan 21;8:645520. doi: 10.3389/fbioe.2020.645520. eCollection 2020.