• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

Scuba:可扩展的基于内核的基因优先级排序。

Scuba: scalable kernel-based gene prioritization.

机构信息

CRIBI Biotechnology Center, University of Padova, viale G. Colombo, 3, Padova, Italy.

Department of Women's and Children's Health, University of Padova, via Giustiniani, 3, Padova, Italy.

出版信息

BMC Bioinformatics. 2018 Jan 25;19(1):23. doi: 10.1186/s12859-018-2025-5.

DOI:10.1186/s12859-018-2025-5
PMID:29370760
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5785908/
Abstract

BACKGROUND

The uncovering of genes linked to human diseases is a pressing challenge in molecular biology and precision medicine. This task is often hindered by the large number of candidate genes and by the heterogeneity of the available information. Computational methods for the prioritization of candidate genes can help to cope with these problems. In particular, kernel-based methods are a powerful resource for the integration of heterogeneous biological knowledge, however, their practical implementation is often precluded by their limited scalability.

RESULTS

We propose Scuba, a scalable kernel-based method for gene prioritization. It implements a novel multiple kernel learning approach, based on a semi-supervised perspective and on the optimization of the margin distribution. Scuba is optimized to cope with strongly unbalanced settings where known disease genes are few and large scale predictions are required. Importantly, it is able to efficiently deal both with a large amount of candidate genes and with an arbitrary number of data sources. As a direct consequence of scalability, Scuba integrates also a new efficient strategy to select optimal kernel parameters for each data source. We performed cross-validation experiments and simulated a realistic usage setting, showing that Scuba outperforms a wide range of state-of-the-art methods.

CONCLUSIONS

Scuba achieves state-of-the-art performance and has enhanced scalability compared to existing kernel-based approaches for genomic data. This method can be useful to prioritize candidate genes, particularly when their number is large or when input data is highly heterogeneous. The code is freely available at https://github.com/gzampieri/Scuba .

摘要

背景

揭示与人类疾病相关的基因是分子生物学和精准医学的一项紧迫挑战。这项任务常常受到大量候选基因和可用信息异质性的阻碍。候选基因优先级排序的计算方法有助于解决这些问题。特别是,基于核的方法是整合异构生物学知识的强大资源,然而,由于其有限的可扩展性,其实际实施常常受到限制。

结果

我们提出了 Scuba,一种用于基因优先级排序的可扩展基于核的方法。它实现了一种新颖的基于半监督视角和边缘分布优化的多核学习方法。Scuba 经过优化,可用于处理强不平衡的情况,即已知疾病基因较少且需要大规模预测的情况。重要的是,它能够有效地处理大量的候选基因和任意数量的数据源。作为可扩展性的直接结果,Scuba 还集成了一种新的有效策略,用于为每个数据源选择最佳核参数。我们进行了交叉验证实验,并模拟了一个现实的使用场景,结果表明 Scuba 优于广泛的最新方法。

结论

与用于基因组数据的现有基于核的方法相比,Scuba 实现了最先进的性能和增强的可扩展性。当候选基因数量较大或输入数据高度异质时,该方法可用于优先考虑候选基因。该代码可在 https://github.com/gzampieri/Scuba 上免费获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a003/5785908/4ae6aedd2712/12859_2018_2025_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a003/5785908/4ae6aedd2712/12859_2018_2025_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a003/5785908/4ae6aedd2712/12859_2018_2025_Fig1_HTML.jpg

相似文献

1
Scuba: scalable kernel-based gene prioritization.Scuba:可扩展的基于内核的基因优先级排序。
BMC Bioinformatics. 2018 Jan 25;19(1):23. doi: 10.1186/s12859-018-2025-5.
2
BayesKAT: bayesian optimal kernel-based test for genetic association studies reveals joint genetic effects in complex diseases.贝叶斯KAT:用于基因关联研究的基于贝叶斯最优核的检验揭示复杂疾病中的联合基因效应。
Brief Bioinform. 2024 Mar 27;25(3). doi: 10.1093/bib/bbae182.
3
DR2DI: a powerful computational tool for predicting novel drug-disease associations.DR2DI:一个强大的计算工具,用于预测新的药物-疾病关联。
J Comput Aided Mol Des. 2018 May;32(5):633-642. doi: 10.1007/s10822-018-0117-y. Epub 2018 Apr 23.
4
ProDiGe: Prioritization Of Disease Genes with multitask machine learning from positive and unlabeled examples.ProDiGe:基于多任务机器学习的正例和无标签数据的疾病基因优先级排序。
BMC Bioinformatics. 2011 Oct 6;12:389. doi: 10.1186/1471-2105-12-389.
5
Fast and interpretable genomic data analysis using multiple approximate kernel learning.使用多种近似核学习进行快速且可解释的基因组数据分析。
Bioinformatics. 2022 Jun 24;38(Suppl 1):i77-i83. doi: 10.1093/bioinformatics/btac241.
6
Laplacian embedded regression for scalable manifold regularization.拉普拉斯嵌入回归的可扩展流形正则化。
IEEE Trans Neural Netw Learn Syst. 2012 Jun;23(6):902-15. doi: 10.1109/TNNLS.2012.2190420.
7
Prioritizing disease genes with an improved dual label propagation framework.利用改进的双重标签传播框架优先考虑疾病基因。
BMC Bioinformatics. 2018 Feb 8;19(1):47. doi: 10.1186/s12859-018-2040-6.
8
Heterogeneous networks integration for disease-gene prioritization with node kernels.基于节点核的疾病基因优先级推断的异质网络整合。
Bioinformatics. 2020 May 1;36(9):2649-2656. doi: 10.1093/bioinformatics/btaa008.
9
Candidate gene prioritization with Endeavour.使用Endeavour进行候选基因优先级排序。
Nucleic Acids Res. 2016 Jul 8;44(W1):W117-21. doi: 10.1093/nar/gkw365. Epub 2016 Apr 30.
10
KDSNP: A kernel-based approach to detecting high-order SNP interactions.KDSNP:一种基于核的高阶单核苷酸多态性相互作用检测方法。
J Bioinform Comput Biol. 2016 Oct;14(5):1644003. doi: 10.1142/S0219720016440030.

引用本文的文献

1
A publication-wide association study (PWAS), historical language models to prioritise novel therapeutic drug targets.全领域关联研究(PWAS),历史语言模型,以确定新型治疗药物靶点的优先级。
Sci Rep. 2023 May 24;13(1):8366. doi: 10.1038/s41598-023-35597-4.
2
Predicting disease genes based on multi-head attention fusion.基于多头注意力融合的疾病基因预测。
BMC Bioinformatics. 2023 Apr 21;24(1):162. doi: 10.1186/s12859-023-05285-1.
3
Assignment of structural domains in proteins using diffusion kernels on graphs.使用图上的扩散核来分配蛋白质中的结构域。

本文引用的文献

1
The Human Phenotype Ontology in 2017.2017年的人类表型本体论。
Nucleic Acids Res. 2017 Jan 4;45(D1):D865-D876. doi: 10.1093/nar/gkw1039. Epub 2016 Nov 28.
2
How to Identify Pathogenic Mutations among All Those Variations: Variant Annotation and Filtration in the Genome Sequencing Era.如何在所有这些变异中识别致病突变:基因组测序时代的变异注释与筛选
Hum Mutat. 2016 Dec;37(12):1272-1282. doi: 10.1002/humu.23110. Epub 2016 Sep 26.
3
NONCODE 2016: an informative and valuable data source of long non-coding RNAs.NONCODE 2016:一个关于长链非编码RNA的信息丰富且有价值的数据源。
BMC Bioinformatics. 2022 Sep 8;23(1):369. doi: 10.1186/s12859-022-04902-9.
4
Network-Based Approaches for Disease-Gene Association Prediction Using Protein-Protein Interaction Networks.基于蛋白质-蛋白质相互作用网络的疾病-基因关联预测的网络方法。
Int J Mol Sci. 2022 Jul 3;23(13):7411. doi: 10.3390/ijms23137411.
5
Knowledge graph-based recommendation framework identifies drivers of resistance in EGFR mutant non-small cell lung cancer.基于知识图谱的推荐框架识别 EGFR 突变型非小细胞肺癌耐药的驱动因素。
Nat Commun. 2022 Mar 29;13(1):1667. doi: 10.1038/s41467-022-29292-7.
6
Preclinical validation of therapeutic targets predicted by tensor factorization on heterogeneous graphs.基于异质图张量分解预测的治疗靶点的临床前验证。
Sci Rep. 2020 Oct 26;10(1):18250. doi: 10.1038/s41598-020-74922-z.
7
Gene-set Enrichment with Mathematical Biology (GEMB).基于数学生物学的基因集富集分析(GEMB)。
Gigascience. 2020 Oct 9;9(10). doi: 10.1093/gigascience/giaa091.
8
Genetic Analyses in Dent Disease and Characterization of CLCN5 Mutations in Kidney Biopsies.遗传性肾性尿崩症的基因分析及肾活检中 CLCN5 突变的特征。
Int J Mol Sci. 2020 Jan 14;21(2):516. doi: 10.3390/ijms21020516.
9
A Survey of Gene Prioritization Tools for Mendelian and Complex Human Diseases.孟德尔和复杂人类疾病基因优先级排序工具综述
J Integr Bioinform. 2019 Sep 9;16(4):20180069. doi: 10.1515/jib-2018-0069.
Nucleic Acids Res. 2016 Jan 4;44(D1):D203-8. doi: 10.1093/nar/gkv1252. Epub 2015 Nov 19.
4
A fast and high performance multiple data integration algorithm for identifying human disease genes.一种用于识别人类疾病基因的快速高效多数据整合算法。
BMC Med Genomics. 2015;8 Suppl 3(Suppl 3):S2. doi: 10.1186/1755-8794-8-S3-S2. Epub 2015 Sep 23.
5
Methods of integrating data to uncover genotype-phenotype interactions.整合数据以揭示基因型-表型相互作用的方法。
Nat Rev Genet. 2015 Feb;16(2):85-97. doi: 10.1038/nrg3868. Epub 2015 Jan 13.
6
Disease gene identification by using graph kernels and Markov random fields.利用图核和马尔可夫随机场进行疾病基因识别。
Sci China Life Sci. 2014 Nov;57(11):1054-63. doi: 10.1007/s11427-014-4745-8. Epub 2014 Oct 17.
7
Kernel methods for large-scale genomic data analysis.用于大规模基因组数据分析的核方法。
Brief Bioinform. 2015 Mar;16(2):183-92. doi: 10.1093/bib/bbu024. Epub 2014 Jul 22.
8
An unbiased evaluation of gene prioritization tools.基因优先级工具的无偏评估。
Bioinformatics. 2012 Dec 1;28(23):3081-8. doi: 10.1093/bioinformatics/bts581. Epub 2012 Oct 9.
9
Pharmacogenomics knowledge for personalized medicine.药物基因组学知识与个性化医疗。
Clin Pharmacol Ther. 2012 Oct;92(4):414-7. doi: 10.1038/clpt.2012.96.
10
Computational tools for prioritizing candidate genes: boosting disease gene discovery.计算工具在候选基因优先级排序中的应用:提高疾病基因发现的效率。
Nat Rev Genet. 2012 Jul 3;13(8):523-36. doi: 10.1038/nrg3253.