• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

利用机器学习和基因功能相似性(通过基因本体论评估)识别疾病基因。

Identifying disease genes using machine learning and gene functional similarities, assessed through Gene Ontology.

机构信息

Instituto Nacional de Saúde Doutor Ricardo Jorge, Avenida Padre Cruz, Lisboa, Portugal.

BioISI: Biosystems & Integrative Sciences Institute, Faculdade de Ciências, Universidade de Lisboa, Lisboa, Portugal.

出版信息

PLoS One. 2018 Dec 10;13(12):e0208626. doi: 10.1371/journal.pone.0208626. eCollection 2018.

DOI:10.1371/journal.pone.0208626
PMID:30532199
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6287949/
Abstract

Identifying disease genes from a vast amount of genetic data is one of the most challenging tasks in the post-genomic era. Also, complex diseases present highly heterogeneous genotype, which difficult biological marker identification. Machine learning methods are widely used to identify these markers, but their performance is highly dependent upon the size and quality of available data. In this study, we demonstrated that machine learning classifiers trained on gene functional similarities, using Gene Ontology (GO), can improve the identification of genes involved in complex diseases. For this purpose, we developed a supervised machine learning methodology to predict complex disease genes. The proposed pipeline was assessed using Autism Spectrum Disorder (ASD) candidate genes. A quantitative measure of gene functional similarities was obtained by employing different semantic similarity measures. To infer the hidden functional similarities between ASD genes, various types of machine learning classifiers were built on quantitative semantic similarity matrices of ASD and non-ASD genes. The classifiers trained and tested on ASD and non-ASD gene functional similarities outperformed previously reported ASD classifiers. For example, a Random Forest (RF) classifier achieved an AUC of 0. 80 for predicting new ASD genes, which was higher than the reported classifier (0.73). Additionally, this classifier was able to predict 73 novel ASD candidate genes that were enriched for core ASD phenotypes, such as autism and obsessive-compulsive behavior. In addition, predicted genes were also enriched for ASD co-occurring conditions, including Attention Deficit Hyperactivity Disorder (ADHD). We also developed a KNIME workflow with the proposed methodology which allows users to configure and execute it without requiring machine learning and programming skills. Machine learning is an effective and reliable technique to decipher ASD mechanism by identifying novel disease genes, but this study further demonstrated that their performance can be improved by incorporating a quantitative measure of gene functional similarities. Source code and the workflow of the proposed methodology are available at https://github.com/Muh-Asif/ASD-genes-prediction.

摘要

从大量的遗传数据中识别疾病基因是后基因组时代最具挑战性的任务之一。此外,复杂疾病的基因型高度异质,这使得生物标志物的识别变得困难。机器学习方法被广泛用于识别这些标记物,但它们的性能高度依赖于可用数据的大小和质量。在这项研究中,我们证明了使用基因本体论(GO)基于基因功能相似性训练的机器学习分类器可以提高对涉及复杂疾病的基因的识别能力。为此,我们开发了一种监督机器学习方法来预测复杂疾病基因。使用自闭症谱系障碍(ASD)候选基因评估了所提出的方法。通过使用不同的语义相似性度量,获得了基因功能相似性的定量度量。为了推断 ASD 基因之间的隐藏功能相似性,我们在 ASD 和非 ASD 基因的定量语义相似性矩阵上构建了各种类型的机器学习分类器。在 ASD 和非 ASD 基因功能相似性上训练和测试的分类器优于以前报道的 ASD 分类器。例如,随机森林(RF)分类器在预测新的 ASD 基因方面的 AUC 为 0.80,高于报道的分类器(0.73)。此外,该分类器能够预测 73 个新的 ASD 候选基因,这些基因富集了核心 ASD 表型,如自闭症和强迫症行为。此外,预测的基因也富集了 ASD 共发疾病,包括注意力缺陷多动障碍(ADHD)。我们还使用所提出的方法开发了一个 KNIME 工作流程,允许用户无需机器学习和编程技能即可配置和执行它。机器学习是通过识别新的疾病基因来破译 ASD 机制的有效可靠技术,但这项研究进一步表明,通过纳入基因功能相似性的定量度量可以提高它们的性能。所提出的方法的源代码和工作流程可在 https://github.com/Muh-Asif/ASD-genes-prediction 上获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b506/6287949/a04b3eec870c/pone.0208626.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b506/6287949/07939e6565c5/pone.0208626.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b506/6287949/2a592a658ad6/pone.0208626.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b506/6287949/3fc58e2465b4/pone.0208626.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b506/6287949/a04b3eec870c/pone.0208626.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b506/6287949/07939e6565c5/pone.0208626.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b506/6287949/2a592a658ad6/pone.0208626.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b506/6287949/3fc58e2465b4/pone.0208626.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b506/6287949/a04b3eec870c/pone.0208626.g004.jpg

相似文献

1
Identifying disease genes using machine learning and gene functional similarities, assessed through Gene Ontology.利用机器学习和基因功能相似性(通过基因本体论评估)识别疾病基因。
PLoS One. 2018 Dec 10;13(12):e0208626. doi: 10.1371/journal.pone.0208626. eCollection 2018.
2
DGH-GO: dissecting the genetic heterogeneity of complex diseases using gene ontology.DGH-GO:利用基因本体论解析复杂疾病的遗传异质性。
BMC Bioinformatics. 2023 Apr 26;24(1):171. doi: 10.1186/s12859-023-05290-4.
3
A hybrid Stacking-SMOTE model for optimizing the prediction of autistic genes.一种混合的堆叠-SMOTE 模型,用于优化自闭症基因预测。
BMC Bioinformatics. 2023 Oct 6;24(1):379. doi: 10.1186/s12859-023-05501-y.
4
Prediction and prioritization of autism-associated long non-coding RNAs using gene expression and sequence features.使用基因表达和序列特征预测和优先考虑自闭症相关的长非编码 RNA。
BMC Bioinformatics. 2020 Nov 7;21(1):505. doi: 10.1186/s12859-020-03843-5.
5
HEC-ASD: a hybrid ensemble-based classification model for predicting autism spectrum disorder disease genes.HEC-ASD:一种基于混合集成的自闭症谱系障碍疾病基因预测分类模型。
BMC Bioinformatics. 2022 Dec 21;23(1):554. doi: 10.1186/s12859-022-05099-7.
6
Detecting Developmental Delay and Autism Through Machine Learning Models Using Home Videos of Bangladeshi Children: Development and Validation Study.通过使用孟加拉国儿童家庭视频的机器学习模型检测发育迟缓与自闭症:开发与验证研究
J Med Internet Res. 2019 Apr 24;21(4):e13822. doi: 10.2196/13822.
7
Identification of biological mechanisms underlying a multidimensional ASD phenotype using machine learning.使用机器学习识别多维 ASD 表型的生物学机制。
Transl Psychiatry. 2020 Jan 28;10(1):43. doi: 10.1038/s41398-020-0721-1.
8
HPO2Vec+: Leveraging heterogeneous knowledge resources to enrich node embeddings for the Human Phenotype Ontology.HPO2Vec+:利用异构知识资源丰富人类表型本体的节点嵌入。
J Biomed Inform. 2019 Aug;96:103246. doi: 10.1016/j.jbi.2019.103246. Epub 2019 Jun 27.
9
Outgroup Machine Learning Approach Identifies Single Nucleotide Variants in Noncoding DNA Associated with Autism Spectrum Disorder.外群机器学习方法识别出与自闭症谱系障碍相关的非编码DNA中的单核苷酸变异。
Pac Symp Biocomput. 2019;24:260-271.
10
Development of a Machine Learning Algorithm for the Surveillance of Autism Spectrum Disorder.一种用于监测自闭症谱系障碍的机器学习算法的开发
PLoS One. 2016 Dec 21;11(12):e0168224. doi: 10.1371/journal.pone.0168224. eCollection 2016.

引用本文的文献

1
GhostBuster: A Deep-Learning-based, Literature-Unbiased Gene Prioritization Tool for Gene Annotation Prediction.幽灵克星:一种基于深度学习、不受文献偏差影响的用于基因注释预测的基因优先级排序工具。
bioRxiv. 2025 Jun 27:2025.06.22.660948. doi: 10.1101/2025.06.22.660948.
2
Hybrid deep learning method to identify key genes in autism spectrum disorder.用于识别自闭症谱系障碍关键基因的混合深度学习方法。
Healthc Technol Lett. 2025 Apr 22;12(1):e12104. doi: 10.1049/htl2.12104. eCollection 2025 Jan-Dec.
3
Identification of metabolomics-based biomarker discovery in individuals with down syndrome utilizing kernel-tree model-enhanced explainable artificial intelligence methodology.

本文引用的文献

1
Identifying Human Phenotype Terms by Combining Machine Learning and Validation Rules.通过结合机器学习和验证规则来识别人类表型术语。
Biomed Res Int. 2017;2017:8565739. doi: 10.1155/2017/8565739. Epub 2017 Nov 9.
2
Predicting disease-related genes using integrated biomedical networks.利用整合生物医学网络预测疾病相关基因。
BMC Genomics. 2017 Jan 25;18(Suppl 1):1043. doi: 10.1186/s12864-016-3263-4.
3
Rare Inherited and De Novo CNVs Reveal Complex Contributions to ASD Risk in Multiplex Families.罕见的遗传性和新生拷贝数变异揭示了对多重家庭中自闭症谱系障碍风险的复杂影响。
利用核树模型增强的可解释人工智能方法识别唐氏综合征个体中基于代谢组学的生物标志物。
Front Mol Biosci. 2025 Apr 9;12:1567199. doi: 10.3389/fmolb.2025.1567199. eCollection 2025.
4
eNSMBL-PASD: Spearheading early autism spectrum disorder detection through advanced genomic computational frameworks utilizing ensemble learning models.欧洲生物信息学研究所自闭症谱系障碍预测分析系统(eNSMBL-PASD):通过利用集成学习模型的先进基因组计算框架引领早期自闭症谱系障碍检测。
Digit Health. 2025 Jan 27;11:20552076241313407. doi: 10.1177/20552076241313407. eCollection 2025 Jan-Dec.
5
Machine learning applications in healthcare clinical practice and research.机器学习在医疗保健临床实践与研究中的应用。
World J Clin Cases. 2025 Jan 6;13(1):99744. doi: 10.12998/wjcc.v13.i1.99744.
6
Cross-Domain Text Mining of Pathophysiological Processes Associated with Diabetic Kidney Disease.与糖尿病肾病相关的病理生理过程的跨领域文本挖掘
Int J Mol Sci. 2024 Apr 19;25(8):4503. doi: 10.3390/ijms25084503.
7
Deciphering Ferroptosis: From Molecular Pathways to Machine Learning-Guided Therapeutic Innovation.解读铁死亡:从分子途径到机器学习引导的治疗创新
Mol Biotechnol. 2025 Apr;67(4):1290-1309. doi: 10.1007/s12033-024-01139-0. Epub 2024 Apr 13.
8
Identification of Neurotransmission and Synaptic Biological Processes Disrupted in Autism Spectrum Disorder Using Interaction Networks and Community Detection Analysis.利用相互作用网络和社区检测分析识别自闭症谱系障碍中神经传递和突触生物学过程的破坏情况。
Biomedicines. 2023 Nov 4;11(11):2971. doi: 10.3390/biomedicines11112971.
9
ReGeNNe: genetic pathway-based deep neural network using canonical correlation regularizer for disease prediction.ReGeNNe:基于遗传途径的深度神经网络,使用正则相关正则化器进行疾病预测。
Bioinformatics. 2023 Nov 1;39(11). doi: 10.1093/bioinformatics/btad679.
10
GeNetOntology: identifying affected gene ontology terms via grouping, scoring, and modeling of gene expression data utilizing biological knowledge-based machine learning.基因本体论:通过利用基于生物知识的机器学习对基因表达数据进行分组、评分和建模来识别受影响的基因本体术语。
Front Genet. 2023 Aug 21;14:1139082. doi: 10.3389/fgene.2023.1139082. eCollection 2023.
Am J Hum Genet. 2016 Sep 1;99(3):540-554. doi: 10.1016/j.ajhg.2016.06.036. Epub 2016 Aug 25.
4
Genome-wide prediction and functional characterization of the genetic basis of autism spectrum disorder.自闭症谱系障碍遗传基础的全基因组预测与功能表征
Nat Neurosci. 2016 Nov;19(11):1454-1462. doi: 10.1038/nn.4353. Epub 2016 Aug 1.
5
Enrichr: a comprehensive gene set enrichment analysis web server 2016 update.Enrichr:一个全面的基因集富集分析网络服务器2016年更新版。
Nucleic Acids Res. 2016 Jul 8;44(W1):W90-7. doi: 10.1093/nar/gkw377. Epub 2016 May 3.
6
First glimpses of the neurobiology of autism spectrum disorder.自闭症谱系障碍神经生物学的初步窥探。
Curr Opin Genet Dev. 2015 Aug;33:80-92. doi: 10.1016/j.gde.2015.10.002. Epub 2015 Nov 9.
7
Prioritization of candidate disease genes by combining topological similarity and semantic similarity.通过结合拓扑相似性和语义相似性对候选疾病基因进行优先级排序。
J Biomed Inform. 2015 Oct;57:1-5. doi: 10.1016/j.jbi.2015.07.005. Epub 2015 Jul 11.
8
Machine learning applications in genetics and genomics.机器学习在遗传学和基因组学中的应用。
Nat Rev Genet. 2015 Jun;16(6):321-32. doi: 10.1038/nrg3920. Epub 2015 May 7.
9
Gene hunting in autism spectrum disorder: on the path to precision medicine.自闭症谱系障碍中的基因搜寻:迈向精准医学之路
Lancet Neurol. 2015 Nov;14(11):1109-20. doi: 10.1016/S1474-4422(15)00044-7. Epub 2015 Apr 16.
10
Synaptic, transcriptional and chromatin genes disrupted in autism.在自闭症中受到破坏的突触、转录和染色质基因。
Nature. 2014 Nov 13;515(7526):209-15. doi: 10.1038/nature13772. Epub 2014 Oct 29.