Suppr超能文献

利用表达数量性状位点数据和图嵌入神经网络揭示基因型-表型相互作用。

Using expression quantitative trait loci data and graph-embedded neural networks to uncover genotype-phenotype interactions.

作者信息

Guo Xinpeng, Han Jinyu, Song Yafei, Yin Zhilei, Liu Shuaichen, Shang Xuequn

机构信息

School of Computer Science and Engineering, Northwestern Polytechnical University, Xi'an, China.

School of Air and Missile Defense, Air Force Engineering University, Xi'an, China.

出版信息

Front Genet. 2022 Aug 15;13:921775. doi: 10.3389/fgene.2022.921775. eCollection 2022.

Abstract

A central goal of current biology is to establish a complete functional link between the genotype and phenotype, known as the so-called genotypephenotype map. With the continuous development of high-throughput technology and the decline in sequencing costs, multi-omics analysis has become more widely employed. While this gives us new opportunities to uncover the correlation mechanisms between single-nucleotide polymorphism (SNP), genes, and phenotypes, multi-omics still faces certain challenges, specifically: 1) When the sample size is large enough, the number of omics types is often not large enough to meet the requirements of multi-omics analysis; 2) each omics' internal correlations are often unclear, such as the correlation between genes in genomics; 3) when analyzing a large number of traits (), the sample size () is often smaller than , hindering the application of machine learning methods in the classification of disease outcomes. To solve these issues with multi-omics and build a robust classification model, we propose a graph-embedded deep neural network (G-EDNN) based on expression quantitative trait loci (eQTL) data, which achieves sparse connectivity between network layers to prevent overfitting. The correlation within each omics is also considered such that the model more closely resembles biological reality. To verify the capabilities of this method, we conducted experimental analysis using the GSE28127 and GSE95496 data sets from the Gene Expression Omnibus (GEO) database, tested various neural network architectures, and used prior data for feature selection and graph embedding. Results show that the proposed method could achieve a high classification accuracy and easy-to-interpret feature selection. This method represents an extended application of genotype-phenotype association analysis in deep learning networks.

摘要

当前生物学的一个核心目标是在基因型和表型之间建立完整的功能联系,即所谓的基因型 - 表型图谱。随着高通量技术的不断发展和测序成本的下降,多组学分析得到了更广泛的应用。虽然这为我们揭示单核苷酸多态性(SNP)、基因和表型之间的相关机制提供了新机会,但多组学仍面临一定挑战,具体如下:1)当样本量足够大时,组学类型的数量往往不足以满足多组学分析的要求;2)每个组学内部的相关性往往不明确,例如基因组学中基因之间的相关性;3)在分析大量性状()时,样本量()往往小于,这阻碍了机器学习方法在疾病结局分类中的应用。为了解决多组学的这些问题并构建一个强大的分类模型,我们提出了一种基于表达数量性状位点(eQTL)数据的图嵌入深度神经网络(G-EDNN),该网络实现了网络层之间的稀疏连接以防止过拟合。同时也考虑了每个组学内部 的相关性,使模型更接近生物学现实。为了验证该方法的能力,我们使用来自基因表达综合数据库(GEO)的GSE28127和GSE95496数据集进行了实验分析,测试了各种神经网络架构,并使用先验数据进行特征选择和图嵌入。结果表明,所提出的方法能够实现高分类准确率和易于解释的特征选择。该方法代表了基因型 - 表型关联分析在深度学习网络中的扩展应用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/754c/9421127/cfe68866aec4/fgene-13-921775-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验