基于深度学习方法从基因型数据预测眼睛颜色和 2 型糖尿病表型。

Eye-color and Type-2 diabetes phenotype prediction from genotype data using deep learning methods.

机构信息

Department of Electrical Engineering and Computer Science, Center for Biotechnology Khalifa University, Khalifa University of Science and Technology, Abu Dhabi, United Arab Emirates.

出版信息

BMC Bioinformatics. 2021 Apr 19;22(1):198. doi: 10.1186/s12859-021-04077-9.

DOI:10.1186/s12859-021-04077-9

PMID:33874881

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8056510/

Abstract

BACKGROUND

Genotype-phenotype predictions are of great importance in genetics. These predictions can help to find genetic mutations causing variations in human beings. There are many approaches for finding the association which can be broadly categorized into two classes, statistical techniques, and machine learning. Statistical techniques are good for finding the actual SNPs causing variation where Machine Learning techniques are good where we just want to classify the people into different categories. In this article, we examined the Eye-color and Type-2 diabetes phenotype. The proposed technique is a hybrid approach consisting of some parts from statistical techniques and remaining from Machine learning.

RESULTS

The main dataset for Eye-color phenotype consists of 806 people. 404 people have Blue-Green eyes where 402 people have Brown eyes. After preprocessing we generated 8 different datasets, containing different numbers of SNPs, using the mutation difference and thresholding at individual SNP. We calculated three types of mutation at each SNP no mutation, partial mutation, and full mutation. After that data is transformed for machine learning algorithms. We used about 9 classifiers, RandomForest, Extreme Gradient boosting, ANN, LSTM, GRU, BILSTM, 1DCNN, ensembles of ANN, and ensembles of LSTM which gave the best accuracy of 0.91, 0.9286, 0.945, 0.94, 0.94, 0.92, 0.95, and 0.96% respectively. Stacked ensembles of LSTM outperformed other algorithms for 1560 SNPs with an overall accuracy of 0.96, AUC = 0.98 for brown eyes, and AUC = 0.97 for Blue-Green eyes. The main dataset for Type-2 diabetes consists of 107 people where 30 people are classified as cases and 74 people as controls. We used different linear threshold to find the optimal number of SNPs for classification. The final model gave an accuracy of 0.97%.

CONCLUSION

Genotype-phenotype predictions are very useful especially in forensic. These predictions can help to identify SNP variant association with traits and diseases. Given more datasets, machine learning model predictions can be increased. Moreover, the non-linearity in the Machine learning model and the combination of SNPs Mutations while training the model increases the prediction. We considered binary classification problems but the proposed approach can be extended to multi-class classification.

摘要

背景

基因型-表型预测在遗传学中非常重要。这些预测有助于找到导致人类变异的基因突变。有许多方法可以找到关联，可以大致分为两类，统计技术和机器学习。统计技术擅长发现导致变异的实际 SNP，而机器学习技术擅长于我们只想将人们分为不同类别。在本文中，我们检查了眼睛颜色和 2 型糖尿病表型。所提出的技术是一种混合方法，由统计技术的部分部分和机器学习的其余部分组成。

结果

眼睛颜色表型的主要数据集包含 806 人。404 人有蓝绿色眼睛，402 人有棕色眼睛。在预处理之后，我们使用突变差异和个体 SNP 阈值生成了 8 个不同的数据集，其中包含不同数量的 SNP。我们在每个 SNP 处计算了三种类型的突变，无突变、部分突变和完全突变。之后，数据被转换为机器学习算法。我们使用了大约 9 个分类器，包括随机森林、极端梯度提升、人工神经网络、长短时记忆网络、门控循环单元、双向长短时记忆网络、一维卷积神经网络、人工神经网络集成和长短时记忆网络集成，它们的准确率分别为 0.91、0.9286、0.945、0.94、0.94、0.92、0.95 和 0.96%。堆叠的长短时记忆网络集成在 1560 个 SNP 上的表现优于其他算法，整体准确率为 0.96%，棕色眼睛的 AUC 为 0.98%，蓝绿色眼睛的 AUC 为 0.97%。2 型糖尿病的主要数据集包含 107 人，其中 30 人被归类为病例，74 人被归类为对照。我们使用不同的线性阈值来找到最佳的 SNP 数量进行分类。最终模型的准确率为 0.97%。

结论

基因型-表型预测非常有用，特别是在法医学中。这些预测可以帮助确定与特征和疾病相关的 SNP 变异关联。有了更多的数据集，可以提高机器学习模型的预测能力。此外，在训练模型时，机器学习模型的非线性和 SNP 突变的组合增加了预测。我们考虑了二元分类问题，但所提出的方法可以扩展到多类分类。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8fc1/8056510/0a97b0b33191/12859_2021_4077_Fig1_HTML.jpg

相似文献

Eye-color and Type-2 diabetes phenotype prediction from genotype data using deep learning methods.基于深度学习方法从基因型数据预测眼睛颜色和 2 型糖尿病表型。

BMC Bioinformatics. 2021 Apr 19;22(1):198. doi: 10.1186/s12859-021-04077-9.

Further development of forensic eye color predictive tests.法医眼色素预测测试的进一步发展。

Forensic Sci Int Genet. 2013 Jan;7(1):28-40. doi: 10.1016/j.fsigen.2012.05.009. Epub 2012 Jun 17.

Evaluation of the IrisPlex DNA-based eye color prediction assay in a United States population.基于DNA的IrisPlex眼睛颜色预测检测法在美国人群中的评估。

Forensic Sci Int Genet. 2014 Mar;9:111-7. doi: 10.1016/j.fsigen.2013.12.003. Epub 2013 Dec 12.

Performance of four models for eye color prediction in an Italian population sample.四种模型在意大利人群样本中预测眼睛颜色的性能。

Forensic Sci Int Genet. 2019 May;40:192-200. doi: 10.1016/j.fsigen.2019.03.008. Epub 2019 Mar 11.

DNA-based eye colour prediction across Europe with the IrisPlex system.基于 DNA 的虹膜预测系统在欧洲的眼睛颜色预测。

Forensic Sci Int Genet. 2012 May;6(3):330-40. doi: 10.1016/j.fsigen.2011.07.009. Epub 2011 Aug 2.

Deep learning versus parametric and ensemble methods for genomic prediction of complex phenotypes.深度学习与参数化和集成方法在复杂表型基因组预测中的比较。

Genet Sel Evol. 2020 Feb 24;52(1):12. doi: 10.1186/s12711-020-00531-z.

Prediction of death status on the course of treatment in SARS-COV-2 patients with deep learning and machine learning methods.利用深度学习和机器学习方法预测 SARS-CoV-2 患者治疗过程中的死亡状态。

Comput Methods Programs Biomed. 2021 Apr;201:105951. doi: 10.1016/j.cmpb.2021.105951. Epub 2021 Jan 22.

Deep learning-based automated detection of glaucomatous optic neuropathy on color fundus photographs.基于深度学习的彩色眼底照片中青光眼视神经病变的自动检测。

Graefes Arch Clin Exp Ophthalmol. 2020 Apr;258(4):851-867. doi: 10.1007/s00417-020-04609-8. Epub 2020 Jan 27.

Predicting complications of diabetes mellitus using advanced machine learning algorithms.使用先进的机器学习算法预测糖尿病并发症。

J Am Med Inform Assoc. 2020 Jul 1;27(9):1343-1351. doi: 10.1093/jamia/ocaa120.

Evaluation of supervised machine-learning methods for predicting appearance traits from DNA.基于监督学习的方法从 DNA 预测外观特征的评估。

Forensic Sci Int Genet. 2021 Jul;53:102507. doi: 10.1016/j.fsigen.2021.102507. Epub 2021 Mar 23.

引用本文的文献

Transfer learning for genotype-phenotype prediction using deep learning models.基于深度学习模型的基因型-表型预测的迁移学习。

BMC Bioinformatics. 2022 Nov 29;23(1):511. doi: 10.1186/s12859-022-05036-8.

Can We Convert Genotype Sequences Into Images for Cases/Controls Classification?我们能否将基因型序列转换为图像用于病例/对照分类？

Front Bioinform. 2022 Jun 28;2:914435. doi: 10.3389/fbinf.2022.914435. eCollection 2022.

LSTM input timestep optimization using simulated annealing for wind power predictions.使用模拟退火优化长短期记忆网络输入时间步长进行风力发电预测。

PLoS One. 2022 Oct 7;17(10):e0275649. doi: 10.1371/journal.pone.0275649. eCollection 2022.

Development and validation of immune-based biomarkers and deep learning models for Alzheimer's disease.基于免疫的生物标志物及阿尔茨海默病深度学习模型的开发与验证

Front Genet. 2022 Aug 22;13:968598. doi: 10.3389/fgene.2022.968598. eCollection 2022.

Correction to: Eye‑color and Type‑2 diabetes phenotype prediction from genotype data using deep learning methods.对《使用深度学习方法从基因型数据预测眼睛颜色和2型糖尿病表型》的更正

BMC Bioinformatics. 2021 Jun 11;22(1):319. doi: 10.1186/s12859-021-04218-0.

本文引用的文献

Machine learning, the kidney, and genotype-phenotype analysis.机器学习、肾脏与基因型-表型分析。

Kidney Int. 2020 Jun;97(6):1141-1149. doi: 10.1016/j.kint.2020.02.028. Epub 2020 Apr 1.

Quantification of the Underlying Mechanisms and Relationships Among Cancer, Metastasis, and Differentiation and Development.癌症、转移、分化与发育之间潜在机制及关系的量化

Front Genet. 2020 Mar 2;10:1388. doi: 10.3389/fgene.2019.01388. eCollection 2019.

LncRNA LUCRC Regulates Colorectal Cancer Cell Growth and Tumorigenesis by Targeting Endoplasmic Reticulum Stress Response.长链非编码RNA LUCRC通过靶向内质网应激反应调控结肠癌细胞生长和肿瘤发生

Front Genet. 2020 Jan 31;10:1409. doi: 10.3389/fgene.2019.01409. eCollection 2019.

S100A6 Promotes B Lymphocyte Penetration Through the Blood-Brain Barrier in Autoimmune Encephalitis.S100A6促进自身免疫性脑炎中B淋巴细胞穿透血脑屏障

Front Genet. 2019 Nov 22;10:1188. doi: 10.3389/fgene.2019.01188. eCollection 2019.

Artificial Intelligence Applications in Type 2 Diabetes Mellitus Care: Focus on Machine Learning Methods.人工智能在2型糖尿病护理中的应用：聚焦机器学习方法。

Healthc Inform Res. 2019 Oct;25(4):248-261. doi: 10.4258/hir.2019.25.4.248. Epub 2019 Oct 31.

Eye color prediction using single nucleotide polymorphisms in Saudi population.沙特人群中利用单核苷酸多态性进行眼睛颜色预测。

Saudi J Biol Sci. 2019 Nov;26(7):1607-1612. doi: 10.1016/j.sjbs.2018.09.011. Epub 2018 Sep 28.

Mass spectra alignment using virtual lock-masses.采用虚拟锁定质量进行质谱对齐。

Sci Rep. 2019 Jun 11;9(1):8469. doi: 10.1038/s41598-019-44923-8.

Protocols, Methods, and Tools for Genome-Wide Association Studies (GWAS) of Dental Traits.牙齿性状全基因组关联研究（GWAS）的方案、方法和工具

Methods Mol Biol. 2019;1922:493-509. doi: 10.1007/978-1-4939-9012-2_38.

Machine learning identifies interacting genetic variants contributing to breast cancer risk: A case study in Finnish cases and controls.机器学习鉴定出导致乳腺癌风险的相互作用遗传变异：芬兰病例对照研究。

Sci Rep. 2018 Sep 3;8(1):13149. doi: 10.1038/s41598-018-31573-5.

A deep convolutional neural network approach for predicting phenotypes from genotypes.一种基于深度卷积神经网络的基因型到表型预测方法。

Planta. 2018 Nov;248(5):1307-1318. doi: 10.1007/s00425-018-2976-9. Epub 2018 Aug 12.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基于深度学习方法从基因型数据预测眼睛颜色和 2 型糖尿病表型。

Eye-color and Type-2 diabetes phenotype prediction from genotype data using deep learning methods.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSION

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献