Suppr超能文献

基于基因分型阵列的双流卷积神经网络唐氏综合征筛查模型

Bi-stream CNN Down Syndrome screening model based on genotyping array.

作者信息

Feng Bing, Hoskins William, Zhang Yan, Meng Zibo, Samuels David C, Wang Jiandong, Xia Ruofan, Liu Chao, Tang Jijun, Guo Yan

机构信息

College of Education, Zhejiang University, Hangzhou, Zhejiang, 310058, People's Republic of China.

Department of Computer Science and Engineering,University of South Carolina, Columbia, 29208, SC, USA.

出版信息

BMC Med Genomics. 2018 Nov 20;11(Suppl 5):105. doi: 10.1186/s12920-018-0416-0.

Abstract

BACKGROUND

Human Down syndrome (DS) is usually caused by genomic micro-duplications and dosage imbalances of human chromosome 21. It is associated with many genomic and phenotype abnormalities. Even though human DS occurs about 1 per 1,000 births worldwide, which is a very high rate, researchers haven't found any effective method to cure DS. Currently, the most efficient ways of human DS prevention are screening and early detection.

METHODS

In this study, we used deep learning techniques and analyzed a set of Illumina genotyping array data. We built a bi-stream convolutional neural networks model to screen/predict the occurrence of DS. Firstly, we built image input data by converting the intensities of each SNP site into chromosome SNP maps. Next, we proposed a bi-stream convolutional neural network (CNN) architecture with nine layers and two branch models. We further merged two CNN branch models into one model in the fourth convolutional layer, and output the prediction in the last layer.

RESULTS

Our bi-stream CNN model achieved 99.3% average accuracies, and very low false-positive and false-negative rates, which was necessary for further applications in disease prediction and medical practice. We further visualized the feature maps and learned filters from intermediate convolutional layers, which showed the genomic patterns and correlated SNPs variations in human DS genomes. We also compared our methods with other CNN and traditional machine learning models. We further analyzed and discussed the characteristics and strengths of our bi-stream CNN model.

CONCLUSIONS

Our bi-stream model used two branch CNN models to learn the local genome features and regional patterns among adjacent genes and SNP sites from two chromosomes simultaneously. It achieved the best performance in all evaluating metrics when compared with two single-stream CNN models and three traditional machine-learning algorithms. The visualized feature maps also provided opportunities to study the genomic markers and pathway components associated with Human DS, which provided insights for gene therapy and genomic medicine developments.

摘要

背景

人类唐氏综合征(DS)通常由人类21号染色体的基因组微重复和剂量失衡引起。它与许多基因组和表型异常相关。尽管全球范围内每1000例出生中约有1例发生人类DS,这一比例非常高,但研究人员尚未找到任何有效的治愈DS的方法。目前,预防人类DS最有效的方法是筛查和早期检测。

方法

在本研究中,我们使用深度学习技术并分析了一组Illumina基因分型阵列数据。我们构建了一个双流卷积神经网络模型来筛查/预测DS的发生。首先,我们通过将每个SNP位点的强度转换为染色体SNP图谱来构建图像输入数据。接下来,我们提出了一种具有九层和两个分支模型的双流卷积神经网络(CNN)架构。我们在第四卷积层将两个CNN分支模型进一步合并为一个模型,并在最后一层输出预测结果。

结果

我们的双流CNN模型平均准确率达到99.3%,假阳性和假阴性率非常低,这对于在疾病预测和医学实践中的进一步应用是必要的。我们进一步可视化了中间卷积层的特征图并学习了滤波器,这展示了人类DS基因组中的基因组模式和相关SNP变异。我们还将我们的方法与其他CNN和传统机器学习模型进行了比较。我们进一步分析和讨论了我们的双流CNN模型的特点和优势。

结论

我们的双流模型使用两个分支CNN模型同时从两条染色体学习局部基因组特征以及相邻基因和SNP位点之间的区域模式。与两个单流CNN模型和三种传统机器学习算法相比,它在所有评估指标中表现最佳。可视化的特征图也为研究与人类DS相关的基因组标记和通路成分提供了机会,这为基因治疗和基因组医学发展提供了见解。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验