Department of Experimental Statistics, Louisiana State University, 45 Martin D. Woodin Hall, Baton Rouge, LA 70802, United States.
Department of Mathematics, Texas State University, 601 University Drive San Marcos, TX 78666, United States.
Brief Bioinform. 2024 Sep 23;25(6). doi: 10.1093/bib/bbae566.
The recent development of artificial intelligence (AI) technology, especially the advance of deep neural network (DNN) technology, has revolutionized many fields. While DNN plays a central role in modern AI technology, it has rarely been used in genetic data analysis due to analytical and computational challenges brought by high-dimensional genetic data and an increasing number of samples. To facilitate the use of AI in genetic data analysis, we developed a C++ package, AIGen, based on two newly developed neural networks (i.e. kernel neural networks and functional neural networks) that are capable of modeling complex genotype-phenotype relationships (e.g. interactions) while providing robust performance against high-dimensional genetic data. Moreover, computationally efficient algorithms (e.g. a minimum norm quadratic unbiased estimation approach and batch training) are implemented in the package to accelerate the computation, making them computationally efficient for analyzing large-scale datasets with thousands or even millions of samples. By applying AIGen to the UK Biobank dataset, we demonstrate that it can efficiently analyze large-scale genetic data, attain improved accuracy, and maintain robust performance. Availability: AIGen is developed in C++ and its source code, along with reference libraries, is publicly accessible on GitHub at https://github.com/TingtHou/AIGen.
最近人工智能 (AI) 技术的发展,特别是深度神经网络 (DNN) 技术的进步,已经彻底改变了许多领域。虽然 DNN 在现代 AI 技术中起着核心作用,但由于高维遗传数据和样本数量的增加带来的分析和计算挑战,它很少被用于遗传数据分析。为了促进 AI 在遗传数据分析中的应用,我们开发了一个基于两个新开发的神经网络(即核神经网络和功能神经网络)的 C++ 包 AIGen,它能够建模复杂的基因型-表型关系(例如相互作用),同时提供针对高维遗传数据的稳健性能。此外,该包中还实现了计算效率高的算法(例如最小范数二次无偏估计方法和批量训练),以加速计算,使其能够对具有数千甚至数百万个样本的大规模数据集进行高效分析。通过将 AIGen 应用于 UK Biobank 数据集,我们证明它可以有效地分析大规模遗传数据,提高准确性,并保持稳健的性能。可获取性:AIGen 是用 C++ 开发的,其源代码以及参考库都可以在 GitHub 上公开获取,网址为 https://github.com/TingtHou/AIGen。