Liu Chun-Chi, Lin Chin-Chung, Li Ker-Chau, Chen Wen-Shyen E, Chen Jiun-Ching, Yang Ming-Te, Yang Pan-Chyr, Chang Pei-Chun, Chen Jeremy J W
Department of Computer Science, National Chung-Hsing University, Taichung, Taiwan, ROC.
BMC Bioinformatics. 2007 May 22;8:164. doi: 10.1186/1471-2105-8-164.
Genome-wide identification of specific oligonucleotides (oligos) is a computationally-intensive task and is a requirement for designing microarray probes, primers, and siRNAs. An artificial neural network (ANN) is a machine learning technique that can effectively process complex and high noise data. Here, ANNs are applied to process the unique subsequence distribution for prediction of specific oligos.
We present a novel and efficient algorithm, named the integration of ANN and BLAST (IAB) algorithm, to identify specific oligos. We establish the unique marker database for human and rat gene index databases using the hash table algorithm. We then create the input vectors, via the unique marker database, to train and test the ANN. The trained ANN predicted the specific oligos with high efficiency, and these oligos were subsequently verified by BLAST. To improve the prediction performance, the ANN over-fitting issue was avoided by early stopping with the best observed error and a k-fold validation was also applied. The performance of the IAB algorithm was about 5.2, 7.1, and 6.7 times faster than the BLAST search without ANN for experimental results of 70-mer, 50-mer, and 25-mer specific oligos, respectively. In addition, the results of polymerase chain reactions showed that the primers predicted by the IAB algorithm could specifically amplify the corresponding genes. The IAB algorithm has been integrated into a previously published comprehensive web server to support microarray analysis and genome-wide iterative enrichment analysis, through which users can identify a group of desired genes and then discover the specific oligos of these genes.
The IAB algorithm has been developed to construct SpecificDB, a web server that provides a specific and valid oligo database of the probe, siRNA, and primer design for the human genome. We also demonstrate the ability of the IAB algorithm to predict specific oligos through polymerase chain reaction experiments. SpecificDB provides comprehensive information and a user-friendly interface.
全基因组范围内特定寡核苷酸(oligos)的鉴定是一项计算密集型任务,也是设计微阵列探针、引物和小干扰RNA(siRNAs)的必要条件。人工神经网络(ANN)是一种机器学习技术,能够有效处理复杂且高噪声的数据。在此,将人工神经网络应用于处理独特子序列分布以预测特定寡核苷酸。
我们提出了一种新颖且高效的算法,名为人工神经网络与BLAST整合(IAB)算法,用于鉴定特定寡核苷酸。我们使用哈希表算法为人类和大鼠基因索引数据库建立独特标记数据库。然后,通过独特标记数据库创建输入向量,以训练和测试人工神经网络。经过训练的人工神经网络能够高效预测特定寡核苷酸,随后这些寡核苷酸通过BLAST进行验证。为提高预测性能,通过以最佳观测误差进行早期停止来避免人工神经网络的过拟合问题,并且还应用了k折验证。对于70聚体、50聚体和25聚体特定寡核苷酸的实验结果,IAB算法的性能分别比无人工神经网络的BLAST搜索快约5.2倍、7.1倍和6.7倍。此外,聚合酶链反应结果表明,IAB算法预测的引物能够特异性扩增相应基因。IAB算法已集成到先前发布的综合网络服务器中,以支持微阵列分析和全基因组迭代富集分析,用户可通过该服务器识别一组所需基因,然后发现这些基因的特定寡核苷酸。
已开发出IAB算法来构建SpecificDB,这是一个网络服务器,它为人类基因组提供用于探针、小干扰RNA和引物设计的特定且有效的寡核苷酸数据库。我们还通过聚合酶链反应实验证明了IAB算法预测特定寡核苷酸的能力。SpecificDB提供全面的信息和用户友好的界面。