Department of Pediatrics, David Geffen School of Medicine at UCLA, Los Angeles, CA 90095-7088, USA.
Mol Genet Metab. 2010 Oct-Nov;101(2-3):134-40. doi: 10.1016/j.ymgme.2010.06.005. Epub 2010 Jun 22.
Genetic databases contain a variety of annotation errors that often go unnoticed due to the large size of modern genetic data sets. Interpretation of these data sets requires bioinformatics tools that may contribute to this problem. While providing gene symbol annotations for identifiers (IDs) such as microarray probe set, RefSeq, GenBank, and Entrez Gene is seemingly trivial, the accuracy is fundamental to any subsequent conclusions. We examine gene symbol annotations and results from three commercial pathway analysis software (PAS) packages: Ingenuity Pathways Analysis, GeneGO, and Pathway Studio. We compare gene symbol annotations and canonical pathway results over time and among different input ID types. We find that PAS results can be affected by variation in gene symbol annotations across software releases and the input ID type analyzed. As a result, we offer suggestions for using commercial PAS and reporting microarray results to improve research quality. We propose a wiki type website to facilitate communication of bioinformatics software problems within the scientific community.
遗传数据库包含各种注释错误,由于现代遗传数据集的规模庞大,这些错误常常被忽视。这些数据集的解释需要生物信息学工具,而这些工具可能会导致这个问题。虽然为标识符(ID)提供基因符号注释(例如微阵列探针集、RefSeq、GenBank 和 Entrez Gene)看似微不足道,但准确性对于任何后续结论都是至关重要的。我们检查了三个商业通路分析软件 (PAS) 包的基因符号注释和结果:Ingenuity Pathways Analysis、GeneGO 和 Pathway Studio。我们比较了不同软件版本和不同输入 ID 类型之间的基因符号注释和规范通路结果。我们发现 PAS 结果可能会受到软件版本之间基因符号注释的变化以及分析的输入 ID 类型的影响。因此,我们为使用商业 PAS 和报告微阵列结果提出了建议,以提高研究质量。我们提议建立一个维基类型的网站,以促进科学界内部生物信息学软件问题的交流。