Suppr超能文献

评估安捷伦表达芯片的注释表明,许多特征无法解释。

Evaluating annotations of an Agilent expression chip suggests that many features cannot be interpreted.

机构信息

Computational Biology Branch, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, DHHS, Bethesda, MD 20892, USA.

出版信息

BMC Genomics. 2009 Nov 30;10:566. doi: 10.1186/1471-2164-10-566.

Abstract

BACKGROUND

While attempting to reanalyze published data from Agilent 4 x 44 human expression chips, we found that some of the 60-mer olignucleotide features could not be interpreted as representing single human genes. For example, some of the oligonucleotides align with the transcripts of more than one gene. We decided to check the annotations for all autosomes and the X chromosome systematically using bioinformatics methods.

RESULTS

Out of 42683 reporters, we found that 25505 (60%) passed all our tests and are considered "fully valid". 9964 (23%) reporters did not have a meaningful identifier, mapped to the wrong chromosome, or did not pass basic alignment tests preventing us from correlating the expression values of these reporters with a unique annotated human gene. The remaining 7214 (17%) reporters could be associated with either a unique gene or a unique intergenic location, but could not be mapped to a transcript in RefSeq. The 7214 reporters are further partitioned into three different levels of validity.

CONCLUSION

Expression array studies should evaluate the annotations of reporters and remove those reporters that have suspect annotations. This evaluation can be done systematically and semi-automatically, but one must recognize that data sources are frequently updated leading to slightly changing validation results over time.

摘要

背景

在尝试重新分析安捷伦 4 x 44 人类表达芯片发布的数据时,我们发现,一些 60 -mer 寡核苷酸特征不能被解释为代表单个人类基因。例如,一些寡核苷酸与多个基因的转录本匹配。我们决定使用生物信息学方法系统地检查所有常染色体和 X 染色体的注释。

结果

在 42683 个报告者中,我们发现 25505 个(60%)通过了我们所有的测试,被认为是“完全有效”。9964 个(23%)报告者没有有意义的标识符,映射到错误的染色体,或者没有通过基本的对齐测试,使我们无法将这些报告者的表达值与唯一注释的人类基因相关联。其余 7214 个(17%)报告者可以与一个独特的基因或一个独特的基因间位置相关联,但不能映射到 RefSeq 中的转录本。这 7214 个报告者进一步分为三个不同的有效性级别。

结论

表达谱研究应评估报告者的注释,并删除那些具有可疑注释的报告者。这种评估可以系统地和半自动地进行,但必须认识到,数据源经常更新,导致验证结果随时间略有变化。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8fb3/2791105/39bddeecf095/1471-2164-10-566-1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验