Decision Systems Group, Brigham & Women's Hospital, Harvard Medical School, Boston, MA, USA.
BMC Bioinformatics. 2010 Oct 28;11 Suppl 9(Suppl 9):S8. doi: 10.1186/1471-2105-11-S9-S8.
The amount of data deposited in the Gene Expression Omnibus (GEO) has expanded significantly. It is important to ensure that these data are properly annotated with clinical data and descriptions of experimental conditions so that they can be useful for future analysis. This study assesses the adequacy of documented asthma markers in GEO. Three objective measures (coverage, consistency and association) were used for evaluation of annotations contained in 17 asthma studies.
There were 918 asthma samples with 20,640 annotated markers. Of these markers, only 10,419 had documented values (50% coverage). In one study carefully examined for consistency, there were discrepancies in drug name usage, with brand name and generic name used in different sections to refer to the same drug. Annotated markers showed adequate association with other relevant variables (i.e. the use of medication only when its corresponding disease state was present).
There is inadequate variable coverage within GEO and usage of terms lacks consistency. Association between relevant variables, however, was adequate.
基因表达综合数据库(GEO)中存储的数据量显著增加。确保这些数据与临床数据和实验条件描述正确标注,以便于将来进行分析,这一点非常重要。本研究评估了 GEO 中记录的哮喘标志物的充分性。使用三种客观指标(覆盖度、一致性和关联性)来评估 17 项哮喘研究中包含的注释。
共有 918 例哮喘样本和 20640 个已注释的标志物。这些标志物中,仅有 10419 个具有记录值(50%的覆盖度)。在一项经过仔细检查一致性的研究中,药物名称的使用存在差异,同一种药物在不同部分使用了商品名和通用名。已注释的标志物与其他相关变量具有充分的关联性(即仅在相应疾病状态存在时使用药物)。
GEO 中存在变量覆盖不足的情况,术语使用缺乏一致性。然而,相关变量之间的关联性是充分的。