The Scripps Research Institute, Department of Molecular and Experimental Medicine, La Jolla, CA, United States.
JMIR Serious Games. 2014 Jul 29;2(2):e7. doi: 10.2196/games.3350.
Molecular signatures for predicting breast cancer prognosis could greatly improve care through personalization of treatment. Computational analyses of genome-wide expression datasets have identified such signatures, but these signatures leave much to be desired in terms of accuracy, reproducibility, and biological interpretability. Methods that take advantage of structured prior knowledge (eg, protein interaction networks) show promise in helping to define better signatures, but most knowledge remains unstructured. Crowdsourcing via scientific discovery games is an emerging methodology that has the potential to tap into human intelligence at scales and in modes unheard of before.
The main objective of this study was to test the hypothesis that knowledge linking expression patterns of specific genes to breast cancer outcomes could be captured from players of an open, Web-based game. We envisioned capturing knowledge both from the player's prior experience and from their ability to interpret text related to candidate genes presented to them in the context of the game.
We developed and evaluated an online game called The Cure that captured information from players regarding genes for use as predictors of breast cancer survival. Information gathered from game play was aggregated using a voting approach, and used to create rankings of genes. The top genes from these rankings were evaluated using annotation enrichment analysis, comparison to prior predictor gene sets, and by using them to train and test machine learning systems for predicting 10 year survival.
Between its launch in September 2012 and September 2013, The Cure attracted more than 1000 registered players, who collectively played nearly 10,000 games. Gene sets assembled through aggregation of the collected data showed significant enrichment for genes known to be related to key concepts such as cancer, disease progression, and recurrence. In terms of the predictive accuracy of models trained using this information, these gene sets provided comparable performance to gene sets generated using other methods, including those used in commercial tests. The Cure is available on the Internet.
The principal contribution of this work is to show that crowdsourcing games can be developed as a means to address problems involving domain knowledge. While most prior work on scientific discovery games and crowdsourcing in general takes as a premise that contributors have little or no expertise, here we demonstrated a crowdsourcing system that succeeded in capturing expert knowledge.
用于预测乳腺癌预后的分子特征可以通过治疗的个体化来极大地改善护理。全基因组表达数据集的计算分析已经确定了这些特征,但就准确性、可重复性和生物学可解释性而言,这些特征还有很大的改进空间。利用结构化先验知识(例如蛋白质相互作用网络)的方法显示出帮助定义更好特征的潜力,但大多数知识仍然是非结构化的。通过科学发现游戏进行众包是一种新兴的方法,它有可能以前所未有的规模和模式利用人类的智慧。
本研究的主要目的是检验这样一个假设,即可以从开放的基于网络的游戏玩家那里获取将特定基因的表达模式与乳腺癌结果联系起来的知识。我们设想从玩家的先前经验以及他们从游戏中向他们呈现的候选基因的相关文本中进行解释的能力中获取知识。
我们开发并评估了一个名为“治愈”的在线游戏,该游戏从玩家那里获取有关基因的信息,用作乳腺癌生存预测的指标。通过投票方式汇总从游戏中收集到的信息,并用于创建基因排名。从这些排名中选择最高的基因,使用注释富集分析、与先前预测基因集的比较以及使用它们来训练和测试用于预测 10 年生存率的机器学习系统进行评估。
从 2012 年 9 月推出到 2013 年 9 月,“治愈”吸引了 1000 多名注册玩家,他们总共玩了近 10000 场游戏。通过收集数据的聚合而形成的基因集在与癌症、疾病进展和复发等关键概念相关的基因方面表现出显著的富集。就使用此信息训练的模型的预测准确性而言,这些基因集的性能与使用其他方法(包括商业测试中使用的方法)生成的基因集相当。“治愈”可以在互联网上使用。
这项工作的主要贡献是表明众包游戏可以开发为解决涉及领域知识的问题的一种手段。虽然大多数关于科学发现游戏和众包的先前工作都以参与者几乎没有或没有专业知识为前提,但在这里,我们展示了一个成功捕获专家知识的众包系统。