Suppr超能文献

BD2K培训协调中心的ERuDIte:数据科学教育资源发现索引

BD2K Training Coordinating Center's ERuDIte: the Educational Resource Discovery Index for Data Science.

作者信息

Ambite José Luis, Fierro Lily, Gordon Jonathan, Burns Gully A, Geigl Florian, Lerman Kristina, Van Horn John D

机构信息

University of Southern California's Information Sciences Institute (ISI), Marina del Rey, CA 90292.

performed as a visiting Ph.D. student at ISI.

出版信息

IEEE Trans Emerg Top Comput. 2021 Jan-Mar;9(1):316-328. doi: 10.1109/tetc.2019.2903466. Epub 2019 Mar 6.

Abstract

Data science is a field that has developed to enable efficient integration and analysis of increasingly large data sets in many domains. In particular, big data in genetics, neuroimaging, mobile health, and other subfields of biomedical science, promises new insights, but also poses challenges. To address these challenges, the National Institutes of Health launched the Big Data to Knowledge (BD2K) initiative, including a Training Coordinating Center (TCC) tasked with developing a resource for personalized data science training for biomedical researchers. The BD2K TCC web portal is powered by ERuDIte, the Educational Resource Discovery Index, which collects training resources for data science, including online courses, videos of tutorials and research talks, textbooks, and other web-based materials. While the availability of so many potential learning resources is exciting, they are highly heterogeneous in quality, difficulty, format, and topic, making the field intimidating to enter and difficult to navigate. Moreover, data science is rapidly evolving, so there is a constant influx of new materials and concepts. We leverage data science techniques to build ERuDIte itself, using data extraction, data integration, machine learning, information retrieval, and natural language processing to automatically collect, integrate, describe, and organize existing online resources for learning data science.

摘要

数据科学是一个不断发展的领域,旨在实现对许多领域中日益庞大的数据集进行高效整合与分析。特别是遗传学、神经影像学、移动健康以及生物医学科学其他子领域中的大数据,既带来了新的见解,也带来了挑战。为应对这些挑战,美国国立卫生研究院发起了“大数据到知识”(BD2K)计划,其中包括一个培训协调中心(TCC),其任务是为生物医学研究人员开发个性化数据科学培训资源。BD2K TCC门户网站由教育资源发现索引ERuDIte提供支持,该索引收集数据科学培训资源,包括在线课程、教程视频和研究讲座、教科书以及其他基于网络的材料。虽然有这么多潜在的学习资源令人兴奋,但它们在质量、难度、格式和主题方面高度异质,使得该领域令人望而却步,难以进入且难以驾驭。此外,数据科学正在迅速发展,因此新材料和新概念不断涌入。我们利用数据科学技术来构建ERuDIte本身,使用数据提取、数据集成、机器学习、信息检索和自然语言处理来自动收集、整合、描述和组织现有的数据科学在线学习资源。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验