Kowald Axel, Schmeier Sebastian
Protagen AG, Dortmund, Germany.
Methods Mol Biol. 2011;696:305-18. doi: 10.1007/978-1-60761-987-1_19.
The yearly output of scientific papers is constantly rising and makes it often impossible for the individual researcher to keep up. Text mining of scientific publications is, therefore, an interesting method to automate knowledge and data retrieval from the literature. In this chapter, we discuss specific tasks required for text mining, including their problems and limitations. The second half of the chapter demonstrates the various aspects of text mining using a practical example. Publications are transformed into a vector space representation and then support vector machines are used to classify papers depending on their content of kinetic parameters, which are required for model building in systems biology.
科学论文的年产量持续增长,这常常使单个研究人员难以跟上。因此,科学出版物的文本挖掘是一种从文献中自动获取知识和数据的有趣方法。在本章中,我们将讨论文本挖掘所需的特定任务,包括其问题和局限性。本章后半部分通过一个实际例子展示了文本挖掘的各个方面。出版物被转换为向量空间表示形式,然后使用支持向量机根据系统生物学模型构建所需的动力学参数内容对论文进行分类。