Xia Jingbo, Zhang Xing, Yuan Daojun, Chen Lingling, Webster Jonathan, Fang Alex Chengyu
College of Science, Huazhong Agricultural University, Wuhan 430070, Hubei, China ; Department of Chinese, Translation and Linguistics, City University of Hong Kong, Kowloon, Hong Kong.
The Halliday Centre for Intelligent Applications of Language Studies, City University of Hong Kong, Kowloon, Hong Kong.
Biomed Res Int. 2013;2013:853043. doi: 10.1155/2013/853043. Epub 2013 Nov 25.
To effectively assess the possibility of the unknown rice protein resistant to Xanthomonas oryzae pv. oryzae, a hybrid strategy is proposed to enhance gene prioritization by combining text mining technologies with a sequence-based approach. The text mining technique of term frequency inverse document frequency is used to measure the importance of distinguished terms which reflect biomedical activity in rice before candidate genes are screened and vital terms are produced. Afterwards, a built-in classifier under the chaos games representation algorithm is used to sieve the best possible candidate gene. Our experiment results show that the combination of these two methods achieves enhanced gene prioritization.
为有效评估未知水稻蛋白对水稻白叶枯病菌产生抗性的可能性,提出了一种混合策略,通过将文本挖掘技术与基于序列的方法相结合来提高基因优先级。在筛选候选基因并生成关键术语之前,使用词频逆文档频率的文本挖掘技术来衡量反映水稻生物医学活性的不同术语的重要性。之后,使用混沌博弈表示算法下的内置分类器筛选出最有可能的候选基因。我们的实验结果表明,这两种方法的结合实现了增强的基因优先级。