Gladki Arek, Siedlecki Pawel, Kaczanowski Szymon, Zielenkiewicz Piotr
Bioinformatics Department, Institute of Biochemistry and Biophysics, Polish Academy of Sciences, ul. Pawinskiego 5a, 02-106, Warszawa, Poland.
Bioinformatics. 2008 Apr 15;24(8):1115-7. doi: 10.1093/bioinformatics/btn086. Epub 2008 Mar 5.
Using literature databases one can find not only known and true relations between processes but also less studied, non-obvious associations. The main problem with discovering such type of relevant biological information is 'selection'. The ability to distinguish between a true correlation (e.g. between different types of biological processes) and random chance that this correlation is statistically significant is crucial for any bio-medical research, literature mining being no exception. This problem is especially visible when searching for information which has not been studied and described in many publications. Therefore, a novel bio-linguistic statistical method is required, capable of 'selecting' true correlations, even when they are low-frequency associations. In this article, we present such statistical approach based on Z-score and implemented in a web-based application 'e-LiSe'.
The software is available at http://miron.ibb.waw.pl/elise/
利用文献数据库,人们不仅可以找到过程之间已知的真实关系,还能发现研究较少、不明显的关联。发现这类相关生物信息的主要问题是“筛选”。区分真实相关性(例如不同类型生物过程之间的相关性)与该相关性具有统计学意义的随机概率的能力,对于任何生物医学研究都至关重要,文献挖掘也不例外。在搜索许多出版物中尚未研究和描述的信息时,这个问题尤为明显。因此,需要一种新颖的生物语言统计方法,即使在真实相关性为低频关联时也能够“筛选”出它们。在本文中,我们介绍了一种基于Z分数的统计方法,并在基于网络的应用程序“e-LiSe”中实现了该方法。