College of Informatics, Department of Information Management, Yuan Ze University, Chung-Li, Taiwan, Republic of China.
BMC Med Inform Decis Mak. 2012 Jul 18;12:72. doi: 10.1186/1472-6947-12-72.
Online psychiatric texts are natural language texts expressing depressive problems, published by Internet users via community-based web services such as web forums, message boards and blogs. Understanding the cause-effect relations embedded in these psychiatric texts can provide insight into the authors' problems, thus increasing the effectiveness of online psychiatric services.
Previous studies have proposed the use of word pairs extracted from a set of sentence pairs to identify cause-effect relations between sentences. A word pair is made up of two words, with one coming from the cause text span and the other from the effect text span. Analysis of the relationship between these words can be used to capture individual word associations between cause and effect sentences. For instance, (broke up, life) and (boyfriend, meaningless) are two word pairs extracted from the sentence pair: "I broke up with my boyfriend. Life is now meaningless to me". The major limitation of word pairs is that individual words in sentences usually cannot reflect the exact meaning of the cause and effect events, and thus may produce semantically incomplete word pairs, as the previous examples show. Therefore, this study proposes the use of inter-sentential language patterns such as ≪broke up, boyfriend>, <life, meaningless≫ to detect causality between sentences. The inter-sentential language patterns can capture associations among multiple words within and between sentences, thus can provide more precise information than word pairs. To acquire inter-sentential language patterns, we develop a text mining framework by extending the classical association rule mining algorithm such that it can discover frequently co-occurring patterns across the sentence boundary.
Performance was evaluated on a corpus of texts collected from PsychPark (http://www.psychpark.org), a virtual psychiatric clinic maintained by a group of volunteer professionals from the Taiwan Association of Mental Health Informatics. Experimental results show that the use of inter-sentential language patterns outperformed the use of word pairs proposed in previous studies.
This study demonstrates the acquisition of inter-sentential language patterns for causality detection from online psychiatric texts. Such semantically more complete and precise features can improve causality detection performance.
在线精神病学文本是通过互联网用户在基于社区的网络服务(如网络论坛、留言板和博客)上发布的表达抑郁问题的自然语言文本。理解这些精神病学文本中嵌入的因果关系,可以深入了解作者的问题,从而提高在线精神病学服务的效果。
以前的研究提出了使用从一组句子对中提取的词对来识别句子之间的因果关系。一个词对由两个词组成,一个来自原因文本跨度,另一个来自效果文本跨度。对这些词之间的关系进行分析,可以用来捕捉因果句之间的个体词关联。例如,(分手,生活)和(男朋友,无意义)是从句子对:“我和男朋友分手了。现在对我来说生活毫无意义”中提取的两个词对。词对的主要局限性在于句子中的单个词通常无法反映因果事件的确切含义,因此可能会产生语义不完整的词对,如前例所示。因此,本研究提出使用跨句语言模式(如≪分手,男朋友≫,≪生活,无意义≫)来检测句子之间的因果关系。跨句语言模式可以捕捉句子内部和句子之间的多个词之间的关联,因此可以比词对提供更准确的信息。为了获取跨句语言模式,我们通过扩展经典关联规则挖掘算法开发了一个文本挖掘框架,以便可以在句子边界之外发现频繁共同出现的模式。
在由台湾心理健康信息学协会的一组志愿者专业人员维护的虚拟精神病诊所 PsychPark(http://www.psychpark.org)上收集的文本语料库上进行了性能评估。实验结果表明,跨句语言模式的使用优于以前研究中提出的词对使用。
本研究演示了从在线精神病学文本中获取因果关系检测的跨句语言模式。这种语义上更完整和更准确的特征可以提高因果关系检测的性能。