Yoon Hong-Jun, Tourassi Georgia
Oak Ridge National Laboratory, Oak Ridge, TN 37831 USA (phone: 865-241-2626; fax: 865-574-6275;
Oak Ridge National Laboratory, Oak Ridge, TN 37831 USA (phone: 865-576-4829; fax: 865-574-6275;
IEEE EMBS Int Conf Biomed Health Inform. 2016 Feb;2016:557-560. doi: 10.1109/BHI.2016.7455958. Epub 2016 Apr 21.
Openly available online sources can be very valuable for executing in silico case-control epidemiological studies. Adjustment of confounding factors to isolate the association between an observing factor and disease is essential for such studies. However, such information is not always readily available online. This paper suggests natural language processing methods for extracting socio-demographic information from content openly available online. Feasibility of the suggested method is demonstrated by performing a case-control study focusing on the association between age, gender, and income level and lung cancer risk. The study shows stronger association between older age and lower socioeconomic status and higher lung cancer risk, which is consistent with the findings reported in traditional cancer epidemiology studies.
公开可用的在线资源对于开展计算机模拟病例对照流行病学研究可能非常有价值。在这类研究中,调整混杂因素以分离观察因素与疾病之间的关联至关重要。然而,此类信息并非总是能在网上轻易获取。本文提出了用于从公开可用的在线内容中提取社会人口统计学信息的自然语言处理方法。通过开展一项聚焦于年龄、性别和收入水平与肺癌风险之间关联的病例对照研究,证明了所建议方法的可行性。该研究表明,较高年龄和较低社会经济地位与较高肺癌风险之间存在更强的关联,这与传统癌症流行病学研究报告的结果一致。