Iran University of Science and Technology, Tehran, Iran.
Tehran University of medical sciences, Tehran, Iran.
BMC Res Notes. 2023 Aug 22;16(1):179. doi: 10.1186/s13104-023-06458-0.
Social media text mining has been widely used to extract information about the experiences and needs of patients regarding various diseases, especially cancer. Understanding these issues is necessary for further management in primary care. Researchers have identified that lifestyle factors such as diet, exercise, alcohol, and Smoking are associated with cancer risks, particularly women's cancer. Considering the growing trend in the global burden of women's cancer, it is essential to monitor up-to-date data sources using text mining.
We have prepared six independent datasets regarding lifestyle components and women's cancer: (1) a dataset of nutrition containing 10,161 tweets; (2) a dataset of exercise containing 9412 tweets; (3) a dataset of alcohol containing 2132 tweets; (4) a dataset of Smoking containing 4316 tweets; and (5) a dataset of lifestyle (term) containing 1861 tweets. We also construct an additional dataset: (6) a dataset by summing other components containing 27,882 tweets. These data are provided to discover people's perspectives, knowledge, and experiences regarding lifestyle and women's cancer. Hence, it should be valuable for healthcare providers to develop more efficient patient management approaches.
社交媒体文本挖掘已被广泛用于提取有关患者在各种疾病(尤其是癌症)方面的经历和需求的信息。了解这些问题对于初级保健中的进一步管理是必要的。研究人员已经发现,生活方式因素如饮食、运动、饮酒和吸烟与癌症风险有关,特别是女性癌症。考虑到全球女性癌症负担不断增加的趋势,使用文本挖掘监测最新的数据源至关重要。
我们准备了六个关于生活方式成分和女性癌症的独立数据集:(1)包含 10161 条推文的营养数据集;(2)包含 9412 条推文的运动数据集;(3)包含 2132 条推文的酒精数据集;(4)包含 4316 条推文的吸烟数据集;(5)包含 1861 条推文的生活方式(术语)数据集。我们还构建了一个额外的数据集:(6)通过汇总其他成分包含 27882 条推文的数据集。这些数据用于发现人们对生活方式和女性癌症的观点、知识和经验。因此,这对于医疗保健提供者制定更有效的患者管理方法应该是有价值的。