From Tweets to Wellness: Wellness Event Detection from Twitter Streams

 

Social media platforms have become the most popular way for users to share what is happening around them. The abundance and growing usage of social media has resulted in a large repository of users’ daily happenings and activities, which provides a stethoscope for inferring individuals’ lifestyle and wellness. As users’ social accounts implicitly reflect their habits, preferences, and feelings, it is feasible for us to monitor and understand the wellness of users by harvesting social media data towards a healthier lifestyle. As a first step towards ccomplishing this goal, we aim to automatically extract personal wellness events (PWEs) from users’ published social contents.

Extraction of personal wellness events will provide significant insights about individual’s wellness and community lifestyle behaviours. At the individual level, it summarizes wellness information of individuals facilitating lifestyle management, user health profiling, targeted online advertisement and son on. At the community level, accumulating the wellness information of a large set of individuals makes it feasible to analyze and understand the lifestyle patterns and wellness of social groups in a scale that was impossible with traditional methods in terms of both time and cost.

 

Task

Given a set of microbbloging posts from Twitter acccount of a given user, we want to automatically extract and categorize tweet messages pertaining to wellness-events. We are interested in a taxonomy of 14 distinct wellness events with three high-level wellness categories, namely, diet, exercise & activities (exercise for brevity), and health as shown in Table 1.

 

Taxonomy of Wellness Events with examples
Event Sub Event Example
Diet
Meals Dinner just salad
Alcoholic Beverages Too much drink in party
Non-alcoholic Beverages Talking about hot chocolates, I might just go and
make myself one :D
Snacks found Taylor’s pretzels in my backpack and I’m so
happy wow
Fruit almost eat all the strawberries
Others Eat 20g carbs and go fo running
Exercise
Walking 20 mins walk around office...
Running after 1 hour run #bgnow 130
Biking I just finished 1 hour biking
Swimming BGnow 95, thanks swimming pool
Others Shopping and having a little dinner URL
Health
Examinations #BGnow 100
Symptoms Feel too much Fatigue
Treatment Insulin injection

 

Dataset

This data contains labelled tweets pertaining to above 14 categories and a null class pertaing to non-health categories. To construct the dataset, we first crawled a set of users who used #BGnow hashtag in their tweets. This hashtag is very popular among diabetic patients to post information about diabetes and their health states. In this way, we gathered 2, 500 different diabetes users. We removed accounts which had high daily traffic to avoid spammers. This filtering process resulted in 1, 987 diabetic users. We then crawled all historical tweets of these users using Twitter API, resulting in a set of about 3 million tweets (2,997,897 tweets). To extract candidate tweets we applied (Thelen and Riloff 2002) bootstrapping approach which resulted in 11, 217 tweets. As extraction for each category was executed separately, one tweet can be condidate for several categories.

Dowload here: Dataset and Source code in python.
Note: Due to Twitter developer & agreement policy, we can only share tweet and user ids but not tweet contents.

 

Publication

Mohammad Akbari, Xia Hu, Nie Liqiang, Tat-Seng Chua, From Tweets toWellness:Wellness Event Detection from Twitter Streams, AAAI 2016.

 

Contact

Mohammad Akbari: akbari [AT] u.[National University of Singapore: i.e. nus] [DOT] edu