Date Published: July 19, 2017
Publisher: Public Library of Science
Author(s): Oliver Gruebner, Sarah R. Lowe, Martin Sykora, Ketan Shankardass, S. V. Subramanian, Sandro Galea, Donald R. Olson.
Disasters have substantial consequences for population mental health. Social media data present an opportunity for mental health surveillance after disasters to help identify areas of mental health needs. We aimed to 1) identify specific basic emotions from Twitter for the greater New York City area during Hurricane Sandy, which made landfall on October 29, 2012, and to 2) detect and map spatial temporal clusters representing excess risk of these emotions.
We applied an advanced sentiment analysis on 344,957 Twitter tweets in the study area over eleven days, from October 22 to November 1, 2012, to extract basic emotions, a space-time scan statistic (SaTScan) and a geographic information system (QGIS) to detect and map excess risk of these emotions.
Sadness and disgust were among the most prominent emotions identified. Furthermore, we noted 24 spatial clusters of excess risk of basic emotions over time: Four for anger, one for confusion, three for disgust, five for fear, five for sadness, and six for surprise. Of these, anger, confusion, disgust and fear clusters appeared pre disaster, a cluster of surprise was found peri disaster, and a cluster of sadness emerged post disaster.
We proposed a novel syndromic surveillance approach for mental health based on social media data that may support conventional approaches by providing useful additional information in the context of disaster. We showed that excess risk of multiple basic emotions could be mapped in space and time as a step towards anticipating acute stress in the population and identifying community mental health need rapidly and efficiently in the aftermath of disaster. More studies are needed to better control for bias, identify associations with reliable and valid instruments measuring mental health, and to explore computational methods for continued model-fitting, causal relationships, and ongoing evaluation. Our study may be a starting point also for more fully elaborated models that can either prospectively detect mental health risk using real-time social media data or detect excess risk of emotional reactions in areas that lack efficient infrastructure during and after disasters. As such, social media data may be used for mental health surveillance after large scale disasters to help identify areas of mental health needs and to guide us in our knowledge where we may most effectively intervene to reduce the mental health consequences of disasters.
Human-made and natural disasters happen with regularity around the globe and are increasing with the growing influence of environmental climate change [1,2]. Together with the current upsurge of violent conflicts, war, and terrorism, these life-threatening events put an increasing number of communities at risk for experiencing the mental health consequences of disasters, with post traumatic stress disorder (PTSD) and depression being the most commonly reported problems [3–5]. Exposure factors for post disaster mental health include female gender, low socioeconomic status (SES), minority status, lack of social support, and pre disaster mental health problems, along with traumatic experiences (actual or threatened death, serious injury, or sexual violation) and stressors (e.g., lack of food, water, medical care, or displacement) [3,4,6–11].
Data included geo-located, English language tweets from Twitter within a rectangular grid around NYC within eleven days around the time when Hurricane Sandy made landfall (October 29, 2012), i.e., between October 22 and November 1, 2012. Data was obtained from the Harvard Center for Geographical Analysis Geo-tweet archive (CGA) and combined with additional Twitter data from GeoFeedia (https://geofeedia.com) to fill some missing dates of the CGA data. Although we aimed to compile data covering a 14-day cycle that would have captured the typical timeframe in many standard psychiatric questionnaires, our combined data using these sources restricted us to only 11 days. The combined dataset included 423,931 tweets, of which 344,957 tweets (81%) were deemed suitable for our analysis, i.e., were in English as identified by an ensemble vote of two language identification models [50,51].
Specific emotions were extracted from the Twitter activity of users in the greater NYC area during Hurricane Sandy. We found 1,952 tweets (0.6%) classified as anger, 603 (0.2%) as confusion, 2,627 (0.8%) as disgust, 1,715 (0.5%) as fear, 5,457 (1.6%) as sadness, 350 (0.1%) as shame, and 2,192 (0.6%) as surprise. From Fig 1, we noted that sadness was the most pronounced emotion during the entire time frame and was particularly elevated one day after the hurricane made land fall in the greater NYC area. Percentages of fear and surprise were temporarily elevated in the Twitter activity of users on the day of disaster. We further noted that anger and disgust were slightly elevated in the three days after the disaster. Fig 2 describes the spatial distribution of absolute numbers of tweets and percentages of single emotions (displayed in quintiles) at the census tract level across the greater NYC area for the time frame of 11 days, i.e. from October 22 to November 1, 2012. We noted that most tweets were sent from locations that are highly populated (e.g., lower Manhattan), transit places (e.g., airports), or places of recreation (e.g., central park). Anger, fear, sadness, and surprise were elevated (highest quintile) in areas with a waterfront or in exposed areas (e.g., Sandy Hook, Englewood, Rikers Island, Cony Island) but to some extent also in scattered areas in the hinterland of New Jersey.
We demonstrated a novel space-time syndromic surveillance approach for disaster mental health by extracting multiple emotions from Twitter for the greater NYC area over eleven days, October 22 to November 1, 2012, and detecting clusters that represented excess risk of these emotions in space and time as a way anticipating acute stress in the population.