Date Published: July 13, 2015
Publisher: Public Library of Science
Author(s): Su Yeon Han, Ming-Hsiang Tsou, Keith C. Clarke, Alejandro Raul Hernandez Montoya.
Dynamic social media content, such as Twitter messages, can be used to examine individuals’ beliefs and perceptions. By analyzing Twitter messages, this study examines how Twitter users exchanged and recognized toponyms (city names) for different cities in the United States. The frequency and variety of city names found in their online conversations were used to identify the unique spatiotemporal patterns of “geographical awareness” for Twitter users. A new analytic method, Knowledge Discovery in Cyberspace for Geographical Awareness (KDCGA), is introduced to help identify the dynamic spatiotemporal patterns of geographic awareness among social media conversations. Twitter data were collected across 50 U.S. cities. Thousands of city names around the world were extracted from a large volume of Twitter messages (over 5 million tweets) by using the Twitter Application Programming Interface (APIs) and Python language computer programs. The percentages of distant city names (cities located in distant states or other countries far away from the locations of Twitter users) were used to estimate the level of global geographical awareness for Twitter users in each U.S. city. A Global awareness index (GAI) was developed to quantify the level of geographical awareness of Twitter users from within the same city. Our findings are that: (1) the level of geographical awareness varies depending on when and where Twitter messages are posted, yet Twitter users from big cities are more aware of the names of international cities or distant US cities than users from mid-size cities; (2) Twitter users have an increased awareness of other city names far away from their home city during holiday seasons; and (3) Twitter users are more aware of nearby city names than distant city names, and more aware of big city names rather than small city names.
Internet-based human communication and messaging has rapidly expanded due to social networking sites (SNS) where people continue to share their ideas, images, news, memes and advertisements at a prodigious and increasing rate. SNS (such as Twitter, Foursquare, and Flickr) dynamically produce huge quantities of instant messages that often reveal the locations of their users and the locations where the messages were created. The locations of users can be found in their user profiles, if they disclose this information to the public. In some cases, global positioning systems (GPS) in mobile phones are used to identify the latitude and longitude where the messages were created, known as geotags. Social media data with geolocation tags or location profiles are being increasingly used in geographical research projects, and in the emerging business of location based services [1–7]. However, few studies have utilized the actual geographical names (i.e., toponyms or place names) mentioned in social media content to estimate the geographical awareness of particular groups and individuals from the same city.
Social media applications such as Twitter and Facebook have recently gained a spectacular popularity. According to Twitter, over 500 million active Twitter users produced about 350 million messages per day in 2013. To exploit these massive social media data bases, extensive studies have been conducted to develop human knowledge, including analyzing human communication and networks [15–17], sentiment analysis [18–22], homeland security [23–25], predictions of election results , predictions of the stock market , crisis management [5, 27], and tracking infectious diseases [6, 7, 28]. These research projects have shown the tremendous value of social media analytics. Rapid growth of social networking sites has triggered not only extensive research in academia, but also the development of applications making use of social media data in the private sectors and commercial applications, such as in mobile phone apps and location-aware devices.
Twitter is an Internet-based social networking and microblogging service that permits registered users to receive and broadcast very short text messages (tweets), up to 140 characters in length. Users and collective sites are identified by user names and hashtag identifiers such as “#katyperry”. Messages can be resent or forwarded (retweeted), and groups of associates can associate together (as friends) in sending and receiving group tweets. References to Internet URLs are made by compression, for example into tinyurls.
Focusing on big data analysis of social media messages, this study adopted the research framework of Knowledge Discovery in Cyberspace (KDC) . KDC is particularly designed for knowledge discovery in social media and big data, focusing on the interdependent relationship among place, time, and content. A new aspect of KDC is to discover the dynamic spatiotemporal patterns of massive amounts of social media messages by using highly scalable information mining algorithms, geographic information systems (GIS), visualization tools, and spatial statistical methods.
This research analyzed Twitter messages to identify spatiotemporal differences in the level of geographical awareness of Twitter users living in various regions across the U.S. KDCGA, consisting of four steps (querying tweets, filtering, geocoding and spatial analysis) led to the discovery of the following new facts. First, Twitter users living in heavily populated areas have more geographical awareness of cities that are far away than those living in less populated areas. This is an indication of a hierarchical effect in global awareness, the so-called global or world cities have far longer reach than other cities. Second, twitter users are more aware of more distant cities during the holiday season in late December than the remainder of the year. This maybe because of travel planning, gift giving or just staying in touch with distant family and friends. It is possible that other holiday events such as the Muslim Haj, the Chinese lunar new year celebration and the U.S. Spring break show similar patterns. Third, Twitter users are much more aware of nearby cities than distant cities, as predicted by Tobler’s first law of geography. This leads us to conclude that the first law of geography applies equally to both fixed geographic space and the cyberspace of internet-based social media. However, the distance decay of Tobler’s law appears to apply only at one level of the urban hierarchy, such that the largest distance major cities can appear to be virtually closer than smaller nearby cities. Thus to a New Yorker, London England may seem closer than Albany, the state capital just 250km away.
We acknowledge that publicly available tweets are only a sample of all tweets, and that only a few of them carry a geotag that allows verification that the user is at their stated location, or one of the places mentioned in the text. We further recognize that culling all tweet geography from what is usually transient and fickle information exchanges may reveal more about human intent and interest that actual travel or communication desires. Nevertheless, there is abundant opportunity to learn more about geography from the large numbers of exchanges now enabled by location-aware social media.