Estimating Toponym Content of Social Media Data of the Northern Triangle



Phillips, Molly

Journal Title

Journal ISSN

Volume Title



Twitter is a microblogging social media platform where users post tweets. Despite the 280-character limit, Twitter data can be harvested and analyzed in order to gain valuable information. While geolocated tweets give in-depth location information, they comprise only a small percentage of tweets. This thesis uses a Twitter dataset collected on Northern Triangle based cartel keywords, and a bounding box of the world. The Northern Triangle, known for its reputation of drug and gang violence, is the area of Central America consisting of Guatemala, El Salvador, and Honduras. Violence stemming from this region has been known to migrate north through Mexico and into the United States. This thesis aims to examine the presence of toponyms on Twitter, their resolution, the types of user accounts who tweet toponyms, and how toponym usage changes over time. In order to examine these toponym-related issues, and using the Northern Triangle region as a case study, 15.3 million tweets related to the Northern Triangle were collected over a period of one year were processed and analyzed. The data processing included two primary steps, namely data enrichment using a Named Entity Recognition (NER) tool, and data analysis in which the enriched data was examined to explore key trends in toponym prevalence across space, time, and user characteristics. Results show that roughly 1 in 4 tweets contains a toponym, a country was 10 times more likely to be mentioned than a city, and the most prolific users were individuals. This work is a novel application of geolocation to a new social media dataset.


This thesis has been embargoed for 2 years and will not be available until November 2021 at the earliest.


Social media, Wikipedia, Toponym, Northern Triangle