socia media

Discovering in-depth tourist behaviour and demand using social media data in Bonaire island

Tourism has not only brought an economic fortune to Bonaire island but also has a detrimental effect on its natural ecosystem. Studying tourist behaviour might be a good precaution step so that the stakeholders can manage better tourism in Bonaire island. This internship research tried to utilize machine learning on social media data to study tourist behaviour and tried to look at tourist demand in the future.

From 2003 to 2019, there are 13,706 geotagged Flickr data which was cleaned and converted into keywords in this internship research to study the tourist’s behaviour. The cleaned keywords then were weighted using TF- DF (Term Frequency-Inverse Document Frequency) and clustered based on keywords similarity with DBSCAN (Density-Based Spatial Clustering of Noise Applications). The most relevant and least relevant keywords in a cluster then determined the tourist activities/interest of that same cluster, but in respect to other keywords in all clusters. There are nine clusters which this internship research found make sense and useful for interpreting Bonaire tourist behaviour.

For tourism demand, this internship research has forecasted time-series of tourist arrival using both Flickr data and CBS (Centraal Bureau voor de Statistiek) data. Although the number was unrealistic, Flickr data could show which continent the tourist came at which seasons (Winter, Spring, Summer, Autumn) from 2015 to the end of 2021. At the same time, CBS data could not show which continent the tourist came, but could show which seasons the tourist come from 2012 until the end of 2021 with a realistic number of government official data.

The attempt that this research has done could provide an insight into the stakeholder of Bonaire island to manage tourism by studying tourist behaviour using free social media data and an automatic method of machine learning.

Date
2020
Data type
Other resources
Theme
Research and monitoring
Geographic location
Bonaire
Author
Image

Tracking digital footprints in Bonaire's landscapes - spatial distribution and characterisation of tourists on Bonaire using social media

Introduction and aims
With the introduction of smart phones that also take photos combined with GPS tracking applications, more tourists are able to take more geo-tagged photo’s during their travels. In combination with the options to upload these photos to online platforms, new ways of collecting data from the Internet provide new research opportunities. These digital footprints, combined with their specific meta-data regarding geo-location the data provide much information to be of use in spatial-temporal research.
A previous study on Bonaire showed the potential of this kind of research. It showed some understanding of the spatial movement of tourists and the number of tourists that visit different parts of Bonaire. In order to use these kinds of data in e.g. potential impact studies, we aimed to give a follow up.
In the present study the approach from Schep et al. (2016) was revisited with the following goals:

  • Update distribution maps with the latest data (2016-2020) and evaluate the reproducibility of the maps.
  • Detect whether distribution patterns and densities has changed following the recent developed trails and thus tourist spreading.
  • Study if densities at specific locations can be related to local characteristics, such as:
    • Spatial characteristics such as distance roads
    • Landscape characteristics (landscapes)
    • Tourist type (cruise versus stay-over)

Methods
For this study, FLICKR was the only online platform to collect photographs and their metadata. Others were no longer available or unsuitable. All photographs taken between November 2002 and October 2019, within a Bonaire surrounding bounding box were collected, including their meta-info. This resulted in 13026 photos, coming from 421 photographers.
Using a self-built Python application “PhotoCategoriser” each photo was assigned to a category (coastal, seascape, wildlife, underwater, terrestrial, other). Metadata of each photo and the assigned category allowed to analyse on origin of the photographer, to estimate the type of tourist (cruise or stay-over), their interest, and differences in spatial and temporal distribution. The specified resolution to aggregate the data was set at grid cells with a mean surface area of 0.301 km2.
Photographer intensity is determined by condensing photographs into Photo User Days (PUD). One PUD stands for one (or more) photographs taken on a given day by specific photographer for a category in a grid cell.
Results and conclusions
Overall results

  • The report provides various figures and maps presenting the spatial distribution of PUD as a proxy for tourist distribution. Temporal aspects in PUDs reflect the annual dynamics in tourist numbers.
  • Trends in tourist numbers are not equally reflected in the numbers of PUDs. PUDs are therefore a proxy of tourist distribution, but not a strong indicator for trends in absolute numbers and intensity.

Reproducibility

  • The additional ~ 4000 photos on top of the estimated ~ 10.000 FLICKR photos that were analyzed by Schep did not add much extra information. Also, the applied resolution did not refine the possibilities of performing risk assessments on habitats or species due to the limited number of data in those areas. Distribution patterns and intensity trends were similar. Category distribution however differed slightly. This can be explained by the differences in used datasets, and by the boundary criteria for assigning categories.

Detection of (changed) distribution and relation to local characteristics

  • The overall distribution of PUDs shows higher intensity along the west coast, near Kralendijk and its tourist area. In addition, some higher intensity spots are visible near Sorobon in the east, and Seru Largu in the middle of the island. The hotspots such as Goto and Washington Slagbaai in the north are clearly highlighted, as well as several scenic spots along the southern flats (Salt pans, Slavery houses, Lighthouse).
  • Less frequented regions mainly include landscapes on the eastern part of the island. The low numbers of PUDs in these regions did not allow additional analysis on changes in distribution of PUD between years. It was considered not to be of added value. Hence, the effectiveness of the recent established trails could not be assessed any further, and an additional preliminary risk assessment for habitats or species was left out of the study.
  • Distribution of PUDs reflect mainly the accessibility of regions: hence the roads and hotspots are clearly visible, and only limited PUDs were plotted further away. Analysis of distribution of tourists in specific habitats or nearby certain living areas of species were therefore considered not to be of added value.
  • The interests (reflected by the categories) of the photographers slightly vary over the years and within a year, both by origin and by tourist type (also reflected by the cruise season). Also, the distribution and intensity of tourist types and origins seems to slightly vary. Details are provided in the report.

Future application and methodological issues

  • We suggest that studies that use these data sources first look into the generic distribution and intensity of photo’s PUDs collected (data coverage) before taking the effort of categorizing. Based on the general overview, following analysis steps such as categorizing and environmental risk assessment could be added.
  • Manual assignment of categories to photos is a subjective exercise. Assigning categories requires strict criteria and midterm evaluation of results.
  • Online platforms are variable in their existence and terms of use, leading to an uncertain accessibility and application of these kinds of data in future studies.
Date
2020
Data type
Research report
Theme
Research and monitoring
Report number
c052/20
Geographic location
Bonaire
Image