Project title

Proposal

Dataset

library(tidyverse)

ratings <- read_csv('data/ratings.csv')
details <- read_csv('data/details.csv')
comprehensive <- full_join(details, ratings, by = "id") |>
  filter(owned > 500) |>
  select(-num.x, -num.y, -year, -name)

We chose the Tidy Tuesday dataset, Board Games, published on January 25th, 2022. Originally collected from Kaggle, it combines board game descriptions and reviews from Board Game Geek. Founded in 2000, Board Game Geek serves as an online community for board gaming hobbyists. They house a game database and have gained popularity and viewership from their annual Golden Geek Award, which medals the best new board game of the year. 

This data set stood out to us because we all share fond memories from playing board games but have never stopped to think about the industry as a whole and trends within it. The extensive collection of data was also alluring with its recent expansion of 4 million additional reviews in January of 2022. 

With 28 distinct characteristics and 8202 board games, the data holds answers to a multitude of questions, but specifically, we are interested in using data visualizations to explore the relationships between variables such as number of players, rating, number of expansions, and game ownership.

Questions

Question 1: Do games with larger differentials in player counts (e.g. can support a wider range of players) tend to be more highly rated?

Question 2: How have the number of expansions for board games changed over time, and have they had an impact on game ownership count?

Analysis plan

For Question 1, we would like to explore whether the difference between the min players and max players of a board game are correlated with the ratings of that game. Could having more flexibility in the number of players that can play a game be associated with higher ratings? In order to determine the relationship between the board game ratings and player differentials, we will first need to create a new variable differential which is calculated by subtracting the min_players from the max_players of a board game within a mutation function. The differential will then be plotted against the board game ratings as a scatter plot with smoothed trend lines with ratings on the y-axis and differential on the x-axis. We will then determine whether or not there is any relationship between the two values. Then, we will facet by another new variable we’ll create that shows whether the board game was published before 1995 or after 1995 - we’ll utilize colors to make the distinction between year clear. This will allow us to compare the relationship between flexibility of the number of players and rating between generations.

For Question 2, we will create a numerical variable num_expansions. This takes the number of unique entries in the boardgameexpansion variable and returns a count. We’ll exclude games from before 2001, since it will bias our data as BoardGameGeek was only launched in late 2000. After grouping by yearpublished, we will mutate a variable to see how many expansions were released per game published (num_expansions_per_game). We’ll then visualize the relationship between yearpublished and num_expansions to represent the prevalence of this trend over time, using a line graph. We’ll also include the total number of games published per year as a second line graph overlaid on the first. This will provide additional context, since knowing how popular the industry is becoming as a whole is useful for understanding the proliferation of game expansions. We will use different colors and alpha values so the two lines are easily interpreted, then add separate y-axis labels on the left and right side (since one side will track expansions per game and the right side will track total games published), and add annotations for additional visual clarity.

The second part of our question can be answered by creating a density plot. This time, we will group games by the num_expansions variable to see how their sales (represented by total ownership, or the owned variable) change. This will be a density plot with the year as the explanatory variable, different linetypes for the num_expansions, and density as a reflection of total sales.