Collegiate Sports Budget Analysis

Proposal

library(tidyverse)

Dataset

sports <- readr::read_csv('data/data.csv')

This data on collegiate sports budgets comes from Equity in Athletics Data Analysis. It has 132327 rows and 28 columns, including numeric and categorical variables. The dataset has information on most colleges in the US, regardless of sports division, and consists of metrics such as revenue and expenditure for each sport at the school from 2015-2019. We chose the dataset because it has a large number of variables that will be interesting to analyze, and because, as college sports fanatics, we think the topic of collegiate sports budgets is interesting, relevant to us, and worthy of exploration. The variables we are primarily concerned with are exp_men, the school expenditure on men’s sports, rev_men, revenue generated from men’s sports for the school, exp_women, the school expenditure on women’s sports, rev_woman, revenue generated from women’s sports for the school, total_exp_menwomen, the total expenditure on men and women’s teams for a certain sport, year, year of athletics, and sector_name, school type like “Private 4 year” for example.

Questions

  1. How did the relationship between expenditure and revenue of collegiate sports change over time for different genders?

  2. How does the total expenditure for both men and women differ by each year and how does this relationship differ by public or private sectors for 4 year or above colleges?

Analysis

Question 1

We want to investigate how the relationship between college expenditure on collegiate sports and revenue generated from collegiate sports changed over the years for men and women. We plan to implement two time series plots to investigate this relationship. For our first plot, we plan to plot year on the x-axis, and then plot men’s and women’s mean expenditure and revenue on the y-axis. We will use color to distinguish between men’s and women’s sports, and shape of the points to distinguish between expenditure and revenue. We will then add lines between the expenditure and revenue points for each gender in each year to show how those gaps are different (men’s sports on average earn more than they spend, while the opposite is true for women’s sports.) For our second plot, we first plan on observing the relationship between expenditure and revenue by gender. We will plot data from all schools, with expenditure on the the x-axis and revenue on y-axis. Then, we plan to pivot longer our data frame and create a new gender variable that we will use to color our scatter plot by. This way we can analyze the differences in the relationship between expenditure and revenue by gender.

Question 2

For our second question, we plan to investigate how total aggregate expenditure for both men and women combined differs for each year from 2015 to 2019. We will use a box plot with the year variable plotted on the x-axis as a factor and the total_exp_menwomen variable plotted on the y-axis as a numeric. We will also look at a line graph where year is on the x axis and expenditure is on the y axis to observe how the expenditure has changed over the years for certain sports. Because our data from TidyTuesday is too large with over 130,000 observations, we choose only Division 1-FBS schools to assess college athletics on a similar playing field. Division 1-FBS schools are typically bigger and have more money to spend, hence it would not be fair to compare The University of Alabama to a community college with regard to athletic budget. After filtering the data, we are left with 10,052 observations. We are interested in investigating whether total expenditure by year differs also by the type of 4 year or above sectors: private and public. So, we will filter for only sectors that are “Public, 4-year or above” and “Private nonprofit, 4-year or above”. Then, we plan to facet wrap by sector as an additional layer to our boxplot so we can see the differences in the distribution of total expenditure over the years by the two different sectors.

sports = sports |>
  filter(sector_name == "Private nonprofit, 4-year or above" | 
           sector_name == "Public, 4-year or above")

The code above is an example of how we plan to filter our data for only those observations with sector name as “Private nonprofit, 4-year or above” or “Private nonprofit, 4-year or above”.