Stats FM

Proposal

library(tidyverse)

#read in CSVs

#radio_station <- tidytuesdayR::tt_load('2022-11-08')
#station_info <- read_csv(file = 'data/station_info.csv')
state_stations <- read_csv(file = 'data/state_stations.csv')

Dataset

The data comes from a Wikipedia list of lists titled List of Radio Stations in the United States. The list contains 50 links to radio stations categorized by state. Each of the 50 links lead to a table of FCC-licensed radio stations in the state. Here is a example of Alabama’s. The Wikipedia tables have 5 columns: Call Sign, Frequency, City of Licensee, Licensee and Format. Here is a description of the column names:

Call sign: a unique identifier for a transmitter station
Frequency: number of cycles a radio wave is transmitted per second
City of License: the city where the station is officially licensed to serve
Licensee: holder of the license
Format: general content broadcast by the station

The data collected from all 50 states were then aggregated to give the final data this project is using.

The raw data, before wrangling, can be found in FCC’s predicted service contour data points which is released regularly approximately at 10:00 AM Eastern Time daily. The details of how to read the data and its formats are given on the website. In addition, FCC’s FM Query Broadcast Station Search can be used to look up a list of licensed stations.

The data that is loaded from tidytuesday 2022-11-08 contains two datasets - state_stations and station_info. State_stations contains 17186 observations of the above data with 6 columns - the above 5 along with a state column that has states as value. Station_info contains 256 rows of station information data with 6 columns - call_sign, facility_id, service, licensee, status, and details. Here are the details for the columns:

call_sign: a unique identifier for a transmitter station
facility_id: facility id, 2,065 unique ids
service: frequency (FM)
licensee: holder of the license
status: status of the station (licensed, license canceled, licensed and silent)
details: link to further details (non-functional) (Click here for details)

Motivation

We selected the radio stations dataset because we thought it would be interesting to examine the relationships between radio stations and their region. We were especially interested in examining how the location of a radio station may be related to the category of its broadcast (called “format” in this dataset). For example, one thing we were curious about was which regions religious-leaning radio stations are the most concentrated or what areas of the country would have dense areas of certain radio station genres. On its own, this dataset doesn’t have many variables to work with. However, we plan to bring in external data with population statistics and demographic information for regions so we can conduct more analysis on how these factors may interact the characteristics of a radio station.

Questions

Question 1

In 2021, Statista’s Global Consumer Survey identified the top radio genres that Americans enjoy most. We are interested in how the distribution and density of radio stations varies across the country for each of the top genres identified in the Consumer Survey, “Rock/alternative/indie music”, “Pop/adult contemporary music”, “Country music”, “News/talk”, and “Urban music (hip hop, R&B etc.)”.

Question 2

Another question we are interested in answering is, “What is the relationship between population size in a particular region on the number of radio stations in that region?” Given the variation of population density across the US, we would like to identify any differences in this relationship based on region (West, Midwest, South, Northeast). We expect to observe that the densely populated regions (such as the Northeast) will have more radio stations than the less populated regions (such as the West and Midwest).

Analysis plan

Plan for Question 1

Because the genre/format of our dataset has over 1000 unique variables, we will be creating new categories that match that of the Consumer Survey’s top genres by adapting code that creates a new categorical column based off string patterns in the format column. The top 5 genres found from the survey are “Rock/alternative/indie music”, “Pop/adult contemporary music”, “Country music”, “News/talk”, and “Urban music (hip hop, R&B etc.)”. Using the method above, we plan to filter for and group radio stations by whether or not the key words from each top genre occur in the radio station format. Each genre will be reviewed to ensure that the correct radio stations are sorted and that any radio stations that are not sorted into one of these genres will be reviewed to ensure that none are missed. We will not be merging in any external data, but will join the state_station.csv and station_info.csv with the data in fm_service_contour_current.zip using the script from the Tidy Tuesday repo. The variables we plan to use are site_lat (Site of station latitude), site_long (Site of station longitude), application_id (Id of station) and the new genre column we will create. With this final dataset, we will create a choropleth map that shows the density of radio stations in each state, faceted by our new genre column. Additionally, we would like to make another map visualization of just North Carolina. This map will plot major cities in NC and radio stations colored by genre. We anticipate needing to merge external data in the form of the latitude/longitude of major cities in North Carolina.

Plan for Question 2

To find a relationship between population of a state and the number of radio stations we will plot a scatter plot where each data point is a different state. The x-axis will represent the state’s population and the y-axis will represent the number of radio stations in the state. To accomplish this, we will need to import another data set that contains the population of each state in 2016. Of course, we will need to group all the rows together based on state and sum up the number of rows based on that value so that there will only be 50 data points. As stated before, we would also like to check for any differences in this relationship based on regions in the US (Regions are defined here), which will be separated by color. We think another interesting plot could be a layered histogram comparing the distributions of the population by state/city and the number of radio stations, faceted by region. Therefore, we could see if there is an overall similar shape or pattern to these two distributions that would provide interesting insight.