Exploring Pell Grant Data From 1999-2017
Proposal
Dataset
Data Description:
This data set pell
contains information on the distribution of Pell Grant awards to college students in the United States from the years of 1999-2017 and was featured in tidytuesday in 2022. The data set is a compilation of yearly data sets collected by the US Department of Education. It is important to note that the original yearly data was reported in different ways across the years but was later compiled and cleaned to the pell
data set used in this project.
The pell
data set contains 100474 cases of pell grant grants and only 4 missing values.
It contains 6 columns titled STATE, AWARD, RECIPIENT, NAME, SESSION, YEAR . Essentially, the data set explores the award amount in each school for each respective year.
The variables NAME, STATE, and SESSION are all factors. Specifically, it is important to note that the variable STATE includes US states and territories (su) . Finally, the difference between SESSION and YEAR is that SESSION indicates the academic school year (typically in between two years).
The reason we chose this data set:
A pell grant is a grant from the United States government covering the cost of an undergraduate or graduate-level education. Recipients of pell grants typically display strong financial need. As college students, we and our families are currently making financial decisions pertaining to financing our education. Pell grants are of interest to us, as we understand the impact receiving a pell grant can have on one’s financial situation and educational opportunity. This dataset will allow us to explore various trends of receiving pell grants over time and across locations.
Questions
Question 1:
How does the average Pell grant award change over time based on the region?
Question 2:
Which states have seen the biggest change in the percentage of awarded Pell grants to total students enrolled in university over time?
Analysis plan
Plan To Solve Question 1:
The raw data set contains six columns of information: the state in which the school’s located, the total Pell grant award the school gave, the number of students at the school who received a Pell grant, the name of the school, the session year, and the year. In order to solve the question of how the average Pell grant award changes over time based on the region, we first need two additional variables: the average pell grant for school and region. The ‘average pell grant’ variable will be calculated by dividing the total Pell grant award by the number of students who received the award. The ‘region’ variable will be mutated by returning the region based on the school’s state.
We plan to find a data set containing two columns, the name of each of the 50 states and the region in the United States for each other the 50 states. We will join that dataset with our pell grant data set by state. A line graph will most likely be implemented to answer this question, since we’re seeking to check a numerical variable’s progression over time. In the end, we hope to have a region-by-region understanding of Pell grant awards and their changes over time. Then we can differentiate the line graphs by region through faceting for each region. Another option would be implementing one individual line graph that has all the lines for each region, so that we can see more clearly how each region compares to one another. The x-variable would be each year provided in the data.
Additionally, we plan to make a second, more visually appealing spatial graph to evaluate change over time. We plan to group the colleges by individual states and different time periods, finding their average pell grant. The time periods will be broken up by presidential terms, hoping to find some interesting correlations and trends concerning real-world events that have taken place. The visualization will display a map of the United States, and each individual state will be a particular shade of a certain color. The darker shade on the graph will represent a higher average pell grant. We hope to facet by the time period (election term) for comparison purposes. Solving this will require extracted information on the location coordinates for each state, for the purposes of making the map visualization. Possibly implementing a Google maps API or the ‘ggmap’ library package will assist us in creating this graph.
In the end, we hope to have a region-by-region understanding of pell grant awards and their changes over time.
Plan To Solve Question 2:
From the raw data set, we will use the state in which the school is located, the total amount of Pell grant recipients for each year, and the year. We will summarize the data to just get the amount of Pell recipients by state by year. From there, we will incorporate enrollment data for the total amount of students enrolled in university by state by year. With that data, we will be able to calculate the proportion of students who are enrolled in university receiving a Pell Grant by dividing amount of recipients by state by year by total undergrads enrolled by state by year.
For our first visualization, we plan to use a line graph to visualize the proportion of students receiving a pell grant over time in each state. We will do this by having a line for each of the 50 states and having a very low alpha value for most of them, with a few that we want to highlight having a higher alpha level. This will allow us to clearly show a few notable stats over the top of the general trend for all of the states. For our second visualization, we plan to get the coordinates for each college by using a google maps API and then plotting a dot at each college. The dot will be sized based on the amount of pell grant recipients at the school and colored by the proportion of students receiving a pell grant. We then will put a few of these maps side by side each representing a different time period, possibly corresponding to presidential cycle. This will allow us to see trends in states and areas in both amounts of students and proportions.