Investigating Brand Reputation of America’s Top 100 Companies

Proposal

Dataset

The dataset that we are using comes from the TidyTuesday project and comes from the Axis-Harris Poll which investigated the reputation of the most visible brands in America. The Harris Poll conducted a survey in February 2022 among a representative sample of the American population to identify the companies that were most prominent in the public’s mind. The top 100 companies with the highest number of nominations were included in the “Most Visible” list.

The poll actually resulted in two datsets: polls and reputation. The polls dataset has 8 variables and 500 observations. This dataset has each company’s overall ranking and RQ score in 2022 as well as information about each company’s rating from 2017 to 2021. RQ scores are a metric that is a combination of each company’s rating for each specific attribute. The attributes are: trust, ethics, growth, p&s, citizenship, vision, and culture. It is specifically calculated using the formula: [ (Sum of ratings of each of the 9 attributes)/(the total number of attributes answered x 7) ] x 100. Next, the reputation dataset has 10 variables and 700 rows or observations. This dataset is only from the year 2022, and shows the individual breakdown for how companies were rated based on certain rating attributes.

We’ve interacted with many of these companies before, yet our familiarity with these entities is limited solely to their name recognition. In an effort to expand our knowledge and gain a more comprehensive understanding of their respective positions, we aim to delve deeply into the available data from the 2022 Axios-Harris Poll and draw deeper insights. We’re interested in investigating companies across different industries that are highly reputable, but may have experienced either a decline or rise in the public’s eye.

Questions

  1. What did individuals value in a company in 2022?

  2. How does company RQ score change between the years of 2017 to 2022 and are the trends consistent with their overall industry?

Analysis plan

Our analysis of question 1 will come from the reputation dataset. To see what individuals valued in a company in 2022, we will examine the different rating attributes (i.e. growth, vision, p&s etc) and see which ones tended to score the highest. However, because companies in different industries probably experience differences in rating attributes, we want to first see which industries ranked the highest for each rating attribute by creating an average rating variable for each attribute for each industry. It is important to note that we may re-categorize some of the industries or place some companies in different industries to have better categories for comparison in the visualization. This is because many grocery chains, like Trader Joe’s and HEB Grocery, are categorized as retail and not under the category for grocery. For our first visualization, we will use the name, industry, and score variables to show the distribution of company scores for the industry with the highest average score for each of the 7 rating attributes. To do so, we will graph a boxplot for each industry that scored the highest for each rating attribute and examine the differences in distributions across these top score industries for each attribute. We also thought that it is possible that rating attributes could be correlated with one another. Thus, for our second visualization, we wanted to examine the relationship between two specific rating attributes. Specifically, we will decide which rating attributes to focus on by looking at the top two rating attributes based on score. We will plot those two rating attributes together on the same graph using geom_point() to see if there is any sort of relationship.

For question 2, our analysis will come from the polls dataset. Variables involved include company, rq, 2022_rq, and year. To examine how the company RQ score changes between 2017 and 2022, we will first remove all companies for which there is an NA value for RQ score for any of these years. Alternatively, we could impute values for missing data using mean RQ scores for company, but this would result in unreliable data. We will then wrangle the data so that 2022_rq is in the same column as rq since the 2022 values of RQ exist in a separate column in the data set than the RQ values for years 2017 to 2021. There are 19 companies in the dataset, and it’d be difficult to visualize all of them in one graph. Instead, we will filter for the top 3 companies who had the greatest increase in RQ score and top 3 companies with the greatest decrease in RQ score from 2017-2022. Next, we intend on created a progressive line chart to visualize their change over time, where each line represents a company. We will utilize the gganimate package with the transition_reveal() function set to yearly to demonstrate this, and include the geom_point() function grouping each company so for each year, an RQ score is illustrated clearly for each company. Our second visualization for question 2 will examine the industry trends for the companies with the greatest increase and decrease in RQ value. From our first visualization, we will select the company with the greatest increase and the company with the greatest decrease in RQ value from 2017-2022. The second visualization will analyze how these company’s respective industries have changed in RQ values across the same time period. This visualization will let us see if the companies’ changes aligned or didn’t align with how their industries changed. We will use geom_line() to map the changes in RQ between years and geom_ribbon() to show variability of the companies within that industry.