Project2 | Portfolio Site

Handshake Data Analysis

Introduction:

This was a project for my IT Data Analyst position. The data is about interviews, career events, and career fairs that companies engaged in with the University of Arizona, obtained from the platform Handshake. The main point of this analysis was to understand which companies have engaged with the University of Arizona and their frequency. The project also aimed to uncover the time periods (years, quarters, months) during which these engagements took place. It consists of 3 worksheets--the first contains 250+ records for company interviews, the second contains 1,500+ records for company career events, the third contains 2,300+ records for company career fairs.

Insights:

There were many important insights to uncover from analyzing this data:

How many interviews, career events, and career fairs did each company engage in?
During which years, quarters, and months did most of these activities take place?
Which event types were the most popular for each company?
Did companies that hosted the most career events also have the most attendees?

Tools:

This project was analyzed using Excel, SQL, Tableau, and Python:

Excel was used because it was requested by the key stakeholder and the data wasn't too large--perfect for analyzing using this tool. I started by cleaning the 3 worksheets by removing duplicates, using trim to remove extra spaces, and changing data types to dates, text, and number. Then I started answering key questions using pivot tables. I created 10 pivot tables and 6 bar graphs. These visuals were compiled in a dashboard in Excel, with key bullet points. I presented this dashboard to 1 stakeholder interested in the various company engagements with the UofA.

SQL was used because I wanted to join the data together for more effective analysis and discover more insights with SQL's functions. I started by cleaning the 3 worksheets which included changing column names, dropping columns, changing data types, adding new columns, and updating blank values. Then I outlined a total of 25 questions and wrote queries for each one. The clauses used included: SELECT, FROM, WHERE, GROUP BY, SUM, COUNT, HAVING, ORDER BY, LIMIT, CTE, SUBSTRING_INDEX, CASE WHEN, SUBQUEIRES, UNION ALL.

Tableau was used because I wanted to visualize my analysis from the SQL queries and explore the data even more to identify trends and patterns not discovered using Excel and SQL. I started by importing the 3 Excel worksheets as 3 separate data sources, since they are unrelated. I created a total of 16 worksheets, each with a visual for either the interviews, career events, or career fairs data. These visuals were comprised of bar graphs, line charts, treemaps, and stacked column charts. Then I created 4 dashboards--1 for the interviews, 1 for the career fairs, and 2 for the career events. Then I compiled these 4 dashboards into a story with 4 slides.

Python was used because it was requested by my supervisor due to its efficiency and large amount of libraries. I started by cleaning the data similar to the process used in Excel and SQL. I outlined a total of 13 questions and used Pandas to answer them and Matplotlib to visualize them. The main Pandas syntax used were groupby and sort_values. The main Matplotlib visuals used were bar graphs and line graphs.

Impact:

The impact from this comprehensive analysis was stakeholders were able to make key decisions to enhance employer and student engagements because of all the tools used to derive detailed insights. The analysis helped identify companies most actively engaged in interviews, career events, and career fairs. Moreover, stakeholders were able to see when these activities were the most popular by years, quarter, and month. This allowed for detailed planning and optimizing scheduling in advance of the next event or career fair.