Project12a | Portfolio Site

US Crime Data Analysis

Introduction:

This was a personal project I completed on my own. I was interested in crime data, specifically murder and manslaughter across the United States, so I started looking for a dataset to meet this interest. I came upon a Kaggle dataset that included 21 columns and 636,670 rows of crime data related to murder and/or manslaughter across all US states from 1980 to 2014. I decided to analyze the data using SQL, as it was very large and MySQL was the perfect tool to work with. The main point of this project was to find trends and patterns in US crime data.

Description:

I first started by cleaning the data which included: changing column names, changing data types, and checking for duplicates. This process was extremely time consuming, as there were different character data types to consider (char vs varchar), null values for int data types, standardizing things like sex (male to m), removing 1,676 duplicates, and much more. The next step was actually digging in and analyzing the data. I answered a total of 36 questions using SQL queries. These ranged from simple queries using WHERE and GROUP BY to more complex ones using CTE's, SUBQUEIRES, and TEMP TABLES.

Results:

The end result was a very detailed and comprehensive analysis of US murder/manslaughter crimes from 1980 to 2014. Some key insights include:

There were 636,670 total murder/manslaughter crimes from 1980 to 2014.
LA and NY have the highest crimes.
About 70% of crimes have been solved, while 30% have been unsolved.
Handguns had the most crimes associated with them, 316,526 crimes.
The average victim age was 34 years old, and the average perpetrator age was 31 years old.
248 cities and 17 states had above average crimes.
Drugs have the highest solve rate, whole firearms have the lowest solve rate.

SQL