Boston Crime Analysis

            • Performed spatial and time series analysis for a 6-year dataset of reported incidents from Boston Police Department

            • Built data processing pipeline based on Pandas Dataframe and Spark SQL for big data online analytical processing (OLAP)

            • Explored and visualized the variation of spatial distribution of different crimes from 2016 to 2021, found that west and south Boston had higher crime rates than east or north Boston

            • Trained and fine-tuned an ARIMA model with a 12-month forecast accuracy of 93% to forecast the trend of crimes in 2022

Background

Big data analysis is an essential skill for data scientists. This project is based on crime data in Boston area. It will establish a data analysis workflow including data collection, cleaning, storage and analysis. Based on analyzing and modeling for the crime data, a possible crime event prediction model is established.

Keywords: Spark SQL, Data Pipeline, Time Series, Data Visualization, OLAP

Introduction

This is the data analysis of crime in Boston from 2016 to 2021. Greater Boston Area is one of the largest and most famous metropolitan regions in the Unites States. Crime in Boston is always a popular topic on the list of concerns. I have been living in Greater Boston Area for more than two years and citizens here pay much attention to the crime. It is highly related to residents’ decisions of choosing a community for living in and schools for their children.

Thanks to the availability of huge amounts of publicly available crime datasets, data scientists can get insights by data mining and help people to make decisions. This project can show crime situations in different regions, forecast the crime occurrence and analyze the crime rates over the years.

[data source link] https://data.boston.gov/dataset/crime-incident-reports-august-2015-to-date-source-new-system

As shown in the figure of crime by category, we can see that each crime is not evenly distributed. Motor vehicle accident, larceny and medical assistance are the most frequent 3 crime response types in the past 6 years. There are over 35,000 vehicle accidents and around 25,000 larceny from 2016 to 2021 in Boston.

As shown in the figure of crime by district, we can see that B2 (Roxbury), C11 (Dochester) and D4 (South End) are the 3 highest-risk areas with over 65,000 crimes each in the past 6 years. There are 5 districts in Boston with over 55,000 crimes, and A15 (Charlestown), A7 (East Boston) and E5 (West Roxbury) are the 3 safest areas to live.

As shown in the figure of crime by street, we can see that the risk of crime in Washington Street is surprisingly much higher than any other streets in Boston, with over 25,000 cases from 2016 to 2021. And the risk of crime in Washington Street is over 10,000 more than that in Blue Hill Avenue, which has the second most crimes in Boston. All the streets except Washington Street have less than 15,000 crime events in the past 6 years.

Distribution of Car Accidents on the Map
Distribution of Larceny on the Map

It seems like we are picturing all the roads in the Greater Boston Area. We can see that there are more traffic accidents on the main road, such as Washington Street, Blue Hill Avenue and Boylston Street.

We can observe that most larceny cases are concentrated in the city center, especially shopping areas. It makes sense because it is relatively easy for thives to find targets and commit crimes in the crowd. The most dense areas are nearby Downtown and Back Bay, which possess the most popular retail shops in Boston.

According to the yearly crime information, we can see that there are around 100,000 crime cases in total in 2016, 2017 and 2018 respectively. The crime situation does not change too much within these years. There are just some tiny fluctuations. After 2019, however, the crime events has a sharp decrease and the trend among 2019, 2020 and 2021 is similar to that among 2016, 2017 and 2018. This phenomenon makes sense because at the end of 2019, the COVID-19 pandemic struck the human being. Before the pandemic, the United States actually encountered a serious flu epidemic from autumn to winter. No matter the virus was common flu or COVID-19, it significantly impacted people’s daily life. The crime cases decreased with the reduction of people’s outdoor time. With the adaptation to the pandemic, the crime rates reached a new balance point. Unfortunately, there are still over 65,000 cases each year in Boston and there is an increasing trend of crimes with people trying to go back to normal life. However, we have to admit that the crime number looks unreasonably  low in 2019. As a transitional year of the pandemic, the crime number in 2019 should be at least higher than 2020 and 2021. We will figure out the reason later.

Monthly Trend

It is obvious that the crime rate is higher in summer and lower in winter. It is reasonable because no one, neither criminal nor potential victim, wants to hang out in Boston’s cold winter, especially sometimes there is snowstorm.

Weekly Trend

It seems like there is a small crime peak on Friday and everyone takes a break on Sunday.

Hourly Trend

It is obvious that the crime rate increases with time after 7am and reaches a peak at 12pm, at which people have lunch. After 4pm, as the night closed in, the crime rate goes up again and gradually decreases after most people coming home before 9pm. However, there is a sharp increase at midnight. It is easy to explain because youngsters finish their night life. Due to the alchohol or drugs, they may lose their personal stuff or be an easy target for criminals. They may also be to blame in a fighting event.

As monthly trend figure shows, crime situation in each year shares the similar trend, which reaches a peak during summer and goes down in winter. Although the magnitudes of crime numbers significantly differ between pre-pandemic and pandemic periods, the trend remains the same. We can already use this law to make some predictions to help citizens protect their personal properties. We notice that there are three months data lost in 2019, from October to December, which can also help us explain why the crime situation looks strange in this transitional year of human being’s life. During the most time of 2019, especially before winter, people led a normal life. It should have been much higher crime numbers in 2019 than 2020 or 2021, and actually the number should have been close to 2018. At the very beginning of flu epidemic and then with the following COVID-19 pandemic in winter 2019, both government and citizens cannot timely and efficiently react to this tough situation. The entire world fell into disorder and the government had no time to maintain and even collect the crime data. The lack of three months crime data perfectly explains that the sudden falling down of crime numbers in 2019 is not true. We can assert that with this quarter’s data, though not as many as the same period in previous years, the crime level would be close to a pre-pandemic one. Unfortunately, we can also predict that during the post-pandemic period, the crime numbers would have a dramatic increase and finally relapse into a pre-pandemic level. Actually, we can already see a gradual increasing trend from 2020 to 2021.

Those are the seasonal trends (90 days) of 6 most frequent crime types (from top left to down right) during pre-pandemic period (from 2016-01-01 to 2018-12-31). We can conclude that motor vehicle accidents are always in high frequency and the government should enhance the citizens’ education on traffic safety. It is also noticed that almost all crimes have a relatively fixed trend and summer is always the crime peak in one year, which corresponds to citizens’ frequency of outdoor activities. There is an interesting fact that all crimes remain on a relatively low level from February to April in 2018. After searching information on the Internet, we do not find any significant events which can help reduce crime. This phenomenon can be explained by lack of crime data at that period or any other unknown reasons. As we can see on the figure of drug violation, there is a dramatic decreasing trend since 2016 and it reaches a historically lowest point at the beginning of 2017. This apperantly accords with the timeline of decriminalization of marijuana and legalization of recreational cannabis sales. Due to the difference between illegal and recreational hemps in both quality and price, however, the demand for illegal ones increases again and reaches to a new balance level, which is lower than before but still high. The weed issue is always a serious problem in Boston and across the United States. Many crime cases, including some vehicle accidents and even shooting accidents, are directly or indirectly caused by drugs. The government should really pay more attention to dealing with drug-related issues.

Gun-related crime is always the citizens’ most concerned topic. Now let us see the most frequent 10 gun-related crime types. Aggravated assaulting (use a deadly weapon, target at a particularly vulnerable victim, or cause serious injury) cases are extremely highly related to guns, and it is even higher than the total number of other 9 top frequent reasons. We can also see that victims are exposed to a very dangerous environment under homicide. It is easy for criminals to get a gun and hurt unarmed citizens. The government should seriously make effects on illegal gun control.

Gun-related violations are more frequent during weekends and from 9pm to late at night everyday. If not necessary, citizens should not take outdoor activities after dark. Home is always the safest place. 

B2 (Roxbury), B3 (Mattapan) and C11 (Dochester) are the 3 most dangerous districts with over 700 gun violations each in the past 6 years. Even worse, Roxbury and Mattapan have more than 1,000 gun-related crimes from 2016 to 2021, which is over 150 events in average each year. Accordingly, most of gun-related cases happen in Washington Street in Roxbury and Blue Hill Avenue in Mattapan. There is no doubt that A1 & A15  (Downtown & Charlestown) are the districts farthest from gun violance in Boston.

Gun violations are relatively low in frequency from 2016 to 2018, and at least people in Boston led a steady and smooth life during those 3 years. However, with the beginning of American Presidential Election and inciting of inappropriate political opinions, we have to admit that the entire United States went through a social inability. And this farce reached the peak in 2020 with the heating up of conflicts between two parties and of the violent law-enforcement event. At that time, the goverment lost the control of some citizens on some degree and violent crimes broke out. The situation became steady after the election, but it was actually not getting too much better. The impact of election still remained and the number of gun-related racial hate crimes rose up.

Distribution of Shooting on the Map

Gun violations are mainly distributed at the southern and western part of Boston in the past 6 years. Heatmap shows that Roxbury and Dorchester are the high-frequency areas of shooting cases, which corresponds to the previous analysis.

Distribution of Drug-Related Crimes on the Map

Not surprisingly, one of the high-frequency drugs-related crime areas is the crowded Downtown area. And in the south and west Boston, there is a big overlapping area of drugs- and guns-related crimes.

Boston is also a very old city with a lot of historical and cutural accumulations in the United States. It is one of the origins of the Revolutionary War, has the earliest public transportation systems, and is a politically, financially and educationally developed place. There are many world-famous universities in the Greater Boston Area, such as Massachusetts Institude of Technology, Harvard University and Thufts University in Cambridge, and Boston University, Boston College and Northeastern University in Boston. This area attracts quantities of international students all over the world every year. Thus, public safety is always a hot topic. Now let us see the security situation around those universities.

There are over 2,000 cases in Boston University area in the past 6 years. North Cambridge and West Boston are also high-risk university areas with over 1,200 cases. Northeastern University is relatively safe from violance because it is one of the city centers nearby Back Bay. It is surprising that MIT area does not have any crime record.

It seems that uinversity area is still the safe place far from gun violance. There are only 5 cases nearby these universities in the past 6 years. And actually there are 2 cases nearby Boston College, 3 cases nearby Northeastern University, and none in others. However, four of them happened in 2020. As a student in Northeastern University, I can clearly remember that I received the email alerts from NUPD (Northeastern University Police Department) after those crime events. Fortunately, those cases were not school shootings. They just happened nearby the campus by coincidence.

Distribution of Crimes Near Boston University on the Map
Distribution of Crimes on Boston University Streets on the Map

It looks like there is an uncommonly high crime rate near Boston University. However, since Boston University goes along the Commonwealth Avenue and Beacon Street and it is around 1 mile long without closed campus, though many crime cases happen in the Boston University area, we cannot assert that all those cases are related to the campus safety threatening. Some of them are just happening in the Commonwealth Avenue or Beacon Street. According to the figure of crime types in Boston University area, it is proved again that car accidents are the most frequent events along streets.

Harvard University
Boston College
Northeastern University

We can see that at whichever university we are studying, car accidents are always the most frequent safety issues. Students cannot be too careful while crossing the road or driving. Since Northeastern University is nearby Back Bay, which is a business zone in Boston as mentioned before, it makes sense that theft is also a high-frequency crime type.

Finally, we apply the ARIMA model to find the monthly pattern of the crime data and make the prediction of the situation in 2022. We can see that the trained model successfully predict the similar trend in 2022 as in 2021. However, based on the fact that citizens in Greater Boston Area will gradually lead a normal life during the post-pandemic period, we draw a pessimistic conclusion that the cirme counts level will return to the pre-pandemic level while maintaining the same trend.