Have crime rates changed under Mayor DeBlasio? I examine this question by looking at NYPD court summons using an NYC OpenData Extract of NYPD Shooting Incident reports. It covers 2006 to 2019. Students who are interested in exploring this data can download a CSV version of the set from NYC OpenData, and can review my R Markdown file here.
Loading the Set
The data set is distributed as a comma-separated values file, and so we call it up using read.csv():
data <- read.csv("NYPD_Shooting_Incident_Data__Historic_.csv")
# Trimming data set, to ease display in this post:
data <- data[c(1,2,4,5,8)]
Appraising the Data for Cleaning Strategy
After loading the data, my next step is to examine the data and develop a strategy for cleaning it into a tidy, compact table:
head(data)
## ï..INCIDENT_KEY OCCUR_DATE BORO PRECINCT STATISTICAL_MURDER_FLAG
## 1 201575314 08/23/2019 QUEENS 103 false
## 2 205748546 11/27/2019 BRONX 40 false
## 3 193118596 02/02/2019 MANHATTAN 23 false
## 4 204192600 10/24/2019 STATEN ISLAND 121 true
## 5 201483468 08/22/2019 BRONX 46 false
## 6 198255460 06/07/2019 BROOKLYN 73 false
Each row is a separate shooting, so I can get a count of shootings by counting rows. Between January 1, 2006 and December 31, 2019, how many shootings were there in NYC? I count using nrow()
nrow(data)
## [1] 21626
The answer is 21,626. Of these, how many were flagged as murders? We use table() to get counts of a dichotomous variable:
table(data$STATISTICAL_MURDER_FLAG)
##
## false true
## 17499 4127
R 4,127
My strategy is to collapse these tables into annual counts. To do that, I am going to extract the year from the date field, and then use frequency tables of that new variable to get row counts:
#Create new variable with year extracted from OCCUR_DATE
#As dates are all same number of characters with year in same
#position, you can extract using **substring()**
data$YEAR <- as.numeric(substring(data$OCCUR_DATE,7,10))
Have Shootings Become More or Less Common?
A count of rows will give us a count of how many reported shootings are in the database by year:
table(data$Year)
## < table of extent 0 >
Visualized, using ggplot2:
data.1 <- data.frame(table(data$YEAR))
library(ggplot2)
ggplot(data.1, aes(x = Var1, y = Freq, group = 1)) + geom_point() + geom_line()

Shootings seem to be way down. It may be that shootings went up during COVID-19, but it seems more likely that COVID-19 itself would precipitate a rise in shootings, than it is a policy change indepdently initiated by DeBlasio .
How about Murders?
We can perform the same operations, only looking at shootings flagged as murders:
data.2 <- subset(data, STATISTICAL_MURDER_FLAG == "true")
data.3 <- data.frame(table(data.2$YEAR))
data.3
## Var1 Freq
## 1 2006 445
## 2 2007 373
## 3 2008 362
## 4 2009 348
## 5 2010 403
## 6 2011 373
## 7 2012 287
## 8 2013 223
## 9 2014 248
## 10 2015 283
## 11 2016 223
## 12 2017 174
## 13 2018 202
## 14 2019 183
ggplot(data.3, aes(x = Var1, y = Freq, group = 1)) + geom_point() + geom_line()

Murders also seem down
Verdict: Improvements until 2019
Overall, our analysis concludes that shootings have not gotten worse under DeBlasio. This may change with 2020 data, although one might question whether such changes are attributable to DeBlasio, the COVID-19 crisis, political unrest, or some other factor.
Photo Credit. By Gage Skidmore, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=81590858