This project was completed as a Capstone Project by University of Pennsylvania graduate student Sean McClellan. The Master’s of Urban Spatial Analytics (MUSA) program is a one year program focused on data science, spatial analytics, statistics, GIS, and data visualization.
More information on the program can be found here. More of Sean’s work can be found here.
NFL Football is a sport filled with passion and aggression. These qualities are shared by the fans. While passion at the stadium can make for a fun and competitive atmosphere, these qualities can just as easily result in antinormative behavior such as crime. For instance, assaults, vandalism, arrests for disorderly conduct, and arrests for alcohol-related offenses are all common arrests on football game days according to Rees & Schnepel (2009) which Deindividuation Theory would attribute to due to large crowds and stadium atmospheres. These settings and circumstances contribute to a collective mind, which can cause individuals to lose the ability of evaluating themselves or their actions and result in irrational or irresponsible behavior.
Exploring crime incidents across time and space in cities with professional football teams answers questions about the effect of NFL football on crime. Furthermore, patterns of spatial distribution and fluctuations in offense type totals can be connected to the result of that city’s team.
Research has been conducted on this topic in the past, however most of this research has been strictly on levels of crime and included no spatial analysis. My project would fill this void by observing clustering and hot/cold spots of crime in each city. While the analysis will include basic measurements and comparisons of crime that is covered in previous research, this hot spot analysis will add a new component to NFL-related crime. I hypothesize that crime such as theft and assault will increase in density or cluster closer around each team’s stadium on game days. In addition to the spatial components of this analysis, it would be interesting to view scatter plots of points allowed by the home team and counts of crime, specifically violent crimes, to see if there is a correlation between performance and criminality.
Game day related crime has been tied to multiple theories of crime and behavior. One of these theories, Routine Activity Theory, would best describe crime around Lincoln Financial Field on game days. Routine Activity Theory suggest crime occurs in situations where three things present themselves at the same time.
Applying these three criteria to a stadium event could mean:
This very quickly and easily becomes an environment that welcomes criminal behavior, according to Routine Activity theory.
As mentioned in Section 1.2, the location of crime on game day lacks any significant amount of research, but Merlo et al. (2010) analyzed the average distance from the stadium that crime occurs within. Their results show that crime shifts closer to the stadium on game days, most likely due to tailgating and the dense and concentrated rush of people. On average, arrests made on game day occurred 0.69 miles from the stadium, while arrests happened an average of 2.1 miles away on non-game days (Merlo et al., 2010). A relation between game day and crime within a close radius of a stadium supports the hypothesis of Rees and Schnepel (2009) that opportunities for disputes and altercations unrelated to football increase as the population of the community temporarily increases. This means on game days it is reasonable to think the influx of people traveling to the game from areas outside of the Stadium District, a section of Philadelphia that is relatively empty on non-game days, could lead to disputes or altercations.
Both Rees & Schnepel (2009) and Merlo et al. (2010) found increases in arrests for alcohol possession by a minor, DUI, open container, and possession of alcohol in the stadium on game days. It is important to note that both of these studies were focused on NCAA Football and not NFL Football. While the findings from both studies are useful and significant, NFL teams, fan bases, stadiums, and cities present entirely new environments and possibly very different game day experiences.
This analysis will explain the connection between criminal incidents in the city of Philadelphia and the Philadelphia Eagles. Crime across the city will be subjected specific tests and figures showing concentration and clustering of crime around the stadium, bars, subway stops, and other local features. These crime levels and patterns can then be compared between other cities with NFL teams, painting a picture of how these teams affect their cities.
The methodology of this analysis will first include the gathering and cleaning of data. Working the datasets into desired formats will be vital to ensure data is ready for analysis despite being from many different sources. Then, data will need to be separated into home games and away games. This will be done by selecting only the observations in which the desired team is the home team. Once the NFL games have been wrangled, data from the city of Philadelphia can be brought into the analysis. Part I and Part II incidents were collected from the City of Philadelphia’s open data portal.
In this section, the data collected on NFL seasons and Philadelphia crime can be filtered and visualized in ways useful for this analysis. Again, this workflow could be easily adapted for another city, team, or year.
Importing raw data for wrangling and cleaning includes reading basic NFL data which holds information on historical NFL games, scores, betting information, and more.
nfl <- read.csv("Data/spreadspoke_scores.csv") %>%
filter(schedule_season == "2019")
# datatable(nfl, options = list(pageLength = 5,scrollX='400px'))
DT::datatable(nfl, options = list(pageLength = 5,scrollX='400px'))
Further data wrangling and cleaning can be completed by filtering this data to only include the teams included in this analysis. For instance, here is data extracted from the previous NFL data for only the Philadelphia Eagles.
eagles <- subset(nfl, team_home == "Philadelphia Eagles" | team_away == "Philadelphia Eagles") %>%
mutate(date = mdy(schedule_date))
DT::datatable(eagles, options = list(pageLength = 5,scrollX='400px'))
These scores and game details will become useful later into the analysis. Crime data must be imported so game day crimes can be spatially compared to non-game day crimes. For now, let’s take a look at crime across the city for the entire year of 2019:
phlCrime <- read_sf("Philadelphia Eagles/incidents_part1_part2/incidents_part1_part2.shp") %>%
st_transform('EPSG:3857') %>%
mutate(date = ymd(dispatch_d),
count = 1) %>%
filter(point_y > 30)
df_sub <- phlCrime[1:100,] # display the first 100 rows
df_sub$dispatch_t <- as.character(df_sub$dispatch_t)
DT::datatable(df_sub, options = list(pageLength = 5,scrollX='400px'))
Click the map to initialize
The figure above shows crime at multiple elevations. At the highest elevation, areas around Center City possess a staggering amount of crimes compared to the outer parts of the city. However, crime was a citywide issue in 2019. This is not new for Philadelphia. Crime in the city has been increasing each year, but what’s even more staggering is the types of crime pushing the totals so high. Violent crime has consistently been one of the highest categories in the recent past. Check out this interactive dashboard for Philadelphia’s shooting victimizations.
crime_cats <- phlCrime %>%
mutate(category = case_when(
text_gener == "Thefts" ~ "Thefts",
text_gener == "Theft from Vehicle" ~ "Thefts",
text_gener == "Aggravated Assault No Firearm" ~ "Violent",
text_gener == "Burglary Residential" ~ "Violent",
text_gener == "Robbery No Firearm" ~ "Violent",
text_gener == "Burglary Non-Residential" ~ "Violent",
text_gener == "Robbery Firearm" ~ "Violent",
text_gener == "Rape" ~ "Rape",
text_gener == "Aggravated Assault Firearm" ~ "Violent",
text_gener == "Other Assaults" ~ "Violent",
text_gener == "Narcotic / Drug Law Violations" ~ "Alcohol/Narcotics",
text_gener == "Weapon Violations" ~ "All Other Offenses",
text_gener == "All Other Offenses" ~ "All Other Offenses",
text_gener == "Vandalism/Criminal Mischief" ~ "All Other Offenses",
text_gener == "DRIVING UNDER THE INFLUENCE" ~ "Alcohol/Narcotics",
text_gener == "Fraud" ~ "Financial",
text_gener == "Forgery and Counterfeiting" ~ "Financial",
text_gener == "Embezzlement" ~ "Financial",
text_gener == "Disorderly Conduct" ~ "All Other Offenses",
text_gener == "Arson" ~ "All Other Offenses",
text_gener == "Offenses Against Family and Children" ~ "All Other Offenses",
text_gener == "Other Sex Offenses (Not Commercialized)" ~ "Sex",
text_gener == "Prostitution and Commercialized Vice" ~ "Sex",
text_gener == "Public Drunkenness" ~ "Alcohol/Narcotics",
text_gener == "Liquor Law Violations" ~ "Alcohol/Narcotics",
text_gener == "Gambling Violations" ~ "All Other Offenses",
text_gener == "Receiving Stolen Property" ~ "All Other Offenses",
text_gener == "Vagrancy/Loitering" ~ "All Other Offenses",
text_gener == "Homicide - Criminal" ~ "Homicide",
text_gener == "Motor Vehicle Theft" ~ "Thefts"
))
phlCrime_counts <- crime_cats %>%
group_by(category) %>%
summarize(count=n()) %>%
st_drop_geometry()
ggplot(data=phlCrime_counts,aes(x=reorder(category, -count), y=count)) +
geom_bar(stat="identity", width=0.5, fill = "#0b465c") +
labs(title = "2019 Crime by Offense Category", subtitle = "Philadelphia") +
plotTheme
The types of crime, seen above, with the highest number of offenses were violent crimes and thefts. According to the research by Rees and Schnepel, thefts are known to increase during NFL home games. Where these crimes occur hold just as much importance as their offense category. There are multiple ways of portraying this data. Aggregating criminal incidents from 2019 to a fishnet grid gives an idea of where the most crime is occurring, similar to the interactive map above.
phlBound <-
st_read("http://data.phl.opendata.arcgis.com/datasets/063f5f85ef17468ebfebc1d2498b7daf_0.geojson") %>%
st_transform('EPSG:3857')
## Reading layer `abf6322f-732c-41eb-be2f-a687a95d16462020329-1-11fkgla.x0gkf' from data source `http://data.phl.opendata.arcgis.com/datasets/063f5f85ef17468ebfebc1d2498b7daf_0.geojson'
## using driver `GeoJSON'
## Simple feature collection with 288 features and 4 fields
## Geometry type: POLYGON
## Dimension: XY
## Bounding box: xmin: -75.28031 ymin: 39.86771 xmax: -74.95575 ymax: 40.13793
## Geodetic CRS: WGS 84
phlOutline <- st_union(phlBound) %>%
st_transform('EPSG:3857')
tracts <- st_read("Census_Tracts_2010.geojson") %>%
st_transform("EPSG:3857")
## Reading layer `db894b56-81d3-49af-9fdf-8a4f8170db4f2020328-1-1i3f5cj.8z1e' from data source `/Users/seanmcclellan/Desktop/School/Capstone/Census_Tracts_2010.geojson'
## using driver `GeoJSON'
## Simple feature collection with 384 features and 14 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -75.28031 ymin: 39.86747 xmax: -74.95575 ymax: 40.13793
## Geodetic CRS: WGS 84
stadiums <-
st_read("Data/Stadiums/stadiums.json") %>%
st_transform("EPSG:3857")
## Reading layer `stadiums' from data source
## `/Users/seanmcclellan/Desktop/School/Capstone/Data/Stadiums/stadiums.json'
## using driver `GeoJSON'
## Simple feature collection with 32 features and 6 fields
## Geometry type: POINT
## Dimension: XY
## Bounding box: xmin: -122.3317 ymin: 25.95806 xmax: -71.26344 ymax: 47.59528
## Geodetic CRS: WGS 84
linc <-
stadiums %>%
filter(Team == "Philadelphia Eagles")
lincBuffer <-
st_buffer(linc, 1000) %>%
st_transform(st_crs(stadiums))
septa_stops <- st_read("SEPTA_-_Highspeed_Stations.geojson") %>%
st_transform("EPSG:3857") %>%
st_filter(.,phlBound)
## Reading layer `SEPTA_-_Highspeed_Stations' from data source
## `/Users/seanmcclellan/Desktop/School/Capstone/SEPTA_-_Highspeed_Stations.geojson'
## using driver `GeoJSON'
## Simple feature collection with 74 features and 5 fields
## Geometry type: POINT
## Dimension: XY
## Bounding box: xmin: -75.35358 ymin: 39.90543 xmax: -75.07788 ymax: 40.11349
## Geodetic CRS: WGS 84
septaBuffer <-
st_union(st_buffer(septa_stops, 275)) %>%
st_sf()
fishnetHexPhl <-
st_make_grid(phlBound,
cellsize = 750,
square = FALSE) %>%
.[phlBound] %>%
st_sf() %>%
mutate(uniqueID = rownames(.))
crime_netHex <-
dplyr::select(phlCrime) %>%
mutate(countCrime = 1) %>%
aggregate(., fishnetHexPhl, sum) %>%
mutate(countCrime = replace_na(countCrime, 0),
uniqueID = rownames(.),
cvID = sample(round(nrow(fishnetHexPhl) / 24),
size=nrow(fishnetHexPhl), replace = TRUE))
ggplot() +
geom_sf(data = crime_netHex, aes(fill = countCrime), color = "transparent", size = 0) +
scale_fill_viridis()+
labs(title = "Total Count of Crimes within Hex Grid",
subtitle = "Philadelphia, PA - 2019") +
mapTheme
crime_joined <- left_join(phlCrime, eagles, by = "date") %>%
mutate(home_game = ifelse(team_home == "Philadelphia Eagles", 1, 0),
away_game = ifelse(team_away == "Philadelphia Eagles", 1, 0),
favorite = ifelse(team_favorite_id == "PHI", 1, 0),
rivalry = ifelse(team_home == "Dallas Cowboys" | team_home == "Washington Redskins" |
team_home == "New York Giants" | team_away == "Dallas Cowboys" |
team_away == "Washington Redskins" | team_away == "New York Giants", 1, 0),
gameday = ifelse(schedule_playoff == "TRUE" | schedule_playoff == "FALSE", 1, 0),
dotw = wday(mdy(schedule_date)),
total_points = score_home + score_away)
Plotting crime as point data, representative of each individual incident, provides a visual understanding that does not capture the true amount of events in a given area. Points can fall on top of each other when crimes throughout the year occur in the same place, or close enough in proximity to appear on the same location at a citywide elevation like above. Plotting the kernel density of crime, however, shows the same data while offering a better understanding of the spatial distribution of these incidents.
Separating crime into three categories will be the basis of the next component of analysis. Here, crimes from 2019 will be aggregated into three types of days: days with no Eagle’s game (NA), days on which the Eagle’s played at home (1), and days on which the Eagle’s played an away game (0).
ggplot() +
geom_sf(data = crime_joined, color = "#165269", alpha = 0.25) +
facet_wrap(~home_game) +
geom_sf(data = phlOutline, fill = "transparent", size = 1.25) +
mapTheme
no_game <- crime_joined %>%
filter(home_game != 1 | home_game != 0)
home_game <- crime_joined %>%
filter(home_game == 1)
away_game <- crime_joined %>%
filter(home_game == 0)
home_density <-
ggplot() +
geom_sf(data = phlOutline, fill = "grey40") +
stat_density2d(data = data.frame(st_coordinates(home_game)),
aes(X, Y, fill = ..level.., alpha = ..level..),
size = 0.01, bins = 40, geom = 'polygon') +
scale_fill_viridis() +
scale_alpha(range = c(0.00, 0.35), guide = FALSE) +
labs(title = "Home Games") +
mapTheme +
theme(legend.position = "none")
away_density <-
ggplot() +
geom_sf(data = phlOutline, fill = "grey40") +
stat_density2d(data = data.frame(st_coordinates(away_game)),
aes(X, Y, fill = ..level.., alpha = ..level..),
size = 0.01, bins = 40, geom = 'polygon') +
scale_fill_viridis() +
scale_alpha(range = c(0.00, 0.35), guide = FALSE) +
labs(title = "Away Games") +
mapTheme +
theme(legend.position = "none")
no_density <-
ggplot() +
geom_sf(data = phlOutline, fill = "grey40") +
stat_density2d(data = data.frame(st_coordinates(no_game)),
aes(X, Y, fill = ..level.., alpha = ..level..),
size = 0.01, bins = 40, geom = 'polygon') +
scale_fill_viridis() +
scale_alpha(range = c(0.00, 0.35), guide = FALSE) +
labs(title = "No Games") +
mapTheme +
theme(legend.position = "none")
grid.arrange(away_density,home_density,no_density, nrow = 1)
Some of the research in this area has found that specific game day factors can influence crime during NCAA sporting events. Upset wins and losses, for instance, can influence criminality more than a normal win or loss.
Do these factors influence crime in Philadelphia?
corr_data <- read.csv("Data/spreadspoke_scores.csv") %>%
filter(team_home == "Philadelphia Eagles" | team_away == "Philadelphia Eagles") %>%
filter(schedule_season >= 2014) %>%
mutate(date = mdy(schedule_date))
crime_2018 <- read_sf("Philadelphia Eagles/2018/incidents_part1_part2/incidents_part1_part2.shp") %>%
st_transform('EPSG:3857') %>%
mutate(date = ymd(dispatch_d),
count = 1) %>%
filter(point_y > 30)
crime_2017 <- read_sf("Philadelphia Eagles/2017/incidents_part1_part2/incidents_part1_part2.shp") %>%
st_transform('EPSG:3857') %>%
mutate(date = ymd(dispatch_d),
count = 1) %>%
filter(point_y > 30)
crime_2016 <- read_sf("Philadelphia Eagles/2016/incidents_part1_part2/incidents_part1_part2.shp") %>%
st_transform('EPSG:3857') %>%
mutate(date = ymd(dispatch_d),
count = 1) %>%
filter(point_y > 30)
crime_2015 <- read_sf("Philadelphia Eagles/2015/incidents_part1_part2/incidents_part1_part2.shp") %>%
st_transform('EPSG:3857') %>%
mutate(date = ymd(dispatch_d),
count = 1) %>%
filter(point_y > 30)
crime_2014 <- read_sf("Philadelphia Eagles/2014/incidents_part1_part2/incidents_part1_part2.shp") %>%
st_transform('EPSG:3857') %>%
mutate(date = ymd(dispatch_d),
count = 1) %>%
filter(point_y > 30)
all_crime <- do.call("rbind", list(phlCrime, crime_2018, crime_2017, crime_2016, crime_2015, crime_2014))
# EDIT - group by date AND hour? THIS DOESNT INCLUDE ZEROS - try complete(?)
crime_summary_hour <- all_crime %>%
group_by(date, hour_) %>%
summarize(crime_per_hour = sum(count)) %>%
st_drop_geometry()
## `summarise()` has grouped output by 'date'. You can override using the
## `.groups` argument.
crime_summary_day <- all_crime %>%
group_by(date) %>%
summarize(daily_crime = sum(count)) %>%
st_drop_geometry()
crime_summary_geo <- all_crime %>%
group_by(date, hour_) %>%
summarize(crime_per_hour = sum(count))
## `summarise()` has grouped output by 'date'. You can override using the
## `.groups` argument.
final_prep <- left_join(crime_summary_hour, corr_data, by = "date") %>%
mutate(home_game = ifelse(team_home == "Philadelphia Eagles", 1, 0),
away_game = ifelse(team_away == "Philadelphia Eagles", 1, 0),
favorite = ifelse(team_favorite_id == "PHI", 1, 0),
rivalry = ifelse(team_home == "Dallas Cowboys" | team_home == "Washington Redskins" |
team_home == "New York Giants" | team_away == "Dallas Cowboys" |
team_away == "Washington Redskins" | team_away == "New York Giants", 1, 0),
gameday = ifelse(schedule_playoff == "TRUE" | schedule_playoff == "FALSE", 1, 0),
dotw = wday(mdy(schedule_date)),
total_points = score_home + score_away)
final <- left_join(final_prep, crime_summary_day, by = "date")
final_geo <- left_join(final_prep, crime_summary_geo, by = "date") %>%
st_as_sf()
final$date <- as.numeric(final$date)
final_gameday <- final_geo %>%
filter(gameday == 1)
corr <-
select_if(final, is.numeric)
corr[is.na(corr)] <- 0
corr_gameday <- corr[, -c(1,2)] %>%
filter(gameday == 1)
ggcorrplot(
round(cor(corr_gameday), 1),
p.mat = cor_pmat(corr_gameday),
colors = c("#691652", "white", "#165269"),
type="lower",
insig = "blank") +
labs(title = "Correlation across numeric variables") +
plotTheme
Total crime does not seem to be influenced by game day factors. This could be for a variety of reasons. Whereas the research that discovered connections between increases of crime and game days were conducted on NCAA teams in smaller towns, Philadelphia is one of the largest cities in the United States and experiences crime in a very different way than college towns.
The figure above suggests there are no significant correlations between game days or game day factors and the number of criminal incidents on a given day. However, if the same process is completed for specific crime categories, we see not much changes.
#EDIT - use corr instead of all crime?
all_categories <- all_crime %>%
mutate(category = case_when(
text_gener == "Thefts" ~ "Thefts",
text_gener == "Theft from Vehicle" ~ "Thefts",
text_gener == "Aggravated Assault No Firearm" ~ "Violent",
text_gener == "Burglary Residential" ~ "Violent",
text_gener == "Robbery No Firearm" ~ "Violent",
text_gener == "Burglary Non-Residential" ~ "Violent",
text_gener == "Robbery Firearm" ~ "Violent",
text_gener == "Rape" ~ "Rape",
text_gener == "Aggravated Assault Firearm" ~ "Violent",
text_gener == "Other Assaults" ~ "Violent",
text_gener == "Narcotic / Drug Law Violations" ~ "Alcohol/Narcotics",
text_gener == "Weapon Violations" ~ "All Other Offenses",
text_gener == "All Other Offenses" ~ "All Other Offenses",
text_gener == "Vandalism/Criminal Mischief" ~ "All Other Offenses",
text_gener == "DRIVING UNDER THE INFLUENCE" ~ "Alcohol/Narcotics",
text_gener == "Fraud" ~ "Financial",
text_gener == "Forgery and Counterfeiting" ~ "Financial",
text_gener == "Embezzlement" ~ "Financial",
text_gener == "Disorderly Conduct" ~ "All Other Offenses",
text_gener == "Arson" ~ "All Other Offenses",
text_gener == "Offenses Against Family and Children" ~ "All Other Offenses",
text_gener == "Other Sex Offenses (Not Commercialized)" ~ "Sex",
text_gener == "Prostitution and Commercialized Vice" ~ "Sex",
text_gener == "Public Drunkenness" ~ "Alcohol/Narcotics",
text_gener == "Liquor Law Violations" ~ "Alcohol/Narcotics",
text_gener == "Gambling Violations" ~ "All Other Offenses",
text_gener == "Receiving Stolen Property" ~ "All Other Offenses",
text_gener == "Vagrancy/Loitering" ~ "All Other Offenses",
text_gener == "Homicide - Criminal" ~ "Homicide",
text_gener == "Motor Vehicle Theft" ~ "Thefts"
))
violent_crimes <- all_categories %>%
filter(category == "Violent")
summary_violent <- violent_crimes %>%
group_by(date) %>%
summarize(crime_count = sum(count)) %>%
st_drop_geometry()
final_violent <- left_join(summary_violent, corr_data, by = "date") %>%
mutate(home_game = ifelse(team_home == "Philadelphia Eagles", 1, 0),
away_game = ifelse(team_away == "Philadelphia Eagles", 1, 0),
favorite = ifelse(team_favorite_id == "PHI", 1, 0),
rivalry = ifelse(team_home == "Dallas Cowboys" | team_home == "Washington Redskins" |
team_home == "New York Giants" | team_away == "Dallas Cowboys" |
team_away == "Washington Redskins" | team_away == "New York Giants", 1, 0),
gameday = ifelse(schedule_playoff == "TRUE" | schedule_playoff == "FALSE", 1, 0),
dotw = wday(mdy(schedule_date)),
total_points = score_home + score_away)
corr_violent <-
select_if(final_violent, is.numeric)
corr_violent[is.na(corr_violent)] <- 0
corr_violent %>%
filter(gameday == 1)
## # A tibble: 101 × 16
## crime_count schedule_season score_home score_away spread_favorite
## <dbl> <int> <int> <int> <dbl>
## 1 138 2014 34 17 -10
## 2 162 2014 27 30 -3
## 3 146 2014 37 34 -4
## 4 132 2014 26 21 -3.5
## 5 133 2014 34 28 -3.5
## 6 116 2014 27 0 -1
## 7 119 2014 24 20 -1
## 8 108 2014 21 31 -1.5
## 9 124 2014 45 21 -7
## 10 102 2014 53 20 -4.5
## # … with 91 more rows, and 11 more variables: over_under_line <dbl>,
## # weather_temperature <int>, weather_wind_mph <int>, weather_humidity <int>,
## # home_game <dbl>, away_game <dbl>, favorite <dbl>, rivalry <dbl>,
## # gameday <dbl>, dotw <dbl>, total_points <int>
ggcorrplot(
round(cor(corr_violent), 1),
p.mat = cor_pmat(corr_violent),
colors = c("#691652", "white", "#165269"),
type="lower",
insig = "blank") +
labs(title = "Violent Crimes") +
plotTheme
# ALCOHOL
alc_narc_crimes <- all_categories %>%
filter(category == "Alcohol/Narcotics")
summary_alc <- alc_narc_crimes %>%
group_by(date) %>%
summarize(crime_count = sum(count)) %>%
st_drop_geometry()
final_alc <- left_join(summary_alc, corr_data, by = "date") %>%
mutate(home_game = ifelse(team_home == "Philadelphia Eagles", 1, 0),
away_game = ifelse(team_away == "Philadelphia Eagles", 1, 0),
favorite = ifelse(team_favorite_id == "PHI", 1, 0),
rivalry = ifelse(team_home == "Dallas Cowboys" | team_home == "Washington Redskins" |
team_home == "New York Giants" | team_away == "Dallas Cowboys" |
team_away == "Washington Redskins" | team_away == "New York Giants", 1, 0),
gameday = ifelse(schedule_playoff == "TRUE" | schedule_playoff == "FALSE", 1, 0),
dotw = wday(mdy(schedule_date)),
total_points = score_home + score_away)
corr_alc <-
select_if(final_alc, is.numeric)
corr_alc[is.na(corr_alc)] <- 0
corr_alc %>%
filter(gameday == 1)
## # A tibble: 101 × 16
## crime_count schedule_season score_home score_away spread_favorite
## <dbl> <int> <int> <int> <dbl>
## 1 21 2014 34 17 -10
## 2 32 2014 27 30 -3
## 3 22 2014 37 34 -4
## 4 24 2014 26 21 -3.5
## 5 28 2014 34 28 -3.5
## 6 30 2014 27 0 -1
## 7 27 2014 24 20 -1
## 8 24 2014 21 31 -1.5
## 9 25 2014 45 21 -7
## 10 25 2014 53 20 -4.5
## # … with 91 more rows, and 11 more variables: over_under_line <dbl>,
## # weather_temperature <int>, weather_wind_mph <int>, weather_humidity <int>,
## # home_game <dbl>, away_game <dbl>, favorite <dbl>, rivalry <dbl>,
## # gameday <dbl>, dotw <dbl>, total_points <int>
ggcorrplot(
round(cor(corr_alc), 1),
p.mat = cor_pmat(corr_alc),
colors = c("#691652", "white", "#165269"),
type="lower",
insig = "blank") +
labs(title = "Alcohol/Narcotics") +
plotTheme
# THEFTS
theft_crimes <- all_categories %>%
filter(category == "Thefts")
summary_theft <- theft_crimes %>%
group_by(date) %>%
summarize(crime_count = sum(count)) %>%
st_drop_geometry()
final_theft <- left_join(summary_theft, corr_data, by = "date") %>%
mutate(home_game = ifelse(team_home == "Philadelphia Eagles", 1, 0),
away_game = ifelse(team_away == "Philadelphia Eagles", 1, 0),
favorite = ifelse(team_favorite_id == "PHI", 1, 0),
rivalry = ifelse(team_home == "Dallas Cowboys" | team_home == "Washington Redskins" |
team_home == "New York Giants" | team_away == "Dallas Cowboys" |
team_away == "Washington Redskins" | team_away == "New York Giants", 1, 0),
gameday = ifelse(schedule_playoff == "TRUE" | schedule_playoff == "FALSE", 1, 0),
dotw = wday(mdy(schedule_date)),
total_points = score_home + score_away)
corr_theft <-
select_if(final_theft, is.numeric)
corr_theft[is.na(corr_theft)] <- 0
corr_theft %>%
filter(gameday == 1)
## # A tibble: 101 × 16
## crime_count schedule_season score_home score_away spread_favorite
## <dbl> <int> <int> <int> <dbl>
## 1 115 2014 34 17 -10
## 2 124 2014 27 30 -3
## 3 131 2014 37 34 -4
## 4 97 2014 26 21 -3.5
## 5 114 2014 34 28 -3.5
## 6 93 2014 27 0 -1
## 7 101 2014 24 20 -1
## 8 105 2014 21 31 -1.5
## 9 138 2014 45 21 -7
## 10 98 2014 53 20 -4.5
## # … with 91 more rows, and 11 more variables: over_under_line <dbl>,
## # weather_temperature <int>, weather_wind_mph <int>, weather_humidity <int>,
## # home_game <dbl>, away_game <dbl>, favorite <dbl>, rivalry <dbl>,
## # gameday <dbl>, dotw <dbl>, total_points <int>
ggcorrplot(
round(cor(corr_theft), 1),
p.mat = cor_pmat(corr_theft),
colors = c("#691652", "white", "#165269"),
type="lower",
insig = "blank") +
labs(title = "Thefts") +
plotTheme
Since game day attributes do not seem to be significant, time is analyzed next.
all_time <- left_join(all_categories, corr_data, by = "date") %>%
mutate(home_game = ifelse(team_home == "Philadelphia Eagles", 1, 0),
away_game = ifelse(team_away == "Philadelphia Eagles", 1, 0),
favorite = ifelse(team_favorite_id == "PHI", 1, 0),
rivalry = ifelse(team_home == "Dallas Cowboys" | team_home == "Washington Redskins" |
team_home == "New York Giants" | team_away == "Dallas Cowboys" |
team_away == "Washington Redskins" | team_away == "New York Giants", 1, 0),
gameday = ifelse(schedule_playoff == "TRUE" | schedule_playoff == "FALSE", 1, 0),
dotw = wday(mdy(schedule_date)),
total_points = score_home + score_away)
library(viridis)
library(scales)
all_time %>%
group_by(category, hour_) %>%
summarise(count=n()) %>%
ggplot(aes(x=category, y=hour_)) +
geom_tile(aes(fill=count)) +
labs(x="Crime Type", y = "Hour (0-23)", title="2014-2019 Total") +
scale_fill_viridis_c("Number of Crimes",label=comma) +
coord_flip()
## `summarise()` has grouped output by 'category'. You can override using the
## `.groups` argument.
all_time %>%
filter(gameday == 1) %>%
group_by(category,hour_) %>%
summarise(count=n()) %>%
ggplot(aes(x=category, y=hour_)) +
geom_tile(aes(fill=count)) +
labs(x="Crime Type", y = "Hour (0-23)", title="2014-2019 Gamedays") +
scale_fill_viridis_c("Number of Crimes",label=comma) +
coord_flip()
## `summarise()` has grouped output by 'category'. You can override using the
## `.groups` argument.
crime_test <- crime_joined %>%
group_by(date, gameday, hour_) %>%
summarize(crimes_per_hour = sum(count)) %>%
st_drop_geometry()
## `summarise()` has grouped output by 'date', 'gameday'. You can override using
## the `.groups` argument.
test_averages <- crime_test %>%
group_by(hour_, gameday) %>%
summarize(avg_per_hour = mean(crimes_per_hour))
## `summarise()` has grouped output by 'hour_'. You can override using the
## `.groups` argument.
test_averages <- test_averages[-49,]
test_averages[is.na(test_averages)] <- 0
ggplot(test_averages, aes(x = hour_, y = avg_per_hour)) +
geom_bar(stat = "identity") +
facet_wrap(~gameday) +
plotTheme
The figure above suggests overnight crimes are a bit more common (on average) on days with eagles games.
We already know why crime would be likely to occur around a stadium, but here we will see how the Linc specifically relates to crime on game days.
# avg crime per hour on gamedays within buffer vs avg crime per hour within buffer on other days
linc_joined <- st_join(lincBuffer, all_categories, join = st_intersects)
linc_group <- left_join(linc_joined, corr_data, by = "date") %>%
mutate(home_game = ifelse(team_home == "Philadelphia Eagles", 1, 0),
away_game = ifelse(team_away == "Philadelphia Eagles", 1, 0),
favorite = ifelse(team_favorite_id == "PHI", 1, 0),
rivalry = ifelse(team_home == "Dallas Cowboys" | team_home == "Washington Redskins" |
team_home == "New York Giants" | team_away == "Dallas Cowboys" |
team_away == "Washington Redskins" | team_away == "New York Giants", 1, 0),
gameday = ifelse(schedule_playoff == "TRUE" | schedule_playoff == "FALSE", 1, 0),
dotw = wday(mdy(schedule_date)),
total_points = score_home + score_away)
# SEPARATE BY GAMEDAY FIRST - facet wrap will use same scale.
linc_group2 <- linc_group %>%
mutate(year = year(ymd(date)))
linc_prep <- linc_group %>%
group_by(date, gameday, hour_) %>%
summarize(crimes_per_hour = sum(count)) %>%
st_drop_geometry()
## `summarise()` has grouped output by 'date', 'gameday'. You can override using
## the `.groups` argument.
linc_averages <- linc_prep %>%
group_by(hour_, gameday) %>%
summarize(avg_per_hour = mean(crimes_per_hour))
## `summarise()` has grouped output by 'hour_'. You can override using the
## `.groups` argument.
# test_averages <- test_averages[-49,]
linc_averages[is.na(linc_averages)] <- 0
linc_table <- linc_averages %>%
group_by(gameday) %>%
summarize(per_hour_avg = mean(avg_per_hour))
# DT::datatable(linc_table, options = list(pageLength = 5,scrollX='400px'))
kable(linc_table, "simple")
gameday | per_hour_avg |
---|---|
0 | 1.192926 |
1 | 1.279609 |
The table above shows that within 1000 meters of Lincoln Financial Field, the average number of crimes committed each hour is slightly higher on game days. The average city block in Philadelphia is about 400-500 feet. This means our buffer is around 7 or 8 blocks around the Linc. This is an interesting result as Lincoln Financial Field, as well as Philadelphia’s other professional sports stadiums (Wells Fargo Center and Citizens Bank Park), are in their own part of the city. This small of an increase, nearly 1.2 to 1.3 crimes per hour, is once again not consistent with previous research.
Let’s see if repeating this process for our three offense categories show any interesting results.
Note: The following line graphs do not include zero values
linc_cat_prep <- linc_group %>%
group_by(date, gameday, category, hour_) %>%
summarize(crimes_per_hour = sum(count)) %>%
st_drop_geometry()
## `summarise()` has grouped output by 'date', 'gameday', 'category'. You can
## override using the `.groups` argument.
linc_cat_averages <- linc_cat_prep %>%
group_by(hour_, gameday, category) %>%
summarize(avg_per_hour = mean(crimes_per_hour))
## `summarise()` has grouped output by 'hour_', 'gameday'. You can override using
## the `.groups` argument.
linc_cat_averages[is.na(linc_cat_averages)] <- 0
linc_cat_table <- linc_cat_averages %>%
group_by(gameday,category) %>%
summarize(per_hour_avg = mean(avg_per_hour))
## `summarise()` has grouped output by 'gameday'. You can override using the
## `.groups` argument.
ggplot(linc_cat_table,
aes(x = reorder(category, -per_hour_avg),
y = per_hour_avg,
color = as.factor(gameday),
group = as.factor(gameday))) +
geom_line() +
geom_point() +
scale_fill_viridis(discrete = TRUE) +
scale_color_viridis(discrete = TRUE) +
plotTheme +
labs(title = "Average Crimes Per Hour Within 1000 m. of Lincoln Financial Field")+
plotTheme
linc_cat_prep <- linc_group2 %>%
group_by(date, year, gameday, category, hour_) %>%
summarize(crimes_per_hour = sum(count)) %>%
st_drop_geometry()
## `summarise()` has grouped output by 'date', 'year', 'gameday', 'category'. You
## can override using the `.groups` argument.
linc_cat_averages <- linc_cat_prep %>%
group_by(year, hour_, gameday, category) %>%
summarize(avg_per_hour = mean(crimes_per_hour))
## `summarise()` has grouped output by 'year', 'hour_', 'gameday'. You can
## override using the `.groups` argument.
linc_cat_averages[is.na(linc_cat_averages)] <- 0
linc_cat_table <- linc_cat_averages %>%
group_by(year,gameday,category) %>%
summarize(per_hour_avg = mean(avg_per_hour))
## `summarise()` has grouped output by 'year', 'gameday'. You can override using
## the `.groups` argument.
ggplot(linc_cat_table,
aes(x = reorder(category, -per_hour_avg),
y = per_hour_avg,
color = as.factor(gameday),
group = as.factor(gameday))) +
geom_line() +
geom_point() +
scale_fill_viridis(discrete = TRUE) +
scale_color_viridis(discrete = TRUE) +
plotTheme +
labs(title = "Average Crimes Per Hour Within 1000 m. of Lincoln Financial Field",
subtitle = "By Year")+
facet_wrap(~year, scales = "free")+
plotTheme
Let’s see if average crimes per hour changes within 2 blocks of Philly bars and pubs.
bars <- st_read("data/pub_point.geojson") %>%
st_transform("EPSG:3857") %>%
st_as_sf()
## Reading layer `sql_statement' from data source
## `/Users/seanmcclellan/Desktop/School/Capstone/Data/pub_point.geojson'
## using driver `GeoJSON'
## Simple feature collection with 142 features and 23 fields
## Geometry type: POINT
## Dimension: XY
## Bounding box: xmin: -75.24387 ymin: 39.87382 xmax: -75.03771 ymax: 40.06808
## Geodetic CRS: WGS 84
bar_buffer <-
st_buffer(bars, 275) %>%
st_union() %>%
st_as_sf()
bars_joined <- st_join(bar_buffer, all_categories, join = st_intersects)
bar_group <- left_join(bars_joined, corr_data, by = "date") %>%
mutate(home_game = ifelse(team_home == "Philadelphia Eagles", 1, 0),
away_game = ifelse(team_away == "Philadelphia Eagles", 1, 0),
favorite = ifelse(team_favorite_id == "PHI", 1, 0),
rivalry = ifelse(team_home == "Dallas Cowboys" | team_home == "Washington Redskins" |
team_home == "New York Giants" | team_away == "Dallas Cowboys" |
team_away == "Washington Redskins" | team_away == "New York Giants", 1, 0),
gameday = ifelse(schedule_playoff == "TRUE" | schedule_playoff == "FALSE", 1, 0),
dotw = wday(mdy(schedule_date)),
total_points = score_home + score_away)
bar_group2 <- bar_group %>%
mutate(year = year(ymd(date)))
bar_prep <- bar_group %>%
group_by(date, gameday, hour_) %>%
summarize(crimes_per_hour = sum(count)) %>%
st_drop_geometry()
## `summarise()` has grouped output by 'date', 'gameday'. You can override using
## the `.groups` argument.
bar_averages <- bar_prep %>%
group_by(hour_, gameday) %>%
summarize(avg_per_hour = mean(crimes_per_hour))
## `summarise()` has grouped output by 'hour_'. You can override using the
## `.groups` argument.
# test_averages <- test_averages[-49,]
bar_averages[is.na(bar_averages)] <- 0
bar_table <- bar_averages %>%
group_by(gameday) %>%
summarize(per_hour_avg = mean(avg_per_hour))
# DT::datatable(bar_table, options = list(pageLength = 5,scrollX='400px'))
kable(bar_table, "simple")
gameday | per_hour_avg |
---|---|
0 | 2.603614 |
1 | 2.273201 |
According to this analysis, bars are actually a bit safer on days of an Eagles game.
bar_cat_prep <- bar_group %>%
group_by(date, gameday, category, hour_) %>%
summarize(crimes_per_hour = sum(count)) %>%
st_drop_geometry()
## `summarise()` has grouped output by 'date', 'gameday', 'category'. You can
## override using the `.groups` argument.
bar_cat_averages <- bar_cat_prep %>%
group_by(hour_, gameday, category) %>%
summarize(avg_per_hour = mean(crimes_per_hour))
## `summarise()` has grouped output by 'hour_', 'gameday'. You can override using
## the `.groups` argument.
bar_cat_averages[is.na(bar_cat_averages)] <- 0
bar_cat_table <- bar_cat_averages %>%
group_by(gameday,category) %>%
summarize(per_hour_avg = mean(avg_per_hour))
## `summarise()` has grouped output by 'gameday'. You can override using the
## `.groups` argument.
ggplot(bar_cat_table,
aes(x = reorder(category, -per_hour_avg),
y = per_hour_avg,
color = as.factor(gameday),
group = as.factor(gameday))) +
geom_line() +
geom_point() +
scale_fill_viridis(discrete = TRUE) +
scale_color_viridis(discrete = TRUE) +
plotTheme +
labs(title = "Average Crimes Per Hour Within 2 Blocks of Bars & Pubs")+
plotTheme
bar_cat_prep <- bar_group2 %>%
group_by(date, year, gameday, category, hour_) %>%
summarize(crimes_per_hour = sum(count)) %>%
st_drop_geometry()
## `summarise()` has grouped output by 'date', 'year', 'gameday', 'category'. You
## can override using the `.groups` argument.
bar_cat_averages <- bar_cat_prep %>%
group_by(year, hour_, gameday, category) %>%
summarize(avg_per_hour = mean(crimes_per_hour))
## `summarise()` has grouped output by 'year', 'hour_', 'gameday'. You can
## override using the `.groups` argument.
bar_cat_averages[is.na(bar_cat_averages)] <- 0
bar_cat_table <- bar_cat_averages %>%
group_by(year,gameday,category) %>%
summarize(per_hour_avg = mean(avg_per_hour))
## `summarise()` has grouped output by 'year', 'gameday'. You can override using
## the `.groups` argument.
ggplot(bar_cat_table,
aes(x = reorder(category, -per_hour_avg),
y = per_hour_avg,
color = as.factor(gameday),
group = as.factor(gameday))) +
geom_line() +
geom_point() +
scale_fill_viridis(discrete = TRUE) +
scale_color_viridis(discrete = TRUE) +
plotTheme +
labs(title = "Average Crimes Per Hour Within 2 Blocks of Bars & Pubs",
subtitle = "By Year")+
facet_wrap(~year, scales = "free")+
plotTheme
The final local feature in this assessment will be stations from SEPTA’s Broad Street and Market Frankford lines. First, let’s take a look at crime surrounding these stations for the entire year of 2019. Subway stops, and other transportation hubs, are known to be crime generators.
# crime_joined$ymd_hms <- paste(crime_joined$dispatch_d, crime_joined$dispatch_t)
#
# cleaned_crime <- crime_joined %>%
# mutate(interval60 = floor_date(ymd_hms(ymd_hms), unit = "hour"),
# interval15 = floor_date(ymd_hms(ymd_hms), unit = "15 mins"))
gameday_tracts <- st_join(tracts, crime_joined, join = st_intersects)
tract_summary <- gameday_tracts %>%
group_by(GEOID10) %>%
summarize(crime_count = sum(count))
clip <-
st_intersection(septaBuffer, tract_summary) %>%
dplyr::select(crime_count) %>%
mutate(Selection_Type = "Clip")
qBr <- function(df, variable, rnd) {
if (missing(rnd)) {
as.character(quantile(round(df[[variable]],0),
c(.01,.2,.4,.6,.8), na.rm=T))
} else if (rnd == FALSE | rnd == F) {
as.character(formatC(quantile(df[[variable]]), digits = 3),
c(.01,.2,.4,.6,.8), na.rm=T)
}
}
q5 <- function(variable) {as.factor(ntile(variable, 5))}
ggplot() +
geom_sf(data=phlOutline) +
geom_sf(data = tracts, size =0.2, color = "grey85", fill = "gray98") +
geom_sf(data=clip,
aes(fill=q5(crime_count)),
color = "transparent") +
scale_fill_viridis(discrete = TRUE, labels=qBr(clip,"crime_count"),
name="Total Crime\n(Quintile Breaks)") +
# scale_fill_manual(values=c("#2998C3","#2586AD","#207596","#1B6380","#165269"),
# labels=qBr(clip,"crime_count"),
# name="Total Crime\n(Quintile Breaks)")+
geom_sf(data=phlOutline, fill = "transparent", size = 1) +
# geom_sf(data = septa_stops, color = "white", size = 0.1) +
labs(title="Total Crimes Within 500 m. of SEPTA Stops",
subtitle="2019") +
mapTheme
septa_joined <- st_join(septaBuffer, all_categories, join = st_intersects)
septa_group <- left_join(septa_joined, corr_data, by = "date") %>%
mutate(home_game = ifelse(team_home == "Philadelphia Eagles", 1, 0),
away_game = ifelse(team_away == "Philadelphia Eagles", 1, 0),
favorite = ifelse(team_favorite_id == "PHI", 1, 0),
rivalry = ifelse(team_home == "Dallas Cowboys" | team_home == "Washington Redskins" |
team_home == "New York Giants" | team_away == "Dallas Cowboys" |
team_away == "Washington Redskins" | team_away == "New York Giants", 1, 0),
gameday = ifelse(schedule_playoff == "TRUE" | schedule_playoff == "FALSE", 1, 0),
dotw = wday(mdy(schedule_date)),
total_points = score_home + score_away)
# SEPARATE BY GAMEDAY FIRST - facet wrap will use same scale.
septa_group2 <- septa_group %>%
mutate(year = year(ymd(date)))
septa_prep <- septa_group %>%
group_by(date, gameday, hour_) %>%
summarize(crimes_per_hour = sum(count)) %>%
st_drop_geometry()
## `summarise()` has grouped output by 'date', 'gameday'. You can override using
## the `.groups` argument.
septa_averages <- septa_prep %>%
group_by(hour_, gameday) %>%
summarize(avg_per_hour = mean(crimes_per_hour))
## `summarise()` has grouped output by 'hour_'. You can override using the
## `.groups` argument.
# test_averages <- test_averages[-49,]
septa_averages[is.na(septa_averages)] <- 0
septa_table <- septa_averages %>%
group_by(gameday) %>%
summarize(per_hour_avg = mean(avg_per_hour))
# DT::datatable(septa_table, options = list(pageLength = 5,scrollX='400px'))
kable(septa_table, "simple")
gameday | per_hour_avg |
---|---|
0 | 2.779927 |
1 | 2.435396 |
septa_cat_prep <- septa_group %>%
group_by(date, gameday, category, hour_) %>%
summarize(crimes_per_hour = sum(count)) %>%
st_drop_geometry()
## `summarise()` has grouped output by 'date', 'gameday', 'category'. You can
## override using the `.groups` argument.
septa_cat_averages <- septa_cat_prep %>%
group_by(hour_, gameday, category) %>%
summarize(avg_per_hour = mean(crimes_per_hour))
## `summarise()` has grouped output by 'hour_', 'gameday'. You can override using
## the `.groups` argument.
#ERROR - cant convert double to character?
septa_cat_averages <- as.data.frame(septa_cat_averages)
septa_cat_averages[is.na(septa_cat_averages)] = 0
septa_cat_table <- septa_cat_averages %>%
group_by(gameday,category) %>%
summarize(per_hour_avg = mean(avg_per_hour))
## `summarise()` has grouped output by 'gameday'. You can override using the
## `.groups` argument.
ggplot(septa_cat_table,
aes(x = reorder(category, -per_hour_avg),
y = per_hour_avg,
color = as.factor(gameday),
group = as.factor(gameday))) +
geom_line() +
geom_point() +
scale_fill_viridis(discrete = TRUE) +
scale_color_viridis(discrete = TRUE) +
plotTheme +
labs(title = "Average Crimes Per Hour Within 2 Blocks of SEPTA Stops")+
plotTheme
septa_cat_prep <- septa_group2 %>%
group_by(date, year, gameday, category, hour_) %>%
summarize(crimes_per_hour = sum(count)) %>%
st_drop_geometry()
## `summarise()` has grouped output by 'date', 'year', 'gameday', 'category'. You
## can override using the `.groups` argument.
septa_cat_averages <- septa_cat_prep %>%
group_by(year, hour_, gameday, category) %>%
summarize(avg_per_hour = mean(crimes_per_hour))
## `summarise()` has grouped output by 'year', 'hour_', 'gameday'. You can
## override using the `.groups` argument.
septa_cat_averages <- as.data.frame(septa_cat_averages)
septa_cat_averages[is.na(septa_cat_averages)] <- 0
septa_cat_table <- septa_cat_averages %>%
group_by(year,gameday,category) %>%
summarize(per_hour_avg = mean(avg_per_hour))
## `summarise()` has grouped output by 'year', 'gameday'. You can override using
## the `.groups` argument.
ggplot(septa_cat_table,
aes(x = reorder(category, -per_hour_avg),
y = per_hour_avg,
color = as.factor(gameday),
group = as.factor(gameday))) +
geom_line() +
geom_point() +
scale_fill_viridis(discrete = TRUE) +
scale_color_viridis(discrete = TRUE) +
plotTheme +
labs(title = "Average Crimes Per Hour Within 2 Blocks of SEPTA Stops",
subtitle = "By Year")+
facet_wrap(~year, scales = "free")+
plotTheme
Repeating this analysis on another NFL team, perhaps a team in a smaller city, or a city where crime is less of a prevalent issue than it is in Philadelphia, could provide different results. Philadelphia has been experiencing increasing crime rates for quite a few years at this point. Pairing that recent trend with the fact that Philadelphia is one of the largest cities in the country offers an explanation as to why crime counts do not seem to change based on the presence of an Eagles game.
While some research has been completed on the relationship between crime and sports teams, only a few of these have dealt with professional sports, and even fewer have analyzed NFL Football. No research has been completed at this point that includes any geospatial analysis of this line of study.
In this geospatial analysis, I found increases (of varying levels) on game days in the following:
and game day decreases in the following:
Further research should be done in this area with a more in depth comparison. For instance, instead of aggregating all non-game days from an entire year and comparing them to just 16 game days, only 8 of which are home games, a more selective approach could be conducted. To be sure this approach sets up a stronger and more fair comparison, the Sunday of each bye-week from the season could be used as a control. That way the seasonality changes play a minimal role in any variance.
If this were to be replicated and improved, additional analysis should be completed on the hours during a game, and for hours following a game. Additionally, this further analysis should be completed with respect to the different effects of home vs. away games. A stronger control or method of standardizing will likely produce results similar to those of the research mentioned in this report.
While this report was simply meant for explaining past events, correlations, and patterns, this type of analysis could absolutely serve a real purpose. If it was completed with a stronger control and comparison, there is no reason a college town couldn’t use this data in a predictive model from weekend to weekend. Knowing what factors influence crime increases, such as opponent, game start time, or time of year can be pivotal in smaller towns’ responses to crime. Obviously, the regular concerns of predictive policing would still apply and require the utmost caution and unbiased attention in designing such a model, but the use case is definitely existent.