Photo of Salt Lake City via Reddit. What do you think of our air quality? #inversion https://t.co/8scLI2cPW6 pic.twitter.com/yokMauGdfc
— KUTV2news (@KUTV2News) December 15, 2017
SLC is famous for its winter air pollution (specifically PM2.5). We want to nail down if it’s getting better. In our last post we looked at hourly measurements from Hawthorne Elementary in SLC proper. The trend was encouragingly downward! But, that post only looked at one site and didn’t focus on the winter (when the problem is at its worst).
Let’s look at 2007-2017 daily data to compare Rose Park (EPA site 49-035-3010) and Hawthorne Elementary (site 49-035-3006), both of which are in SLC proper and 5.2 miles apart as the crow flies. We’ll work up to a winter by winter comparison over the last ten years.
We download the data from here and again use these parameters:
Let’s read the two flat files into R and combine.
library(tidyverse)
# While I could have combined this into one file, I didn't want to mask
# how the original .txt data looked coming from EPA website (reproducibility!)
dfH <- read_csv('../data/SLCPM2_5/SLCHawthorne_PM2.5_Daily_1999-2017.csv')
dfRP <- read_csv('../data/SLCPM2_5/SLCRosePark_PM2.5_Daily_2007-2017.csv')
df <- dplyr::bind_rows(dfH, dfRP)
For brevity, we’ve loaded in our data and now we’ll grab data from the most robust Hawthorne sensor and select only the helpful columns.
dfinal <- df %>%
# Hawthorne has several options; grab the most complete record
filter(POC == 1,
`AQS Parameter Desc` == 'PM2.5 - Local Conditions') %>%
# Change the Site column to be more readable
mutate(Site = ifelse(`Site Num` == 3006, 'Hawthorne', 'Rose Park')) %>%
mutate(Date = as.Date(`Date Local`, format="%m/%d/%Y")) %>%
rename(Measurement = `Sample Measurement`) %>%
select(Date, Site, Measurement) %>%
arrange(desc(Date))
# Look at the data
head(dfinal, 4)
## # A tibble: 4 x 3
## Date Site Measurement
## <date> <chr> <dbl>
## 1 2017-11-30 Hawthorne 21.2
## 2 2017-11-30 Rose Park 20.1
## 3 2017-11-29 Hawthorne 11.3
## 4 2017-11-29 Rose Park 12.4
Nice and simple. Note that this tibble is the handy tidy format, in that you could add any number of sites and it wouldn’t require the annoyance of adding new columns. Let’s now count up the average number of observations per year, to see how much data might be missing.
dfinal %>%
mutate(Year = lubridate::year(Date)) %>%
filter(Year %in% 2007:2011) %>%
group_by(Site, Year) %>%
summarize(`Measurements Per Site` = length(Measurement))
## # A tibble: 10 x 3
## # Groups: Site [?]
## Site Year `Measurements Per Site`
## <chr> <dbl> <int>
## 1 Hawthorne 2007 341
## 2 Hawthorne 2008 362
## 3 Hawthorne 2009 351
## 4 Hawthorne 2010 305
## 5 Hawthorne 2011 344
## 6 Rose Park 2007 255
## 7 Rose Park 2008 356
## 8 Rose Park 2009 320
## 9 Rose Park 2010 331
## 10 Rose Park 2011 346
Note that we weirdly have fewer than 365 observations per year (but foruntately still enough to work with). Note sure why these sites have such issues–anyone?
Now that we’ve simplified the dataset and feel good about its coverage over time, let’s see how well the measurements from Hawthorne and Rose Park correspond. While there are statistical ways to do this, let’s simply plot.
library(scales)
ggplot(dfinal, aes(x = Date, y = Measurement, color = Site)) +
geom_line() +
labs(title = "3-Day Running Average PM2.5 in SLC from 2007 - 2017") +
scale_colour_manual(values = c("black", "red")) +
scale_y_log10(name = "PM2.5 µg/m3",
limits = c(1,NA),
breaks = c(1, 5, 10, 15)) +
scale_x_date(name = "Year-Month",
labels = date_format("%Y-%m"),
limits = c(as.Date('2007-01-01'),NA),
breaks = date_breaks("1 year"))
While some missing data, it indeed looks like there is fairly good agreement between the Hawthorne and Rose Park daily PM2.5 values from 2007-2017. And it indeed appears that for both sites winter air quality has generally been improving.
Even though Hawthorne and Rose Park mostly track together, how to they differ overall? To check longer-term differences, let’s plot with a loess
smoother (which is built into ggplot2
).
ggplot(dfinal, aes(x = Date, y = Measurement, color = Site)) +
geom_smooth(method = 'loess') +
labs(title = "Smoothed PM2.5 in SLC from 1999 - 2017") +
scale_colour_manual(values = c("black", "red")) +
scale_y_continuous(name = "PM2.5 µg/m3",
breaks = c(6, 8, 10, 12)) +
scale_x_date(name = "Year",
limits = c(as.Date('1999-01-01'),
as.Date('2017-12-01')),
labels = date_format("%Y"),
breaks = date_breaks("1 year"))
Overall, note the downward Hawthorne trend since ~2004! Interestingly Rose Park PM2.5 levels are typically higher than those at Hawthorne Elementary (which is in the Liberty Wells neighborhood). Let’s zoom a bit, since Rose Park data starts so much later.
ggplot(dfinal, aes(x = Date, y = Measurement, color = Site)) +
geom_smooth(method = 'loess') +
labs(title = "Smoothed PM2.5 in SLC from 2008 - 2017") +
scale_colour_manual(values = c("black", "red")) +
scale_y_continuous(name = "PM2.5 µg/m3",
breaks = c(8, 10, 12)) +
scale_x_date(name = "Year-Month",
labels = date_format("%Y-%m"),
breaks = date_breaks("1 year")) +
# Zooming w/o affecting smoother
coord_cartesian(xlim=c(as.Date('2008-01-01'),
as.Date('2017-12-01')),
expand = FALSE)
Over much of the last six years there’s been a difference of ~1 µg/m3 between the sites. While that level of background difference doesn’t seem like much, this smoothing likely hides shorter-term spikes; what’s scary is even differences in low-level PM2.5 exposure can lead to adverse health effects.
Air pollution:
— HEAL Utah (@healutah) January 25, 2018
• during pregnancy has been linked to miscarriages, premature birth, autism spectrum disorder & asthma in children.
• Other conditions associated with high levels include emphysema & chronic bronchitis, as well as lung cancer. https://t.co/PWetIG7yQ7 #cleanairnow
While this is speculation, the heightened PM2.5 levels in Rose Park could be due to proximity to the local refinery, which is just 1.5 miles east of the air quality measurement site (although it’s totally lost on me why the difference is exacerbated in 2012).
Now, let’s make the year-over-year winter comparison crystal clear. Note that because we’re only comparing winters, we need a winter column that allows us to use faceting in ggplot2. (Thanks to Mike Levy for the seasonal support here.)
dividers <- as.Date(paste0(2007:2017, "06", "01"), format = "%Y%m%d")
dfinal <- dfinal %>%
mutate(Season = cut(as.numeric(Date),
as.numeric(dividers),
include.lowest = TRUE),
Season = as.numeric(Season),
Month = lubridate::month(Date)) %>%
filter(Date > as.Date('2006-06-01'),
Month %in% c(1,2,11,12), # Only care about winter months
!is.na(Season)) # This avoids an NA facet panel
And now we plot PM2.5 levels for five winters at Hawthorne and Rose Park.
# Add custom winter label for faceting
season_names <- c(
'1' = "2007-08",
'2' = "2008-09",
'3' = "2009-10",
'4' = "2010-11",
'5' = "2011-12",
'6' = "2012-13",
'7' = "2013-14",
'8' = "2014-15",
'9' = "2015-16",
'10' = "2016-17")
ggplot(dfinal, aes(x = Date, y = Measurement)) +
geom_smooth(method = "loess") +
scale_y_log10(name = "PM2.5 µg/m3",
breaks = c(3, 5, 7, 10, 15, 20, 30, 40, 50)) +
scale_x_date(name = "Winter Month",
labels = date_format("%m"),
breaks = date_breaks("1 month")) +
# Zooming w/o affecting smoother
coord_cartesian(ylim = c(3,40)) +
facet_grid(Site ~ Season,
scale = "free",
labeller = labeller(Season = as_labeller(season_names)))
First, note how similar the patterns are between Rose Park and Hawthorne for a given winter–it’s heartening to see the signal is roughly what we’d expect. There is some missing data, however, which is why there’s so much gray in Hawthorne for 2014-15.
Overall, for Hawthorne Elementary (which has the most accurate measurements in the Salt Lake Valley), it’s pretty clear that at least the last two winters have had nearly the lowest PM2.5 levels of the last 10 years. Note that winter 2010-2011 had remarkable air as well. Rose Park confirms this.
While a couple of relatively pollution-free winters could be due to meteorgological conditions (i.e., fewer high-pressure ridges), the fact that this decline in PM2.5 in the Salt Lake Valley is corroborated by the longer term trends shown above gives me confidence that this is a true decline in background pollution.
What does this mean for policy advocacy in Utah? This downward trend is encourgaging and, when combined with the fact that Salt Lake County is still an serious EPA PM2.5 non-attainment area, it gives us reason to believe that brave policy changes can eventually make SLC safe enough for all folks to go outside every winter day. Doesn’t seem too ambitious, right?
Note, see here to track the 2018 Utah Legislative session.