An analysis of the Longbeach Animal Shelter dataset
tidy tuesday
r
linear regression
data analysis
Author
Liam Cottrell
Published
March 3, 2025
Introduction
This week’s #TidyTuesday is an exploration of data from an animal shelter in California. I spent an unreasonable amount of time learning how to build a waffle plot.
I have also been learning about linear regression so have made a very shoddy attempt to build a model! It doesn’t work because the data is not normally distributed…so at least I know that now 🥲
Q1. How long does it take an animal to be adopted from a shelter in California?
# figure out adoption times, remove NAs and 0 daysadoption <- longbeach %>%select(intake_date, outcome_date, outcome_type) %>%filter(outcome_type =='adoption') %>%mutate(adoption_time =as.numeric(outcome_date -intake_date, na.rm=TRUE)) %>%filter(!is.na(adoption_time)) %>%filter(adoption_time >0) %>%arrange(-adoption_time)head(adoption)
# adoption times descriptive statisticstime <- adoption$adoption_timesummary(time)
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.00 11.00 27.00 48.61 62.00 1107.00
# histogram of adoption timesggplot(adoption, aes(x = adoption_time)) +geom_histogram(bins =60, fill ="#FFA07A", color ="white") +scale_x_log10() +labs(x ="Length of time to adoption (days)", y ="Frequency",caption ="log10 scale") +theme_minimal()
Hypothesis test
The null hypothesis is \(H_0: \mu = 48.61\).
# hypothesis testt.test(time, mu =48.61, alternative ="two.sided")
One Sample t-test
data: time
t = -0.0028233, df = 6245, p-value = 0.9977
alternative hypothesis: true mean is not equal to 48.61
95 percent confidence interval:
46.93340 50.28178
sample estimates:
mean of x
48.60759
The p-value is 0.998 so we fail to reject the null hypothesis. Based on the data, we are 95% confident that the average time taken for an animal to be adopted from a shelter in California is between 46.9 and 50.3 days.
Q2. Which animal colours are most common at the shelter?