library(tidyverse)
library(gt)
02 - Getting Started
Slides
Your Turn
Set-up
Load necessary packages
Preview data
glimpse(towny)
Rows: 414
Columns: 25
$ name <chr> "Addington Highlands", "Adelaide Metcalfe", "…
$ website <chr> "https://addingtonhighlands.ca", "https://ade…
$ status <chr> "lower-tier", "lower-tier", "lower-tier", "lo…
$ csd_type <chr> "township", "township", "township", "township…
$ census_div <chr> "Lennox and Addington", "Middlesex", "Simcoe"…
$ latitude <dbl> 45.00000, 42.95000, 44.13333, 45.52917, 43.85…
$ longitude <dbl> -77.25000, -81.70000, -79.93333, -76.89694, -…
$ land_area_km2 <dbl> 1293.99, 331.11, 371.53, 519.59, 66.64, 116.6…
$ population_1996 <int> 2429, 3128, 9359, 2837, 64430, 1027, 8315, 16…
$ population_2001 <int> 2402, 3149, 10082, 2824, 73753, 956, 8593, 18…
$ population_2006 <int> 2512, 3135, 10695, 2716, 90167, 958, 8654, 19…
$ population_2011 <int> 2517, 3028, 10603, 2844, 109600, 864, 9196, 2…
$ population_2016 <int> 2318, 2990, 10975, 2935, 119677, 969, 9680, 2…
$ population_2021 <int> 2534, 3011, 10989, 2995, 126666, 954, 9949, 2…
$ density_1996 <dbl> 1.88, 9.45, 25.19, 5.46, 966.84, 8.81, 21.22,…
$ density_2001 <dbl> 1.86, 9.51, 27.14, 5.44, 1106.74, 8.20, 21.93…
$ density_2006 <dbl> 1.94, 9.47, 28.79, 5.23, 1353.05, 8.22, 22.09…
$ density_2011 <dbl> 1.95, 9.14, 28.54, 5.47, 1644.66, 7.41, 23.47…
$ density_2016 <dbl> 1.79, 9.03, 29.54, 5.65, 1795.87, 8.31, 24.71…
$ density_2021 <dbl> 1.96, 9.09, 29.58, 5.76, 1900.75, 8.18, 25.39…
$ pop_change_1996_2001_pct <dbl> -0.0111, 0.0067, 0.0773, -0.0046, 0.1447, -0.…
$ pop_change_2001_2006_pct <dbl> 0.0458, -0.0044, 0.0608, -0.0382, 0.2226, 0.0…
$ pop_change_2006_2011_pct <dbl> 0.0020, -0.0341, -0.0086, 0.0471, 0.2155, -0.…
$ pop_change_2011_2016_pct <dbl> -0.0791, -0.0125, 0.0351, 0.0320, 0.0919, 0.1…
$ pop_change_2016_2021_pct <dbl> 0.0932, 0.0070, 0.0013, 0.0204, 0.0584, -0.01…
Exercises
- How many different types of CSD (
csd_type
) are there in the dataset?
- How many different types of CSD and status (
status
) are there in the dataset?
- What is the proportion of each type of CSD?
- What is the proportion of each status within type of CSD?
- What is the mean population of all of the municipalities in 2021?
- What is the mean population by CSD Type in 2021?
- What is the mean population of all of the municipalities in 1996, 2001, 2006, 2011, 2016, and 2021? Try to use the across function.
- Advanced bonus exercise Run a t-test to see if the average population in 1996 is different from the average population in 2016.
Solutions
See the solutions
- How many different types of CSD (
csd_type
) are there in the dataset?
Show code
%>%
towny count(csd_type)
# A tibble: 5 × 2
csd_type n
<chr> <int>
1 city 52
2 municipality 68
3 town 88
4 township 195
5 village 11
- How many different types of CSD and status (
status
) are there in the dataset?
Show code
%>%
towny count(csd_type, status)
# A tibble: 10 × 3
csd_type status n
<chr> <chr> <int>
1 city lower-tier 20
2 city single-tier 32
3 municipality lower-tier 41
4 municipality single-tier 27
5 town lower-tier 61
6 town single-tier 27
7 township lower-tier 113
8 township single-tier 82
9 village lower-tier 6
10 village single-tier 5
- What is the proportion of each type of CSD?
Show code
%>%
towny count(csd_type) %>%
mutate(p = n / sum(n))
# A tibble: 5 × 3
csd_type n p
<chr> <int> <dbl>
1 city 52 0.126
2 municipality 68 0.164
3 town 88 0.213
4 township 195 0.471
5 village 11 0.0266
- What is the proportion of each status within type of CSD?
Show code
%>%
towny count(csd_type, status) %>%
group_by(csd_type) %>%
mutate(p = n / sum(n))
# A tibble: 10 × 4
# Groups: csd_type [5]
csd_type status n p
<chr> <chr> <int> <dbl>
1 city lower-tier 20 0.385
2 city single-tier 32 0.615
3 municipality lower-tier 41 0.603
4 municipality single-tier 27 0.397
5 town lower-tier 61 0.693
6 town single-tier 27 0.307
7 township lower-tier 113 0.579
8 township single-tier 82 0.421
9 village lower-tier 6 0.545
10 village single-tier 5 0.455
- What is the mean population of all of the municipalities in 2021?
Show code
%>%
towny summarize(mean_pop = mean(population_2021, na.rm = TRUE))
# A tibble: 1 × 1
mean_pop
<dbl>
1 34142.
- What is the mean population by CSD Type in 2021?
Show code
%>%
towny group_by(csd_type) %>%
summarize(mean_pop = mean(population_2021, na.rm = TRUE))
# A tibble: 5 × 2
csd_type mean_pop
<chr> <dbl>
1 city 200005.
2 municipality 10102.
3 town 22579.
4 township 5367.
5 village 1276.
- What is the mean population of all of the municipalities in 1996, 2001, 2006, 2011, 2016, and 2021? Try to use the across function.
Show code
%>%
towny summarize(across(starts_with("population"), ~ mean(.x, na.rm = TRUE)))
# A tibble: 1 × 6
population_1996 population_2001 population_2006 population_2011
<dbl> <dbl> <dbl> <dbl>
1 25866. 27538. 29173. 30838.
# ℹ 2 more variables: population_2016 <dbl>, population_2021 <dbl>
- Run a t-test to see if the average population in 1996 is different from the average population in 2016.
Show code
t.test(towny$population_1996, towny$population_2016, paired = TRUE)
Paired t-test
data: towny$population_1996 and towny$population_2016
t = -4.3204, df = 410, p-value = 1.956e-05
alternative hypothesis: true mean difference is not equal to 0
95 percent confidence interval:
-9593.244 -3593.423
sample estimates:
mean difference
-6593.333
or
Show code (another way)
t.test(population_1996 - population_2016 ~ 1, data = towny)
One Sample t-test
data: population_1996 - population_2016
t = -4.3204, df = 410, p-value = 1.956e-05
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
-9593.244 -3593.423
sample estimates:
mean of x
-6593.333