02 - Getting Started

Slides

View slides in full screen

Your Turn

Set-up

Load necessary packages

library(tidyverse)
library(gt)

Preview data

glimpse(towny)
Rows: 414
Columns: 25
$ name                     <chr> "Addington Highlands", "Adelaide Metcalfe", "…
$ website                  <chr> "https://addingtonhighlands.ca", "https://ade…
$ status                   <chr> "lower-tier", "lower-tier", "lower-tier", "lo…
$ csd_type                 <chr> "township", "township", "township", "township…
$ census_div               <chr> "Lennox and Addington", "Middlesex", "Simcoe"…
$ latitude                 <dbl> 45.00000, 42.95000, 44.13333, 45.52917, 43.85…
$ longitude                <dbl> -77.25000, -81.70000, -79.93333, -76.89694, -…
$ land_area_km2            <dbl> 1293.99, 331.11, 371.53, 519.59, 66.64, 116.6…
$ population_1996          <int> 2429, 3128, 9359, 2837, 64430, 1027, 8315, 16…
$ population_2001          <int> 2402, 3149, 10082, 2824, 73753, 956, 8593, 18…
$ population_2006          <int> 2512, 3135, 10695, 2716, 90167, 958, 8654, 19…
$ population_2011          <int> 2517, 3028, 10603, 2844, 109600, 864, 9196, 2…
$ population_2016          <int> 2318, 2990, 10975, 2935, 119677, 969, 9680, 2…
$ population_2021          <int> 2534, 3011, 10989, 2995, 126666, 954, 9949, 2…
$ density_1996             <dbl> 1.88, 9.45, 25.19, 5.46, 966.84, 8.81, 21.22,…
$ density_2001             <dbl> 1.86, 9.51, 27.14, 5.44, 1106.74, 8.20, 21.93…
$ density_2006             <dbl> 1.94, 9.47, 28.79, 5.23, 1353.05, 8.22, 22.09…
$ density_2011             <dbl> 1.95, 9.14, 28.54, 5.47, 1644.66, 7.41, 23.47…
$ density_2016             <dbl> 1.79, 9.03, 29.54, 5.65, 1795.87, 8.31, 24.71…
$ density_2021             <dbl> 1.96, 9.09, 29.58, 5.76, 1900.75, 8.18, 25.39…
$ pop_change_1996_2001_pct <dbl> -0.0111, 0.0067, 0.0773, -0.0046, 0.1447, -0.…
$ pop_change_2001_2006_pct <dbl> 0.0458, -0.0044, 0.0608, -0.0382, 0.2226, 0.0…
$ pop_change_2006_2011_pct <dbl> 0.0020, -0.0341, -0.0086, 0.0471, 0.2155, -0.…
$ pop_change_2011_2016_pct <dbl> -0.0791, -0.0125, 0.0351, 0.0320, 0.0919, 0.1…
$ pop_change_2016_2021_pct <dbl> 0.0932, 0.0070, 0.0013, 0.0204, 0.0584, -0.01…

Exercises

  1. How many different types of CSD (csd_type) are there in the dataset?
  1. How many different types of CSD and status (status) are there in the dataset?
  1. What is the proportion of each type of CSD?
  1. What is the proportion of each status within type of CSD?
  1. What is the mean population of all of the municipalities in 2021?
  1. What is the mean population by CSD Type in 2021?
  1. What is the mean population of all of the municipalities in 1996, 2001, 2006, 2011, 2016, and 2021? Try to use the across function.
  1. Advanced bonus exercise Run a t-test to see if the average population in 1996 is different from the average population in 2016.

Solutions

See the solutions
  1. How many different types of CSD (csd_type) are there in the dataset?
Show code
towny %>%
  count(csd_type)
# A tibble: 5 × 2
  csd_type         n
  <chr>        <int>
1 city            52
2 municipality    68
3 town            88
4 township       195
5 village         11
  1. How many different types of CSD and status (status) are there in the dataset?
Show code
towny %>%
  count(csd_type, status)
# A tibble: 10 × 3
   csd_type     status          n
   <chr>        <chr>       <int>
 1 city         lower-tier     20
 2 city         single-tier    32
 3 municipality lower-tier     41
 4 municipality single-tier    27
 5 town         lower-tier     61
 6 town         single-tier    27
 7 township     lower-tier    113
 8 township     single-tier    82
 9 village      lower-tier      6
10 village      single-tier     5
  1. What is the proportion of each type of CSD?
Show code
towny %>%
  count(csd_type) %>%
  mutate(p = n / sum(n))
# A tibble: 5 × 3
  csd_type         n      p
  <chr>        <int>  <dbl>
1 city            52 0.126 
2 municipality    68 0.164 
3 town            88 0.213 
4 township       195 0.471 
5 village         11 0.0266
  1. What is the proportion of each status within type of CSD?
Show code
towny %>%
  count(csd_type, status) %>%
  group_by(csd_type) %>%
  mutate(p = n / sum(n))
# A tibble: 10 × 4
# Groups:   csd_type [5]
   csd_type     status          n     p
   <chr>        <chr>       <int> <dbl>
 1 city         lower-tier     20 0.385
 2 city         single-tier    32 0.615
 3 municipality lower-tier     41 0.603
 4 municipality single-tier    27 0.397
 5 town         lower-tier     61 0.693
 6 town         single-tier    27 0.307
 7 township     lower-tier    113 0.579
 8 township     single-tier    82 0.421
 9 village      lower-tier      6 0.545
10 village      single-tier     5 0.455
  1. What is the mean population of all of the municipalities in 2021?
Show code
towny %>%
  summarize(mean_pop = mean(population_2021, na.rm = TRUE))
# A tibble: 1 × 1
  mean_pop
     <dbl>
1   34142.
  1. What is the mean population by CSD Type in 2021?
Show code
towny %>%
  group_by(csd_type) %>%
  summarize(mean_pop = mean(population_2021, na.rm = TRUE))
# A tibble: 5 × 2
  csd_type     mean_pop
  <chr>           <dbl>
1 city          200005.
2 municipality   10102.
3 town           22579.
4 township        5367.
5 village         1276.
  1. What is the mean population of all of the municipalities in 1996, 2001, 2006, 2011, 2016, and 2021? Try to use the across function.
Show code
towny %>%
  summarize(across(starts_with("population"), ~ mean(.x, na.rm = TRUE)))
# A tibble: 1 × 6
  population_1996 population_2001 population_2006 population_2011
            <dbl>           <dbl>           <dbl>           <dbl>
1          25866.          27538.          29173.          30838.
# ℹ 2 more variables: population_2016 <dbl>, population_2021 <dbl>
  1. Run a t-test to see if the average population in 1996 is different from the average population in 2016.
Show code
t.test(towny$population_1996, towny$population_2016, paired = TRUE)

    Paired t-test

data:  towny$population_1996 and towny$population_2016
t = -4.3204, df = 410, p-value = 1.956e-05
alternative hypothesis: true mean difference is not equal to 0
95 percent confidence interval:
 -9593.244 -3593.423
sample estimates:
mean difference 
      -6593.333 

or

Show code (another way)
t.test(population_1996 - population_2016 ~ 1, data = towny)

    One Sample t-test

data:  population_1996 - population_2016
t = -4.3204, df = 410, p-value = 1.956e-05
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
 -9593.244 -3593.423
sample estimates:
mean of x 
-6593.333