Use the svyttest()
function to compare two proportions or means.
Syntax:
Use the svyttest()
function to compare two proportions or means.
Syntax:
Let’s walk through an example using the towny
data.
# A tibble: 414 × 25
name website status csd_type census_div latitude longitude land_area_km2
<chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
1 Addin… https:… lower… township Lennox an… 45 -77.2 1294.
2 Adela… https:… lower… township Middlesex 43.0 -81.7 331.
3 Adjal… https:… lower… township Simcoe 44.1 -79.9 372.
4 Admas… https:… lower… township Renfrew 45.5 -76.9 520.
5 Ajax https:… lower… town Durham 43.9 -79.0 66.6
6 Alber… https:… singl… township Rainy Riv… 48.6 -93.5 117.
7 Alfre… https:… lower… township Prescott … 45.6 -74.9 392.
8 Algon… https:… lower… township Haliburton 45.4 -78.8 1000.
9 Alnwi… https:… lower… township Northumbe… 44.1 -78.0 398.
10 Amara… https:… lower… township Dufferin 44.0 -80.2 265.
# ℹ 404 more rows
# ℹ 17 more variables: population_1996 <int>, population_2001 <int>,
# population_2006 <int>, population_2011 <int>, population_2016 <int>,
# population_2021 <int>, density_1996 <dbl>, density_2001 <dbl>,
# density_2006 <dbl>, density_2011 <dbl>, density_2016 <dbl>,
# density_2021 <dbl>, pop_change_1996_2001_pct <dbl>,
# pop_change_2001_2006_pct <dbl>, pop_change_2006_2011_pct <dbl>, …
Using the towny
data, let’s filter to only cities.
# A tibble: 52 × 25
name website status csd_type census_div latitude longitude land_area_km2
<chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
1 Barrie https:… singl… city Simcoe 44.4 -79.7 99.0
2 Belle… https:… singl… city Hastings 44.2 -77.4 247.
3 Bramp… https:… lower… city Peel 43.7 -79.8 266.
4 Brant https:… singl… city Brant 43.1 -80.4 818.
5 Brant… https:… singl… city Brant 43.1 -80.3 98.6
6 Brock… https:… singl… city Leeds and… 44.6 -75.7 20.9
7 Burli… https:… lower… city Halton 43.4 -79.8 186.
8 Cambr… https:… lower… city Waterloo 43.4 -80.3 113.
9 Clare… https:… lower… city Prescott … 45.5 -75.2 297.
10 Cornw… https:… singl… city Stormont,… 45.0 -74.7 61.5
# ℹ 42 more rows
# ℹ 17 more variables: population_1996 <int>, population_2001 <int>,
# population_2006 <int>, population_2011 <int>, population_2016 <int>,
# population_2021 <int>, density_1996 <dbl>, density_2001 <dbl>,
# density_2006 <dbl>, density_2011 <dbl>, density_2016 <dbl>,
# density_2021 <dbl>, pop_change_1996_2001_pct <dbl>,
# pop_change_2001_2006_pct <dbl>, pop_change_2006_2011_pct <dbl>, …
Using the towny
data, let’s filter to only cities.
# A tibble: 52 × 25
name website status csd_type census_div latitude longitude land_area_km2
<chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
1 Barrie https:… singl… city Simcoe 44.4 -79.7 99.0
2 Belle… https:… singl… city Hastings 44.2 -77.4 247.
3 Bramp… https:… lower… city Peel 43.7 -79.8 266.
4 Brant https:… singl… city Brant 43.1 -80.4 818.
5 Brant… https:… singl… city Brant 43.1 -80.3 98.6
6 Brock… https:… singl… city Leeds and… 44.6 -75.7 20.9
7 Burli… https:… lower… city Halton 43.4 -79.8 186.
8 Cambr… https:… lower… city Waterloo 43.4 -80.3 113.
9 Clare… https:… lower… city Prescott … 45.5 -75.2 297.
10 Cornw… https:… singl… city Stormont,… 45.0 -74.7 61.5
# ℹ 42 more rows
# ℹ 17 more variables: population_1996 <int>, population_2001 <int>,
# population_2006 <int>, population_2011 <int>, population_2016 <int>,
# population_2021 <int>, density_1996 <dbl>, density_2001 <dbl>,
# density_2006 <dbl>, density_2011 <dbl>, density_2016 <dbl>,
# density_2021 <dbl>, pop_change_1996_2001_pct <dbl>,
# pop_change_2001_2006_pct <dbl>, pop_change_2006_2011_pct <dbl>, …
Using the towny
data, let’s filter to only cities.
# A tibble: 52 × 25
name website status csd_type census_div latitude longitude land_area_km2
<chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
1 Barrie https:… singl… city Simcoe 44.4 -79.7 99.0
2 Belle… https:… singl… city Hastings 44.2 -77.4 247.
3 Bramp… https:… lower… city Peel 43.7 -79.8 266.
4 Brant https:… singl… city Brant 43.1 -80.4 818.
5 Brant… https:… singl… city Brant 43.1 -80.3 98.6
6 Brock… https:… singl… city Leeds and… 44.6 -75.7 20.9
7 Burli… https:… lower… city Halton 43.4 -79.8 186.
8 Cambr… https:… lower… city Waterloo 43.4 -80.3 113.
9 Clare… https:… lower… city Prescott … 45.5 -75.2 297.
10 Cornw… https:… singl… city Stormont,… 45.0 -74.7 61.5
# ℹ 42 more rows
# ℹ 17 more variables: population_1996 <int>, population_2001 <int>,
# population_2006 <int>, population_2011 <int>, population_2016 <int>,
# population_2021 <int>, density_1996 <dbl>, density_2001 <dbl>,
# density_2006 <dbl>, density_2011 <dbl>, density_2016 <dbl>,
# density_2021 <dbl>, pop_change_1996_2001_pct <dbl>,
# pop_change_2001_2006_pct <dbl>, pop_change_2006_2011_pct <dbl>, …
Using the towny
data, let’s filter to only cities.
# A tibble: 52 × 25
name website status csd_type census_div latitude longitude land_area_km2
<chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
1 Barrie https:… singl… city Simcoe 44.4 -79.7 99.0
2 Belle… https:… singl… city Hastings 44.2 -77.4 247.
3 Bramp… https:… lower… city Peel 43.7 -79.8 266.
4 Brant https:… singl… city Brant 43.1 -80.4 818.
5 Brant… https:… singl… city Brant 43.1 -80.3 98.6
6 Brock… https:… singl… city Leeds and… 44.6 -75.7 20.9
7 Burli… https:… lower… city Halton 43.4 -79.8 186.
8 Cambr… https:… lower… city Waterloo 43.4 -80.3 113.
9 Clare… https:… lower… city Prescott … 45.5 -75.2 297.
10 Cornw… https:… singl… city Stormont,… 45.0 -74.7 61.5
# ℹ 42 more rows
# ℹ 17 more variables: population_1996 <int>, population_2001 <int>,
# population_2006 <int>, population_2011 <int>, population_2016 <int>,
# population_2021 <int>, density_1996 <dbl>, density_2001 <dbl>,
# density_2006 <dbl>, density_2011 <dbl>, density_2016 <dbl>,
# density_2021 <dbl>, pop_change_1996_2001_pct <dbl>,
# pop_change_2001_2006_pct <dbl>, pop_change_2006_2011_pct <dbl>, …
SummerTempNight
Test if the average U.S. household sets its temperature at a value different from 68°F using svyttest()
:
SummerTempNight
variable minus 68°F is equal to 0.
that passes the recs_des
object into the design
argumentTest if the average U.S. household sets its temperature at a value different from 68°F using svyttest()
:
Design-based one-sample t-test
data: SummerTempNight - 68 ~ 0
t = 84.788, df = 58, p-value < 2.2e-16
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
3.287816 3.446810
sample estimates:
mean
3.367313
DOLLAREL
for the electricity bill and ACUsed
to determine if households have air-conditioningTest if the electricity expenditure is significantly different for homes with and without air-conditioning.
.
that passes the recs_des
object into the design
argumentTest if the electricity expenditure is significantly different for homes with and without air-conditioning:
Design-based t-test
data: DOLLAREL ~ ACUsed
t = 21.29, df = 58, p-value < 2.2e-16
alternative hypothesis: true difference in mean is not equal to 0
95 percent confidence interval:
331.3343 400.1054
sample estimates:
difference in mean
365.7199
Do U.S. households set their thermostat differently at night in the summer and winter?
Design-based one-sample t-test
data: SummerTempNight - WinterTempNight ~ 0
t = 50.833, df = 58, p-value < 2.2e-16
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
2.739403 2.963995
sample estimates:
mean
2.851699
04-testing-exercises.qmd
10:00
There are two functions that we use to compare proportions:
svygofchisq()
: For goodness-of-fit testssvychisq()
: For tests of independence and homogeneityLet’s collapse Bachelor’s and Graduate degrees into a single category for comparison.
Test to see if the ANES education matches the population percentages
anes_des_educ %>%
drop_na(Education2) %>%
group_by(Education2) %>%
summarize(Observed = survey_mean(vartype = "ci")) %>%
rename(Education = Education2) %>%
mutate(Expected = c(0.11, 0.27, 0.29, 0.33)) %>%
select(Education, Expected, everything()) %>%
pivot_longer(
cols = c("Expected", "Observed"),
names_to = "Names",
values_to = "Proportion"
) %>%
mutate(
Observed_low = if_else(Names == "Observed", Observed_low, NA_real_),
Observed_upp = if_else(Names == "Observed", Observed_upp, NA_real_),
Names = if_else(Names == "Observed",
"ANES (observed)", "ACS (expected)"
)
) %>%
ggplot(aes(x = Education, y = Proportion, color = Names)) +
geom_point(alpha = 0.75, size = 2) +
geom_errorbar(aes(ymin = Observed_low, ymax = Observed_upp),
width = 0.25
) +
theme_bw() +
scale_color_manual(name = "Type", values = c("#ff8484", "#0b3954")) +
theme(legend.position = "bottom", legend.title = element_blank())
ANES asked respondents two questions about trust:
Run a test of independence to see if there is a relationship between these two questions.
Run a test of independence to see if there is a relationship between these two questions.
04-testing-exercises.qmd
15:00
Use Analysis of Variance (ANOVA) to compare two or more means.
Use Analysis of Variance (ANOVA) to compare two or more means.
Syntax:
svyglm()
function and variables SummerTempNight
and Region
# A tibble: 4 × 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 69.7 0.103 674. 3.69e-111
2 RegionMidwest 1.34 0.138 9.68 1.46e- 13
3 RegionSouth 2.05 0.128 16.0 1.36e- 22
4 RegionWest 2.80 0.177 15.9 2.27e- 22