Chapter 13 National Crime Victimization Survey vignette
For this chapter, load the following packages:
We use data from the United States National Crime Victimization Survey (NCVS). These data are available in the {srvyrexploR} package as ncvs_2021_incident
, ncvs_2021_household
, and ncvs_2021_person
.
13.1 Introduction
The National Crime Victimization Survey (NCVS) is a household survey sponsored by the Bureau of Justice Statistics (BJS), which collects data on criminal victimization, including characteristics of the crimes, offenders, and victims. Crime types include both household and personal crimes, as well as violent and non-violent crimes. The population of interest of this survey is all people in the United States age 12 and older living in housing units and non-institutional group quarters.
The NCVS has been ongoing since 1992. An earlier survey, the National Crime Survey, was run from 1972 to 1991 (U. S. Bureau of Justice Statistics 2017). The survey is administered using a rotating panel. When an address enters the sample, the residents of that address are interviewed every 6 months for a total of 7 interviews. If the initial residents move away from the address during the period and new residents move in, the new residents are included in the survey, as people are not followed when they move.
NCVS data are publicly available and distributed by Inter-university Consortium for Political and Social Research (ICPSR), with data going back to 1992. The vignette in this book includes data from 2021 (U.S. Bureau of Justice Statistics 2022). The NCVS data structure is complicated, and the User’s Guide contains examples for analysis in SAS, SUDAAN, SPSS, and Stata, but not R (Shook-Sa, Couzens, and Berzofsky 2015). This vignette adapts those examples for R.
13.2 Data structure
The data from ICPSR are distributed with five files, each having its unique identifier indicated:
- Address Record -
YEARQ
,IDHH
- Household Record -
YEARQ
,IDHH
- Person Record -
YEARQ
,IDHH
,IDPER
- Incident Record -
YEARQ
,IDHH
,IDPER
- 2021 Collection Year Incident -
YEARQ
,IDHH
,IDPER
In this vignette, we focus on the household, person, and incident files and have selected a subset of columns for use in the examples. We have included data in the {srvyexploR} package with this subset of columns, but the complete data files can be downloaded from ICPSR.
13.3 Survey notation
The NCVS User Guide (Shook-Sa, Couzens, and Berzofsky 2015) uses the following notation:
- \(i\) represents NCVS households, identified on the household-level file with the household identification number
IDHH
. - \(j\) represents NCVS individual respondents within household \(i\), identified on the person-level file with the person identification number
IDPER
. - \(k\) represents reporting periods (i.e.,
YEARQ
) for household \(i\) and individual respondent \(j\). - \(l\) represents victimization records for respondent \(j\) in household \(i\) and reporting period \(k\). Each record on the NCVS incident-level file is associated with a victimization record \(l\).
- \(D\) represents one or more domain characteristics of interest in the calculation of NCVS estimates. For victimization totals and proportions, domains can be defined on the basis of crime types (e.g., violent crimes, property crimes), characteristics of victims (e.g., age, sex, household income), or characteristics of the victimizations (e.g., victimizations reported to police, victimizations committed with a weapon present). Domains could also be a combination of all of these types of characteristics. For example, in the calculation of victimization rates, domains are defined on the basis of the characteristics of the victims.
- \(A_a\) represents the level \(a\) of covariate \(A\). Covariate \(A\) is defined in the calculation of victimization proportions and represents the characteristic we want to obtain the distribution of victimizations in domain \(D\).
- \(C\) represents the personal or property crime for which we want to obtain a victimization rate.
In this vignette, we discuss four estimates:
- Victimization totals estimate the number of criminal victimizations with a given characteristic. As demonstrated below, these can be calculated from any of the data files. The estimated victimization total, \(\hat{t}_D\) for domain \(D\) is estimated as
\[ \hat{t}_D = \sum_{ijkl \in D} v_{ijkl}\]
where \(v_{ijkl}\) is the series-adjusted victimization weight for household \(i\), respondent \(j\), reporting period \(k\), and victimization \(l\), represented in the data as WGTVICCY
.
- Victimization proportions estimate characteristics among victimizations or victims. Victimization proportions are calculated using the incident data file. The estimated victimization proportion for domain \(D\) across level \(a\) of covariate \(A\), \(\hat{p}_{A_a,D}\) is
\[ \hat{p}_{A_a,D} =\frac{\sum_{ijkl \in A_a, D} v_{ijkl}}{\sum_{ijkl \in D} v_{ijkl}}.\] The numerator is the number of incidents with a particular characteristic in a domain, and the denominator is the number of incidents in a domain.
- Victimization rates are estimates of the number of victimizations per 1,000 persons or households in the population29. Victimization rates are calculated using the household or person-level data files. The estimated victimization rate for crime \(C\) in domain \(D\) is
\[\hat{VR}_{C,D}= \frac{\sum_{ijkl \in C,D} v_{ijkl}}{\sum_{ijk \in D} w_{ijk}}\times 1000\]
where \(w_{ijk}\) is the person weight (WGTPERCY
) for personal crimes or household weight (WGTHHCY
) for household crimes. The numerator is the number of incidents in a domain, and the denominator is the number of persons or households in a domain. Notice that the weights in the numerator and denominator are different; this is important, and in the syntax and examples below, we discuss how to make an estimate that involves two weights.
- Prevalence rates are estimates of the percentage of the population (persons or households) who are victims of a crime. These are estimated using the household or person-level data files. The estimated prevalence rate for crime \(C\) in domain \(D\) is
\[ \hat{PR}_{C, D}= \frac{\sum_{ijk \in {C,D}} I_{ij}w_{ijk}}{\sum_{ijk \in D} w_{ijk}} \times 100\]
where \(I_{ij}\) is an indicator that a person or household in domain \(D\) was a victim of crime \(C\) at any time in the year. The numerator is the number of victims in domain \(D\) for crime \(C\), and the denominator is the number of people or households in the population.
13.4 Data file preparation
Some work is necessary to prepare the files before analysis. The design variables indicating pseudo-stratum (V2117
) and half-sample code (V2118
) are only included on the household file, so they must be added to the person and incident files for any analysis.
For victimization rates, we need to know the victimization status for both victims and non-victims. Therefore, the incident file must be summarized and merged onto the household or person files for household-level and person-level crimes, respectively. We begin this vignette by discussing how to create these incident summary files. This is following Section 2.2 of the NCVS User’s Guide (Shook-Sa, Couzens, and Berzofsky 2015).
13.4.1 Preparing files for estimation of victimization rates
Each record on the incident file represents one victimization, which is not the same as one incident. Some victimizations have several instances that make it difficult for the victim to differentiate the details of these incidents, labeled as “series crimes.” Appendix A of the User’s Guide indicates how to calculate the series weight in other statistical languages.
Here, we adapt that code for R. Essentially, if a victimization is a series crime, its series weight is top-coded at 10 based on the number of actual victimizations, that is, even if the crime occurred more than 10 times, it is counted as 10 times to reduce the influence of extreme outliers. If an incident is a series crime, but the number of occurrences is unknown, the series weight is set to 6. A description of the variables used to create indicators of series and the associated weights is included in Table 13.1.
Description | Value | Label | |
---|---|---|---|
V4016 | How many times incident occur last 6 months | 1–996 | Number of times |
997 | Don’t know | ||
V4017 | How many incidents | 1 | 1–5 incidents (not a “series”) |
2 | 6 or more incidents | ||
8 | Residue (invalid data) | ||
V4018 | Incidents similar in detail | 1 | Similar |
2 | Different (not in a “series”) | ||
8 | Residue (invalid data) | ||
V4019 | Enough detail to distinguish incidents | 1 | Yes (not a “series”) |
2 | No (is a “series”) | ||
8 | Residue (invalid data) | ||
WGTVICCY | Adjusted victimization weight | Numeric |
We want to create four variables to indicate if an incident is a series crime. First, we create a variable called series
using V4017
, V4018
, and V4019
where an incident is considered a series crime if there are 6 or more incidents (V4107
), the incidents are similar in detail (V4018
), or there is not enough detail to distinguish the incidents (V4019
). Second, we top-code the number of incidents (V4016
) by creating a variable n10v4016
, which is set to 10 if V4016 > 10
. Third, we create the serieswgt
using the two new variables series
and n10v4019
to classify the max series based on missing data and number of incidents. Finally, we create the new weight using our new serieswgt
variable and the existing weight (WGTVICCY
).
inc_series <- ncvs_2021_incident %>%
mutate(
series = case_when(
V4017 %in% c(1, 8) ~ 1,
V4018 %in% c(2, 8) ~ 1,
V4019 %in% c(1, 8) ~ 1,
TRUE ~ 2
),
n10v4016 = case_when(
V4016 %in% c(997, 998) ~ NA_real_,
V4016 > 10 ~ 10,
TRUE ~ V4016
),
serieswgt = case_when(
series == 2 & is.na(n10v4016) ~ 6,
series == 2 ~ n10v4016,
TRUE ~ 1
),
NEWWGT = WGTVICCY * serieswgt
)
The next step in preparing the files for estimation is to create indicators on the victimization file for characteristics of interest. Almost all BJS publications limit the analysis to records where the victimization occurred in the United States (where V4022
is not equal to 1). We do this for all estimates as well. A brief codebook of variables for this task is located in Table 13.2.
Variable | Description | Value | Label |
---|---|---|---|
V4022 | In what city/town/village | 1 | Outside U.S. |
2 | Not inside a city/town/village | ||
3 | Same city/town/village as present residence | ||
4 | Different city/town/village as present residence | ||
5 | Don’t know | ||
6 | Don’t know if 2, 4, or 5 | ||
V4049 | Did offender have a weapon | 1 | Yes |
2 | No | ||
3 | Don’t know | ||
V4050 | What was the weapon that offender had | 1 | At least one good entry |
3 | Indicates “Yes-Type Weapon-NA” | ||
7 | Indicates “Gun Type Unknown” | ||
8 | No good entry | ||
V4051 | Hand gun | 0 | No |
1 | Yes | ||
V4052 | Other gun | 0 | No |
1 | Yes | ||
V4053 | Knife | 0 | No |
1 | Yes | ||
V4399 | Reported to police | 1 | Yes |
2 | No | ||
3 | Don’t know | ||
V4529 | Type of crime code | 01 | Completed rape |
02 | Attempted rape | ||
03 | Sexual attack with serious assault | ||
04 | Sexual attack with minor assault | ||
05 | Completed robbery with injury from serious assault | ||
06 | Completed robbery with injury from minor assault | ||
07 | Completed robbery without injury from minor assault | ||
08 | Attempted robbery with injury from serious assault | ||
09 | Attempted robbery with injury from minor assault | ||
10 | Attempted robbery without injury | ||
11 | Completed aggravated assault with injury | ||
12 | Attempted aggravated assault with weapon | ||
13 | Threatened assault with weapon | ||
14 | Simple assault completed with injury | ||
15 | Sexual assault without injury | ||
16 | Unwanted sexual contact without force | ||
17 | Assault without weapon without injury | ||
18 | Verbal threat of rape | ||
19 | Verbal threat of sexual assault | ||
20 | Verbal threat of assault | ||
21 | Completed purse snatching | ||
22 | Attempted purse snatching | ||
23 | Pocket picking (completed only) | ||
31 | Completed burglary, forcible entry | ||
32 | Completed burglary, unlawful entry without force | ||
33 | Attempted forcible entry | ||
40 | Completed motor vehicle theft | ||
41 | Attempted motor vehicle theft | ||
54 | Completed theft less than $10 | ||
55 | Completed theft $10 to $49 | ||
56 | Completed theft $50 to $249 | ||
57 | Completed theft $250 or greater | ||
58 | Completed theft value NA | ||
59 | Attempted theft |
Using these variables, we create the following indicators:
- Property crime
V4529
\(\ge\) 31- Variable:
Property
- Violent crime
V4529
\(\le\) 20- Variable:
Violent
- Property crime reported to the police
V4529
\(\ge\) 31 andV4399
=1- Variable:
Property_ReportPolice
- Violent crime reported to the police
V4529
< 31 andV4399
=1- Variable:
Violent_ReportPolice
- Aggravated assault without a weapon
V4529
in 11:12 andV4049
=2- Variable:
AAST_NoWeap
- Aggravated assault with a firearm
V4529
in 11:12 andV4049
=1 and (V4051
=1 orV4052
=1 orV4050
=7)- Variable:
AAST_Firearm
- Aggravated assault with a knife or sharp object
V4529
in 11:12 andV4049
=1 and (V4053
=1 orV4054
=1)- Variable:
AAST_Knife
- Aggravated assault with another type of weapon
V4529
in 11:12 andV4049
=1 andV4050
=1 and not firearm or knife- Variable:
AAST_Other
inc_ind <- inc_series %>%
filter(V4022 != 1) %>%
mutate(
WeapCat = case_when(
is.na(V4049) ~ NA_character_,
V4049 == 2 ~ "NoWeap",
V4049 == 3 ~ "UnkWeapUse",
V4050 == 3 ~ "Other",
V4051 == 1 | V4052 == 1 | V4050 == 7 ~ "Firearm",
V4053 == 1 | V4054 == 1 ~ "Knife",
TRUE ~ "Other"
),
V4529_num = parse_number(as.character(V4529)),
ReportPolice = V4399 == 1,
Property = V4529_num >= 31,
Violent = V4529_num <= 20,
Property_ReportPolice = Property & ReportPolice,
Violent_ReportPolice = Violent & ReportPolice,
AAST = V4529_num %in% 11:13,
AAST_NoWeap = AAST & WeapCat == "NoWeap",
AAST_Firearm = AAST & WeapCat == "Firearm",
AAST_Knife = AAST & WeapCat == "Knife",
AAST_Other = AAST & WeapCat == "Other"
)
This is a good point to pause to look at the output of crosswalks between an original variable and a derived one to check that the logic was programmed correctly and that everything ends up in the expected category.
## # A tibble: 6 × 2
## V4022 n
## <fct> <int>
## 1 1 34
## 2 2 65
## 3 3 7697
## 4 4 1143
## 5 5 39
## 6 8 4
## # A tibble: 5 × 2
## V4022 n
## <fct> <int>
## 1 2 65
## 2 3 7697
## 3 4 1143
## 4 5 39
## 5 8 4
## # A tibble: 13 × 8
## WeapCat V4049 V4050 V4051 V4052 V4053 V4054 n
## <chr> <fct> <fct> <fct> <fct> <fct> <fct> <int>
## 1 Firearm 1 1 0 1 0 0 15
## 2 Firearm 1 1 0 1 1 1 1
## 3 Firearm 1 1 1 0 0 0 125
## 4 Firearm 1 1 1 0 1 0 2
## 5 Firearm 1 1 1 1 0 0 3
## 6 Firearm 1 7 0 0 0 0 3
## 7 Knife 1 1 0 0 0 1 14
## 8 Knife 1 1 0 0 1 0 71
## 9 NoWeap 2 <NA> <NA> <NA> <NA> <NA> 1794
## 10 Other 1 1 0 0 0 0 147
## 11 Other 1 3 0 0 0 0 26
## 12 UnkWeapUse 3 <NA> <NA> <NA> <NA> <NA> 519
## 13 <NA> <NA> <NA> <NA> <NA> <NA> <NA> 6228
## # A tibble: 34 × 5
## V4529 Property Violent AAST n
## <fct> <lgl> <lgl> <lgl> <int>
## 1 1 FALSE TRUE FALSE 45
## 2 2 FALSE TRUE FALSE 20
## 3 3 FALSE TRUE FALSE 11
## 4 4 FALSE TRUE FALSE 3
## 5 5 FALSE TRUE FALSE 24
## 6 6 FALSE TRUE FALSE 26
## 7 7 FALSE TRUE FALSE 59
## 8 8 FALSE TRUE FALSE 5
## 9 9 FALSE TRUE FALSE 7
## 10 10 FALSE TRUE FALSE 57
## 11 11 FALSE TRUE TRUE 97
## 12 12 FALSE TRUE TRUE 91
## 13 13 FALSE TRUE TRUE 163
## 14 14 FALSE TRUE FALSE 165
## 15 15 FALSE TRUE FALSE 24
## 16 16 FALSE TRUE FALSE 12
## 17 17 FALSE TRUE FALSE 357
## 18 18 FALSE TRUE FALSE 14
## 19 19 FALSE TRUE FALSE 3
## 20 20 FALSE TRUE FALSE 607
## 21 21 FALSE FALSE FALSE 2
## 22 22 FALSE FALSE FALSE 2
## 23 23 FALSE FALSE FALSE 19
## 24 31 TRUE FALSE FALSE 248
## 25 32 TRUE FALSE FALSE 634
## 26 33 TRUE FALSE FALSE 188
## 27 40 TRUE FALSE FALSE 256
## 28 41 TRUE FALSE FALSE 97
## 29 54 TRUE FALSE FALSE 407
## 30 55 TRUE FALSE FALSE 1006
## 31 56 TRUE FALSE FALSE 1686
## 32 57 TRUE FALSE FALSE 1420
## 33 58 TRUE FALSE FALSE 798
## 34 59 TRUE FALSE FALSE 395
## # A tibble: 4 × 3
## ReportPolice V4399 n
## <lgl> <fct> <int>
## 1 FALSE 2 5670
## 2 FALSE 3 103
## 3 FALSE 8 12
## 4 TRUE 1 3163
## # A tibble: 11 × 7
## AAST WeapCat AAST_NoWeap AAST_Firearm AAST_Knife AAST_Other n
## <lgl> <chr> <lgl> <lgl> <lgl> <lgl> <int>
## 1 FALSE Firearm FALSE FALSE FALSE FALSE 34
## 2 FALSE Knife FALSE FALSE FALSE FALSE 23
## 3 FALSE NoWeap FALSE FALSE FALSE FALSE 1769
## 4 FALSE Other FALSE FALSE FALSE FALSE 27
## 5 FALSE UnkWeapUse FALSE FALSE FALSE FALSE 516
## 6 FALSE <NA> FALSE FALSE FALSE FALSE 6228
## 7 TRUE Firearm FALSE TRUE FALSE FALSE 115
## 8 TRUE Knife FALSE FALSE TRUE FALSE 62
## 9 TRUE NoWeap TRUE FALSE FALSE FALSE 25
## 10 TRUE Other FALSE FALSE FALSE TRUE 146
## 11 TRUE UnkWeapUse FALSE FALSE FALSE FALSE 3
After creating indicators of victimization types and characteristics, the file is summarized, and crimes are summed across persons or households by YEARQ.
Property crimes (i.e., crimes committed against households, such as household burglary or motor vehicle theft) are summed across households, and personal crimes (i.e., crimes committed against an individual, such as assault, robbery, and personal theft) are summed across persons. The indicators are summed using our created series weight variable (serieswgt
). Additionally, the existing weight variable (WGTVICCY
) needs to be retained for later analysis.
inc_hh_sums <-
inc_ind %>%
filter(V4529_num > 23) %>% # restrict to household crimes
group_by(YEARQ, IDHH) %>%
summarize(
WGTVICCY = WGTVICCY[1],
across(starts_with("Property"),
~ sum(. * serieswgt),
.names = "{.col}"
),
.groups = "drop"
)
inc_pers_sums <-
inc_ind %>%
filter(V4529_num <= 23) %>% # restrict to person crimes
group_by(YEARQ, IDHH, IDPER) %>%
summarize(
WGTVICCY = WGTVICCY[1],
across(c(starts_with("Violent"), starts_with("AAST")),
~ sum(. * serieswgt),
.names = "{.col}"
),
.groups = "drop"
)
Now, we merge the victimization summary files into the appropriate files. For any record on the household or person file that is not on the victimization file, the victimization counts are set to 0 after merging. In this step, we also create the victimization adjustment factor. See Section 2.2.4 in the User’s Guide for details of why this adjustment is created (Shook-Sa, Couzens, and Berzofsky 2015). It is calculated as follows:
\[ A_{ijk}=\frac{v_{ijk}}{w_{ijk}}\]
where \(w_{ijk}\) is the person weight (WGTPERCY
) for personal crimes or the household weight (WGTHHCY
) for household crimes, and \(v_{ijk}\) is the victimization weight (WGTVICCY
) for household \(i\), respondent \(j\), in reporting period \(k\). The adjustment factor is set to 0 if no incidents are reported.
hh_z_list <- rep(0, ncol(inc_hh_sums) - 3) %>%
as.list() %>%
setNames(names(inc_hh_sums)[-(1:3)])
pers_z_list <- rep(0, ncol(inc_pers_sums) - 4) %>%
as.list() %>%
setNames(names(inc_pers_sums)[-(1:4)])
hh_vsum <- ncvs_2021_household %>%
full_join(inc_hh_sums, by = c("YEARQ", "IDHH")) %>%
replace_na(hh_z_list) %>%
mutate(ADJINC_WT = if_else(is.na(WGTVICCY), 0, WGTVICCY / WGTHHCY))
pers_vsum <- ncvs_2021_person %>%
full_join(inc_pers_sums, by = c("YEARQ", "IDHH", "IDPER")) %>%
replace_na(pers_z_list) %>%
mutate(ADJINC_WT = if_else(is.na(WGTVICCY), 0, WGTVICCY / WGTPERCY))
13.4.2 Derived demographic variables
A final step in file preparation for the household and person files is creating any derived variables on the household and person files, such as income categories or age categories, for subgroup analysis. We can do this step before or after merging the victimization counts.
13.4.2.1 Household variables
For the household file, we create categories for tenure (rental status), urbanicity, income, place size, and region. A codebook of the household variables is listed in Table 13.3.
Variable | Description | Value | Label |
---|---|---|---|
V2015 | Tenure | 1 | Owned or being bought |
2 | Rented for cash | ||
3 | No cash rent | ||
SC214A | Household Income | 01 | Less than $5,000 |
02 | $5,000–7,499 | ||
03 | $7,500–9,999 | ||
04 | $10,000–12,499 | ||
05 | $12,500–14,999 | ||
06 | $15,000–17,499 | ||
07 | $17,500–19,999 | ||
08 | $20,000–24,999 | ||
09 | $25,000–29,999 | ||
10 | $30,000–34,999 | ||
11 | $35,000–39,999 | ||
12 | $40,000–49,999 | ||
13 | $50,000–74,999 | ||
15 | $75,000–99,999 | ||
16 | $100,000–149,999 | ||
17 | $150,000–199,999 | ||
18 | $200,000 or more | ||
V2126B | Place Size (Population) Code | 00 | Not in a place |
13 | Population under 10,000 | ||
16 | 10,000–49,999 | ||
17 | 50,000–99,999 | ||
18 | 100,000–249,999 | ||
19 | 250,000–499,999 | ||
20 | 500,000–999,999 | ||
21 | 1,000,000–2,499,999 | ||
22 | 2,500,000–4,999,999 | ||
23 | 5,000,000 or more | ||
V2127B | Region | 1 | Northeast |
2 | Midwest | ||
3 | South | ||
4 | West | ||
V2143 | Urbanicity | 1 | Urban |
2 | Suburban | ||
3 | Rural |
hh_vsum_der <- hh_vsum %>%
mutate(
Tenure = factor(
case_when(
V2015 == 1 ~ "Owned",
!is.na(V2015) ~ "Rented"
),
levels = c("Owned", "Rented")
),
Urbanicity = factor(
case_when(
V2143 == 1 ~ "Urban",
V2143 == 2 ~ "Suburban",
V2143 == 3 ~ "Rural"
),
levels = c("Urban", "Suburban", "Rural")
),
SC214A_num = as.numeric(as.character(SC214A)),
Income = case_when(
SC214A_num <= 8 ~ "Less than $25,000",
SC214A_num <= 12 ~ "$25,000--49,999",
SC214A_num <= 15 ~ "$50,000--99,999",
SC214A_num <= 17 ~ "$100,000--199,999",
SC214A_num <= 18 ~ "$200,000 or more"
),
Income = fct_reorder(Income, SC214A_num, .na_rm = FALSE),
PlaceSize = case_match(
as.numeric(as.character(V2126B)),
0 ~ "Not in a place",
13 ~ "Population under 10,000",
16 ~ "10,000--49,999",
17 ~ "50,000--99,999",
18 ~ "100,000--249,999",
19 ~ "250,000--499,999",
20 ~ "500,000--999,999",
c(21, 22, 23) ~ "1,000,000 or more"
),
PlaceSize = fct_reorder(PlaceSize, as.numeric(V2126B)),
Region = case_match(
as.numeric(V2127B),
1 ~ "Northeast",
2 ~ "Midwest",
3 ~ "South",
4 ~ "West"
),
Region = fct_reorder(Region, as.numeric(V2127B))
)
As before, we want to check to make sure the recoded variables we create match the existing data as expected.
## # A tibble: 4 × 3
## Tenure V2015 n
## <fct> <fct> <int>
## 1 Owned 1 101944
## 2 Rented 2 46269
## 3 Rented 3 1925
## 4 <NA> <NA> 106322
## # A tibble: 3 × 3
## Urbanicity V2143 n
## <fct> <fct> <int>
## 1 Urban 1 26878
## 2 Suburban 2 173491
## 3 Rural 3 56091
## # A tibble: 18 × 3
## Income SC214A n
## <fct> <fct> <int>
## 1 Less than $25,000 1 7841
## 2 Less than $25,000 2 2626
## 3 Less than $25,000 3 3949
## 4 Less than $25,000 4 5546
## 5 Less than $25,000 5 5445
## 6 Less than $25,000 6 4821
## 7 Less than $25,000 7 5038
## 8 Less than $25,000 8 11887
## 9 $25,000--49,999 9 11550
## 10 $25,000--49,999 10 13689
## 11 $25,000--49,999 11 13655
## 12 $25,000--49,999 12 23282
## 13 $50,000--99,999 13 44601
## 14 $50,000--99,999 15 33353
## 15 $100,000--199,999 16 34287
## 16 $100,000--199,999 17 15317
## 17 $200,000 or more 18 16892
## 18 <NA> <NA> 2681
## # A tibble: 10 × 3
## PlaceSize V2126B n
## <fct> <fct> <int>
## 1 Not in a place 0 69484
## 2 Population under 10,000 13 39873
## 3 10,000--49,999 16 53002
## 4 50,000--99,999 17 27205
## 5 100,000--249,999 18 24461
## 6 250,000--499,999 19 13111
## 7 500,000--999,999 20 15194
## 8 1,000,000 or more 21 6167
## 9 1,000,000 or more 22 3857
## 10 1,000,000 or more 23 4106
## # A tibble: 4 × 3
## Region V2127B n
## <fct> <fct> <int>
## 1 Northeast 1 41585
## 2 Midwest 2 74666
## 3 South 3 87783
## 4 West 4 52426
13.4.2.2 Person variables
For the person file, we create categories for sex, race/Hispanic origin, age categories, and marital status. A codebook of the household variables is located in Table 13.4. We also merge the household demographics to the person file as well as the design variables (V2117
and V2118
).
Variable | Description | Value | Label |
---|---|---|---|
V3014 | Age | 12–90 | |
V3015 | Current Marital Status | 1 | Married |
2 | Widowed | ||
3 | Divorced | ||
4 | Separated | ||
5 | Never married | ||
V3018 | Sex | 1 | Male |
2 | Female | ||
V3023A | Race | 01 | White only |
02 | Black only | ||
03 | American Indian, Alaska native only | ||
04 | Asian only | ||
05 | Hawaiian/Pacific Islander only | ||
06 | White-Black | ||
07 | White-American Indian | ||
08 | White-Asian | ||
09 | White-Hawaiian | ||
10 | Black-American Indian | ||
11 | Black-Asian | ||
12 | Black-Hawaiian/Pacific Islander | ||
13 | American Indian-Asian | ||
14 | Asian-Hawaiian/Pacific Islander | ||
15 | White-Black-American Indian | ||
16 | White-Black-Asian | ||
17 | White-American Indian-Asian | ||
18 | White-Asian-Hawaiian | ||
19 | 2 or 3 races | ||
20 | 4 or 5 races | ||
V3024 | Hispanic Origin | 1 | Yes |
2 | No |
NHOPI <- "Native Hawaiian or Other Pacific Islander"
pers_vsum_der <- pers_vsum %>%
mutate(
Sex = factor(case_when(
V3018 == 1 ~ "Male",
V3018 == 2 ~ "Female"
)),
RaceHispOrigin = factor(
case_when(
V3024 == 1 ~ "Hispanic",
V3023A == 1 ~ "White",
V3023A == 2 ~ "Black",
V3023A == 4 ~ "Asian",
V3023A == 5 ~ NHOPI,
TRUE ~ "Other"
),
levels = c(
"White", "Black", "Hispanic",
"Asian", NHOPI, "Other"
)
),
V3014_num = as.numeric(as.character(V3014)),
AgeGroup = case_when(
V3014_num <= 17 ~ "12--17",
V3014_num <= 24 ~ "18--24",
V3014_num <= 34 ~ "25--34",
V3014_num <= 49 ~ "35--49",
V3014_num <= 64 ~ "50--64",
V3014_num <= 90 ~ "65 or older"
),
AgeGroup = fct_reorder(AgeGroup, V3014_num),
MaritalStatus = factor(
case_when(
V3015 == 1 ~ "Married",
V3015 == 2 ~ "Widowed",
V3015 == 3 ~ "Divorced",
V3015 == 4 ~ "Separated",
V3015 == 5 ~ "Never married"
),
levels = c(
"Never married", "Married",
"Widowed", "Divorced",
"Separated"
)
)
) %>%
left_join(
hh_vsum_der %>% select(
YEARQ, IDHH,
V2117, V2118, Tenure:Region
),
by = c("YEARQ", "IDHH")
)
As before, we want to check to make sure the recoded variables we create match the existing data as expected.
## # A tibble: 2 × 3
## Sex V3018 n
## <fct> <fct> <int>
## 1 Female 2 150956
## 2 Male 1 140922
## # A tibble: 11 × 3
## RaceHispOrigin V3024 n
## <fct> <fct> <int>
## 1 White 2 197292
## 2 White 8 883
## 3 Black 2 29947
## 4 Black 8 120
## 5 Hispanic 1 41450
## 6 Asian 2 16015
## 7 Asian 8 61
## 8 Native Hawaiian or Other Pacific Islander 2 891
## 9 Native Hawaiian or Other Pacific Islander 8 9
## 10 Other 2 5161
## 11 Other 8 49
pers_vsum_der %>%
filter(RaceHispOrigin != "Hispanic" |
is.na(RaceHispOrigin)) %>%
count(RaceHispOrigin, V3023A)
## # A tibble: 20 × 3
## RaceHispOrigin V3023A n
## <fct> <fct> <int>
## 1 White 1 198175
## 2 Black 2 30067
## 3 Asian 4 16076
## 4 Native Hawaiian or Other Pacific Islander 5 900
## 5 Other 3 1319
## 6 Other 6 1217
## 7 Other 7 1025
## 8 Other 8 837
## 9 Other 9 184
## 10 Other 10 178
## 11 Other 11 87
## 12 Other 12 27
## 13 Other 13 13
## 14 Other 14 53
## 15 Other 15 136
## 16 Other 16 45
## 17 Other 17 11
## 18 Other 18 33
## 19 Other 19 22
## 20 Other 20 23
pers_vsum_der %>%
group_by(AgeGroup) %>%
summarize(
minAge = min(V3014),
maxAge = max(V3014),
.groups = "drop"
)
## # A tibble: 6 × 3
## AgeGroup minAge maxAge
## <fct> <dbl> <dbl>
## 1 12--17 12 17
## 2 18--24 18 24
## 3 25--34 25 34
## 4 35--49 35 49
## 5 50--64 50 64
## 6 65 or older 65 90
## # A tibble: 6 × 3
## MaritalStatus V3015 n
## <fct> <fct> <int>
## 1 Never married 5 90425
## 2 Married 1 148131
## 3 Widowed 2 17668
## 4 Divorced 3 28596
## 5 Separated 4 4524
## 6 <NA> 8 2534
We then create tibbles that contain only the variables we need, which makes it easier to use them for analyses.
hh_vsum_slim <- hh_vsum_der %>%
select(
YEARQ:V2118,
WGTVICCY:ADJINC_WT,
Tenure,
Urbanicity,
Income,
PlaceSize,
Region
)
pers_vsum_slim <- pers_vsum_der %>%
select(YEARQ:WGTPERCY, WGTVICCY:ADJINC_WT, Sex:Region)
To calculate estimates about types of crime, such as what percentage of violent crimes are reported to the police, we must use the incident file. The incident file is not guaranteed to have every pseudo-stratum and half-sample code, so dummy records are created to append before estimation. Finally, we merge demographic variables onto the incident tibble.
dummy_records <- hh_vsum_slim %>%
distinct(V2117, V2118) %>%
mutate(
Dummy = 1,
WGTVICCY = 1,
NEWWGT = 1
)
inc_analysis <- inc_ind %>%
mutate(Dummy = 0) %>%
left_join(select(pers_vsum_slim, YEARQ, IDHH, IDPER, Sex:Region),
by = c("YEARQ", "IDHH", "IDPER")
) %>%
bind_rows(dummy_records) %>%
select(
YEARQ:IDPER,
WGTVICCY,
NEWWGT,
V4529,
WeapCat,
ReportPolice,
Property:Region
)
The tibbles hh_vsum_slim
, pers_vsum_slim
, and inc_analysis
can now be used to create design objects and calculate crime rate estimates.
13.5 Survey design objects
All the data preparation above is necessary to create the design objects and finally begin analysis. We create three design objects for different types of analysis, depending on the estimate we are creating. For the incident data, the weight of analysis is NEWWGT
, which we constructed previously. The household and person-level data use WGTHHCY
and WGTPERCY
, respectively. For all analyses, V2117
is the strata variable, and V2118
is the cluster/PSU variable for analysis. This information can be found in the User’s Guide (Shook-Sa, Couzens, and Berzofsky 2015).
inc_des <- inc_analysis %>%
as_survey_design(
weight = NEWWGT,
strata = V2117,
ids = V2118,
nest = TRUE
)
hh_des <- hh_vsum_slim %>%
as_survey_design(
weight = WGTHHCY,
strata = V2117,
ids = V2118,
nest = TRUE
)
pers_des <- pers_vsum_slim %>%
as_survey_design(
weight = WGTPERCY,
strata = V2117,
ids = V2118,
nest = TRUE
)
13.6 Calculating estimates
Now that we have prepared our data and created the design objects, we can calculate our estimates. As a reminder, those are:
Victimization totals estimate the number of criminal victimizations with a given characteristic.
Victimization proportions estimate characteristics among victimizations or victims.
Victimization rates are estimates of the number of victimizations per 1,000 persons or households in the population.
Prevalence rates are estimates of the percentage of the population (persons or households) who are victims of a crime.
13.6.1 Estimation 1: Victimization totals
There are two ways to calculate victimization totals. Using the incident design object (inc_des
) is the most straightforward method, but the person (pers_des
) and household (hh_des
) design objects can be used as well if the adjustment factor (ADJINC_WT
) is incorporated. In the example below, the total number of property and violent victimizations is first calculated using the incident file and then using the household and person design objects. The incident file is smaller, and thus, estimation is faster using that file, but the estimates are the same as illustrated in Table 13.5, Table 13.6, and Table 13.7.
vt1 <-
inc_des %>%
summarize(
Property_Vzn = survey_total(Property, na.rm = TRUE),
Violent_Vzn = survey_total(Violent, na.rm = TRUE)
) %>%
gt() %>%
tab_spanner(
label = "Property Crime",
columns = starts_with("Property")
) %>%
tab_spanner(
label = "Violent Crime",
columns = starts_with("Violent")
) %>%
cols_label(
ends_with("Vzn") ~ "Total",
ends_with("se") ~ "S.E."
) %>%
fmt_number(decimals = 0)
vt2a <- hh_des %>%
summarize(Property_Vzn = survey_total(Property * ADJINC_WT,
na.rm = TRUE
)) %>%
gt() %>%
tab_spanner(
label = "Property Crime",
columns = starts_with("Property")
) %>%
cols_label(
ends_with("Vzn") ~ "Total",
ends_with("se") ~ "S.E."
) %>%
fmt_number(decimals = 0)
vt2b <- pers_des %>%
summarize(Violent_Vzn = survey_total(Violent * ADJINC_WT,
na.rm = TRUE
)) %>%
gt() %>%
tab_spanner(
label = "Violent Crime",
columns = starts_with("Violent")
) %>%
cols_label(
ends_with("Vzn") ~ "Total",
ends_with("se") ~ "S.E."
) %>%
fmt_number(decimals = 0)
Property Crime | Violent Crime | ||
---|---|---|---|
Total | S.E. | Total | S.E. |
11,682,056 | 263,844 | 4,598,306 | 198,115 |
Property Crime | |
---|---|
Total | S.E. |
11,682,056 | 263,844 |
Violent Crime | |
---|---|
Total | S.E. |
4,598,306 | 198,115 |
The number of victimizations estimated using the incident file is equivalent to the person and household file method. There were an estimated 11,682,056 property victimizations and 4,598,306 violent victimizations in 2021.
13.6.2 Estimation 2: Victimization proportions
Victimization proportions are proportions describing features of a victimization. The key here is that these are estimates among victimizations, not among the population. These types of estimates can only be calculated using the incident design object (inc_des
).
For example, we could be interested in the percentage of property victimizations reported to the police as shown in the following code with an estimate, the standard error, and 95% confidence interval:
prop1 <- inc_des %>%
filter(Property) %>%
summarize(Pct = survey_mean(ReportPolice,
na.rm = TRUE,
proportion = TRUE,
vartype = c("se", "ci")
) * 100)
prop1
## # A tibble: 1 × 4
## Pct Pct_se Pct_low Pct_upp
## <dbl> <dbl> <dbl> <dbl>
## 1 30.8 0.798 29.2 32.4
Or, the percentage of violent victimizations that are in urban areas:
prop2 <- inc_des %>%
filter(Violent) %>%
summarize(Pct = survey_mean(Urbanicity == "Urban",
na.rm = TRUE
) * 100)
prop2
## # A tibble: 1 × 2
## Pct Pct_se
## <dbl> <dbl>
## 1 18.1 1.49
In 2021, we estimate that 30.8% of property crimes were reported to the police, and 18.1% of violent crimes occurred in urban areas.
13.6.3 Estimation 3: Victimization rates
Victimization rates measure the number of victimizations per population. They are not an estimate of the proportion of households or persons who are victimized, which is the prevalence rate described in Section 13.6.4. Victimization rates are estimated using the household (hh_des
) or person (pers_des
) design objects depending on the type of crime, and the adjustment factor (ADJINC_WT
) must be incorporated. We return to the example of property and violent victimizations used in the example for victimization totals (Section 13.6.1). In the following example, the property victimization totals are calculated as above, as well as the property victimization rate (using survey_mean()
) and the population size using survey_total()
.
Victimization rates use the incident weight in the numerator and the person or household weight in the denominator. This is accomplished by calculating the rates with the weight adjustment (ADJINC_WT
) multiplied by the estimate of interest. Let’s look at an example of property victimization.
vr_prop <- hh_des %>%
summarize(
Property_Vzn = survey_total(Property * ADJINC_WT,
na.rm = TRUE
),
Property_Rate = survey_mean(Property * ADJINC_WT * 1000,
na.rm = TRUE
),
PopSize = survey_total(1, vartype = NULL)
)
vr_prop
## # A tibble: 1 × 5
## Property_Vzn Property_Vzn_se Property_Rate Property_Rate_se PopSize
## <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 11682056. 263844. 90.3 1.95 129319232.
In the output above, we see the estimate for property victimization rate in 2021 was 90.3 per 1,000 households. This is consistent with calculating the number of victimizations per 1,000 population, as demonstrated in the following code output.
vr_prop %>%
select(-ends_with("se")) %>%
mutate(Property_Rate_manual = Property_Vzn / PopSize * 1000)
## # A tibble: 1 × 4
## Property_Vzn Property_Rate PopSize Property_Rate_manual
## <dbl> <dbl> <dbl> <dbl>
## 1 11682056. 90.3 129319232. 90.3
Victimization rates can also be calculated based on particular characteristics of the victimization. In the following example, we calculate the rate of aggravated assault with no weapon, firearm, knife, and another weapon.
pers_des %>%
summarize(across(
starts_with("AAST_"),
~ survey_mean(. * ADJINC_WT * 1000, na.rm = TRUE)
))
## # A tibble: 1 × 8
## AAST_NoWeap AAST_NoWeap_se AAST_Firearm AAST_Firearm_se AAST_Knife
## <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 0.249 0.0595 0.860 0.101 0.455
## # ℹ 3 more variables: AAST_Knife_se <dbl>, AAST_Other <dbl>,
## # AAST_Other_se <dbl>
A common desire is to calculate victimization rates by several characteristics. For example, we may want to calculate the violent victimization rate and aggravated assault rate by sex, race/Hispanic origin, age group, marital status, and household income. This requires a separate group_by()
statement for each categorization. Thus, we make a function to do this and then use the map_df()
function from the {purrr} package to loop through the variables (Wickham and Henry 2023). This function takes a demographic variable as its input (byarvar
) and calculates the violent and aggravated assault victimization rate for each level. It then creates some columns with the variable, the level of each variable, and a numeric version of the variable (LevelNum
) for sorting later. The function is run across multiple variables using map()
and then stacks the results into a single output using bind_rows()
.
pers_est_by <- function(byvar) {
pers_des %>%
rename(Level := {{ byvar }}) %>%
filter(!is.na(Level)) %>%
group_by(Level) %>%
summarize(
Violent = survey_mean(Violent * ADJINC_WT * 1000, na.rm = TRUE),
AAST = survey_mean(AAST * ADJINC_WT * 1000, na.rm = TRUE)
) %>%
mutate(
Variable = byvar,
LevelNum = as.numeric(Level),
Level = as.character(Level)
) %>%
select(Variable, Level, LevelNum, everything())
}
pers_est_df <-
c("Sex", "RaceHispOrigin", "AgeGroup", "MaritalStatus", "Income") %>%
map(pers_est_by) %>%
bind_rows()
The output from all the estimates is cleaned to create better labels, such as going from “RaceHispOrigin” to “Race/Hispanic Origin.” Finally, the {gt} package is used to make a publishable table (Table 13.8). Using the functions from the {gt} package, we add column labels and footnotes and present estimates rounded to the first decimal place (Iannone et al. 2024).
vr_gt <- pers_est_df %>%
mutate(
Variable = case_when(
Variable == "RaceHispOrigin" ~ "Race/Hispanic Origin",
Variable == "MaritalStatus" ~ "Marital Status",
Variable == "AgeGroup" ~ "Age",
TRUE ~ Variable
)
) %>%
select(-LevelNum) %>%
group_by(Variable) %>%
gt(rowname_col = "Level") %>%
tab_spanner(
label = "Violent Crime",
id = "viol_span",
columns = c("Violent", "Violent_se")
) %>%
tab_spanner(
label = "Aggravated Assault",
columns = c("AAST", "AAST_se")
) %>%
cols_label(
Violent = "Rate",
Violent_se = "S.E.",
AAST = "Rate",
AAST_se = "S.E.",
) %>%
fmt_number(
columns = c("Violent", "Violent_se", "AAST", "AAST_se"),
decimals = 1
) %>%
tab_footnote(
footnote = "Includes rape or sexual assault, robbery,
aggravated assault, and simple assault.",
locations = cells_column_spanners(spanners = "viol_span")
) %>%
tab_footnote(
footnote = "Excludes persons of Hispanic origin.",
locations =
cells_stub(rows = Level %in%
c("White", "Black", "Asian", NHOPI, "Other"))
) %>%
tab_footnote(
footnote = "Includes persons who identified as
Native Hawaiian or Other Pacific Islander only.",
locations = cells_stub(rows = Level == NHOPI)
) %>%
tab_footnote(
footnote = "Includes persons who identified as American Indian or
Alaska Native only or as two or more races.",
locations = cells_stub(rows = Level == "Other")
) %>%
tab_source_note(
source_note = md("*Note*: Rates per 1,000 persons age 12 or older.")
) %>%
tab_source_note(
source_note = md("*Source*: Bureau of Justice Statistics,
National Crime Victimization Survey, 2021.")
) %>%
tab_stubhead(label = "Victim Demographic") %>%
tab_caption("Rate and standard error of violent victimization,
by type of crime and demographic characteristics, 2021")
Victim Demographic | Violent Crime1 | Aggravated Assault | ||
---|---|---|---|---|
Rate | S.E. | Rate | S.E. | |
Sex | ||||
Female | 15.5 | 0.9 | 2.3 | 0.2 |
Male | 17.5 | 1.1 | 3.2 | 0.3 |
Race/Hispanic Origin | ||||
White2 | 16.1 | 0.9 | 2.7 | 0.3 |
Black2 | 18.5 | 2.2 | 3.7 | 0.7 |
Hispanic | 15.9 | 1.7 | 2.3 | 0.4 |
Asian2 | 8.6 | 1.3 | 1.9 | 0.6 |
Native Hawaiian or Other Pacific Islander2,3 | 36.1 | 34.4 | 0.0 | 0.0 |
Other2,4 | 45.4 | 13.0 | 6.2 | 2.0 |
Age | ||||
12--17 | 13.2 | 2.2 | 2.5 | 0.8 |
18--24 | 23.1 | 2.1 | 3.9 | 0.9 |
25--34 | 22.0 | 2.1 | 4.0 | 0.6 |
35--49 | 19.4 | 1.6 | 3.6 | 0.5 |
50--64 | 16.9 | 1.9 | 2.0 | 0.3 |
65 or older | 6.4 | 1.1 | 1.1 | 0.3 |
Marital Status | ||||
Never married | 22.2 | 1.4 | 4.0 | 0.4 |
Married | 9.5 | 0.9 | 1.5 | 0.2 |
Widowed | 10.7 | 3.5 | 0.9 | 0.2 |
Divorced | 27.4 | 2.9 | 4.0 | 0.7 |
Separated | 36.8 | 6.7 | 8.8 | 3.1 |
Income | ||||
Less than $25,000 | 29.6 | 2.5 | 5.1 | 0.7 |
$25,000--49,999 | 16.9 | 1.5 | 3.0 | 0.4 |
$50,000--99,999 | 14.6 | 1.1 | 1.9 | 0.3 |
$100,000--199,999 | 12.2 | 1.3 | 2.5 | 0.4 |
$200,000 or more | 9.7 | 1.4 | 1.7 | 0.6 |
Note: Rates per 1,000 persons age 12 or older. | ||||
Source: Bureau of Justice Statistics, National Crime Victimization Survey, 2021. | ||||
1 Includes rape or sexual assault, robbery, aggravated assault, and simple assault. | ||||
2 Excludes persons of Hispanic origin. | ||||
3 Includes persons who identified as Native Hawaiian or Other Pacific Islander only. | ||||
4 Includes persons who identified as American Indian or Alaska Native only or as two or more races. |
13.6.4 Estimation 4: Prevalence rates
Prevalence rates differ from victimization rates, as the numerator is the number of people or households victimized rather than the number of victimizations. To calculate the prevalence rates, we must run another summary of the data by calculating an indicator for whether a person or household is a victim of a particular crime at any point in the year. Below is an example of calculating the indicator and then the prevalence rate of violent crime and aggravated assault.
pers_prev_des <-
pers_vsum_slim %>%
mutate(Year = floor(YEARQ)) %>%
mutate(
Violent_Ind = sum(Violent) > 0,
AAST_Ind = sum(AAST) > 0,
.by = c("Year", "IDHH", "IDPER")
) %>%
as_survey(
weight = WGTPERCY,
strata = V2117,
ids = V2118,
nest = TRUE
)
pers_prev_ests <- pers_prev_des %>%
summarize(
Violent_Prev = survey_mean(Violent_Ind * 100),
AAST_Prev = survey_mean(AAST_Ind * 100)
)
pers_prev_ests
## # A tibble: 1 × 4
## Violent_Prev Violent_Prev_se AAST_Prev AAST_Prev_se
## <dbl> <dbl> <dbl> <dbl>
## 1 0.980 0.0349 0.215 0.0143
In the example above, the indicator is multiplied by 100 to return a percentage rather than a proportion. In 2021, we estimate that 0.98% of people aged 12 and older were victims of violent crime in the United States, and 0.22% were victims of aggravated assault.
13.7 Statistical testing
For any of the types of estimates discussed, we can also perform statistical testing. For example, we could test whether property victimization rates are different between properties that are owned versus rented. First, we calculate the point estimates.
prop_tenure <- hh_des %>%
group_by(Tenure) %>%
summarize(
Property_Rate = survey_mean(Property * ADJINC_WT * 1000,
na.rm = TRUE, vartype = "ci"
),
)
prop_tenure
## # A tibble: 3 × 4
## Tenure Property_Rate Property_Rate_low Property_Rate_upp
## <fct> <dbl> <dbl> <dbl>
## 1 Owned 68.2 64.3 72.1
## 2 Rented 130. 123. 137.
## 3 <NA> NaN NaN NaN
The property victimization rate for rented households is 129.8 per 1,000 households, while the property victimization rate for owned households is 68.2, which seem very different, especially given the non-overlapping confidence intervals. However, survey data are inherently non-independent, so statistical testing cannot be done by comparing confidence intervals. To conduct the statistical test, we first need to create a variable that incorporates the adjusted incident weight (ADJINC_WT
), and then the test can be conducted on this adjusted variable as discussed in Chapter 6.
prop_tenure_test <- hh_des %>%
mutate(
Prop_Adj = Property * ADJINC_WT * 1000
) %>%
svyttest(
formula = Prop_Adj ~ Tenure,
design = .,
na.rm = TRUE
) %>%
broom::tidy()
estimate | statistic | p.value | parameter | conf.low | conf.high | method | alternative |
---|---|---|---|---|---|---|---|
61.62 | 16.04 | <0.0001 | 169.00 | 54.03 | 69.21 | Design-based t-test | two.sided |
The output of the statistical test shown in Table 13.9 indicates a difference of 61.6 between the property victimization rates of renters and owners, and the test is highly significant with the p-value of <0.0001.
13.8 Exercises
What proportion of completed motor vehicle thefts are not reported to the police? Hint: Use the codebook to look at the definition of Type of Crime (V4529).
How many violent crimes occur in each region?
What is the property victimization rate among each income level?
What is the difference between the violent victimization rate between males and females? Is it statistically different?
References
BJS publishes victimization rates per 1,000, which are also presented in these examples.↩︎