Join us at useR on August 8, 2025 for our workshop: Complex Survey Data Analysis: A Tidy Introduction with {srvyr} and {survey}. Register here!

Chapter 13 National Crime Victimization Survey vignette

Prerequisites

For this chapter, load the following packages:

library(tidyverse)
library(survey)
library(srvyr)
library(srvyrexploR)
library(gt)

We use data from the United States National Crime Victimization Survey (NCVS). These data are available in the {srvyrexploR} package as ncvs_2021_incident, ncvs_2021_household, and ncvs_2021_person.

13.1 Introduction

The National Crime Victimization Survey (NCVS) is a household survey sponsored by the Bureau of Justice Statistics (BJS), which collects data on criminal victimization, including characteristics of the crimes, offenders, and victims. Crime types include both household and personal crimes, as well as violent and non-violent crimes. The population of interest of this survey is all people in the United States age 12 and older living in housing units and non-institutional group quarters.

The NCVS has been ongoing since 1992. An earlier survey, the National Crime Survey, was run from 1972 to 1991 (U. S. Bureau of Justice Statistics 2017). The survey is administered using a rotating panel. When an address enters the sample, the residents of that address are interviewed every 6 months for a total of 7 interviews. If the initial residents move away from the address during the period and new residents move in, the new residents are included in the survey, as people are not followed when they move.

NCVS data are publicly available and distributed by Inter-university Consortium for Political and Social Research (ICPSR), with data going back to 1992. The vignette in this book includes data from 2021 (U.S. Bureau of Justice Statistics 2022). The NCVS data structure is complicated, and the User’s Guide contains examples for analysis in SAS, SUDAAN, SPSS, and Stata, but not R (Shook-Sa, Couzens, and Berzofsky 2015). This vignette adapts those examples for R.

13.2 Data structure

The data from ICPSR are distributed with five files, each having its unique identifier indicated:

Address Record - YEARQ, IDHH
Household Record - YEARQ, IDHH
Person Record - YEARQ, IDHH, IDPER
Incident Record - YEARQ, IDHH, IDPER
2021 Collection Year Incident - YEARQ, IDHH, IDPER

In this vignette, we focus on the household, person, and incident files and have selected a subset of columns for use in the examples. We have included data in the {srvyexploR} package with this subset of columns, but the complete data files can be downloaded from ICPSR.

13.3 Survey notation

The NCVS User Guide (Shook-Sa, Couzens, and Berzofsky 2015) uses the following notation:

$i$ represents NCVS households, identified on the household-level file with the household identification number IDHH.
$j$ represents NCVS individual respondents within household $i$, identified on the person-level file with the person identification number IDPER.
$k$ represents reporting periods (i.e., YEARQ) for household $i$ and individual respondent $j$.
$l$ represents victimization records for respondent $j$ in household $i$ and reporting period $k$. Each record on the NCVS incident-level file is associated with a victimization record $l$.
$D$ represents one or more domain characteristics of interest in the calculation of NCVS estimates. For victimization totals and proportions, domains can be defined on the basis of crime types (e.g., violent crimes, property crimes), characteristics of victims (e.g., age, sex, household income), or characteristics of the victimizations (e.g., victimizations reported to police, victimizations committed with a weapon present). Domains could also be a combination of all of these types of characteristics. For example, in the calculation of victimization rates, domains are defined on the basis of the characteristics of the victims.
$A_a$ represents the level $a$ of covariate $A$. Covariate $A$ is defined in the calculation of victimization proportions and represents the characteristic we want to obtain the distribution of victimizations in domain $D$.
$C$ represents the personal or property crime for which we want to obtain a victimization rate.

In this vignette, we discuss four estimates:

Victimization totals estimate the number of criminal victimizations with a given characteristic. As demonstrated below, these can be calculated from any of the data files. The estimated victimization total, $\hat{t}_D$ for domain $D$ is estimated as

\[ \hat{t}_D = \sum_{ijkl \in D} v_{ijkl}\]

where $v_{ijkl}$ is the series-adjusted victimization weight for household $i$, respondent $j$, reporting period $k$, and victimization $l$, represented in the data as WGTVICCY.

Victimization proportions estimate characteristics among victimizations or victims. Victimization proportions are calculated using the incident data file. The estimated victimization proportion for domain $D$ across level $a$ of covariate $A$, $\hat{p}_{A_a,D}$ is

\[ \hat{p}_{A_a,D} =\frac{\sum_{ijkl \in A_a, D} v_{ijkl}}{\sum_{ijkl \in D} v_{ijkl}}.\] The numerator is the number of incidents with a particular characteristic in a domain, and the denominator is the number of incidents in a domain.

Victimization rates are estimates of the number of victimizations per 1,000 persons or households in the population²⁹. Victimization rates are calculated using the household or person-level data files. The estimated victimization rate for crime $C$ in domain $D$ is

\[\hat{VR}_{C,D}= \frac{\sum_{ijkl \in C,D} v_{ijkl}}{\sum_{ijk \in D} w_{ijk}}\times 1000\] where $w_{ijk}$ is the person weight (WGTPERCY) for personal crimes or household weight (WGTHHCY) for household crimes. The numerator is the number of incidents in a domain, and the denominator is the number of persons or households in a domain. Notice that the weights in the numerator and denominator are different; this is important, and in the syntax and examples below, we discuss how to make an estimate that involves two weights.

Prevalence rates are estimates of the percentage of the population (persons or households) who are victims of a crime. These are estimated using the household or person-level data files. The estimated prevalence rate for crime $C$ in domain $D$ is

\[ \hat{PR}_{C, D}= \frac{\sum_{ijk \in {C,D}} I_{ij}w_{ijk}}{\sum_{ijk \in D} w_{ijk}} \times 100\]

where $I_{ij}$ is an indicator that a person or household in domain $D$ was a victim of crime $C$ at any time in the year. The numerator is the number of victims in domain $D$ for crime $C$, and the denominator is the number of people or households in the population.

13.4 Data file preparation

Some work is necessary to prepare the files before analysis. The design variables indicating pseudo-stratum (V2117) and half-sample code (V2118) are only included on the household file, so they must be added to the person and incident files for any analysis.

For victimization rates, we need to know the victimization status for both victims and non-victims. Therefore, the incident file must be summarized and merged onto the household or person files for household-level and person-level crimes, respectively. We begin this vignette by discussing how to create these incident summary files. This is following Section 2.2 of the NCVS User’s Guide (Shook-Sa, Couzens, and Berzofsky 2015).

13.4.1 Preparing files for estimation of victimization rates

Each record on the incident file represents one victimization, which is not the same as one incident. Some victimizations have several instances that make it difficult for the victim to differentiate the details of these incidents, labeled as “series crimes.” Appendix A of the User’s Guide indicates how to calculate the series weight in other statistical languages.

Here, we adapt that code for R. Essentially, if a victimization is a series crime, its series weight is top-coded at 10 based on the number of actual victimizations, that is, even if the crime occurred more than 10 times, it is counted as 10 times to reduce the influence of extreme outliers. If an incident is a series crime, but the number of occurrences is unknown, the series weight is set to 6. A description of the variables used to create indicators of series and the associated weights is included in Table 13.1.

TABLE 13.1: Codebook for incident variables, related to series weight
	Description	Value	Label
V4016	How many times incident occur last 6 months	1–996	Number of times
		997	Don’t know
V4017	How many incidents	1	1–5 incidents (not a “series”)
		2	6 or more incidents
		8	Residue (invalid data)
V4018	Incidents similar in detail	1	Similar
		2	Different (not in a “series”)
		8	Residue (invalid data)
V4019	Enough detail to distinguish incidents	1	Yes (not a “series”)
		2	No (is a “series”)
		8	Residue (invalid data)
WGTVICCY	Adjusted victimization weight		Numeric

We want to create four variables to indicate if an incident is a series crime. First, we create a variable called series using V4017, V4018, and V4019 where an incident is considered a series crime if there are 6 or more incidents (V4107), the incidents are similar in detail (V4018), or there is not enough detail to distinguish the incidents (V4019). Second, we top-code the number of incidents (V4016) by creating a variable n10v4016, which is set to 10 if V4016 > 10. Third, we create the serieswgt using the two new variables series and n10v4019 to classify the max series based on missing data and number of incidents. Finally, we create the new weight using our new serieswgt variable and the existing weight (WGTVICCY).

inc_series <- ncvs_2021_incident %>%
  mutate(
    series = case_when(
      V4017 %in% c(1, 8) ~ 1,
      V4018 %in% c(2, 8) ~ 1,
      V4019 %in% c(1, 8) ~ 1,
      TRUE ~ 2
    ),
    n10v4016 = case_when(
      V4016 %in% c(997, 998) ~ NA_real_,
      V4016 > 10 ~ 10,
      TRUE ~ V4016
    ),
    serieswgt = case_when(
      series == 2 & is.na(n10v4016) ~ 6,
      series == 2 ~ n10v4016,
      TRUE ~ 1
    ),
    NEWWGT = WGTVICCY * serieswgt
  )

The next step in preparing the files for estimation is to create indicators on the victimization file for characteristics of interest. Almost all BJS publications limit the analysis to records where the victimization occurred in the United States (where V4022 is not equal to 1). We do this for all estimates as well. A brief codebook of variables for this task is located in Table 13.2.

TABLE 13.2: Codebook for incident variables, crime type indicators and characteristics
Variable	Description	Value	Label
V4022	In what city/town/village	1	Outside U.S.
		2	Not inside a city/town/village
		3	Same city/town/village as present residence
		4	Different city/town/village as present residence
		5	Don’t know
		6	Don’t know if 2, 4, or 5
V4049	Did offender have a weapon	1	Yes
		2	No
		3	Don’t know
V4050	What was the weapon that offender had	1	At least one good entry
		3	Indicates “Yes-Type Weapon-NA”
		7	Indicates “Gun Type Unknown”
		8	No good entry
V4051	Hand gun	0	No
		1	Yes
V4052	Other gun	0	No
		1	Yes
V4053	Knife	0	No
		1	Yes
V4399	Reported to police	1	Yes
		2	No
		3	Don’t know
V4529	Type of crime code	01	Completed rape
		02	Attempted rape
		03	Sexual attack with serious assault
		04	Sexual attack with minor assault
		05	Completed robbery with injury from serious assault
		06	Completed robbery with injury from minor assault
		07	Completed robbery without injury from minor assault
		08	Attempted robbery with injury from serious assault
		09	Attempted robbery with injury from minor assault
		10	Attempted robbery without injury
		11	Completed aggravated assault with injury
		12	Attempted aggravated assault with weapon
		13	Threatened assault with weapon
		14	Simple assault completed with injury
		15	Sexual assault without injury
		16	Unwanted sexual contact without force
		17	Assault without weapon without injury
		18	Verbal threat of rape
		19	Verbal threat of sexual assault
		20	Verbal threat of assault
		21	Completed purse snatching
		22	Attempted purse snatching
		23	Pocket picking (completed only)
		31	Completed burglary, forcible entry
		32	Completed burglary, unlawful entry without force
		33	Attempted forcible entry
		40	Completed motor vehicle theft
		41	Attempted motor vehicle theft
		54	Completed theft less than $10
		55	Completed theft $10 to $49
		56	Completed theft $50 to $249
		57	Completed theft $250 or greater
		58	Completed theft value NA
		59	Attempted theft

Using these variables, we create the following indicators:

Property crime
- V4529 $\ge$ 31
- Variable: Property
Violent crime
- V4529 $\le$ 20
- Variable: Violent
Property crime reported to the police
- V4529 $\ge$ 31 and V4399=1
- Variable: Property_ReportPolice
Violent crime reported to the police
- V4529 < 31 and V4399=1
- Variable: Violent_ReportPolice
Aggravated assault without a weapon
- V4529 in 11:12 and V4049=2
- Variable: AAST_NoWeap
Aggravated assault with a firearm
- V4529 in 11:12 and V4049=1 and (V4051=1 or V4052=1 or V4050=7)
- Variable: AAST_Firearm
Aggravated assault with a knife or sharp object
- V4529 in 11:12 and V4049=1 and (V4053=1 or V4054=1)
- Variable: AAST_Knife
Aggravated assault with another type of weapon
- V4529 in 11:12 and V4049=1 and V4050=1 and not firearm or knife
- Variable: AAST_Other

inc_ind <- inc_series %>%
  filter(V4022 != 1) %>%
  mutate(
    WeapCat = case_when(
      is.na(V4049) ~ NA_character_,
      V4049 == 2 ~ "NoWeap",
      V4049 == 3 ~ "UnkWeapUse",
      V4050 == 3 ~ "Other",
      V4051 == 1 | V4052 == 1 | V4050 == 7 ~ "Firearm",
      V4053 == 1 | V4054 == 1 ~ "Knife",
      TRUE ~ "Other"
    ),
    V4529_num = parse_number(as.character(V4529)),
    ReportPolice = V4399 == 1,
    Property = V4529_num >= 31,
    Violent = V4529_num <= 20,
    Property_ReportPolice = Property & ReportPolice,
    Violent_ReportPolice = Violent & ReportPolice,
    AAST = V4529_num %in% 11:13,
    AAST_NoWeap = AAST & WeapCat == "NoWeap",
    AAST_Firearm = AAST & WeapCat == "Firearm",
    AAST_Knife = AAST & WeapCat == "Knife",
    AAST_Other = AAST & WeapCat == "Other"
  )

This is a good point to pause to look at the output of crosswalks between an original variable and a derived one to check that the logic was programmed correctly and that everything ends up in the expected category.

inc_series %>% count(V4022)

## # A tibble: 6 × 2
##   V4022     n
##   <fct> <int>
## 1 1        34
## 2 2        65
## 3 3      7697
## 4 4      1143
## 5 5        39
## 6 8         4

inc_ind %>% count(V4022)

## # A tibble: 5 × 2
##   V4022     n
##   <fct> <int>
## 1 2        65
## 2 3      7697
## 3 4      1143
## 4 5        39
## 5 8         4

inc_ind %>%
  count(WeapCat, V4049, V4050, V4051, V4052, V4052, V4053, V4054)

## # A tibble: 13 × 8
##    WeapCat    V4049 V4050 V4051 V4052 V4053 V4054     n
##    <chr>      <fct> <fct> <fct> <fct> <fct> <fct> <int>
##  1 Firearm    1     1     0     1     0     0        15
##  2 Firearm    1     1     0     1     1     1         1
##  3 Firearm    1     1     1     0     0     0       125
##  4 Firearm    1     1     1     0     1     0         2
##  5 Firearm    1     1     1     1     0     0         3
##  6 Firearm    1     7     0     0     0     0         3
##  7 Knife      1     1     0     0     0     1        14
##  8 Knife      1     1     0     0     1     0        71
##  9 NoWeap     2     <NA>  <NA>  <NA>  <NA>  <NA>   1794
## 10 Other      1     1     0     0     0     0       147
## 11 Other      1     3     0     0     0     0        26
## 12 UnkWeapUse 3     <NA>  <NA>  <NA>  <NA>  <NA>    519
## 13 <NA>       <NA>  <NA>  <NA>  <NA>  <NA>  <NA>   6228

inc_ind %>%
  count(V4529, Property, Violent, AAST) %>%
  print(n = 40)

## # A tibble: 34 × 5
##    V4529 Property Violent AAST      n
##    <fct> <lgl>    <lgl>   <lgl> <int>
##  1 1     FALSE    TRUE    FALSE    45
##  2 2     FALSE    TRUE    FALSE    20
##  3 3     FALSE    TRUE    FALSE    11
##  4 4     FALSE    TRUE    FALSE     3
##  5 5     FALSE    TRUE    FALSE    24
##  6 6     FALSE    TRUE    FALSE    26
##  7 7     FALSE    TRUE    FALSE    59
##  8 8     FALSE    TRUE    FALSE     5
##  9 9     FALSE    TRUE    FALSE     7
## 10 10    FALSE    TRUE    FALSE    57
## 11 11    FALSE    TRUE    TRUE     97
## 12 12    FALSE    TRUE    TRUE     91
## 13 13    FALSE    TRUE    TRUE    163
## 14 14    FALSE    TRUE    FALSE   165
## 15 15    FALSE    TRUE    FALSE    24
## 16 16    FALSE    TRUE    FALSE    12
## 17 17    FALSE    TRUE    FALSE   357
## 18 18    FALSE    TRUE    FALSE    14
## 19 19    FALSE    TRUE    FALSE     3
## 20 20    FALSE    TRUE    FALSE   607
## 21 21    FALSE    FALSE   FALSE     2
## 22 22    FALSE    FALSE   FALSE     2
## 23 23    FALSE    FALSE   FALSE    19
## 24 31    TRUE     FALSE   FALSE   248
## 25 32    TRUE     FALSE   FALSE   634
## 26 33    TRUE     FALSE   FALSE   188
## 27 40    TRUE     FALSE   FALSE   256
## 28 41    TRUE     FALSE   FALSE    97
## 29 54    TRUE     FALSE   FALSE   407
## 30 55    TRUE     FALSE   FALSE  1006
## 31 56    TRUE     FALSE   FALSE  1686
## 32 57    TRUE     FALSE   FALSE  1420
## 33 58    TRUE     FALSE   FALSE   798
## 34 59    TRUE     FALSE   FALSE   395

inc_ind %>% count(ReportPolice, V4399)

## # A tibble: 4 × 3
##   ReportPolice V4399     n
##   <lgl>        <fct> <int>
## 1 FALSE        2      5670
## 2 FALSE        3       103
## 3 FALSE        8        12
## 4 TRUE         1      3163

inc_ind %>%
  count(
    AAST,
    WeapCat,
    AAST_NoWeap,
    AAST_Firearm,
    AAST_Knife,
    AAST_Other
  )

## # A tibble: 11 × 7
##    AAST  WeapCat    AAST_NoWeap AAST_Firearm AAST_Knife AAST_Other     n
##    <lgl> <chr>      <lgl>       <lgl>        <lgl>      <lgl>      <int>
##  1 FALSE Firearm    FALSE       FALSE        FALSE      FALSE         34
##  2 FALSE Knife      FALSE       FALSE        FALSE      FALSE         23
##  3 FALSE NoWeap     FALSE       FALSE        FALSE      FALSE       1769
##  4 FALSE Other      FALSE       FALSE        FALSE      FALSE         27
##  5 FALSE UnkWeapUse FALSE       FALSE        FALSE      FALSE        516
##  6 FALSE <NA>       FALSE       FALSE        FALSE      FALSE       6228
##  7 TRUE  Firearm    FALSE       TRUE         FALSE      FALSE        115
##  8 TRUE  Knife      FALSE       FALSE        TRUE       FALSE         62
##  9 TRUE  NoWeap     TRUE        FALSE        FALSE      FALSE         25
## 10 TRUE  Other      FALSE       FALSE        FALSE      TRUE         146
## 11 TRUE  UnkWeapUse FALSE       FALSE        FALSE      FALSE          3

After creating indicators of victimization types and characteristics, the file is summarized, and crimes are summed across persons or households by YEARQ. Property crimes (i.e., crimes committed against households, such as household burglary or motor vehicle theft) are summed across households, and personal crimes (i.e., crimes committed against an individual, such as assault, robbery, and personal theft) are summed across persons. The indicators are summed using our created series weight variable (serieswgt). Additionally, the existing weight variable (WGTVICCY) needs to be retained for later analysis.

inc_hh_sums <-
  inc_ind %>%
  filter(V4529_num > 23) %>% # restrict to household crimes
  group_by(YEARQ, IDHH) %>%
  summarize(
    WGTVICCY = WGTVICCY[1],
    across(starts_with("Property"),
      ~ sum(. * serieswgt),
      .names = "{.col}"
    ),
    .groups = "drop"
  )

inc_pers_sums <-
  inc_ind %>%
  filter(V4529_num <= 23) %>% # restrict to person crimes
  group_by(YEARQ, IDHH, IDPER) %>%
  summarize(
    WGTVICCY = WGTVICCY[1],
    across(c(starts_with("Violent"), starts_with("AAST")),
      ~ sum(. * serieswgt),
      .names = "{.col}"
    ),
    .groups = "drop"
  )

Now, we merge the victimization summary files into the appropriate files. For any record on the household or person file that is not on the victimization file, the victimization counts are set to 0 after merging. In this step, we also create the victimization adjustment factor. See Section 2.2.4 in the User’s Guide for details of why this adjustment is created (Shook-Sa, Couzens, and Berzofsky 2015). It is calculated as follows:

\[ A_{ijk}=\frac{v_{ijk}}{w_{ijk}}\]

where $w_{ijk}$ is the person weight (WGTPERCY) for personal crimes or the household weight (WGTHHCY) for household crimes, and $v_{ijk}$ is the victimization weight (WGTVICCY) for household $i$, respondent $j$, in reporting period $k$. The adjustment factor is set to 0 if no incidents are reported.

hh_z_list <- rep(0, ncol(inc_hh_sums) - 3) %>%
  as.list() %>%
  setNames(names(inc_hh_sums)[-(1:3)])
pers_z_list <- rep(0, ncol(inc_pers_sums) - 4) %>%
  as.list() %>%
  setNames(names(inc_pers_sums)[-(1:4)])

hh_vsum <- ncvs_2021_household %>%
  full_join(inc_hh_sums, by = c("YEARQ", "IDHH")) %>%
  replace_na(hh_z_list) %>%
  mutate(ADJINC_WT = if_else(is.na(WGTVICCY), 0, WGTVICCY / WGTHHCY))

pers_vsum <- ncvs_2021_person %>%
  full_join(inc_pers_sums, by = c("YEARQ", "IDHH", "IDPER")) %>%
  replace_na(pers_z_list) %>%
  mutate(ADJINC_WT = if_else(is.na(WGTVICCY), 0, WGTVICCY / WGTPERCY))

13.4.2 Derived demographic variables

A final step in file preparation for the household and person files is creating any derived variables on the household and person files, such as income categories or age categories, for subgroup analysis. We can do this step before or after merging the victimization counts.

13.4.2.1 Household variables

For the household file, we create categories for tenure (rental status), urbanicity, income, place size, and region. A codebook of the household variables is listed in Table 13.3.

TABLE 13.3: Codebook for household variables
Variable	Description	Value	Label
V2015	Tenure	1	Owned or being bought
		2	Rented for cash
		3	No cash rent
SC214A	Household Income	01	Less than $5,000
		02	$5,000–7,499
		03	$7,500–9,999
		04	$10,000–12,499
		05	$12,500–14,999
		06	$15,000–17,499
		07	$17,500–19,999
		08	$20,000–24,999
		09	$25,000–29,999
		10	$30,000–34,999
		11	$35,000–39,999
		12	$40,000–49,999
		13	$50,000–74,999
		15	$75,000–99,999
		16	$100,000–149,999
		17	$150,000–199,999
		18	$200,000 or more
V2126B	Place Size (Population) Code	00	Not in a place
		13	Population under 10,000
		16	10,000–49,999
		17	50,000–99,999
		18	100,000–249,999
		19	250,000–499,999
		20	500,000–999,999
		21	1,000,000–2,499,999
		22	2,500,000–4,999,999
		23	5,000,000 or more
V2127B	Region	1	Northeast
		2	Midwest
		3	South
		4	West
V2143	Urbanicity	1	Urban
		2	Suburban
		3	Rural

hh_vsum_der <- hh_vsum %>%
  mutate(
    Tenure = factor(
      case_when(
        V2015 == 1 ~ "Owned",
        !is.na(V2015) ~ "Rented"
      ),
      levels = c("Owned", "Rented")
    ),
    Urbanicity = factor(
      case_when(
        V2143 == 1 ~ "Urban",
        V2143 == 2 ~ "Suburban",
        V2143 == 3 ~ "Rural"
      ),
      levels = c("Urban", "Suburban", "Rural")
    ),
    SC214A_num = as.numeric(as.character(SC214A)),
    Income = case_when(
      SC214A_num <= 8 ~ "Less than $25,000",
      SC214A_num <= 12 ~ "$25,000--49,999",
      SC214A_num <= 15 ~ "$50,000--99,999",
      SC214A_num <= 17 ~ "$100,000--199,999",
      SC214A_num <= 18 ~ "$200,000 or more"
    ),
    Income = fct_reorder(Income, SC214A_num, .na_rm = FALSE),
    PlaceSize = case_match(
      as.numeric(as.character(V2126B)),
      0 ~ "Not in a place",
      13 ~ "Population under 10,000",
      16 ~ "10,000--49,999",
      17 ~ "50,000--99,999",
      18 ~ "100,000--249,999",
      19 ~ "250,000--499,999",
      20 ~ "500,000--999,999",
      c(21, 22, 23) ~ "1,000,000 or more"
    ),
    PlaceSize = fct_reorder(PlaceSize, as.numeric(V2126B)),
    Region = case_match(
      as.numeric(V2127B),
      1 ~ "Northeast",
      2 ~ "Midwest",
      3 ~ "South",
      4 ~ "West"
    ),
    Region = fct_reorder(Region, as.numeric(V2127B))
  )

As before, we want to check to make sure the recoded variables we create match the existing data as expected.

hh_vsum_der %>% count(Tenure, V2015)

## # A tibble: 4 × 3
##   Tenure V2015      n
##   <fct>  <fct>  <int>
## 1 Owned  1     101944
## 2 Rented 2      46269
## 3 Rented 3       1925
## 4 <NA>   <NA>  106322

hh_vsum_der %>% count(Urbanicity, V2143)

## # A tibble: 3 × 3
##   Urbanicity V2143      n
##   <fct>      <fct>  <int>
## 1 Urban      1      26878
## 2 Suburban   2     173491
## 3 Rural      3      56091

hh_vsum_der %>% count(Income, SC214A)

## # A tibble: 18 × 3
##    Income            SC214A     n
##    <fct>             <fct>  <int>
##  1 Less than $25,000 1       7841
##  2 Less than $25,000 2       2626
##  3 Less than $25,000 3       3949
##  4 Less than $25,000 4       5546
##  5 Less than $25,000 5       5445
##  6 Less than $25,000 6       4821
##  7 Less than $25,000 7       5038
##  8 Less than $25,000 8      11887
##  9 $25,000--49,999   9      11550
## 10 $25,000--49,999   10     13689
## 11 $25,000--49,999   11     13655
## 12 $25,000--49,999   12     23282
## 13 $50,000--99,999   13     44601
## 14 $50,000--99,999   15     33353
## 15 $100,000--199,999 16     34287
## 16 $100,000--199,999 17     15317
## 17 $200,000 or more  18     16892
## 18 <NA>              <NA>    2681

hh_vsum_der %>% count(PlaceSize, V2126B)

## # A tibble: 10 × 3
##    PlaceSize               V2126B     n
##    <fct>                   <fct>  <int>
##  1 Not in a place          0      69484
##  2 Population under 10,000 13     39873
##  3 10,000--49,999          16     53002
##  4 50,000--99,999          17     27205
##  5 100,000--249,999        18     24461
##  6 250,000--499,999        19     13111
##  7 500,000--999,999        20     15194
##  8 1,000,000 or more       21      6167
##  9 1,000,000 or more       22      3857
## 10 1,000,000 or more       23      4106

hh_vsum_der %>% count(Region, V2127B)

## # A tibble: 4 × 3
##   Region    V2127B     n
##   <fct>     <fct>  <int>
## 1 Northeast 1      41585
## 2 Midwest   2      74666
## 3 South     3      87783
## 4 West      4      52426

13.4.2.2 Person variables

For the person file, we create categories for sex, race/Hispanic origin, age categories, and marital status. A codebook of the household variables is located in Table 13.4. We also merge the household demographics to the person file as well as the design variables (V2117 and V2118).

TABLE 13.4: Codebook for person variables
Variable	Description	Value	Label
V3014	Age		12–90
V3015	Current Marital Status	1	Married
		2	Widowed
		3	Divorced
		4	Separated
		5	Never married
V3018	Sex	1	Male
		2	Female
V3023A	Race	01	White only
		02	Black only
		03	American Indian, Alaska native only
		04	Asian only
		05	Hawaiian/Pacific Islander only
		06	White-Black
		07	White-American Indian
		08	White-Asian
		09	White-Hawaiian
		10	Black-American Indian
		11	Black-Asian
		12	Black-Hawaiian/Pacific Islander
		13	American Indian-Asian
		14	Asian-Hawaiian/Pacific Islander
		15	White-Black-American Indian
		16	White-Black-Asian
		17	White-American Indian-Asian
		18	White-Asian-Hawaiian
		19	2 or 3 races
		20	4 or 5 races
V3024	Hispanic Origin	1	Yes
		2	No

NHOPI <- "Native Hawaiian or Other Pacific Islander"

pers_vsum_der <- pers_vsum %>%
  mutate(
    Sex = factor(case_when(
      V3018 == 1 ~ "Male",
      V3018 == 2 ~ "Female"
    )),
    RaceHispOrigin = factor(
      case_when(
        V3024 == 1 ~ "Hispanic",
        V3023A == 1 ~ "White",
        V3023A == 2 ~ "Black",
        V3023A == 4 ~ "Asian",
        V3023A == 5 ~ NHOPI,
        TRUE ~ "Other"
      ),
      levels = c(
        "White", "Black", "Hispanic",
        "Asian", NHOPI, "Other"
      )
    ),
    V3014_num = as.numeric(as.character(V3014)),
    AgeGroup = case_when(
      V3014_num <= 17 ~ "12--17",
      V3014_num <= 24 ~ "18--24",
      V3014_num <= 34 ~ "25--34",
      V3014_num <= 49 ~ "35--49",
      V3014_num <= 64 ~ "50--64",
      V3014_num <= 90 ~ "65 or older"
    ),
    AgeGroup = fct_reorder(AgeGroup, V3014_num),
    MaritalStatus = factor(
      case_when(
        V3015 == 1 ~ "Married",
        V3015 == 2 ~ "Widowed",
        V3015 == 3 ~ "Divorced",
        V3015 == 4 ~ "Separated",
        V3015 == 5 ~ "Never married"
      ),
      levels = c(
        "Never married", "Married",
        "Widowed", "Divorced",
        "Separated"
      )
    )
  ) %>%
  left_join(
    hh_vsum_der %>% select(
      YEARQ, IDHH,
      V2117, V2118, Tenure:Region
    ),
    by = c("YEARQ", "IDHH")
  )

As before, we want to check to make sure the recoded variables we create match the existing data as expected.

pers_vsum_der %>% count(Sex, V3018)

## # A tibble: 2 × 3
##   Sex    V3018      n
##   <fct>  <fct>  <int>
## 1 Female 2     150956
## 2 Male   1     140922

pers_vsum_der %>% count(RaceHispOrigin, V3024)

## # A tibble: 11 × 3
##    RaceHispOrigin                            V3024      n
##    <fct>                                     <fct>  <int>
##  1 White                                     2     197292
##  2 White                                     8        883
##  3 Black                                     2      29947
##  4 Black                                     8        120
##  5 Hispanic                                  1      41450
##  6 Asian                                     2      16015
##  7 Asian                                     8         61
##  8 Native Hawaiian or Other Pacific Islander 2        891
##  9 Native Hawaiian or Other Pacific Islander 8          9
## 10 Other                                     2       5161
## 11 Other                                     8         49

pers_vsum_der %>%
  filter(RaceHispOrigin != "Hispanic" |
    is.na(RaceHispOrigin)) %>%
  count(RaceHispOrigin, V3023A)

## # A tibble: 20 × 3
##    RaceHispOrigin                            V3023A      n
##    <fct>                                     <fct>   <int>
##  1 White                                     1      198175
##  2 Black                                     2       30067
##  3 Asian                                     4       16076
##  4 Native Hawaiian or Other Pacific Islander 5         900
##  5 Other                                     3        1319
##  6 Other                                     6        1217
##  7 Other                                     7        1025
##  8 Other                                     8         837
##  9 Other                                     9         184
## 10 Other                                     10        178
## 11 Other                                     11         87
## 12 Other                                     12         27
## 13 Other                                     13         13
## 14 Other                                     14         53
## 15 Other                                     15        136
## 16 Other                                     16         45
## 17 Other                                     17         11
## 18 Other                                     18         33
## 19 Other                                     19         22
## 20 Other                                     20         23

pers_vsum_der %>%
  group_by(AgeGroup) %>%
  summarize(
    minAge = min(V3014),
    maxAge = max(V3014),
    .groups = "drop"
  )

## # A tibble: 6 × 3
##   AgeGroup    minAge maxAge
##   <fct>        <dbl>  <dbl>
## 1 12--17          12     17
## 2 18--24          18     24
## 3 25--34          25     34
## 4 35--49          35     49
## 5 50--64          50     64
## 6 65 or older     65     90

pers_vsum_der %>% count(MaritalStatus, V3015)

## # A tibble: 6 × 3
##   MaritalStatus V3015      n
##   <fct>         <fct>  <int>
## 1 Never married 5      90425
## 2 Married       1     148131
## 3 Widowed       2      17668
## 4 Divorced      3      28596
## 5 Separated     4       4524
## 6 <NA>          8       2534

We then create tibbles that contain only the variables we need, which makes it easier to use them for analyses.

hh_vsum_slim <- hh_vsum_der %>%
  select(
    YEARQ:V2118,
    WGTVICCY:ADJINC_WT,
    Tenure,
    Urbanicity,
    Income,
    PlaceSize,
    Region
  )

pers_vsum_slim <- pers_vsum_der %>%
  select(YEARQ:WGTPERCY, WGTVICCY:ADJINC_WT, Sex:Region)

To calculate estimates about types of crime, such as what percentage of violent crimes are reported to the police, we must use the incident file. The incident file is not guaranteed to have every pseudo-stratum and half-sample code, so dummy records are created to append before estimation. Finally, we merge demographic variables onto the incident tibble.

dummy_records <- hh_vsum_slim %>%
  distinct(V2117, V2118) %>%
  mutate(
    Dummy = 1,
    WGTVICCY = 1,
    NEWWGT = 1
  )

inc_analysis <- inc_ind %>%
  mutate(Dummy = 0) %>%
  left_join(select(pers_vsum_slim, YEARQ, IDHH, IDPER, Sex:Region),
    by = c("YEARQ", "IDHH", "IDPER")
  ) %>%
  bind_rows(dummy_records) %>%
  select(
    YEARQ:IDPER,
    WGTVICCY,
    NEWWGT,
    V4529,
    WeapCat,
    ReportPolice,
    Property:Region
  )

The tibbles hh_vsum_slim, pers_vsum_slim, and inc_analysis can now be used to create design objects and calculate crime rate estimates.

13.5 Survey design objects

All the data preparation above is necessary to create the design objects and finally begin analysis. We create three design objects for different types of analysis, depending on the estimate we are creating. For the incident data, the weight of analysis is NEWWGT, which we constructed previously. The household and person-level data use WGTHHCY and WGTPERCY, respectively. For all analyses, V2117 is the strata variable, and V2118 is the cluster/PSU variable for analysis. This information can be found in the User’s Guide (Shook-Sa, Couzens, and Berzofsky 2015).

inc_des <- inc_analysis %>%
  as_survey_design(
    weight = NEWWGT,
    strata = V2117,
    ids = V2118,
    nest = TRUE
  )

hh_des <- hh_vsum_slim %>%
  as_survey_design(
    weight = WGTHHCY,
    strata = V2117,
    ids = V2118,
    nest = TRUE
  )

pers_des <- pers_vsum_slim %>%
  as_survey_design(
    weight = WGTPERCY,
    strata = V2117,
    ids = V2118,
    nest = TRUE
  )

13.6 Calculating estimates

Now that we have prepared our data and created the design objects, we can calculate our estimates. As a reminder, those are:

Victimization totals estimate the number of criminal victimizations with a given characteristic.
Victimization proportions estimate characteristics among victimizations or victims.
Victimization rates are estimates of the number of victimizations per 1,000 persons or households in the population.
Prevalence rates are estimates of the percentage of the population (persons or households) who are victims of a crime.

13.6.1 Estimation 1: Victimization totals

There are two ways to calculate victimization totals. Using the incident design object (inc_des) is the most straightforward method, but the person (pers_des) and household (hh_des) design objects can be used as well if the adjustment factor (ADJINC_WT) is incorporated. In the example below, the total number of property and violent victimizations is first calculated using the incident file and then using the household and person design objects. The incident file is smaller, and thus, estimation is faster using that file, but the estimates are the same as illustrated in Table 13.5, Table 13.6, and Table 13.7.

vt1 <-
  inc_des %>%
  summarize(
    Property_Vzn = survey_total(Property, na.rm = TRUE),
    Violent_Vzn = survey_total(Violent, na.rm = TRUE)
  ) %>%
  gt() %>%
  tab_spanner(
    label = "Property Crime",
    columns = starts_with("Property")
  ) %>%
  tab_spanner(
    label = "Violent Crime",
    columns = starts_with("Violent")
  ) %>%
  cols_label(
    ends_with("Vzn") ~ "Total",
    ends_with("se") ~ "S.E."
  ) %>%
  fmt_number(decimals = 0)

vt2a <- hh_des %>%
  summarize(Property_Vzn = survey_total(Property * ADJINC_WT,
    na.rm = TRUE
  )) %>%
  gt() %>%
  tab_spanner(
    label = "Property Crime",
    columns = starts_with("Property")
  ) %>%
  cols_label(
    ends_with("Vzn") ~ "Total",
    ends_with("se") ~ "S.E."
  ) %>%
  fmt_number(decimals = 0)

vt2b <- pers_des %>%
  summarize(Violent_Vzn = survey_total(Violent * ADJINC_WT,
    na.rm = TRUE
  )) %>%
  gt() %>%
  tab_spanner(
    label = "Violent Crime",
    columns = starts_with("Violent")
  ) %>%
  cols_label(
    ends_with("Vzn") ~ "Total",
    ends_with("se") ~ "S.E."
  ) %>%
  fmt_number(decimals = 0)

TABLE 13.5: Estimates of total property and violent victimizations with standard errors calculated using the incident design object, 2021 (vt1)
Property Crime		Violent Crime
Total	S.E.	Total	S.E.
11,682,056	263,844	4,598,306	198,115

TABLE 13.6: Estimates of total property victimizations with standard errors calculated using the household design object, 2021 (vt2a)
Property Crime
Total	S.E.
11,682,056	263,844

TABLE 13.7: Estimates of total violent victimizations with standard errors calculated using the person design object, 2021 (vt2b)
Violent Crime
Total	S.E.
4,598,306	198,115

The number of victimizations estimated using the incident file is equivalent to the person and household file method. There were an estimated 11,682,056 property victimizations and 4,598,306 violent victimizations in 2021.

13.6.2 Estimation 2: Victimization proportions

Victimization proportions are proportions describing features of a victimization. The key here is that these are estimates among victimizations, not among the population. These types of estimates can only be calculated using the incident design object (inc_des).

For example, we could be interested in the percentage of property victimizations reported to the police as shown in the following code with an estimate, the standard error, and 95% confidence interval:

prop1 <- inc_des %>%
  filter(Property) %>%
  summarize(Pct = survey_mean(ReportPolice,
    na.rm = TRUE,
    proportion = TRUE,
    vartype = c("se", "ci")
  ) * 100)

prop1

## # A tibble: 1 × 4
##     Pct Pct_se Pct_low Pct_upp
##   <dbl>  <dbl>   <dbl>   <dbl>
## 1  30.8  0.798    29.2    32.4

Or, the percentage of violent victimizations that are in urban areas:

prop2 <- inc_des %>%
  filter(Violent) %>%
  summarize(Pct = survey_mean(Urbanicity == "Urban",
    na.rm = TRUE
  ) * 100)

prop2

## # A tibble: 1 × 2
##     Pct Pct_se
##   <dbl>  <dbl>
## 1  18.1   1.49

In 2021, we estimate that 30.8% of property crimes were reported to the police, and 18.1% of violent crimes occurred in urban areas.

13.6.3 Estimation 3: Victimization rates

Victimization rates measure the number of victimizations per population. They are not an estimate of the proportion of households or persons who are victimized, which is the prevalence rate described in Section 13.6.4. Victimization rates are estimated using the household (hh_des) or person (pers_des) design objects depending on the type of crime, and the adjustment factor (ADJINC_WT) must be incorporated. We return to the example of property and violent victimizations used in the example for victimization totals (Section 13.6.1). In the following example, the property victimization totals are calculated as above, as well as the property victimization rate (using survey_mean()) and the population size using survey_total().

Victimization rates use the incident weight in the numerator and the person or household weight in the denominator. This is accomplished by calculating the rates with the weight adjustment (ADJINC_WT) multiplied by the estimate of interest. Let’s look at an example of property victimization.

vr_prop <- hh_des %>%
  summarize(
    Property_Vzn = survey_total(Property * ADJINC_WT,
      na.rm = TRUE
    ),
    Property_Rate = survey_mean(Property * ADJINC_WT * 1000,
      na.rm = TRUE
    ),
    PopSize = survey_total(1, vartype = NULL)
  )

vr_prop

## # A tibble: 1 × 5
##   Property_Vzn Property_Vzn_se Property_Rate Property_Rate_se    PopSize
##          <dbl>           <dbl>         <dbl>            <dbl>      <dbl>
## 1    11682056.         263844.          90.3             1.95 129319232.

In the output above, we see the estimate for property victimization rate in 2021 was 90.3 per 1,000 households. This is consistent with calculating the number of victimizations per 1,000 population, as demonstrated in the following code output.

vr_prop %>%
  select(-ends_with("se")) %>%
  mutate(Property_Rate_manual = Property_Vzn / PopSize * 1000)

## # A tibble: 1 × 4
##   Property_Vzn Property_Rate    PopSize Property_Rate_manual
##          <dbl>         <dbl>      <dbl>                <dbl>
## 1    11682056.          90.3 129319232.                 90.3

Victimization rates can also be calculated based on particular characteristics of the victimization. In the following example, we calculate the rate of aggravated assault with no weapon, firearm, knife, and another weapon.

pers_des %>%
  summarize(across(
    starts_with("AAST_"),
    ~ survey_mean(. * ADJINC_WT * 1000, na.rm = TRUE)
  ))

## # A tibble: 1 × 8
##   AAST_NoWeap AAST_NoWeap_se AAST_Firearm AAST_Firearm_se AAST_Knife
##         <dbl>          <dbl>        <dbl>           <dbl>      <dbl>
## 1       0.249         0.0595        0.860           0.101      0.455
## # ℹ 3 more variables: AAST_Knife_se <dbl>, AAST_Other <dbl>,
## #   AAST_Other_se <dbl>

A common desire is to calculate victimization rates by several characteristics. For example, we may want to calculate the violent victimization rate and aggravated assault rate by sex, race/Hispanic origin, age group, marital status, and household income. This requires a separate group_by() statement for each categorization. Thus, we make a function to do this and then use the map_df() function from the {purrr} package to loop through the variables (Wickham and Henry 2023). This function takes a demographic variable as its input (byarvar) and calculates the violent and aggravated assault victimization rate for each level. It then creates some columns with the variable, the level of each variable, and a numeric version of the variable (LevelNum) for sorting later. The function is run across multiple variables using map() and then stacks the results into a single output using bind_rows().

pers_est_by <- function(byvar) {
  pers_des %>%
    rename(Level := {{ byvar }}) %>%
    filter(!is.na(Level)) %>%
    group_by(Level) %>%
    summarize(
      Violent = survey_mean(Violent * ADJINC_WT * 1000, na.rm = TRUE),
      AAST = survey_mean(AAST * ADJINC_WT * 1000, na.rm = TRUE)
    ) %>%
    mutate(
      Variable = byvar,
      LevelNum = as.numeric(Level),
      Level = as.character(Level)
    ) %>%
    select(Variable, Level, LevelNum, everything())
}

pers_est_df <-
  c("Sex", "RaceHispOrigin", "AgeGroup", "MaritalStatus", "Income") %>%
  map(pers_est_by) %>%
  bind_rows()

The output from all the estimates is cleaned to create better labels, such as going from “RaceHispOrigin” to “Race/Hispanic Origin.” Finally, the {gt} package is used to make a publishable table (Table 13.8). Using the functions from the {gt} package, we add column labels and footnotes and present estimates rounded to the first decimal place (Iannone et al. 2025).

vr_gt <- pers_est_df %>%
  mutate(
    Variable = case_when(
      Variable == "RaceHispOrigin" ~ "Race/Hispanic Origin",
      Variable == "MaritalStatus" ~ "Marital Status",
      Variable == "AgeGroup" ~ "Age",
      TRUE ~ Variable
    )
  ) %>%
  select(-LevelNum) %>%
  group_by(Variable) %>%
  gt(rowname_col = "Level") %>%
  tab_spanner(
    label = "Violent Crime",
    id = "viol_span",
    columns = c("Violent", "Violent_se")
  ) %>%
  tab_spanner(
    label = "Aggravated Assault",
    columns = c("AAST", "AAST_se")
  ) %>%
  cols_label(
    Violent = "Rate",
    Violent_se = "S.E.",
    AAST = "Rate",
    AAST_se = "S.E.",
  ) %>%
  fmt_number(
    columns = c("Violent", "Violent_se", "AAST", "AAST_se"),
    decimals = 1
  ) %>%
  tab_footnote(
    footnote = "Includes rape or sexual assault, robbery,
    aggravated assault, and simple assault.",
    locations = cells_column_spanners(spanners = "viol_span")
  ) %>%
  tab_footnote(
    footnote = "Excludes persons of Hispanic origin.",
    locations =
      cells_stub(rows = Level %in%
        c("White", "Black", "Asian", NHOPI, "Other"))
  ) %>%
  tab_footnote(
    footnote = "Includes persons who identified as
    Native Hawaiian or Other Pacific Islander only.",
    locations = cells_stub(rows = Level == NHOPI)
  ) %>%
  tab_footnote(
    footnote = "Includes persons who identified as American Indian or
    Alaska Native only or as two or more races.",
    locations = cells_stub(rows = Level == "Other")
  ) %>%
  tab_source_note(
    source_note = md("*Note*: Rates per 1,000 persons age 12 or older.")
  ) %>%
  tab_source_note(
    source_note = md("*Source*: Bureau of Justice Statistics,
                     National Crime Victimization Survey, 2021.")
  ) %>%
  tab_stubhead(label = "Victim Demographic") %>%
  tab_caption("Rate and standard error of violent victimization,
              by type of crime and demographic characteristics, 2021")

vr_gt

TABLE 13.8: Rate and standard error of violent victimization, by type of crime and demographic characteristics, 2021
Victim Demographic	Violent Crime¹		Aggravated Assault
Victim Demographic	Rate	S.E.	Rate	S.E.
Sex
Female	15.5	0.9	2.3	0.2
Male	17.5	1.1	3.2	0.3
Race/Hispanic Origin
White²	16.1	0.9	2.7	0.3
Black²	18.5	2.2	3.7	0.7
Hispanic	15.9	1.7	2.3	0.4
Asian²	8.6	1.3	1.9	0.6
Native Hawaiian or Other Pacific Islander^2,3	36.1	34.4	0.0	0.0
Other^2,4	45.4	13.0	6.2	2.0
Age
12--17	13.2	2.2	2.5	0.8
18--24	23.1	2.1	3.9	0.9
25--34	22.0	2.1	4.0	0.6
35--49	19.4	1.6	3.6	0.5
50--64	16.9	1.9	2.0	0.3
65 or older	6.4	1.1	1.1	0.3
Marital Status
Never married	22.2	1.4	4.0	0.4
Married	9.5	0.9	1.5	0.2
Widowed	10.7	3.5	0.9	0.2
Divorced	27.4	2.9	4.0	0.7
Separated	36.8	6.7	8.8	3.1
Income
Less than $25,000	29.6	2.5	5.1	0.7
$25,000--49,999	16.9	1.5	3.0	0.4
$50,000--99,999	14.6	1.1	1.9	0.3
$100,000--199,999	12.2	1.3	2.5	0.4
$200,000 or more	9.7	1.4	1.7	0.6
Note: Rates per 1,000 persons age 12 or older.
Source: Bureau of Justice Statistics, National Crime Victimization Survey, 2021.
¹ Includes rape or sexual assault, robbery, aggravated assault, and simple assault.
² Excludes persons of Hispanic origin.
³ Includes persons who identified as Native Hawaiian or Other Pacific Islander only.
⁴ Includes persons who identified as American Indian or Alaska Native only or as two or more races.

13.6.4 Estimation 4: Prevalence rates

Prevalence rates differ from victimization rates, as the numerator is the number of people or households victimized rather than the number of victimizations. To calculate the prevalence rates, we must run another summary of the data by calculating an indicator for whether a person or household is a victim of a particular crime at any point in the year. Below is an example of calculating the indicator and then the prevalence rate of violent crime and aggravated assault.

pers_prev_des <-
  pers_vsum_slim %>%
  mutate(Year = floor(YEARQ)) %>%
  mutate(
    Violent_Ind = sum(Violent) > 0,
    AAST_Ind = sum(AAST) > 0,
    .by = c("Year", "IDHH", "IDPER")
  ) %>%
  as_survey(
    weight = WGTPERCY,
    strata = V2117,
    ids = V2118,
    nest = TRUE
  )

pers_prev_ests <- pers_prev_des %>%
  summarize(
    Violent_Prev = survey_mean(Violent_Ind * 100),
    AAST_Prev = survey_mean(AAST_Ind * 100)
  )

pers_prev_ests

## # A tibble: 1 × 4
##   Violent_Prev Violent_Prev_se AAST_Prev AAST_Prev_se
##          <dbl>           <dbl>     <dbl>        <dbl>
## 1        0.980          0.0349     0.215       0.0143

In the example above, the indicator is multiplied by 100 to return a percentage rather than a proportion. In 2021, we estimate that 0.98% of people aged 12 and older were victims of violent crime in the United States, and 0.22% were victims of aggravated assault.

13.7 Statistical testing

For any of the types of estimates discussed, we can also perform statistical testing. For example, we could test whether property victimization rates are different between properties that are owned versus rented. First, we calculate the point estimates.

prop_tenure <- hh_des %>%
  group_by(Tenure) %>%
  summarize(
    Property_Rate = survey_mean(Property * ADJINC_WT * 1000,
      na.rm = TRUE, vartype = "ci"
    ),
  )

prop_tenure

## # A tibble: 3 × 4
##   Tenure Property_Rate Property_Rate_low Property_Rate_upp
##   <fct>          <dbl>             <dbl>             <dbl>
## 1 Owned           68.2              64.3              72.1
## 2 Rented         130.              123.              137. 
## 3 <NA>           NaN               NaN               NaN

The property victimization rate for rented households is 129.8 per 1,000 households, while the property victimization rate for owned households is 68.2, which seem very different, especially given the non-overlapping confidence intervals. However, survey data are inherently non-independent, so statistical testing cannot be done by comparing confidence intervals. To conduct the statistical test, we first need to create a variable that incorporates the adjusted incident weight (ADJINC_WT), and then the test can be conducted on this adjusted variable as discussed in Chapter 6.

prop_tenure_test <- hh_des %>%
  mutate(
    Prop_Adj = Property * ADJINC_WT * 1000
  ) %>%
  svyttest(
    formula = Prop_Adj ~ Tenure,
    design = .,
    na.rm = TRUE
  ) %>%
  broom::tidy()

prop_tenure_test %>%
  mutate(p.value = pretty_p_value(p.value)) %>%
  gt() %>%
  fmt_number()

TABLE 13.9: T-test output for estimates of property victimization rates between properties that are owned versus rented, NCVS 2021
estimate	statistic	p.value	parameter	conf.low	conf.high	method	alternative
61.62	16.04	<0.0001	169.00	54.03	69.21	Design-based t-test	two.sided

The output of the statistical test shown in Table 13.9 indicates a difference of 61.6 between the property victimization rates of renters and owners, and the test is highly significant with the p-value of <0.0001.

13.8 Exercises

What proportion of completed motor vehicle thefts are not reported to the police? Hint: Use the codebook to look at the definition of Type of Crime (V4529).
How many violent crimes occur in each region?
What is the property victimization rate among each income level?
What is the difference between the violent victimization rate between males and females? Is it statistically different?

References

Iannone, Richard, Joe Cheng, Barret Schloerke, Ellis Hughes, Alexandra Lauer, JooYoung Seo, Ken Brevoort, and Olivier Roy. 2025. gt: Easily Create Presentation-Ready Display Tables. https://github.com/rstudio/gt.

Shook-Sa, Bonnie, G. Lance Couzens, and Marcus Berzofsky. 2015. “Users’ Guide to the National Crime Victimization Survey (NCVS) Direct Variance Estimation.” https://bjs.ojp.gov/sites/g/files/xyckuh236/files/media/document/ncvs_variance_user_guide_11.06.14.pdf; U. S. Bureau of Justice Statistics.

U. S. Bureau of Justice Statistics. 2017. “National Crime Victimization Survey, 2016: Technical Documentation.” https://bjs.ojp.gov/sites/g/files/xyckuh236/files/media/document/ncvstd16.pdf.

U.S. Bureau of Justice Statistics. 2022. “National Crime Victimization Survey, [United States], 2021.” https://www.icpsr.umich.edu/web/NACJD/studies/38429; Inter-university Consortium for Political; Social Research [distributor]. https://doi.org/10.3886/ICPSR38429.v1.

Wickham, Hadley, and Lionel Henry. 2023. purrr: Functional Programming Tools. https://purrr.tidyverse.org/.

BJS publishes victimization rates per 1,000, which are also presented in these examples.↩︎