2  Generating Qualtrics Dictionaries

Working with qualtrics files can be tricky because surveys can utilizes question families that contain multiple levels of responses. As a result, variables belong to variable groups. The raw qualtrics names are not very helpful, so dictionary crosswalk files have been create to facilitate data preparation workflow.

Example Variable Family:

Staff#1_1_1 Staff_Fulltime_2021 Number of full time staff in 2021
Staff#1_2_1 Staff_Parttime_2021 Number of part time staff in 2021
Staff#1_3_1 Staff_Boardmmbr_2021 Number of Board Members in 2021
Staff#2_1_1 Staff_Fulltime_2022 Number of full time staff in 2022
Staff#2_2_1 Staff_Parttime_2022 Number of part time staff in 2022
Staff#2_3_1 Staff_Boardmmbr_2022 Number of Board Members in 2022

Some utility scripts have been written to extract variable dictionary elements from qualtrics survey file exports and convert them into a basic crosswalk file.

2.1 Qualtrics File to Crosswalk

# source( "../data-dictionaries/R/00-data-processing-utils.R" )

URL <- "https://raw.githubusercontent.com/UrbanInstitute/nccs-nptrends/main/data-dictionaries/R/00-data-processing-utils.R"
source( URL )

###########   DATA DICTIONARY


d <- 
    legacy = T )

head( as.data.frame(d) )

# exports data dictionary 

dd <- extract_colmap( d )  

# add group variables
# and factor labels 

dd <- 
  dd %>% 
  mutate( group_name = append_groups(qname) ) %>%
  group_by( group_name ) %>% 
  mutate( group_n = n(),
          group_name = ifelse( group_n > 1, group_name, "" ),
          is_group = ifelse( group_n > 1, "1", "0" ),
          group_levels = ifelse( group_n > 1, get_categories(description), "" ) ) %>%
  ungroup() %>% 
  mutate( type = ifelse( group_n > 1, "factor", "" ) ) %>% 
  select( qname, is_group, group_name, group_n, group_levels, 
          description, main, sub )

write.csv( dd, "../data-dictionaries/dd-nptrends-wave-02.csv" )

2.2 Dictionary Files

After generating the skeleton of the crosswalk file from the script, it would be completed by a research assistant by providing information for fields marked user in the table:

q question number (order) qualtrix
vname_raw variable name (from qualtrix export) qualtrix
vname variable name (final) user
vlabel variable label user
type data type (numeric, character, factor, logical, date) user
group group name r script
group_lev1 factor levels user
group_lev2 second factor level for double-grouped variables (e.g. finances_1_1) user
group_lev_draft parsed categories (clean up and use as group_lev labels) r script
add_noise add noise to this variable to anonymize? user
description survey question full qualtrix
main survey question sub qualtrix
sub survey question response categories (kindof) qualtrix

These steps need to be completed each year. Most of the completed dictionaries can be reused if questions are not changing, but note that changes to the order of the questions in the survey can change the qualtrics naming conventions, and any new questions added would need documentation.

These dictionary crosswalk files are utilized in subsequent steps.

2.2.1 Preview

dd <- readxl::read_xlsx( "../data-dictionaries/dd-nptrends-wave-02.xlsx", sheet = "data dictionary" )

head( dd[30:50,] ) 
# A tibble: 6 × 13
      q vname vname_raw vlabel type  group group_lev1 group_lev2 group_lev_draft
  <dbl> <chr> <chr>     <chr>  <chr> <chr> <chr>      <chr>      <chr>          
1    30 Addr… MainAddr… Nonpr… char… Main… address    <NA>       City and State 
2    31 Addr… MainAddr… Nonpr… char… Main… address    <NA>       ZIP Code       
3    32 PrgS… ProgChan… Indic… bool… Prog… number of… increase   Increased the …
4    33 PrgS… ProgChan… indic… bool… Prog… number of… decrease   Dcrsuced the n…
5    34 PrgS… ProgChan… indic… bool… Prog… services   suspend    Suspended or p…
6    35 PrgS… ProgChan… Indic… bool… Prog… people se… increase   Increased the …
# ℹ 4 more variables: add_noise <chr>, description <chr>, main <chr>, sub <chr>