Skip to contents

Bring GES/CRSS data into the current environment, whether by downloading it anew or by using pre-existing files.

Usage

get_gescrss(
  years = 2011:2022,
  regions = c("mw", "ne", "s", "w"),
  dir = NULL,
  proceed = FALSE,
  cache = NULL
)

Arguments

years

Years to be downloaded, in yyyy (character or numeric formats), currently limited to 2011-2021.

regions

(Optional) Regions to keep: mw=midwest, ne=northeast, s=south, w=west.

dir

Directory in which to search for or save a 'GESCRSS data' folder. If NULL (the default), files are downloaded and unzipped to temporary directories and prepared in memory.

proceed

Logical, whether or not to proceed with downloading files without asking for user permission (defaults to FALSE, thus asking permission)

cache

The name of an RDS file to save or use. If the specified file (e.g., 'myFARS.rds') exists in 'dir' it will be returned; if not, an RDS file of this name will be saved in 'dir' for quick use in subsequent calls.

Value

A GESCRSS data object (a list with six tibbles: flat, multi_acc, multi_veh, multi_per, events, and codebook).

Details

This function downloads raw data from the GES and CRSS crash databases. If no directory (dir) is specified, raw CSV files are downloaded into a tempdir(), where they are also prepared, combined, and then brought into the current environment. If you specify a directory (dir), the function will look there for a 'GESCRSS data' folder. If not found, it will be created and populated with raw and prepared SAS and RDS files. If the directory is found, the function makes sure all requested years are present and asks permission to download any missing years.

The object returned is a list with class 'GESCRSS'. It contains six tibbles: flat, multi_acc, multi_veh, multi_per, events, and codebook.

Flat files are wide-formatted and presented at the person level. All crashes involve at least one motor vehicle, each of which may contain one or multiple people. These are the three entities of crash data. The flat files therefore repeat some data elements across multiple rows. Please conduct your analysis with your entity in mind.

Some data elements can include multiple values for any data level (e.g., multiple weather conditions corresponding to the crash, or multiple crash factors related to vehicle or person). These elements have been collected in the yyyy_multi_[acc/veh/per].rds files in long format. These files contain crash, vehicle, and person identifiers, and two variables labelled name and value. These correspond to variable names from the raw data files and the corresponding values, respectively.

The events tibble provides a sequence of events for all vehicles involved in the crash. See Crash Sequences vignette for an example.

The codebook tibble serves as a searchable codebook for all files of any given year.

Please review the CRSS Analytical User's Manual

Regions are as follows: mw = Midwest = OH, IN, IL, MI, WI, MN, ND, SD, NE, IA, MO, KS ne = Northeast = PA, NJ, NY, NH, VT, RI, MA, ME, CT s = South = MD, DE, DC, WV, VA, KY, TN, NC, SC, GA, FL, AL, MS, LA, AR, OK, TX w = West = MT, ID, WA, OR, CA, NV, NM, AZ, UT, CO, WY, AK, HI

Examples


  if (FALSE) {
    myGESCRSS <- get_gescrss(years = 2021, regions = "s")
  }