Bring GES/CRSS data into the current environment, whether by downloading it anew or by using pre-existing files.
Usage
get_gescrss(
years = 2011:2022,
regions = c("mw", "ne", "s", "w"),
dir = NULL,
proceed = FALSE,
cache = NULL
)
Arguments
- years
Years to be downloaded, in yyyy (character or numeric formats), currently limited to 2011-2021.
- regions
(Optional) Regions to keep: mw=midwest, ne=northeast, s=south, w=west.
- dir
Directory in which to search for or save a 'GESCRSS data' folder. If NULL (the default), files are downloaded and unzipped to temporary directories and prepared in memory.
- proceed
Logical, whether or not to proceed with downloading files without asking for user permission (defaults to FALSE, thus asking permission)
- cache
The name of an RDS file to save or use. If the specified file (e.g., 'myFARS.rds') exists in 'dir' it will be returned; if not, an RDS file of this name will be saved in 'dir' for quick use in subsequent calls.
Value
A GESCRSS data object (a list with six tibbles: flat, multi_acc, multi_veh, multi_per, events, and codebook).
Details
This function downloads raw data from the GES and CRSS crash databases. If no directory (dir) is specified, raw CSV files are downloaded into a tempdir(), where they are also prepared, combined, and then brought into the current environment. If you specify a directory (dir), the function will look there for a 'GESCRSS data' folder. If not found, it will be created and populated with raw and prepared SAS and RDS files. If the directory is found, the function makes sure all requested years are present and asks permission to download any missing years.
The object returned is a list with class 'GESCRSS'. It contains six tibbles: flat, multi_acc, multi_veh, multi_per, events, and codebook.
Flat files are wide-formatted and presented at the person level. All crashes involve at least one motor vehicle, each of which may contain one or multiple people. These are the three entities of crash data. The flat files therefore repeat some data elements across multiple rows. Please conduct your analysis with your entity in mind.
Some data elements can include multiple values for any data level
(e.g., multiple weather conditions corresponding to the crash, or multiple
crash factors related to vehicle or person). These elements have been
collected in the yyyy_multi_[acc/veh/per].rds files in long format.
These files contain crash, vehicle, and person identifiers, and two
variables labelled name
and value
. These correspond to
variable names from the raw data files and the corresponding values,
respectively.
The events tibble provides a sequence of events for all vehicles involved in the crash. See Crash Sequences vignette for an example.
The codebook tibble serves as a searchable codebook for all files of any given year.
Please review the CRSS Analytical User's Manual
Regions are as follows: mw = Midwest = OH, IN, IL, MI, WI, MN, ND, SD, NE, IA, MO, KS ne = Northeast = PA, NJ, NY, NH, VT, RI, MA, ME, CT s = South = MD, DE, DC, WV, VA, KY, TN, NC, SC, GA, FL, AL, MS, LA, AR, OK, TX w = West = MT, ID, WA, OR, CA, NV, NM, AZ, UT, CO, WY, AK, HI