A table describing each GESCRSS variable name, value, and corresponding value label.
Format
A data frame with 34,662 rows and 8 variables:
- source
The source of the data (either FARS or GESCRSS).
- file
The data file that contains the given variable.
- name_ncsa
The original name of the data element.
- name_rfars
The modified data element name used in rfars
- label
The label of the data element itself (not its constituent values).
- Definition
The data element's definition, pulled from the Analytical User Manual
- Additional Information
Additional information on the data element, pulled from the Analytical User Manual.
- value
The original value of the data element.
- value_label
The de-coded value label.
- 2014
Indicator: 1 if valid for 2014, NA otherwise.
- 2015
Indicator: 1 if valid for 2015, NA otherwise.
- 2016
Indicator: 1 if valid for 2016, NA otherwise.
- 2017
Indicator: 1 if valid for 2017, NA otherwise.
- 2018
Indicator: 1 if valid for 2018, NA otherwise.
- 2019
Indicator: 1 if valid for 2019, NA otherwise.
- 2020
Indicator: 1 if valid for 2020, NA otherwise.
- 2021
Indicator: 1 if valid for 2021, NA otherwise.
- 2022
Indicator: 1 if valid for 2022, NA otherwise.
- 2023
Indicator: 1 if valid for 2023, NA otherwise.
Details
This codebook serves as a useful reference for researchers using GES/CRSS data. The 'source' variable is intended to help combine with the fars_codebook. Data elements are relatively stable but are occasionally discontinued, created anew, or modified. The 'year' variable helps indicate the availability of data elements, and differentiates between different definitions over time. Users should always check for discontinuities when tabulating cases.
The 'file' variable indicates the file in which the given data element originally appeared. Here, files refers to the SAS files downloaded from NHTSA. Most data elements stayed in their original file. Those that did not were moved to the multi_ files. For example, 'weather' originates from the 'accident' file, but appears in the multi_acc data object created by rfars.
The 'name_ncsa' variable describes the data element's name as assigned by NCSA (the organization within NHTSA that manages the database). To maximize compatibility between years and ease of use for programming, 'name_rfars' provides a cleaned naming convention (via janitor::clean_names()).
Each data element has a 'label', a more human-readable version of the element names. For example, the label for 'harm_ev' is 'First Harmful Event'. These are not definitions but may provide enough information to help users conduct their analysis. Consult the CRSS User Manual for definitions and further details.
'Definition' and 'Additional Information' were extracted from the Analytical User’s Manual.
Each data element has multiple 'value'-'value_label' pairs: 'value' represents the original, non-human-readable value (usually a number), and 'value_label' represents the corresponding text value. For example, for 'harm_ev', 1 (the 'value') corresponds to 'Rollover/Overturn' (the 'value_label'), 2 corresponds to 'Fire/Explosion', etc.
@source Codebooks are automatically generated by extracting SAS format catalogs (.sas7bcat files) and VALUE statements from .sas files during data processing, then consolidating variable names, labels, and value-label mappings across all years into searchable reference tables. Source files are published by NHTSA and available here.
Examples
head(rfars::gescrss_codebook)
#> source file name_ncsa name_rfars label
#> <char> <char> <char> <char> <char>
#> 1: GESCRSS accident LAND_USE land_use Land Use
#> 2: GESCRSS accident LAND_USE land_use Land Use
#> 3: GESCRSS accident LAND_USE land_use Land Use
#> 4: GESCRSS accident LAND_USE land_use Land Use
#> 5: GESCRSS accident LAND_USE land_use Land Use
#> 6: GESCRSS accident REGION region Region of the Country
#> Definition
#> <char>
#> 1: <NA>
#> 2: <NA>
#> 3: <NA>
#> 4: <NA>
#> 5: <NA>
#> 6: This data element identifies the region of the country where the crash occurred.
#> Additional Information
#> <char>
#> 1: <NA>
#> 2: <NA>
#> 3: <NA>
#> 4: <NA>
#> 5: <NA>
#> 6: This data element is derived based on the State in which the Primary Sampling Unit is located where the crash occurred. See Appendix B: Rules for Derived Data Elements for an explanation of this data element and how it is derived.
#> value value_label 2014 2015 2016
#> <char> <char> <char> <char> <char>
#> 1: 1 Within area of population 25,000 - 49,999 1 1 <NA>
#> 2: 2 Within area of population 50,000 - 100,000 1 1 <NA>
#> 3: 3 Within area of population 100,000+ 1 1 <NA>
#> 4: 8 Other area 1 1 <NA>
#> 5: 9 Unknown 1 1 <NA>
#> 6: 1 Northeast (PA, NJ, NY, NH, VT, RI, MA, ME, CT) 1 1 1
#> 2017 2018 2019 2020 2021 2022 2023
#> <char> <char> <char> <char> <char> <char> <char>
#> 1: <NA> <NA> <NA> <NA> <NA> <NA> <NA>
#> 2: <NA> <NA> <NA> <NA> <NA> <NA> <NA>
#> 3: <NA> <NA> <NA> <NA> <NA> <NA> <NA>
#> 4: <NA> <NA> <NA> <NA> <NA> <NA> <NA>
#> 5: <NA> <NA> <NA> <NA> <NA> <NA> <NA>
#> 6: 1 1 1 1 1 1 1